Netflix

Netflix Software Engineer Onsite Coding Questions

11+ questions from real Netflix Software Engineer Onsite Coding rounds, reported by candidates who interviewed there.

11
Questions
7
Topic Areas
10+
Sources

What does the Netflix Onsite Coding round test?

The Netflix onsite coding round is the core technical evaluation. Software Engineer candidates typically see 2-3 algorithm and data structure problems. Problems range from medium to hard difficulty, and interviewers evaluate both correctness and code quality.

Top Topics in This Round

Netflix Software Engineer Onsite Coding Questions

Let\'s say you have an array of similar json objects. Example object: json { "field1": "bar", "field2": 1, "field3": true, "field4": [1, 2, 3], "field5": { "nested": { \t\t"other": [4, 5] \t} } ... } Design...

LeetCode #1136: Parallel Courses. Difficulty: Medium. Topics: Graph Theory, Topological Sort. Asked at Netflix in the last 6 months.

LeetCode #2622: Cache With Time Limit. Difficulty: Medium. Asked at Netflix in the last 6 months.

LeetCode #210: Course Schedule II. Difficulty: Medium. Topics: Depth-First Search, Breadth-First Search, Graph Theory, Topological Sort. Asked at Netflix in the last 6 months.

LeetCode #220: Contains Duplicate III. Difficulty: Hard. Topics: Array, Sliding Window, Sorting, Bucket Sort, Ordered Set. Asked at Netflix in the last 6 months.

LeetCode #981: Time Based Key-Value Store. Difficulty: Medium. Topics: Hash Table, String, Binary Search, Design. Asked at Netflix in the last 6 months.

LeetCode #1146: Snapshot Array. Difficulty: Medium. Topics: Array, Hash Table, Binary Search, Design. Asked at Netflix in the last 6 months.

Problem Statement: Task Scheduler with Buffered Execution You are tasked with designing a system, named TaskProcessor, that efficiently handles a stream of tasks. The system should ensure that when a certain...

## Round 1 - Data Modeling ## Problem You will be asked to design data schemas for three scenarios. For each, discuss table/collection design, relationships, indexes, and tradeoffs. **Scenario 1 — E-Commerce:** Model `Users`, `Products`, `Orders`, and `OrderItems`. Support queries: "all orders for a user" and "total revenue per product in the last 30 days." **Scenario 2 — Event Analytics:** You ingest 100M user click events per day `(user_id, event_type, page, timestamp, metadata JSON)`. How do you model this for fast time-range queries and funnel analysis? **Scenario 3 — Time-Series:** Store hourly sensor readings `(sensor_id, metric, value, recorded_at)` for 10,000 sensors over 5 years. Support: last 24h readings per sensor, and min/max/avg over any time window. ## Follow-ups 1. For Scenario 1, what indexes do you add and why? What is the query plan for "total revenue per product in the last 30 days"? 2. For Scenario 2, why is a row-per-event schema problematic at 100M/day? How do columnar formats (Parquet, BigQuery) help? 3. For Scenario 3, compare storing raw rows vs pre-aggregated hourly/daily summaries. What is the staleness tradeoff? 4. A product team wants to join Scenario 2 (clicks) with Scenario 1 (orders) to compute conversion rate. Describe the pipeline.

## Round 1 - System Design ## Problem Design a contact tracing system used during a disease outbreak. The system must: 1. Record a contact event: `log_contact(person_a, person_b, timestamp, duration_minutes)` — bidirectional. 2. Report exposure risk: `get_exposed(person_id, days_back, exposure_depth)` — return all people who came in contact (directly or through a chain of depth `exposure_depth`) with `person_id` in the last `days_back` days. 3. Mark a person as confirmed positive: `report_positive(person_id, test_date)`. 4. Automatically generate notifications for anyone within exposure_depth=2 of a confirmed positive. Walk through your data model, graph traversal strategy, and scalability concerns. ``` log_contact(A, B, t1, 30) log_contact(B, C, t2, 15) report_positive(A) get_exposed(A, days_back=14, exposure_depth=2) -> {B, C} ``` ## Follow-ups 1. The exposure graph could have millions of nodes. How do you make BFS/DFS efficient? What indexes does your DB need? 2. Privacy is critical — people's contact histories are sensitive. How do you design the system to minimize data retention and exposure? 3. A person is contacted as a potential exposure but later tests negative. How does your notification system handle false positives? 4. How would Bluetooth-based proximity detection (e.g., Apple/Google API) integrate with your backend contact log?

## Round 1 - System Design ## Problem Design a backend system that supports two core features for a video streaming platform: 1. **Video Downloads** — authenticated users can download videos for offline playback. Downloads must expire after 30 days and be DRM-protected. 2. **Creator Subscriptions** — users can subscribe to a creator; subscribers get early access to new uploads and ad-free playback. Cover: API design, data models, storage strategy, and at least one non-trivial scaling concern for each feature. **Clarifying questions to consider:** - What is the expected ratio of downloads to streams? - Should subscription state be strongly consistent or eventually consistent? - How many creators and subscribers are in scope (order of magnitude)? ## Key Components to Discuss - Download token issuance, signed URL generation, and expiry enforcement. - Subscription event fan-out: how to notify millions of subscribers when a creator uploads. - Storage tiering: hot vs. cold for downloaded content. - Preventing download link sharing (device binding, token fingerprinting). ## Follow-ups 1. How do you handle a creator with 10 million subscribers uploading a video — what does fan-out look like? 2. Where does the DRM key service live, and how do you protect it from abuse? 3. How do you ensure a user's downloaded library is consistent across two devices?

See All 11 Questions from This Round

Full question text, answer context, and frequency data for subscribers.

Get Access