DoorDash

DoorDash Data Scientist Phone Screen Questions

4+ questions from real DoorDash Data Scientist Phone Screen rounds, reported by candidates who interviewed there.

4
Questions
3
Topic Areas
10+
Sources

What does the DoorDash Phone Screen round test?

The DoorDash phone screen typically lasts 45-60 minutes and evaluates core Data Scientist fundamentals. Candidates should expect 1-2 algorithmic problems, basic system design discussion at senior levels, and questions about relevant experience. The goal is to confirm technical competence before bringing candidates onsite.

Top Topics in This Round

DoorDash Data Scientist Phone Screen Questions

## Round 1 - Data Science Case Study ## Problem DoorDash is piloting a bike courier program ("Biker Dashers") in a dense urban market. You have one week of delivery data. Evaluate whether the program is performing well and recommend whether to expand, hold, or cut the program. **Data Available:** - `deliveries(delivery_id, dasher_id, dasher_type, pickup_time, delivery_time, distance_km, order_value, tip)` - `dashers(dasher_id, dasher_type, city, start_date)` **Your analysis should cover:** 1. **Efficiency metrics:** Average delivery time by dasher type (bike vs. car). Is bike faster for short distances? 2. **Coverage gap:** What fraction of deliveries are beyond typical bike range (e.g. > 3 km)? 3. **Economics:** Compare earnings-per-hour for bike vs. car dashers including tips. Are bikers fairly compensated? 4. **Customer impact:** Is there a significant difference in delivery time variance? Do customers of bikers rate orders differently? ``` Sample insight: Bike Dashers: avg 18 min, car Dashers: avg 24 min for distances < 2km Bike coverage: 68% of all deliveries in pilot zone are < 2km -> Recommendation: expand in high-density zones ``` ## Follow-ups 1. What confounding variables might make bike dashers look artificially faster? 2. How would you run an A/B test to cleanly isolate the impact of bike vs. car assignment? 3. How would weather data change your analysis? 4. What guardrail metrics would you monitor to catch problems early in an expansion?

## Round 1 - Data Science Case Study ## Problem DoorDash receives a spike in customer complaints about food arriving cold. You are asked to diagnose the root cause and propose data-driven solutions. **Data Available:** - `deliveries(delivery_id, dasher_id, restaurant_id, pickup_time, delivery_time, food_prep_ready_at, distance_km, weather)` - `ratings(delivery_id, overall_score, food_temp_score, comments)` - `restaurants(restaurant_id, category, avg_prep_time_min)` **Your analysis should cover:** 1. **Wait time at restaurant:** How long does the dasher wait after food is ready? Longer waits = colder food at pickup. 2. **Transit time:** Is delivery distance or traffic the dominant factor? 3. **Segmentation:** Are certain restaurant categories (e.g. sushi, pizza) more affected? 4. **Correlation:** Does `food_temp_score` correlate with `(delivery_time - food_prep_ready_at)`? ``` Key metric: "food exposure time" = delivery_time - food_prep_ready_at Hypothesis: food_temp_score drops ~0.3 points per 5 extra minutes of exposure ``` ## Follow-ups 1. How would you distinguish "cold at pickup" vs. "cooled during transit"? 2. What operational levers does DoorDash have to reduce cold food? Which would you prioritize? 3. How would you design an experiment to test whether insulated bags reduce cold complaints? 4. What metric would you track as a leading indicator before customers complain?

## Round 1 - SQL ## Problem You have the following DoorDash schema: ```sql CREATE TABLE customers (customer_id INT, name VARCHAR, city VARCHAR, joined_date DATE); CREATE TABLE orders (order_id INT, customer_id INT, restaurant_id INT, order_total DECIMAL, placed_at TIMESTAMP); ``` **Task 1:** Find all customers who placed orders on at least 5 distinct days in the past 30 days. **Task 2:** For each city, find the customer with the highest number of orders this month. Break ties by customer_id ascending. **Task 3:** Find customers who ordered from the same restaurant at least 3 times. Return customer_id, restaurant_id, and order_count. ```sql -- Task 1 skeleton: SELECT customer_id FROM orders WHERE placed_at >= CURRENT_DATE - INTERVAL '30 days' GROUP BY customer_id HAVING COUNT(DISTINCT DATE(placed_at)) >= 5; ``` ```sql -- Task 2 skeleton: WITH ranked AS ( SELECT c.city, o.customer_id, COUNT(*) AS order_count, RANK() OVER (PARTITION BY c.city ORDER BY COUNT(*) DESC, o.customer_id ASC) AS rnk FROM orders o JOIN customers c USING (customer_id) WHERE DATE_TRUNC('month', o.placed_at) = DATE_TRUNC('month', CURRENT_DATE) GROUP BY c.city, o.customer_id ) SELECT city, customer_id, order_count FROM ranked WHERE rnk = 1; ``` ## Follow-ups 1. In Task 1, why use `COUNT(DISTINCT DATE(placed_at))` instead of `COUNT(*)`? 2. In Task 2, what is the difference between `RANK()` and `ROW_NUMBER()` for tie-breaking? 3. For Task 3, how would you find customers who ordered the same restaurant on consecutive days? 4. How would you optimize these queries on a 500M-row orders table?

## Round 1 - Data Science Case Study ## Problem DoorDash's order volume dropped 15% overnight in one metro area. You are the analyst on call. Walk through your full investigation. **Data Available:** - `orders(order_id, customer_id, restaurant_id, dasher_id, placed_at, status, order_total)` - `restaurants(restaurant_id, name, category, is_active, city)` - `app_events(event_id, user_id, event_type, platform, occurred_at)` **Step 1 - Scope the drop:** - Is the drop on all platforms or just iOS/Android/web? - Is it all restaurant categories or specific ones? - Is it all customer cohorts (new, returning) or one? **Step 2 - Check data integrity:** - Is the drop real or a logging failure? Check if `app_events` also dropped. **Step 3 - Hypothesize causes:** - Competitor promotion launched - App deployment introduced checkout bug - Major restaurant chain went offline - Extreme weather event **Step 4 - Validate and quantify:** ```sql SELECT restaurant_id, COUNT(*) AS orders FROM orders WHERE placed_at BETWEEN '2024-06-01' AND '2024-06-02' GROUP BY restaurant_id ORDER BY orders; ``` ## Follow-ups 1. How would you determine if the drop is seasonal vs. an anomaly? 2. What would you look at if order volume is fine but revenue dropped? 3. How do you communicate your findings to stakeholders before you have a root cause? 4. After identifying the cause, what process do you put in place to catch this faster next time?

See All 4 Questions from This Round

Full question text, answer context, and frequency data for subscribers.

Get Access