OpenAI

OpenAI Software Engineer Phone Screen Questions

22+ questions from real OpenAI Software Engineer Phone Screen rounds, reported by candidates who interviewed there.

22
Questions
8
Topic Areas
10+
Sources

What does the OpenAI Phone Screen round test?

The OpenAI phone screen typically lasts 45-60 minutes and evaluates core Software Engineer fundamentals. Candidates should expect 1-2 algorithmic problems, basic system design discussion at senior levels, and questions about relevant experience. The goal is to confirm technical competence before bringing candidates onsite.

Top Topics in This Round

OpenAI Software Engineer Phone Screen Questions

https://preview.redd.it/zs68z1sczihg1.jpg?width=1080&format=pjpg&auto=webp&s=6e1af82b1628668938f825da14a4a8894e26e045 https://preview.redd.it/ou8tvntizihg1.jpg?width=1080&format=pjpg&a

Just finished my interview, so I'm reminiscing. System design: multi-tenant CI/CD. The question description was the same as one I saw in a previous interview: Design a multi-tenant CI/CD system which

This post was last edited by Anonymous on 2025-09-25 08:49. General SDE: Has anyone encountered this prompt before? This will be a coding interview, and the task will be a smaller-scale version of wha

I crammed for two or three days preparing for the interview, but none of the questions were on the topic, haha. The coding question was an OOD (Object-Oriented Learning) exercise: design a logic for a

Hey guys, I made a post recently about applying to OpenAI, in which I was asking about what I might expect from the phone screen. That\'s here --> https://leetcode.com/discuss/interview-question/5908027/any-idea-what-to-expect-from-new-grad-phone-interview/2713911 I got a...

Hey guys, recently got an interview for OpenAI Internship next summer and do not want to fumble. Was wondering if anyone have done it before and can share your experience or have advice on how I can p

## Round 1 - Coding ## Problem Simulate a turn-based battle between two armies. Each army has a list of units with attack and health values. Units attack in order; a defeated unit is removed. The first army to lose all units loses the battle. ```python def simulate_battle( army_a: list[tuple[int, int]], # [(attack, health), ...] army_b: list[tuple[int, int]] ) -> str: # "A", "B", or "DRAW" # Each round: # Front unit of A attacks front unit of B (reduces health by A's attack) # Front unit of B attacks front unit of A (simultaneous) # Remove units with health <= 0 # Continue until one or both armies are empty. ... ``` ``` Example: army_a = [(10, 20), (5, 15)] army_b = [(8, 25)] Round 1: A[0](10 atk) vs B[0](8 atk) B[0].health = 25-10=15, A[0].health = 20-8=12 Round 2: A[0] vs B[0] B[0].health = 15-10=5, A[0].health = 12-8=4 Round 3: A[0] vs B[0] B[0].health = 5-10=-5 (dead), A[0].health = 4-8=-4 (dead) Both front units die simultaneously; A still has A[1] simulate_battle(...) -> "A" ``` ## Follow-ups 1. How do you handle the case where both front units die simultaneously? 2. How would you add unit abilities (e.g. splash damage, healing)? 3. How would you determine the optimal ordering of your own army to maximize win probability? 4. What data structure best models the queues of units?

## Round 1 - Coding ## Problem You are given a string representing a musical beat sequence using a notation where `'Q'` = quarter note (1 beat), `'H'` = half note (2 beats), `'E'` = eighth note (0.5 beats), and `'R'` = rest (1 beat). Given a measure length in beats, determine if a given sequence exactly fills one measure with no remainder. ```python def is_valid_measure(notation: str, beats_per_measure: int) -> bool: # notation: string of characters from {Q, H, E, R} # Return True if total beats == beats_per_measure exactly ... def parse_beats(notation: str) -> float: # Return total beat count for the notation string ... ``` ``` Example: parse_beats("QQHQ") -> 1+1+2+1 = 5.0 parse_beats("EEEE") -> 0.5*4 = 2.0 parse_beats("HHRR") -> 2+2+1+1 = 6.0 is_valid_measure("QQHQ", 4) -> False # 5 != 4 is_valid_measure("QQRR", 4) -> True # 1+1+1+1 = 4 ``` **Extension:** Given a target measure length, generate all valid notation strings of length exactly `n` characters. ## Follow-ups 1. How would you handle invalid characters in the input string? 2. Can you extend the notation to support dotted notes (e.g. `'Q.'` = 1.5 beats)? 3. How would you validate an entire song (list of measure strings) against a time signature? 4. What parsing approach handles arbitrarily nested groupings (e.g. triplets)?

## Problem Implement a text editor with operations such as insert, delete, undo, and cursor movement. ## Tags strings, stack, coding_other

## Problem Implement a utility that traverses and manipulates a file directory tree, supporting commands like ls, cd, or find. ## Tags strings, binary_tree, recursion

## Round 1 - Coding ## Problem Design a GPU credit system used by an ML platform. Users have a credit balance and submit GPU jobs with an instance type and duration. Different instance types have different costs per hour. Implement the credit management system. ```python GPU_COSTS = { "T4": 0.35, # credits per hour "A100": 3.00, "H100": 8.00, } class GPUCreditSystem: def __init__(self, initial_credits: float): ... def run_job( self, instance_type: str, duration_hours: float ) -> dict: # Returns {"success": bool, "cost": float, "remaining": float} # Deducts cost if sufficient credits; otherwise rejects the job ... def add_credits(self, amount: float) -> float: # Returns new balance ... def get_balance(self) -> float: ... ``` ``` Example: sys = GPUCreditSystem(initial_credits=10.0) sys.run_job("T4", 2.0) -> {"success": True, "cost": 0.70, "remaining": 9.30} sys.run_job("H100", 5.0) -> {"success": False, "cost": 40.0, "remaining": 9.30} ``` ## Follow-ups 1. How would you track a usage history per user for auditing? 2. What concurrency issues arise if two jobs submit simultaneously with insufficient combined credits? 3. How would you implement a soft limit (warn at 20% remaining) vs. a hard limit? 4. How would you handle fractional hours (job runs 90 minutes)?

## Problem Implement a spreadsheet supporting cell references, formula evaluation, and dependency resolution between cells. ## Tags hash_table, graph, dynamic_programming

## Problem Simulate infection spreading through a grid, finding the minimum time for all nodes to become infected. ## Likely LeetCode equivalent LC 994 - rotting-oranges ## Tags graph, matrix, dynamic_programming

## Problem Implement an iterator that generates or traverses IP address ranges in a specified format. ## Tags strings, math

## Problem Implement a priority-based data structure supporting efficient insert, delete, and priority-ordered retrieval. ## Tags heap, hash_table

## Round 1 - Coding ## Problem You are given the following Python function. Identify all code quality issues and produce a refactored version that is correct, readable, and efficient. ```python # Original code (do NOT modify this block - just analyze it) def process(data): result = [] for i in range(len(data)): if data[i] != None: if data[i] > 0: result.append(data[i] * 2) else: if data[i] < 0: result.append(data[i] * -1) else: result.append(0) else: pass return result ``` Issues to identify: 1. Using `!= None` instead of `is not None` 2. Redundant nested `if` (the `else: if x < 0` branch misses nothing since `x == 0` is handled) 3. `for i in range(len(data))` should be `for x in data` 4. `else: pass` is a no-op 5. Manual abs() logic re-implements `abs()` ```python # Expected refactored version: def process(data: list) -> list: return [abs(x) * (2 if x > 0 else 1) for x in data if x is not None] ``` ## Follow-ups 1. Walk through edge cases: empty list, all-None list, list with zeros. 2. How would you write unit tests to verify the refactored version is behaviorally identical? 3. If `data` is a 10M-element list, what memory concerns does a list comprehension introduce vs. a generator? 4. How does type annotation help catch bugs here before runtime?

## Problem Implement a job scheduler that manages task execution order based on priority, deadlines, or dependencies. ## Tags heap, greedy, sorting

## Problem Model and query a social network graph, likely involving friend suggestions, reachability, or shortest path queries. ## Tags graph, hash_table

## Round 1 - SQL ## Problem You are given the following schema and three SQL tasks. For each, either fix the bug or optimize the query. ```sql CREATE TABLE users (user_id INT, name VARCHAR, created_at DATE); CREATE TABLE orders (order_id INT, user_id INT, total DECIMAL, created_at DATE); CREATE TABLE order_items (item_id INT, order_id INT, product_id INT, qty INT, price DECIMAL); ``` **Task 1 - Bug:** The following query is supposed to return users with no orders, but returns all users. Fix it. ```sql SELECT u.user_id, u.name FROM users u LEFT JOIN orders o ON u.user_id = o.user_id WHERE o.user_id != u.user_id; -- BUG ``` **Task 2 - Optimization:** The following query is slow on 50M rows. Rewrite it. ```sql SELECT user_id, SUM(total) FROM orders WHERE YEAR(created_at) = 2024 GROUP BY user_id; ``` **Task 3 - Write from scratch:** Find the top 3 products by revenue in Q1 2024. Revenue = SUM(qty * price) per product. ## Follow-ups 1. For Task 1: why does `WHERE o.user_id IS NULL` correctly find users with no orders after a LEFT JOIN? 2. For Task 2: what index would you create and how does the rewrite avoid a full-table function scan? 3. In Task 3: how do you break ties in the top 3 (by product_id, name)? 4. How would you rewrite Task 3 using a window function instead of GROUP BY + LIMIT?

## Problem Implement persistent string storage supporting manipulation operations like append, slice, and undo across sessions. ## Tags strings, hash_table, coding_other

See All 22 Questions from This Round

Full question text, answer context, and frequency data for subscribers.

Get Access