Jane Street

Jane Street Software Engineer Phone Screen Questions

21+ questions from real Jane Street Software Engineer Phone Screen rounds, reported by candidates who interviewed there.

21
Questions
8
Topic Areas
10+
Sources

What does the Jane Street Phone Screen round test?

The Jane Street phone screen typically lasts 45-60 minutes and evaluates core Software Engineer fundamentals. Candidates should expect 1-2 algorithmic problems, basic system design discussion at senior levels, and questions about relevant experience. The goal is to confirm technical competence before bringing candidates onsite.

Top Topics in This Round

Jane Street Software Engineer Phone Screen Questions

It's basically writing the algorithm on paper. The problem is essentially a rate-limiting problem. Given a list of IP access log lines with IP address and timestamp, flag the IP addresses that access

Last month, I had my first-round interview at Jane Street, similar to the first round for an SDE intern. The main question was a simulation: simulating a 2D chessboard with two types of pieces. The fi

Has anyone here had an onsite interview at Jane Street for a software engineer position and can comment on the difficulty vs. other big tech companies? There's plenty of information on glassdoor about

## Problem Arithmetic coding compresses a message into a single number in `[0, 1)` based on character probabilities. Given a symbol frequency table and a message, implement an encoder and decoder. ```python class ArithmeticCoder: def __init__(self, frequencies: dict[str, int]): ... def encode(self, message: str) -> float: # returns a number in [0,1) def decode(self, encoded: float, length: int) -> str: ``` ## Example ``` frequencies = {"a": 3, "b": 2, "c": 1} # total = 6 # Probability ranges: # a: [0, 0.5) # b: [0.5, 0.833) # c: [0.833, 1.0) Encode "ab": Start: [0, 1) After 'a': [0, 0.5) After 'b': [0 + 0.5*0.5, 0 + 0.5*0.833) = [0.25, 0.4165) Output: any value in [0.25, 0.4165), e.g. 0.3 coder.decode(0.3, 2) -> "ab" ``` ## Follow-ups 1. Floating-point precision limits the message length. How does integer arithmetic coding solve this? 2. How does arithmetic coding compare to Huffman coding in compression ratio? 3. How do you handle symbols not seen in the frequency table (zero-probability problem)?

## Problem Parse and evaluate an arithmetic expression string, respecting operator precedence without a built-in eval. ## Likely LeetCode equivalent Similar to Basic Calculator II (LC 227) or Expression Evaluation variants. ## Tags stack, strings, math, jane_street

## Problem Build a JavaScript/TypeScript form validation library. Each field has one or more validation rules. Rules are composable — validators can be chained. When `validate()` is called, return all field errors. ```typescript type Rule = (value: string) => string | null; // null = pass, string = error message class FormValidator { addField(name: string, ...rules: Rule[]): this; validate(data: Record<string, string>): Record<string, string[]>; // field -> errors } // Built-in rules const required: Rule; const minLength = (n: number): Rule => ...; const matches = (pattern: RegExp, msg: string): Rule => ...; ``` ## Example ```typescript const validator = new FormValidator() .addField("email", required, matches(/@/, "must contain @")) .addField("password", required, minLength(8)); validator.validate({ email: "notanemail", password: "short" }); // -> { email: ["must contain @"], password: ["must be at least 8 characters"] } validator.validate({ email: "[email protected]", password: "securepass" }); // -> {} ``` ## Follow-ups 1. How do you support async validators (e.g., checking if a username is taken via API)? 2. How do you implement cross-field validation (e.g., "password" must match "confirm_password")? 3. How would you integrate this with React controlled components?

## Problem Implement a code folding engine for a source code editor. Given source code as a list of lines, detect foldable regions (blocks delimited by `{` and `}`, or by indentation for Python-style code), and support fold/unfold operations. ```python class CodeFolder: def __init__(self, lines: list[str], style: str = "brace"): # style: "brace"|"indent" def get_foldable_ranges(self) -> list[tuple[int, int]]: # (start_line, end_line) def fold(self, start_line: int) -> list[str]: # returns visible lines def unfold(self, start_line: int) -> list[str]: def fold_all(self) -> list[str]: ``` ## Example ``` lines = [ "function foo() {", # line 0 - foldable range (0, 3) " if (x) {", # line 1 - foldable range (1, 2) " return x;", # line 2 " }", # line 3 "}", # line 4 ] get_foldable_ranges() -> [(0, 4), (1, 3)] fold(0) -> ["function foo() { ... }", "}"] ``` ## Follow-ups 1. How do you handle mismatched braces in malformed source code? 2. What data structure efficiently tracks nested folds as the user types? 3. How would you extend this to support language-specific fold points (e.g., `if/else`, `try/catch`)?

## Round 1 - System Design ## Problem Design a multi-asset exchange platform where users can trade stocks, ETFs, and crypto. The system must route incoming orders to the correct order book per asset, match orders, manage user balances, and produce trade confirmations. ## Core Components - **Order Router**: parses incoming orders (`{user_id, asset, side, price, qty, type}`), routes to the correct order book instance. - **Order Book**: per-asset, price-time priority matching (same as a single matching engine). - **Balance Ledger**: tracks per-user per-asset holdings; validates sufficient balance before accepting an order. - **Trade Confirmation Service**: emits trade events to users via WebSocket. - **Risk Engine**: pre-trade checks (margin, position limits, wash-trade detection). ## Follow-ups 1. How do you prevent double-spend races when a user submits two orders simultaneously? 2. How do you design the balance ledger for consistency — is optimistic locking or two-phase commit better? 3. What is your failover strategy if one order book node crashes mid-match? 4. How do you replay all trades from a checkpoint to rebuild balances after a crash?

## Problem Design a `GamepadTracker` that records sequences of button/axis inputs with timestamps and can replay them to check whether a recorded combo was input within a timing window. Support: ```python class GamepadTracker: def record(self, input_event: dict) -> None: # event = {"type": "button"|"axis", "id": str, "value": float, "ts": int} def detect_combo( self, combo: list[dict], # sequence of events to match window_ms: int # all events must occur within this window ) -> bool: def get_recent(self, duration_ms: int) -> list[dict]: ``` ## Example ``` tracker.record({"type": "button", "id": "A", "value": 1, "ts": 1000}) tracker.record({"type": "button", "id": "B", "value": 1, "ts": 1050}) tracker.record({"type": "button", "id": "A", "value": 1, "ts": 1200}) combo = [{"id": "A"}, {"id": "B"}] tracker.detect_combo(combo, window_ms=200) # -> True (A at 1000, B at 1050: both within 200ms) ``` ## Follow-ups 1. How do you handle axis inputs where the value must exceed a threshold (e.g., joystick > 0.8)? 2. What data structure enables efficient sliding-window combo detection as new events arrive? 3. How would you extend this to support approximate matching (one missed input allowed)?

## Problem Design a game board that is conceptually infinite in all directions. Players and entities occupy cells on the grid. The board must support: ```python class InfiniteBoard: def place(self, entity_id: str, x: int, y: int) -> None: def move(self, entity_id: str, dx: int, dy: int) -> bool: # False if cell occupied def get_position(self, entity_id: str) -> tuple[int, int] | None: def get_entities_in_range(self, x: int, y: int, radius: int) -> list[str]: def get_neighbors(self, entity_id: str) -> list[str]: # adjacent 8 cells ``` ## Example ``` board.place("hero", 0, 0) board.place("goblin", 1, 0) board.move("hero", 1, 0) -> False # goblin is there board.move("goblin", 5, 5) board.move("hero", 1, 0) -> True # now clear board.get_entities_in_range(0, 0, 10) # -> ["hero", "goblin"] (goblin at (6,5) is within radius 10 of (0,0)) ``` ## Follow-ups 1. What data structure (hashmap, quadtree, spatial hash) is best for this use case? Under what entity densities? 2. How do you implement chunk-based loading as entities explore new areas? 3. How would you synchronize board state across multiple game server nodes?

## Round 1 - System Design ## Problem Design a live streaming platform (think Twitch/YouTube Live) supporting: - Streamers push video to an ingest server via RTMP - Viewers watch the stream with <5 second latency at any scale - Real-time chat: millions of viewers sending messages to the same stream - Live viewer count updated every second ## Key Design Areas - **Ingest and transcoding**: RTMP ingest -> transcode to HLS/DASH at multiple bitrates -> push to CDN edge - **Delivery**: adaptive bitrate over HLS; CDN handles viewer scale - **Chat**: fan-out problem — one message to N viewers. Use a pub-sub bus (Kafka/Redis pub-sub) per stream; clients subscribe via WebSocket to a chat gateway. - **Viewer count**: approximate real-time count using HyperLogLog or counter per server aggregated every second. ## Follow-ups 1. The stream suddenly gets 5M concurrent viewers — what breaks first and how do you recover? 2. How do you enforce chat moderation rules (rate limits, banned words) without adding latency? 3. How do you store the VOD (video on demand) after the stream ends? 4. How do you handle a streamer going live from a very low-bandwidth connection?

## Problem Design a multi-level or multi-tenant cache system with configurable eviction and access policies. ## Likely LeetCode equivalent Extends LRU Cache (LC 146) to multi-level design. ## Tags hash_table, design, jane_street, cache

## Problem You receive order book updates from two market data feeds (Feed A and Feed B) for the same instrument. Each update is `(timestamp, side, price, quantity)`. Implement a comparator that detects discrepancies: same-timestamp updates where the two feeds disagree on price levels or quantities. ```python class OrderBookComparator: def ingest_a(self, update: dict) -> None: # {"ts": int, "side": str, "price": float, "qty": float} def ingest_b(self, update: dict) -> None: def get_discrepancies( self, tolerance_ms: int = 50, qty_tolerance: float = 0.01 ) -> list[dict]: ``` ## Example ``` comp.ingest_a({"ts": 1000, "side": "bid", "price": 100.0, "qty": 10.0}) comp.ingest_b({"ts": 1000, "side": "bid", "price": 100.0, "qty": 9.5}) comp.get_discrepancies(qty_tolerance=0.1) # -> [{"ts": 1000, "side": "bid", "price": 100.0, "feed_a_qty": 10.0, "feed_b_qty": 9.5}] # qty diff = 0.5 > 0.1 tolerance ``` ## Follow-ups 1. Feed B often lags by 20-100ms — how do you match updates across feeds with timing skew? 2. What ring-buffer or sliding-window data structure best supports this real-time comparison? 3. How do you alert on systematic bias (Feed B is always lower) vs. random noise?

## Problem Simulate the Snake game where the snake grows on eating food and the game ends on self-collision or boundary hit. ## Likely LeetCode equivalent Directly matches Design Snake Game (LC 353). ## Tags hash_table, queue, matrix, jane_street

## Problem Implement a stack-based language interpreter, processing push/pop operations and arithmetic instructions. ## Likely LeetCode equivalent Similar to Evaluate Reverse Polish Notation (LC 150). ## Tags stack, jane_street, interpreter

## Problem Implement a streaming RLE (run-length encoding) compressor and decompressor that processes data chunk by chunk — input may arrive in variable-size chunks and must be processed incrementally without buffering the full input. ```python class StreamingRLE: def __init__(self, mode: str): # "compress" or "decompress" def feed(self, chunk: bytes) -> bytes: # returns compressed/decompressed bytes so far def flush(self) -> bytes: # finalize and return remaining bytes ``` Encoding format: `[count: 1 byte][byte_value: 1 byte]` for runs of 1-255. Runs longer than 255 are split. ## Example ``` compressor = StreamingRLE("compress") compressor.feed(b"AAABBC") -> b"\x03A\x02B" # C not yet flushed (run might continue) compressor.feed(b"CCC") -> b"" # accumulating CCCC run compressor.flush() -> b"\x04C" # 1+3=4 C's total decompressor = StreamingRLE("decompress") decompressor.feed(b"\x03A\x02B\x04C") -> b"AAABBCCCC" ``` ## Follow-ups 1. What is the worst-case output size for RLE, and what inputs trigger it? 2. How would you extend this to LZ77 streaming compression? 3. How do you design the chunk boundary handling to avoid emitting partial runs mid-chunk?

## Problem Implement a line-level diff tool similar to Unix `diff`. Given two lists of strings (lines of text), compute the minimal edit script (insertions and deletions) to transform one into the other. Output in unified diff format. ```python def compute_diff( original: list[str], modified: list[str] ) -> list[str]: # unified diff lines pass ``` ## Example ``` original = ["foo", "bar", "baz"] modified = ["foo", "qux", "baz"] compute_diff(original, modified) # Output: # @@ -1,3 +1,3 @@ # foo # -bar # +qux # baz ``` ## Follow-ups 1. What algorithm computes the minimum edit distance efficiently? What is its time and space complexity? 2. How does the Myers diff algorithm improve on naive LCS-based approaches? 3. How would you extend this to word-level or character-level diffs within changed lines? 4. How do you handle files with thousands of lines efficiently — can you avoid an O(n^2) DP table?

## Problem Implement a string matching game engine. The system selects a secret word; the player submits guesses. Each guess receives per-character feedback: - `'G'` (green) — correct character, correct position - `'Y'` (yellow) — character in word, wrong position - `'X'` (gray) — character not in word Multiple occurrences are handled carefully: yellows and greens together should not exceed the true count of that character in the secret. ```python class StringMatchingGame: def __init__(self, secret: str): ... def guess(self, word: str) -> list[str]: # list of 'G'/'Y'/'X' per character def is_solved(self) -> bool: def attempts(self) -> int: ``` ## Example ``` game = StringMatchingGame("apple") game.guess("paper") # secret: a p p l e # guess: p a p e r # -> ['Y','Y','G','Y','X'] # p: in word (Y), a: in word (Y), p: correct pos (G), e: in word wrong pos (Y), r: not in word (X) game.guess("apple") -> ['G','G','G','G','G'] game.is_solved() -> True ``` ## Follow-ups 1. Walk through the exact algorithm for handling duplicate characters correctly. 2. How would you implement an optimal solver (information-theory based, minimizing expected guesses)? 3. How do you validate that the guess is a real word — what data structure for O(1) lookup?

## Problem Simulate a supermarket with `k` checkout lanes. Customers arrive at given times with a number of items. Each item takes 1 unit of time to scan; no setup time between customers. Each customer joins the lane with the fewest customers currently in it (ties broken by lane index). Return the time each customer finishes checking out. ```python def simulate_queues( customers: list[tuple[int, int]], # (arrival_time, num_items) k: int # number of lanes ) -> list[int]: # finish time per customer (same order) pass ``` ## Example ``` Input: customers=[(0,3),(0,4),(2,2),(4,1)], k=2 Customer 0: arrives 0, joins lane 0 (both empty), finishes at 3 Customer 1: arrives 0, joins lane 1, finishes at 4 Customer 2: arrives 2, joins lane 0 (1 customer remaining), finishes at max(3,2)+2=5 Customer 3: arrives 4, joins lane 0 (done at 5) or lane 1 (done at 4) -> lane 1, finishes at max(4,4)+1=5 Output: [3, 4, 5, 5] ``` ## Follow-ups 1. What is the time complexity of your solution? How does it change with `k` and `n`? 2. How would you extend this to model express lanes (items < 10) and self-checkout lanes? 3. If customers can observe queue lengths and switch lanes, how does that change the simulation?

## Problem Implement the backend for a text editor supporting insert, delete, and cursor navigation operations efficiently. ## Likely LeetCode equivalent Similar to Design a Text Editor (LC 2296). ## Tags strings, stack, linked_list, jane_street, editor

See All 21 Questions from This Round

Full question text, answer context, and frequency data for subscribers.

Get Access