Sofi

Sofi Software Engineer Phone Screen Questions

17+ questions from real Sofi Software Engineer Phone Screen rounds, reported by candidates who interviewed there.

17
Questions
8
Topic Areas
10+
Sources

What does the Sofi Phone Screen round test?

The Sofi phone screen typically lasts 45-60 minutes and evaluates core Software Engineer fundamentals. Candidates should expect 1-2 algorithmic problems, basic system design discussion at senior levels, and questions about relevant experience. The goal is to confirm technical competence before bringing candidates onsite.

Top Topics in This Round

Sofi Software Engineer Phone Screen Questions

This post was last edited by juliaxx on 2025-09-25 12:10. Given a list of string wordset containing anagrams, and a list of strings containing sentences. You can replace words in sentences with anagra

## Problem Find all starting indices of anagrams of a pattern in a given string. ## Likely LeetCode equivalent Directly matches Find All Anagrams in a String (LC 438). ## Tags strings, sliding_window, hash_table, sofi

## Problem Simulate asteroid collisions in a row: same direction pass through, opposite directions collide with larger surviving. ## Likely LeetCode equivalent Directly matches Asteroid Collision (LC 735). ## Tags stack, arrays, sofi

## Problem Return the top-rated TV shows within each genre from a dataset, likely using grouped sorting or heaps. ## Likely LeetCode equivalent No confident single LC match. ## Tags heap, hash_table, sorting, sofi

## Problem You are given a list of blocked words and a document string. Return the document with every blocked word replaced by asterisks (`*`) of equal length. Matching is case-insensitive; non-alpha characters act as word boundaries. ```python def block_words(blocked: list[str], document: str) -> str: pass ``` **Example:** ``` blocked = ["bad", "spam"] document = "This is a Bad example with spam content." output -> "This is a *** example with **** content." ``` ## Follow-ups 1. How would you handle partial matches, e.g., blocking "bad" should not match "badminton"? 2. If `blocked` contains 100,000 entries, how do you reduce lookup time? (Trie vs. hash set trade-offs.) 3. The document is a 10 GB log stream arriving in chunks. How do you process it without loading it all in memory? 4. Extend the solution to support wildcard patterns like `sp*m`.

## Problem Given a list of `n` integer arrays, return all elements that appear in every array. The result should be sorted in ascending order and contain no duplicates. ```python def common_elements(arrays: list[list[int]]) -> list[int]: pass ``` **Example:** ``` arrays = [[3, 1, 2, 4], [1, 2, 5], [2, 1, 6]] output -> [1, 2] arrays = [[1, 2], [3, 4]] output -> [] ``` ## Approach Convert the first array to a set, then intersect with each subsequent array. Final result sorted. O(n * m) time where m is average array length. ## Follow-ups 1. What if `n` is 10,000 and each array has up to 1 million elements? Discuss memory trade-offs. 2. How do you handle duplicates within a single array -- does an element need to appear multiple times per array to be counted multiple times in the output? 3. Implement without using Python's built-in `set.intersection`. 4. Given the arrays arrive as a stream one at a time, maintain the running intersection incrementally.

## Problem Design a `CreditScoreSystem` that manages user credit scores. Starting score is 700 for every new user. Implement the following operations: - `add_user(user_id)` -- register a new user. - `record_payment(user_id, on_time: bool)` -- on-time payment adds 10 points; missed payment subtracts 40 points. Score is clamped to [300, 850]. - `get_score(user_id) -> int` -- return current score. - `top_k(k: int) -> list[str]` -- return the k users with the highest scores (ties broken by `user_id` lexicographically). ```python class CreditScoreSystem: def add_user(self, user_id: str) -> None: ... def record_payment(self, user_id: str, on_time: bool) -> None: ... def get_score(self, user_id: str) -> int: ... def top_k(self, k: int) -> list[str]: ... ``` **Example:** ``` sys.add_user("alice"); sys.add_user("bob") sys.record_payment("alice", True) # alice -> 710 sys.record_payment("bob", False) # bob -> 660 sys.top_k(1) -> ["alice"] ``` ## Follow-ups 1. How would you make `top_k` efficient if there are 50 million users? 2. Add a `history(user_id) -> list[int]` method returning all score snapshots. What is the storage impact? 3. How do you handle concurrent `record_payment` calls for the same user in a multithreaded context?

## Problem You are given a list of pipeline events. Each event is a tuple `(file_id, stage, status, timestamp)` where `status` is either `"start"` or `"end"`. Compute the total processing time for each file and the per-stage breakdown. ```python def file_processing_time( events: list[tuple[str, str, str, int]] ) -> dict[str, dict[str, int]]: # Returns {file_id: {stage: duration, "total": total_duration}} pass ``` **Example:** ``` events = [ ("f1", "parse", "start", 0), ("f1", "parse", "end", 5), ("f1", "compile", "start", 5), ("f1", "compile", "end", 12), ] output -> {"f1": {"parse": 5, "compile": 7, "total": 12}} ``` ## Follow-ups 1. What if events for different files are interleaved? Does your solution still work? 2. Handle missing `"end"` events -- mark those stages as `"incomplete"`. 3. Stages can overlap (parallel execution). How do you compute wall-clock total vs. CPU-sum total? 4. The events arrive out of timestamp order from distributed workers. How do you handle sorting efficiently?

## Problem Find the top-K most frequently used tags from a collection of posts or items. ## Likely LeetCode equivalent Similar to Top K Frequent Words (LC 692) or Top K Frequent Elements (LC 347). ## Tags heap, hash_table, sofi

## Problem Identify users or accounts with abnormally high transaction frequency within a time window. ## Likely LeetCode equivalent No confident single LC match. ## Tags hash_table, sliding_window, sofi, finance

## Problem You have a sorted list of GIF objects `[{"id": str, "trending_score": int}]`. Implement a `paginate` function that returns a page of results given `offset` and `limit`. If the caller passes `cursor` (the last-seen id) instead of `offset`, resolve the offset from the cursor first. ```python def paginate( gifs: list[dict], offset: int = 0, limit: int = 10, cursor: str | None = None ) -> dict: # Returns {"data": [...], "next_cursor": str | None} pass ``` **Example:** ``` gifs = [{"id": "a", ...}, {"id": "b", ...}, {"id": "c", ...}] paginate(gifs, offset=1, limit=2) -> {"data": [{"id": "b"}, {"id": "c"}], "next_cursor": None} ``` ## Follow-ups 1. Offset-based vs. cursor-based pagination -- when does each break down at scale? 2. What happens if a new GIF is inserted before the cursor position between two page fetches? How do you maintain consistency? 3. Trending score changes frequently. How do you prevent items from appearing twice or being skipped across pages? 4. Implement server-side caching so repeated identical page requests don't re-sort the list.

## Problem Parse log files to extract and aggregate metrics such as error rates, latency percentiles, or request counts. ## Likely LeetCode equivalent No confident single LC match. ## Tags hash_table, strings, sofi, log_parsing

## Problem You have a network of `n` nodes and `m` weighted edges where edge weight represents bandwidth capacity. Find the maximum bandwidth achievable between a source node `s` and destination node `t`, defined as the maximum over all paths of the minimum edge capacity along that path. ```python def max_bandwidth(n: int, edges: list[tuple[int,int,int]], s: int, t: int) -> int: # edges: (u, v, capacity) pass ``` **Example:** ``` n=4, edges=[(0,1,3),(1,2,1),(0,2,4),(2,3,2)], s=0, t=3 Paths: 0->1->2->3 bottleneck=1; 0->2->3 bottleneck=2 output -> 2 ``` ## Approach Modified Dijkstra or Kruskal. With Kruskal: sort edges by capacity descending, union-find until s and t are connected -- that edge's capacity is the answer. O(m log m). ## Follow-ups 1. How does this differ from standard shortest-path? Which data structures change? 2. The graph has 10 million edges. What is the memory-efficient approach? 3. Edges now have both capacity and latency. Find the path maximizing bandwidth subject to latency <= L. 4. How does the answer change if you can split flow across multiple paths (max-flow)?

## Problem Given an unsorted array of integers, find the pair `(a, b)` such that `|a - b|` is minimized. Return the difference. If multiple pairs tie, return any one. ```python def min_difference_pair(nums: list[int]) -> tuple[int, int, int]: # Returns (a, b, difference) pass ``` **Example:** ``` nums = [4, 9, 1, 32, 13] sorted -> [1, 4, 9, 13, 32] output -> (4, 9, 5) # or (9, 13, 4) -- minimum diff is 4 nums = [5, 5, 3] output -> (5, 5, 0) ``` ## Approach Sort the array, then scan adjacent pairs. O(n log n) time, O(1) extra space. ## Follow-ups 1. Can you solve this in O(n) time if the elements are bounded integers in range [0, M]? 2. Return all pairs that achieve the minimum difference, not just one. 3. Extend to a 2D version: given n points on a number line, find the pair of points with minimum Manhattan distance. 4. What if the array is already sorted? What is the optimal algorithm then and why?

## Problem Reverse a singly linked list iteratively or recursively. ## Likely LeetCode equivalent LeetCode 206 - Reverse Linked List. ## Tags linked_list, recursion

## Problem You are given a royal family tree. Succession follows pre-order traversal (depth-first, left-to-right). Implement two methods: - `birth(parent, child)` -- add a child to a parent. - `death(name)` -- mark a person as deceased. - `order() -> list[str]` -- return the current line of succession (pre-order, excluding deceased). ```python class RoyalSuccession: def birth(self, parent: str, child: str) -> None: ... def death(self, name: str) -> None: ... def order(self) -> list[str]: ... ``` **Example:** ``` rs.birth("King", "Alice") rs.birth("King", "Bob") rs.birth("Alice", "Carol") rs.death("Alice") rs.order() -> ["King", "Carol", "Bob"] # Alice is dead but Carol (her child) still inherits before Bob ``` ## Follow-ups 1. Why is pre-order (not in-order or post-order) the right traversal for succession? 2. How do you efficiently retrieve `order()` if `birth` and `death` are called millions of times? 3. Add `isKing(name) -> bool` that returns True if the given person is first in the succession line. 4. Handle the case where the root (current monarch) dies -- who automatically becomes monarch?

## Problem Find the shortest substring satisfying some constraint, such as containing all required characters. ## Likely LeetCode equivalent Similar to LC 76 Minimum Window Substring but not confirmed as identical. ## Tags strings, sliding_window, two_pointers

See All 17 Questions from This Round

Full question text, answer context, and frequency data for subscribers.

Get Access