point72 hackerrank pyspark data engineer online assessment experience
Interview Experience
刚刚做完了一场 HackerRank 笔试(是 Point72 的 OA),题目逻辑虽然不是地狱难度,但是非常考验对细节的把控,尤其是字符串拼接和格式化。在此给大家分享一下题目和避坑经验,希望能帮到接下来要考的同学!
---题目名称: SQL: Detection of Fraud Trends on the CryptoCoin Cryptocurrency Exchange
考察语言: PostgreSQL
核心需求: 找出同一发送者(sender)到同一接收者(recipient)之间发生 2 次及以上交易的序列,并按发送者进行聚合。
要求最终输出三列:
sender
复制代码
: 发送者钱包地址。
period
复制代码
: 该发送者首次和末次交易的时间段(格式要求严格)。
transactions
复制代码
: 将该发送者的交易金额降序排列并拼接。如果总交易数 $\le$ 3,直接拼接并求和(如
A + B + C = Sum
复制代码
);如果 $>$ 3,只显示前三大金额,剩下的折叠并求和(如
A + B + C + .. X more = Sum
复制...
Full Details
刚刚做完了一场 HackerRank 笔试(是 Point72 的 OA),题目逻辑虽然不是地狱难度,但是非常考验对细节的把控,尤其是字符串拼接和格式化。在此给大家分享一下题目和避坑经验,希望能帮到接下来要考的同学!
---题目名称: SQL: Detection of Fraud Trends on the CryptoCoin Cryptocurrency Exchange
考察语言: PostgreSQL
核心需求: 找出同一发送者(sender)到同一接收者(recipient)之间发生 2 次及以上交易的序列,并按发送者进行聚合。
要求最终输出三列:
sender
复制代码
: 发送者钱包地址。
period
复制代码
: 该发送者首次和末次交易的时间段(格式要求严格)。
transactions
复制代码
: 将该发送者的交易金额降序排列并拼接。如果总交易数 $\le$ 3,直接拼接并求和(如
A + B + C = Sum
复制代码
);如果 $>$ 3,只显示前三大金额,剩下的折叠并求和(如
A + B + C + .. X more = Sum
复制代码
)。最后按总金额降序排列。💡 核心解题思路这道题用 CTE(Common Table Expressions)分步拆解最清晰:
先
GROUP BY sender, recipient HAVING COUNT(*) >= 2
复制代码
找出符合要求的“嫌疑交易对”。
把原表和嫌疑对
JOIN
复制代码
,过滤出有效的交易明细。
用窗口函数
ROW_NUMBER() OVER(PARTITION BY sender ORDER BY amount DESC, dt ASC)
复制代码
给每个发送者的金额打个排序的标。
按
sender
复制代码
聚合,计算出
MIN(dt)
复制代码
、
MAX(dt)
复制代码
、
COUNT(*)
复制代码
和
SUM(amount)
复制代码
。
针对前三大金额(
rnk <= 3
复制代码
),用
STRING_AGG
复制代码
进行降序拼接。
最后用
CASE WHEN
复制代码
将所有字符串按照要求的格式缝合在一起。🚨 究极避坑指南(全都是血泪教训)这道题极容易遇到
Wrong Answer
复制代码
,大多数情况不是你逻辑写错了,而是掉进了以下四个坑:
- 数字格式化与“幽灵空格”(PostgreSQL 专属坑)
题目要求所有浮点数必须保留 6 位小数(如
8.000000
复制代码
)。在 Postgres 中如果直接用
TO_CHAR(amount, '9999999990.000000')
复制代码
,默认会在数字前面保留一个前导空格(用来放正负号的)。这会导致和 Expected Output 对不上!
👉 解法: 必须在格式化字符串前面加上
FM
复制代码
(Fill Mode)前缀,写成
'FM9999999990.000000'
复制代码
来强制去除多余空格。
- 隐藏的时间截取需求
原始数据里的日期
dt
复制代码
是
VARCHAR(19)
复制代码
类型(长这样
2022-07-14 03:06:10
复制代码
)。但如果你仔细看 Expected Output,
period
复制代码
这一列根本不需要日期,只需要时分秒(如
00:17:47 - 23:53:49
复制代码
)。
👉 解法: 拼接时必须用
RIGHT(dt, 8)
复制代码
把
HH:MM:SS
复制代码
单独抠出来。
- 魔鬼在细节:“两个点”还是“三个点”?
在处理超过 3 笔交易的拼接逻辑时,题目要求的后缀省略号是 两个点(
-
.. X more
复制代码
),而不是通常以为的三个点( -
... X more
复制代码
)!肉眼看极容易错过,查半天不知道哪里出错了。 -
HackerRank 的 UI 视觉错觉(千万别被骗了)
跑完测试后,你可能看到自己的输出只有 4 行,而 Expected Output 有 9 行,第一反应肯定是“我是不是漏掉了数据?”。
绝对没有! 这是因为 HackerRank 的标准输出控制台把你的结果画成了一个巨大的 ASCII 表格(带
|
复制代码
和
-
复制代码
边框),导致垂直空间被撑爆,它只在屏幕上显示了前几行。旁边其实有个隐蔽的滚动条。只要你第一行的聚合总数和目标输出严丝合缝,行数就一定没问题。
第二道题目要求处理两张表:
eligibility
复制代码
(资格表,包含个人信息)和
medical
复制代码
(医疗账单表)。主要考察了以下五个方法:
init_spark_session
复制代码
(初始化 Session)
考点:Spark 基础。需配置 master 为
local
复制代码
并准确设置 AppName。
filter_medical
复制代码
(数据过滤)
考点:高效的表间过滤。
最佳实践:推荐使用
left_semi
复制代码
join。这是一种专门用于“存在性检查”(Existence Check)的 join,它只保留左表(medical)中在右表(eligibility)有匹配项的行,既不会产生重复数据,也不会把右表多余的列带过来,性能最优。
generate_full_name
复制代码
(字段丰富与拼接)
考点:字符串操作与表关联。
最佳实践:需要先 drop 掉原来的空列,通过关联 eligibility 表,使用
F.concat_ws(" ", F.col("firstName"), F.col("lastName"))
复制代码
拼接姓名,最后 select 回指定的 Schema。
find_max_paid_member
复制代码
(极值查找)
考点:排序取 Top 1 及类型安全。
最佳实践:对金额排序前,务必使用
.cast("double")
复制代码
。因为 CSV 读进来的数据默认可能是 String 类型,直接按 String 排序会导致
99.0
复制代码
大于
100.0
复制代码
这种严重的逻辑错误。
find_total_paid_amount
复制代码
(聚合计算)
考点:全表聚合与标量(Scalar)提取。
最佳实践:使用
F.sum()
复制代码
计算后,需要通过
.collect()[0][0]
复制代码
从 DataFrame 中提取出具体的 Python 数值,并转换为整型。祝大家找工顺利,Offer 多多!
About This Question
This is a candidate experience report from a point72 interview for a data science role (newgrad level) during the oa round reported in 2026.
More Point72 Interview Questions
About Point72 Interview Reports
This question was reported by a candidate who interviewed at Point72. LeakCode aggregates interview reports from 10+ sources, including 1Point3Acres, Glassdoor, LeetCode Discuss, Blind, Reddit, Indeed, and Nowcoder. Each report is translated where necessary, deduplicated against existing entries, and tagged by company, role, round type, and reporting date.
Use this question as one calibration data point, not a memorization target. Companies typically rotate their question pools every 2-4 months; the exact wording of a 2024 question may differ from what you encounter today. The underlying pattern, difficulty level, and follow-up depth at Point72 are the higher-signal extractions to take from this report.
For broader preparation context, the Point72 interview process typically includes a recruiter screen, one or two technical phone screens, and a 4-5 round on-site loop covering coding, system design (at L4+ levels), and behavioral. Reports tagged on LeakCode show the round-by-round distribution and typical difficulty calibration. To browse questions filtered by round type and seniority, use the company hub linked above.
How To Practice This Type of Question
Solve similar problems on LeetCode under timed conditions (25-35 minutes per medium difficulty). The goal is pattern recognition: recognize the underlying technique (sliding window, two-pointer, BFS, memoized recursion, etc.) within 60-90 seconds of reading. Strong candidates verbalize their hypothesis out loud before coding, then iterate based on feedback. Weak candidates dive into implementation immediately, lose time on the wrong approach, and run out of time for follow-ups.
Companies update their question pools every 2-4 months. The exact wording of any given question may have been retired by the time you interview. Focus your prep on the pattern, not the specific problem. The patterns that appear in Point72 reports consistently are the ones worth investing in; one-off niche problems are not.
During Your Point72 Round
Apply the standard interview round template: clarify requirements (2-3 minutes), state your approach out loud and confirm direction with the interviewer (3-5 minutes), code with narration (15-25 minutes), test with concrete examples including edge cases (5 minutes), discuss optimization or trade-offs if time permits (5 minutes). This template is universally accepted across FAANG and adjacent companies; deviating from it produces weaker interviewer feedback signal.
The single most predictive failure mode in Point72 reports tagged "no hire": not asking clarifying questions. Interviewers are explicitly trained to weight this. Strong candidates ask 3-5 clarifying questions even on problems that look obvious; weak candidates dive into code immediately. The clarifying-question check is often the first signal recorded in the interviewer's written notes.