AI Coding Assistant Measurement & Developer Productivity
- Led product measurement for Meta's Code Llama-powered coding assistant.
- Defined and tracked suggestion acceptance, edit distance, compile and test pass rates, task completion time, latency, and session productivity.
- Diagnosed latency spikes, low-confidence suggestions, and context-window truncation in partnership with engineering and research.
- Prototyped Python simulations for alternative suggestion-ranking strategies.
- Contributed to a 9% improvement in developer efficiency and about 14 minutes saved per engineer per day.
Developer Efficiency & Coding Capacity Dashboard
- Defined North Star metrics including session productivity, net time saved, and coding capacity utilization.
- Established guardrail metrics covering latency, regression rate, and error rate.
- Designed telemetry events including suggestion_shown, suggestion_accepted, suggestion_edited, suggestion_deleted, compile_triggered, and test_passed.
- Built a self-serve dashboard that reduced ad hoc requests by roughly 40%.
AI Model Offline Evaluation Framework (GenAI Evals)
- Architected a 0→1 offline evaluation system combining LM-as-a-judge scoring, human annotation, and failure-mode analysis.
- Drove 10%+ model quality improvement and extended the framework to Meta AI Search.
Large-Scale A/B Testing & Developer Product Experimentation
- Designed A/B tests and staged rollouts with CUPED variance reduction.
- Supported a 20% DAU uplift on Meta AI Search across tens of millions of users.
Causal ML & Incrementality Measurement
- Built uplift modeling systems for 10M+ advertisers, resulting in $300M+ incremental revenue.
- Drove $100M+ annual incremental revenue attribution across Meta's ads ecosystem.