Staff Data Scientist

|

Product analytics, experimentation, causal inference, and AI developer-tool measurement at scale.

Staff Data Scientist with 10 years of experience in product analytics, experimentation, and causal inference, including leading measurement for Meta's internal AI coding assistant used by 2,000+ engineers. Built 0→1 developer productivity metric systems, telemetry instrumentation, self-serve dashboards, and offline/online evaluation frameworks for coding models. Deep expertise in SQL, Python, A/B testing, staged rollouts, and model-quality diagnostics.

Summary

Ning Ai is a staff-level data scientist who specializes in converting complex product, experimentation, and model-behavior questions into decision-grade measurement systems. Her work spans developer productivity, telemetry architecture, large-scale A/B testing, staged rollouts, causal ML, and evaluation frameworks for AI coding models.

At Meta, she led measurement for an internal AI coding assistant, defined North Star and guardrail metrics for developer productivity, built self-serve analytics surfaces, diagnosed model failure modes, and extended offline eval systems across coding and search use cases. The result is a resume anchored in measurable operating leverage rather than one-off analyses.
0→1 productivity metrics, telemetry instrumentation, and dashboard systems
Experimentation depth across A/B tests, CUPED, staged rollouts, and causal inference
LLM and coding-model evaluation through LM-as-a-judge, human annotation, and failure-mode diagnosis
Clear cross-functional partnership with engineering, research, and product teams

Experience

AI Coding Assistant Measurement & Developer Productivity

  • Led product measurement for Meta's Code Llama-powered coding assistant.
  • Defined and tracked suggestion acceptance, edit distance, compile and test pass rates, task completion time, latency, and session productivity.
  • Diagnosed latency spikes, low-confidence suggestions, and context-window truncation in partnership with engineering and research.
  • Prototyped Python simulations for alternative suggestion-ranking strategies.
  • Contributed to a 9% improvement in developer efficiency and about 14 minutes saved per engineer per day.

Developer Efficiency & Coding Capacity Dashboard

  • Defined North Star metrics including session productivity, net time saved, and coding capacity utilization.
  • Established guardrail metrics covering latency, regression rate, and error rate.
  • Designed telemetry events including suggestion_shown, suggestion_accepted, suggestion_edited, suggestion_deleted, compile_triggered, and test_passed.
  • Built a self-serve dashboard that reduced ad hoc requests by roughly 40%.

AI Model Offline Evaluation Framework (GenAI Evals)

  • Architected a 0→1 offline evaluation system combining LM-as-a-judge scoring, human annotation, and failure-mode analysis.
  • Drove 10%+ model quality improvement and extended the framework to Meta AI Search.

Large-Scale A/B Testing & Developer Product Experimentation

  • Designed A/B tests and staged rollouts with CUPED variance reduction.
  • Supported a 20% DAU uplift on Meta AI Search across tens of millions of users.

Causal ML & Incrementality Measurement

  • Built uplift modeling systems for 10M+ advertisers, resulting in $300M+ incremental revenue.
  • Drove $100M+ annual incremental revenue attribution across Meta's ads ecosystem.

Operational ML Systems

  • Built an ML model for property tax prediction that reduced operational cost by $1M+.
  • Created a document-understanding pipeline on payoff images that achieved 98% accuracy.

Customer Segmentation & Personalization

  • Developed user segmentation and LTV-based frameworks for marketing personalization.

Impact Numbers

0
Engineers Served
Internal AI coding assistant footprint
0
Revenue Impact
Incremental revenue from causal ML systems
0
Annual Attribution
Ads ecosystem incremental revenue attributed
0
Developer Efficiency Gain
Measured improvement in coding assistant outcomes
0
Saved Per Engineer / Day
Average efficiency lift translated into time
0
DAU Uplift
Meta AI Search growth through experimentation
0
Fewer Ad Hoc Requests
Dashboarding leverage for partner teams
0
Model Quality Improvement
Offline evaluation framework impact

Skills

Languages & Tools

Python pandas numpy scikit-learn statsmodels SQL Spark R Git

Experimentation & Causal Inference

A/B testing staged rollouts CUPED CATE/DML geo-holdout diff-in-diff

AI / Coding Model Evaluation

LM-as-a-judge offline eval frameworks human annotation design regression analysis failure-mode diagnosis

Developer Product Analytics

telemetry instrumentation metric framework design self-serve dashboards segmentation analysis

Education

Cornell University

M.S. Applied Statistics

GPA 3.80/4.00

2016-2017

University of Liverpool

B.S. Financial Mathematics

First Class Honours, GPA 3.86/4.00

2012-2016