What makes A/B testing questions challenging in PM interviews?

A/B testing questions combine statistical concepts with business strategy, requiring candidates to design experiments that balance user experience, business metrics, and statistical validity while considering biases like novelty effect and primacy bias.

What is the PICOT methodology for A/B testing?

PICOT stands for Population (target audience), Intervention (the change being tested), Comparison (control group), Outcome (key metrics), and Time (experiment duration). This framework helps structure A/B testing approaches systematically.

How do you handle biases in A/B testing results?

Key biases include novelty effect (temporary engagement spike), primacy effect (resistance to change), and interference between groups. Combat these by running longer experiments, monitoring sustained behavior changes, and ensuring clean group separation.

What are the most important metrics for Facebook ads A/B testing?

eCPM (effective Cost Per Mille) is the primary revenue metric. Secondary metrics include conversion rates, click-through rates, and engagement metrics. Guardrail metrics protect user experience and advertiser satisfaction.

A/B Testing PM Interview: How to Ace the Facebook Ads Revenue Question

Welcome to the fifth edition of PM Interview Prep Weekly! I’m Ajitesh, and this week we’re diving into one of the most technical yet business-critical areas of product management: A/B testing for revenue optimization.

The Context

A/B testing questions are particularly common when recruiting for consumer-tech PM roles at firms like Facebook, Google, and other data-driven organizations.

Having worked as a PM in both consumer tech and enterprise spaces, I’ve seen the stark differences in how A/B testing manifests. At Fitso and Sqrrl - consumer-tech startups I was part of, A/B testing was pretty routine. Almost every new feature launch. Marketing campaigns. In consumer tech, data is plentiful and user verdicts come fast.

When working at Cisco and Google Cloud—enterprise territory, I found A/B testing rare. Why? You have fewer users, long development cycles, and a few large customers can determine the fate of the product. Our backend APIs would go through alpha, beta, and GA (Generally Available) releases with one-year deprecation periods. No room for experimentation there—enterprise customers demand stability. Instead, a lot of work at the start to get the product and design decisions right.

But here’s what fascinated me: even within Google Cloud, our Pantheon UI team (code name for Google Cloud frontend) ran experiments constantly. While the backend stayed stable, the frontend tested everything—how to present features, where to place buttons, which workflows converted better.

Google’s culture of experimentation runs deep. There’s a famous story (possibly false, but illustrative) about a PM who changed the font color in Google Search and increased revenue by $1 billion, got promoted, then a year later someone changed it back and also got promoted. Was it novelty effect? Primacy bias? Who knows—but it shows how even tiny changes can have massive impacts when you’re operating at scale.

This dual experience taught me something crucial: while consumer-oriented companies obsess over A/B testing and metrics (rightfully so), B2B companies often undervalue these skills. But the ability to think experimentally, interpret metrics, and understand user behavior is valuable everywhere. Even in my enterprise role, understanding A/B testing principles helped me make sense of our UI team’s decisions and spot patterns in user adoption data.

The setup: Ads represent one of the trickiest three-sided marketplaces: users want relevant experiences, advertisers want ROI, and the platform wants revenue.

Facebook generates over $100 billion annually from ads—that’s roughly 97% of their revenue. When a PM at Facebook designs an A/B test, they’re not just optimizing a feature. They’re making decisions that could impact billions in revenue while balancing user experience and advertiser success.

The question: Today we’re tackling—“What are the top 3 types of A/B experiments you would run on Facebook ads to increase revenue?“.

Let’s break down exactly how to nail this question and the framework that makes A/B testing interviews manageable.

P.S. One of my PM mentors from YouTube once shared: “A good PM celebrates results; a great PM questions them.” This is especially true for A/B testing. Say you run a test that shows a 3% increase in CTR—before scaling to 100% of users, question everything about it. Was it novelty effect? Did it accidentally target a specific segment? Will it sustain over time? The same scrutiny applies to negative results. At Google’s scale, even small misinterpretations can have massive consequences.

Approach to Solving A/B Testing Questions

A/B testing questions can feel overwhelming because they combine statistical concepts with business strategy.

Here’s the thing though—A/B testing, for better or worse, has been done to death by organizations across healthcare, tech, and beyond. The approach has become remarkably standardized and theoretical. Whether you’re designing experiments at a startup, Google or interview, the path is approch is almost same.

What I’m sharing today is the approach I follow, but talk to any data scientist or PM at Google, Facebook, Amazon, or any large consumer tech company, and you’ll hear similar patterns. Pick up any book on experimentation or data science, and you’ll find the same core principles.

The Three-Step Framework:

Step 1: Define Your Hypotheses

Every A/B test starts with a clear hypothesis pair:

Null Hypothesis (H₀): In A/B testing, the null hypothesis states that there is no difference between the control and variant group. This is your default assumption until proven otherwise.

Alternative Hypothesis (H₁): The alternative hypothesis states that there is a measurable difference between the control and variant group. This is what you’re trying to prove.

The goal is to determine whether to reject the null hypothesis in favor of the alternative with statistical significance.

Step 2: Apply the PICOT Methodology

The PICOT framework, commonly used in healthcare research, adapts perfectly to structure A/B testing approaches:

Population: Target audience for the test (e.g., mobile app users)
Intervention: The change or variant being tested relative to the control—a measurable product change like altering button colors or modifying checkout processes
Comparison: Existing product without any change (the control group)
Outcome: The key metrics that define intervention impact. These become your main evaluation criteria and often represent the most critical part of the interview
Time: Duration for running the experiment before analyzing data and reaching conclusions

Example Application: P (mobile app users in the US aged 18-35), I (increased font size in onboarding flow), C (original font size), O (completion rate of onboarding), T (2 weeks).

Step 3: Watch for Biases and Statistical Significance

When analyzing A/B testing results, several critical considerations emerge:

Novelty Effect: Users may engage more with new or changed products initially due to curiosity, but this effect might not be sustainable long-term. For example, if a messaging app introduces animated emojis, users may send more emojis initially, but this could just be a temporary spike. Always account for novelty effects by looking for persistent changes over longer periods.

Primacy Effect: Conversely, users often prefer and stick with original product versions. They may resist changes or provide negative feedback when redesigns are first introduced because people prefer familiarity. Long-time users are especially prone to this effect. When redesigning products, negative initial feedback could be temporary primacy effect, so monitor user sentiment and behavior over longer terms.

Dealing with Interference: Care must be taken to avoid treatment groups interfering with control groups. For example, if testing a new navigation menu, users in the test group could discuss the new menu on forums seen by control group users. Without clean separation, results get skewed as the control is no longer a true baseline.

Statistical Significance: Determine whether results are statistically significant using p-values. This represents the probability of obtaining results at least as extreme as observed results, assuming the null hypothesis is true. Values range from 0 to 1, with p-values less than 0.05 (5%) conventionally considered statistically significant for accepting alternative hypotheses.

This framework provides structure while leaving room for creative thinking. The critical part is thinking through the right metrics and showcasing that you understand how experiments are set up.

The Case Study

Interviewer: “You’re a PM at Meta/Facebook on the Ads team. What are the top 3 types of A/B experiments you would run on Facebook ads to increase revenue?”

My Solution Using the outlined approach

Step 1: Clarifying Questions and Context

First, let me share my understanding of Facebook’s ads ecosystem:

Facebook generates revenue primarily through an auction system where advertisers bid to show ads to users. The key revenue metric is eCPM (effective Cost Per Mille)—how much revenue Facebook earns per thousand ad impressions. Success depends on balancing three stakeholders: users who want relevant, non-intrusive ads; advertisers who want strong ROI and reach; and Facebook, which wants to maximize revenue sustainably.

To increase revenue, we can optimize eCPM through better auction mechanics, improved ad relevance, or new ad formats that drive higher engagement and conversion.

My assumption: We want experiments that can drive meaningful revenue impact (double-digit percentage improvements in their segments) while preserving advertiser ROI and user experience.

Let me brainstorm potential experiments across different revenue levers:

AR Try-On Ads - Let users virtually try products (makeup, glasses, clothes) using AR
ML-Powered Audience Expansion - Auto-expand targeting using machine learning to find similar high-value users
AI Avatar Brand Ambassadors - Interactive AI representatives that can answer questions and qualify leads
Social Proof Overlays - Show real-time engagement metrics (views, purchases) on ads
Sequential Retargeting Campaigns - Multi-step ad journeys that tell a story across touchpoints

Now, let me prioritize based on:

Implementation feasibility (can we build this in reasonable time?)
Revenue potential (how much lift can we expect?)
Strategic alignment (does it leverage Meta’s unique strengths?)

I’ll focus on the first three. While Social Proof Overlays could drive urgency and Sequential Retargeting could improve conversion through storytelling, the first three experiments better leverage Meta’s investments in AR/VR and AI and have higher revenue potential in the long run, even if a bit high on implementation feasibility.

Step 2: Three High-Impact Experiments

Note on metrics: Since our goal is to increase ads revenue, eCPM remains the primary metric across all experiments—it’s the direct measure of revenue per thousand impressions. What varies is the mechanism (secondary metrics) through which we achieve that eCPM lift and what could go wrong (guardrail metrics) with each approach.

Experiment 1: AR Try-On Ads (Immersive Ad Innovation)

Problem Statement: E-commerce advertisers, especially in beauty and fashion, face a fundamental challenge—users can’t physically try products before buying, leading to low conversion rates and thus lower eCPMs (effective Cost Per Mille) for these ad types.

Hypothesis Formation:

Null Hypothesis (H₀): AR try-on features have no impact on ad revenue
Alternative Hypothesis (H₁): AR try-on ads will meaningfully increase eCPM through higher conversion rates
Reasoning: Sizing concerns and “how will it look on me” uncertainty are my top reasons for abandoning cart, and I think there are many like me.

PICOT Application:

Population: Fashion, beauty, and eyewear advertisers currently using catalog ads, starting with top 1000 advertisers by spend
Intervention: Implement AR camera effects that let users virtually try products:
- Makeup: Real-time facial tracking for lipstick, foundation
- Eyewear: Glasses/sunglasses overlay
- Accessories: Watches, jewelry placement
Comparison: 50/50 A/B split; control sees standard carousel/collection ads
Outcome:
- Primary - eCPM;
- Secondary - conversion rate, time in ad
- Guardrail - load time, crash rate
Time: 6 weeks to account for user adoption curve

Trade-offs: While AR drives engagement, it requires newer devices and more bandwidth. We’d need fallbacks for older phones. Implementation costs are high, but the potential for transforming e-commerce advertising is significant.

Experiment 2: ML-Powered Audience Expansion (Targeting Optimization)

Problem Statement: Many SMB advertisers use overly narrow targeting, limiting their reach and Facebook’s revenue potential. They lack the expertise to find similar high-value audiences.

Hypothesis Formation:

Null Hypothesis (H₀): ML audience expansion doesn’t affect revenue per advertiser
Alternative Hypothesis (H₁): Intelligent audience expansion will increase revenue per advertiser substantially
Reasoning: There’s significant untapped potential in helping SMBs reach similar audiences through machine learning, which should drive meaningful revenue growth.

PICOT Application:

Population: Conversion-optimized campaigns with <10K audience size (primarily SMB advertisers)
Intervention: ML model expands targeting by finding similar users based on:
- Behavioral patterns across Facebook’s ecosystem
- Purchase history and intent signals
- Cross-advertiser learnings while preserving privacy
Comparison: Control maintains original targeting parameters
Outcome:
- Primary - eCPM
- Secondary - reach expansion, ROAS
- Guardrail - advertiser opt-out rate
Time: 2 weeks minimum to account for ML model learning phase

Key Implementation: Show advertisers which expanded segments perform best, allow opt-out after seeing results, and maintain ROAS thresholds to preserve advertiser trust.

Experiment 3: AI Avatar Brand Ambassadors (Conversational Ads)

Problem Statement: Traditional ads are one-way communication. Users have questions but must leave Facebook to get answers, reducing conversion rates and ad effectiveness, especially for service-based businesses.

Hypothesis Formation:

Null Hypothesis (H₀): AI avatars don’t impact ad revenue
Alternative Hypothesis (H₁): Conversational AI avatars will drive higher eCPM through better engagement and lead quality
Reasoning: Service-based advertisers need qualification conversations. AI avatars providing this at scale should drive significant revenue improvements through better lead quality.

PICOT Application:

Population: Service advertisers (insurance, education, travel, real estate) currently using lead generation ads
Intervention: AI-powered brand representatives that can:
- Answer product questions in real-time
- Provide personalized recommendations based on user inputs
- Book appointments or schedule demos directly
- Remember conversation context for follow-up interactions
Comparison: Standard lead generation ads with forms
Outcome:
- Primary - eCPM
- Secondary - lead quality score, conversation completion
- Guardrail - inappropriate responses, brand safety
Time: 8 weeks to allow proper AI model training and user adoption

Implementation Details: Start with rule-based responses and evolve to LLM capabilities, maintain human fallback for complex queries, and enforce strict brand voice guidelines.

Step 3: Watch for Biases and Statistical Significance

Critical Biases to Consider:

For AR Try-On Ads:

Novelty Effect: Users might initially engage heavily with AR features out of curiosity. Monitor for 6+ weeks to see if engagement sustains beyond the novelty phase
Selection Bias: Early adopters using AR might already be high-intent buyers

For ML Audience Expansion:

Survivorship Bias: Advertisers who opt-out early might be different from those who stay, skewing results
Primacy Effect: SMB advertisers might resist the change initially, preferring their manual targeting

For AI Avatar Brand Ambassadors:

Novelty Effect: Initial fascination with AI conversations could inflate early metrics
Interference: Users might share their AI chat experiences on social media, influencing control group behavior

Statistical Significance Considerations:

Need sufficient sample size (likely 1000+ advertisers per experiment) to detect meaningful eCPM lifts
Segment analysis by device type, advertiser size, and user demographics to understand true impact

Revenue Impact: While exact projections require historical data, prioritize ML expansion for quickest validation, followed by AR ads, then AI avatars based on implementation complexity.

This concludes the main framework application. The interviewer might ask about specific biases, sample size calculations, or how you’d handle conflicting results across segments.

How to Excel in This Case

Brainstorm before narrowing: Always generate 4-5 experiment ideas before selecting your test. This avoids being stuck in a bad experiment which nobody cares about.
Follow PICOT with discipline: Don’t skip steps even if it gets boring. It has to be done systematically whether in interview or in real life.
Show industry knowledge: Mentioning Meta’s AR/VR and AI investments isn’t name-dropping—it’s demonstrating product sense. Know what the company is betting on and align your experiments accordingly.
Bring real experience: When I mentioned “sizing concerns are one of prime reasons for cart abandonment for me personally,” that’s authentic. Share your actual observations from running experiments or being a user. Interviewers want to see you’ve lived this, not just studied it.
Question everything: Always discuss novelty effect, primacy bias, and other confounding factors. Say things like “The 3% CTR increase could be novelty effect, so I’d monitor for 4+ weeks to see if it sustains.” This shows maturity in interpreting results.

Common Pitfalls to Avoid

Jumping to one solution: Going straight to “I’ll test AR ads” without exploring alternatives shows narrow thinking. Always brainstorm multiple options first.
Being too theoretical: Don’t recite textbook definitions. This isn’t an academic exercise—they want to know you can run real experiments. Connect to actual products and experiences.
Ignoring trade-offs and biases: Not mentioning novelty effect, selection bias, or interference between test groups suggests you haven’t actually run experiments at scale.

Practice This Case

Want to try this A/B testing case yourself with an AI interviewer that challenges your experimental design and provides detailed statistical feedback?

Practice here: PM Interview: A/B Testing - Facebook Ads Revenue

The AI interviewer will push you on your statistical assumptions, challenge your business logic, and test whether you can balance multiple stakeholder needs—just like a real Facebook ads team PM would.

PM Tool of the Week: Mixpanel

As PMs running A/B tests, we need analytics that actually work. This week, I’m sharing Mixpanel—been using it since Fitso, now at Tough Tongue AI.

Here’s why I like it:

User-level tracking: See exactly which users did what in your test vs control groups
Built-in experiments report: Automatic lift and confidence calculations—no Excel gymnastics
Instant segmentation: Slice test results by any property to find hidden patterns

For years, Mixpanel was the only tool doing user-level analytics while Google Analytics was stuck in aggregate-land. Still my go-to for A/B test analysis.

Got an analytics tool you swear by? Hit reply and tell me about it!

What experiments would you run differently? What unique angles would you bring to Facebook’s ads revenue challenge? Hit reply—different perspectives on these analytical challenges always teach me something new!

About PM Interview Prep Weekly

Every Monday, get one complete PM case study with:

Detailed solution walkthrough from an ex-Google PM perspective
AI interview partner to practice with
Insights on what interviewers actually look for
Real examples from FAANG interviews

No fluff. No outdated advice. Just practical prep that works.

— Ajitesh
CEO & Co-founder, Tough Tongue AI
Ex-Google PM (Gemini)
LinkedIn | Twitter

A/B Testing PM Interview Questions - Facebook Ads Revenue Case Study

A/B Testing PM Interview: How to Ace the Facebook Ads Revenue Question

The Context

Approach to Solving A/B Testing Questions

Step 1: Define Your Hypotheses

Step 2: Apply the PICOT Methodology

Step 3: Watch for Biases and Statistical Significance

The Case Study

My Solution Using the outlined approach

Step 1: Clarifying Questions and Context

Step 2: Three High-Impact Experiments

Step 3: Watch for Biases and Statistical Significance

How to Excel in This Case

Common Pitfalls to Avoid

Practice This Case

Further Reading

PM Tool of the Week: Mixpanel

Related Posts

A/B Testing PM Interview: The Facebook Reactions Rollout Decision

PM Interview Questions - Google Photos Storage Capacity Case Study

Decision-Making PM Interview Questions - Netflix Ad-Supported Tier Case Study

Product Execution PM Interview: Reddit Root Cause Analysis