Experimentation By Gregor Spielmann, Adasight March 2026

How to Prioritize Experiments: ICE, PIE, and RICE Frameworks Compared

Most growth teams have more experiment ideas than they have traffic or engineering bandwidth to run them. Prioritization frameworks prevent the loudest voice in the room from winning. ICE, PIE, and RICE are the three most widely used scoring systems — here's how each works and what each one gets right and wrong.

🧮 Free tool: Sample Size Calculator — no signup required

Open tool →

ICE: the fastest prioritization method

ICE stands for Impact, Confidence, Ease. Each experiment idea is scored 1-10 on each dimension, and the scores are multiplied (or averaged, depending on your team's preference). Impact: how large a metric improvement would a successful version of this experiment produce? Confidence: how confident are you that this experiment will produce a positive result, based on data, user research, or proven patterns? Ease: how simple is this to implement — low engineering effort, no design work, can be done this sprint? ICE is fast to run (a team can score 20 ideas in 30 minutes) and gives a rough prioritization that's better than gut feel. Its weakness: it doesn't explicitly account for the volume of users affected, which means a high-scoring ICE experiment might actually affect very few users if it's deep in the product funnel.

PIE: ICE with reach built in

PIE stands for Potential, Importance, Ease. Potential is similar to ICE's Impact but focuses on how much improvement there is relative to the current state (high-potential experiments are on pages or flows that are currently underperforming their benchmark). Importance captures how important the affected user segment is — an experiment on your highest-traffic onboarding screen is more important than one on your account settings page, even if both have similar conversion improvement potential. Ease is the same as ICE. PIE was developed by Widerfunnel and is especially useful for teams running CRO experiments on marketing and landing pages where page-level traffic data makes Importance easy to quantify.

RICE: the most rigorous framework

RICE stands for Reach, Impact, Confidence, Effort. Reach is the number of users affected per time period — not a 1-10 score but an actual estimate: 'this experiment affects users in the onboarding flow, which 2,000 users per month go through.' Impact is a multiplier (0.25, 0.5, 1, 2, 3) for how much this moves the needle per user affected. Confidence is a percentage (100% = definitive data, 80% = strong data, 50% = gut feel). Effort is person-months of engineering and design work. RICE score = (Reach × Impact × Confidence) / Effort. The output is comparable across experiments — a RICE score of 1,200 vs. 350 gives you a clear ranking. The tradeoff: RICE takes longer to score carefully, and the estimates for Reach and Impact require real data to be meaningful.

Which framework should your team use?

The right choice depends on your stage and experimentation velocity. Early-stage (under 5 experiments/month): use ICE or PIE — the overhead of RICE isn't worth it, and the main benefit is simply making the prioritization conversation explicit rather than gut-driven. Mid-stage (5-20 experiments/month): add a RICE-style Reach estimate to your ICE scores. Just having a column for 'users affected per month' dramatically improves prioritization quality. High-velocity (20+ experiments/month): implement full RICE with real Reach data pulled from your analytics tool. At this velocity, the difference between high-RICE and low-RICE experiments meaningfully compounds over time. Regardless of framework: always track the outcome of prioritization decisions. If your top-scored experiments consistently underperform your lower-scored ones, your scoring calibration is off.

Avoiding the most common prioritization mistakes

The most common mistake with prioritization frameworks is scoring ideas with scores that reflect personal enthusiasm rather than actual data. Impact scores of 8-10 are assigned to ideas the team is excited about, and low scores are assigned to ideas they're skeptical of — the framework becomes a rationalization of pre-existing preferences rather than an objective ranking tool. Two practices that counteract this: have multiple team members score ideas independently before comparing scores (disagreements reveal where assumptions differ), and require that any score above 7 for Impact or Confidence be backed by a specific data point or research source. The discipline of sourcing your scores is most of the value of the framework.

Need expert help with growth analytics?

Adasight works with scaling D2C and SaaS companies to build the analytics foundations and experimentation programs that drive measurable growth.

Talk to Adasight →

Frequently asked questions

What is the ICE scoring model for experiments?

ICE stands for Impact, Confidence, and Ease. Each experiment idea is scored 1-10 on each dimension: Impact (how large a metric improvement would success produce), Confidence (how likely is this to work based on data or evidence), and Ease (how little effort is required to implement and run it). The three scores are multiplied together to produce a priority score. ICE is simple, fast, and better than gut-feel prioritization — its main limitation is that it doesn't account for the volume of users affected.

What is the RICE framework for prioritization?

RICE stands for Reach, Impact, Confidence, and Effort. Unlike ICE's 1-10 scoring, RICE uses actual estimates: Reach (users affected per time period), Impact (a multiplier for how much this moves the needle per user), Confidence (a percentage reflecting evidence quality), and Effort (person-months of work). RICE score = (Reach × Impact × Confidence) / Effort. RICE produces more rigorous prioritization but requires real data to be meaningful.

How do you build an experiment backlog?

An experiment backlog should contain: a hypothesis statement ('We believe that [change] will cause [outcome] because [evidence]'), the primary metric and guardrail metrics for the test, an estimated sample size requirement, an effort estimate, and a priority score. Review the backlog in a weekly experimentation meeting, promote the highest-priority ready items to 'in flight,' and retrospectively tag each completed experiment with the actual outcome. The historical record of what worked and didn't is as valuable as the current backlog.