Why a maturity model

CRO maturity models exist. The mainstream ones are vendor-published, and they tend to share two flaws. First, the higher levels of the model conveniently require the publishing vendor's product. Second, the levels are described in terms of capabilities — what the program can do — rather than in terms of decision-making patterns — what the program actually does.

The model below is different. It is calibrated against what we see when we audit real CRO programs. The levels are about how decisions are made, not about which platform the team is on. The vendor question matters at level 4 and 5; at levels 1-3 it is mostly noise.

Most programs we audit cluster at level 2. A few are at level 3. A handful are at level 4. We have seen one level-5 program in the past decade. The model below is descriptive of where programs actually are, not aspirational about where they could be.

Level 1: gut-feel changes

A level-1 program is not really a program. Someone — usually a marketer or a designer — has opinions about what should change on the site. The opinions get shipped. Sometimes they get measured after the fact. There is no structured testing, no statistical rigor, no hypothesis framing.

What it looks like: "We should make the CTA bigger." Two weeks later the CTA is bigger. Three weeks after that the team is debating whether conversion went up because of the CTA or because of seasonality.

How to recognize you are here: if your team cannot show you a list of tests with start dates, sample sizes, and concluded outcomes, you are at level 1, regardless of what your dashboards claim.

What it takes to move to level 2: tool selection (almost any modern testing platform), an explicit policy that changes to high-traffic surfaces require an A/B test, and one team member with enough statistical literacy to call BS on bad test designs.

Level 2: structured testing without learning

A level-2 program runs structured A/B tests with reasonable statistical hygiene. Tests have sample sizes. Outcomes are recorded. Wins are shipped. Losses are documented. Aggregate site-level conversion rate slowly drifts upward.

This is where most enterprise programs are. The team is doing real work. The work is producing real lift, slowly. The hidden problem is that the tests are not connected to each other — each test is an isolated bet, and the program does not learn anything generalizable from the outcomes.

What it looks like: 40+ tests run over 18 months. Aggregate site conversion has moved from 2.1% to 2.4%. The team is busy. The CMO is starting to ask whether CRO has hit a ceiling.

How to recognize you are here: if your team cannot describe the strategic hypotheses behind the current test backlog, you are at level 2. Tests are running but they are not probing anything specific.

What it takes to move to level 3: rebuild the test backlog as a hypothesis tree. Refuse to run tests that do not connect to a tactical hypothesis. Accept that this will reduce test volume in the short term. We have written a case study on what this rebuild looks like in practice.

Level 3: hypothesis-driven program

A level-3 program has a hypothesis tree. Every test is connected to a tactical hypothesis. Every tactical hypothesis is connected to a strategic hypothesis about the buyer or the journey. The outcomes of tests update the hypotheses above them, which updates which tests come next.

This is the level where CRO starts to compound. The program is no longer optimizing the page. The program is building a model of how the buyer makes decisions, and the page is the experimental apparatus.

CRO program maturity Compounding lift accelerates from level 3 onward compounding lift over time L1 L2 L3 L4 L5 MATURITY LEVEL Gut-feel Structured testing Hypothesis-driven Pipeline-tied Moat (most programs here) hardest single move Levels are about how decisions are made, not which platform the team is on.
Fig 1. The maturity curve from level 1 to level 5. Test volume rises from level 1 to level 2 and then falls as programs move up — higher levels do fewer tests but learn more from each. Compounding lift accelerates from level 3 onward.

What it looks like: 20-25 tests in a six-month window (lower volume than level 2). Test win rate around 30-40% (lower than the "vanity" win rate at level 2). Aggregate site conversion moves materially faster. Pipeline contribution grows faster than aggregate conversion.

How to recognize you are here: a new team member joining the program can read the hypothesis tree and understand what the program is trying to learn. The PM can explain why a specific test is running this sprint and not the next.

What it takes to move to level 4: instrument the funnel below the surface metric. Tie test outcomes to pipeline at 60-90 day lag. Build the operational muscle to roll back surface wins that do not translate downstream.

Level 4: pipeline-tied program

A level-4 program measures test outcomes against pipeline contribution, not surface metrics. Tests that win at the surface but lose at the pipeline get rolled back. Tests that lose at the surface but win at the pipeline get re-evaluated. The program is now optimizing for a measure that matters to the CFO.

This is hard. It requires instrumentation that connects anonymous visit data to closed-revenue data, often across multiple tools. It requires the team to be patient about test outcomes for 60-90 days after a test concludes. It requires CMO-level support to accept that some "wins" will be rolled back, which is politically uncomfortable.

What it looks like: test cadence similar to level 3. Win rate at the surface lower than level 3 (more rollbacks). Pipeline contribution growing faster than aggregate conversion. CMO can show the board a chart of cumulative CRO contribution to pipeline.

How to recognize you are here: the program has rolled back a surface "win" in the past 90 days because the downstream measurement showed it did not move pipeline.

What it takes to move to level 5: the level-5 transition is qualitatively different. See below.

Level 5: program as competitive moat

A level-5 program is a competitive moat. The brand learns about its buyers faster than its competitors do. The buyer model that the program has built — across years of compounding hypothesis-driven testing — is a proprietary asset that does not exist anywhere else.

We have seen this level once. It was a consumer subscription brand that had been running level-3-and-above CRO for nine consecutive years. Their hypothesis tree had ~400 tactical hypotheses with empirical weight behind them. Their team could predict, with reasonable accuracy, how a proposed change would perform before testing it.

The competitive advantage of a level-5 program is not the next test win. It is the accumulated knowledge about how the brand's buyers actually behave. That knowledge does not transfer to a competitor when an employee leaves. The hypothesis tree is durable institutional memory.

There is no playbook to get from level 4 to level 5. It requires nine years of consistent level-3+ practice. You cannot accelerate this. You can only start now.

The level-5 advantage is not the next test. It is the nine-year hypothesis tree.

How to use this model

The model is descriptive, not prescriptive. The right next level for your program depends on where you are now, what your buyer cycle looks like, and how patient your leadership is.

If you are at level 1, getting to level 2 is mostly a tooling and policy decision. A quarter of focused work moves you up one level.

If you are at level 2 (where most programs live), getting to level 3 is the hardest single move on the curve. It requires giving up the freedom to test "anything interesting" and accepting that test volume will drop. The CMO has to back this for at least two quarters before the compounding benefit is visible.

If you are at level 3, getting to level 4 is mostly instrumentation work, and it pays back fast — usually within two quarters.

Level 5 is not a target. Level 5 is what happens when a level-3+ program runs for long enough.

Our team is happy to walk a maturity diagnostic for your specific program. Reach out if you want a second perspective on where your program actually sits and what it would take to move up one level.