CRO is not A/B testing. The hierarchy of CRO disciplines.

In short

Most CRO programs we look at are running structured A/B tests on the wrong things, because the team is operating from one of seven CRO disciplines without realizing the other six exist. This piece walks the hierarchy: research, hypothesis development, instrumentation, experimentation, analysis, operationalization, governance. A/B testing is one stage in a chain. Programs that own all seven disciplines compound at rates that programs running only experimentation cannot match.

How the industry got here

The conversion-rate-optimization vendor market is dominated by experimentation platforms — Optimizely, VWO, AB Tasty, Convert, and similar. These platforms run statistical tests on website variants. They are good at what they do. They are also, structurally, only able to address one of the seven disciplines that make up a complete CRO function.

The market has shaped what most enterprise teams understand "CRO" to mean. Buy an experimentation platform, hire a CRO manager who knows the platform, run tests, report results. This is CRO-as-experimentation. It is the most expensive lossy abstraction in the digital optimization market: it makes complete sense as a vendor offering and almost no sense as an operating model for sustained lift.

A complete CRO program has seven disciplines. The discipline most programs invest in (experimentation) is the fourth of the seven. Investing in the fourth discipline without the first three is why most CRO programs produce 4-8% aggregate lift per year, which decays over time as the easy wins are exhausted. Programs that invest in all seven produce sustained compounding lift because each discipline reinforces the others.

The seven disciplines

Research. Where does opportunity actually exist? This includes quantitative funnel analysis to find drop-off points; qualitative work to understand why the drop-off happens; competitive teardowns; customer interviews. Research is what makes hypotheses real instead of guesses. Programs without research generate hypothesis lists from "things the team thinks would work."

Hypothesis development. Translating research insights into testable claims structured as hypothesis trees, where strategic claims branch into tactical claims branch into experiments. Our piece on running a 6-month CRO program walks the hypothesis-tree structure in more depth. Programs without hypothesis development run experiments that are not connected to each other, so wins do not compound.

Fig 1. The seven disciplines of CRO. The fourth (experimentation) is what the vendor market has trained the industry to focus on. The other six are where durable lift comes from.

Instrumentation. The measurement substrate that makes outcomes legible. This is the layer between "we shipped a change" and "we know what it did." Tag management, event design, server-side measurement, identity resolution, attribution model selection. Programs with weak instrumentation cannot trust their own results regardless of how rigorous the rest of the program is.

Experimentation. The discipline most programs treat as the whole job. Running A/B and multivariate tests with appropriate statistical rigor. Sample size calculation, sequential testing methods, multiple testing correction. This discipline is well-understood; the vendor platforms support it well. It is also entirely downstream of disciplines 1-3.

Analysis. Distinct from experiment statistical analysis. This is the deeper read of what the test result means about underlying customer behavior, what to test next, and whether the result generalizes beyond the tested context. Most experiment results have three readings — one that the team's first instinct produces and two that emerge under closer analysis. Programs that ship from the first reading miss the structural insights.

Operationalization. Taking wins from experiment to production at scale, with the organizational infrastructure to maintain them. A test wins in the experimentation platform; making the win live for 100% of users in a way that compounds with future wins is a separate discipline. Most "tests we never shipped" failures are operationalization failures.

Governance. The portfolio-level oversight that prevents the program from drifting. Prioritization frameworks, kill-the-bad-tests discipline, peer review of methodology, cross-team coordination so that one team's test does not break another team's measurement. Programs without governance produce a long-tail of tests where the bottom half of the tail is destroying value.

The most expensive lossy abstraction in the digital optimization market: CRO collapsed to experimentation.

Why the hierarchy matters

The disciplines are hierarchical in a specific sense: lower disciplines determine the ceiling of higher disciplines. A team with weak research will pick the wrong things to test, no matter how rigorous the experimentation. A team with weak instrumentation cannot trust the experiment results, no matter how good the research. A team with weak operationalization cannot ship the wins to production, no matter how clean the experimentation.

This means investing in disciplines 4-7 without first investing in 1-3 produces structurally limited returns. We have seen programs spend $400K/year on experimentation platforms and CRO managers, while spending $0 on research and instrumentation, and produce 3% aggregate lift. The same program with $100K of research and instrumentation investment would have produced 12-15% aggregate lift on the same experimentation budget, because the tests would have been pointed at things that actually mattered.

The inverse is also worth being specific about: investing heavily in disciplines 1-3 without 4-7 produces insights that never reach production. Research without experimentation produces a deck. Instrumentation without analysis produces dashboards. Operationalization without governance produces drift. The disciplines reinforce each other; programs that pick three of seven get punished by the missing four.

What a balanced program looks like

A balanced CRO program invests in all seven disciplines, with the relative weights shifting based on the program's maturity stage. Our piece on the CRO maturity model walks the maturity progression in more depth, but the discipline weight pattern is roughly:

Early-stage programs (level 1-2 maturity) overinvest in research and instrumentation, because the program's ceiling is set by what it can know about customer behavior. Experimentation is a smaller share. Operationalization and governance are nearly zero, because the program does not yet have a volume of wins that requires them.

Mid-stage programs (level 3) balance research-and-instrumentation with hypothesis-and-experimentation. The first three disciplines stabilize at perhaps 40% of program effort, experimentation at 30%, analysis at 20%, operationalization at 10%.

Mature programs (level 4-5) invest heavily in operationalization and governance, because the program now has enough win volume that the bottleneck has shifted from "produce wins" to "ship wins consistently and prevent the bad wins from being shipped." Research stabilizes, experimentation runs continuously in the background, analysis is where the marginal investment goes.

If you map your current CRO program's investment to this pattern, the mismatch is usually instructive. Most programs have experimentation overweighted by a factor of 2-3x relative to what the maturity stage justifies, with research and operationalization underweighted by the same factor.

How our CRO offering is structured

Our team runs CRO engagements as instrumented programs against all seven disciplines, not as test execution. Specifically, the first six weeks of any CRO engagement are research, hypothesis development, and instrumentation — not experimentation. Experimentation begins in week seven, against a hypothesis tree we built in weeks 1-5 from research we ran in weeks 1-3 on a measurement substrate we hardened in weeks 4-6.

Most prospective clients find this frustrating in the first conversation. They are paying for a CRO program and they want tests running by week two. We explain that running tests in week two means running tests against the team's prior hypotheses, which are usually a mixture of correct and incorrect, and which the experiments will validate as 50/50 because they are roughly a coin flip on what to test. Running tests in week seven against a researched, instrumented hypothesis tree means each test moves the program forward, and the program compounds.

Clients who push us to start testing earlier than week seven typically have programs that produce 5-8% lift over 12 months. Clients who let us complete the research-instrumentation-hypothesis cycle before testing typically have programs that produce 25-40% lift over 12 months, with the gap widening in years two and three because the program compounds.

This is the CRO offering our team takes engagements on. If you want to see how the seven disciplines apply to your specific program, the first conversation is usually about which disciplines your current program over- or under-invests in.

Maya Chen

Digital Principal & Pod Leader, CXOntology

CRO Optimization Strategy

CRO is not A/B testing. The hierarchy of CRO disciplines.

How the industry got here

The seven disciplines

Why the hierarchy matters

What a balanced program looks like

How our CRO offering is structured

The CRO maturity model nobody talks about.

From baseline to 38% lift: a 6-month CRO program

Mid-flight: personalization observations from a B2B SaaS engagement

Like what you're seeing? We publish when we have something worth saying.