The incentive that produces high scores
A maturity assessment is a commercial artifact. It is paid for by a buyer, conducted by a vendor, and read by the buyer's leadership. The vendor's commercial interest is that the buyer hires them for the remediation work that the assessment implies. The buyer's political interest is that the assessment validates that prior investments in digital have been working.
These two interests pull the assessment's scoring toward a specific number: high enough to validate that progress has been made, low enough to justify continued investment in the vendor. In practice this lands around 4.5 to 4.7 out of 5 on a five-point scale, or 75-85 on a hundred-point scale. The number is high enough that leadership is comfortable presenting it. The narrative is mature-enough-to-be-credible but with enough remaining opportunity to fund the next phase of work.
This is an internally consistent calibration. The assessment is comparing the brand to its prior assessments and to a peer set defined by the vendor. If everyone in the peer set scores in the 4.5-4.7 range, then a 4.6 means roughly average. The math works. What the assessment is NOT doing is comparing the brand to what excellent looks like in 2026, which is what would actually inform leadership about where to invest.
How our team calibrates
Our Digital Health Assessment uses a calibration anchored at three explicit points. 0 represents the brand having no capability in this signal — absent or so weak that it is not contributing to the digital function. 50 represents the brand having the capability operating at industry-median for brands of comparable size in comparable categories. 100 represents the brand operating at the best-in-class level we have observed for this specific signal anywhere in our work.
Most signals for most brands come in between 30 and 55, because most brands have most capabilities running below or at industry median. The composite score for most enterprise brands comes in between 34 and 48. We have never assessed a brand at composite 80+, and we would be surprised if we ever do.
The brand that gets a 4.6 out of 5 (92%) from a competitor assessment typically scores 41-48 on our scale. Both numbers describe the same brand. Both numbers are internally consistent within their respective calibrations. The 92% calibration is comparing the brand to other brands the vendor assesses, all of whom are in roughly the same range. Our calibration is comparing the brand to what excellent looks like, knowing that excellent is exceedingly rare.
Why the choice of calibration matters
The calibration choice changes what the assessment produces downstream. Three specific changes are worth being explicit about.
One: the prioritization shifts. A 4.6/5 result naturally produces remediation recommendations focused on incremental improvements — the easy 10-20% gains. A 41/100 result produces remediation recommendations focused on structural gaps — the capabilities that are operating two-thirds below where they could be. The remediation work that comes out of the second framing is more painful in the short term and produces materially larger compounding returns.
Two: the conversation with leadership shifts. Presenting a 4.6/5 to a board produces a conversation about pace of progress and continuation of current investment. Presenting a 41/100 produces a conversation about whether the current strategy is delivering, and whether the current vendor partnerships are right. The second conversation is more uncomfortable and more useful.
Three: the cultural posture shifts. Teams who hear "you are at 4.6/5" reasonably conclude that they are mostly succeeding and should keep doing what they are doing. Teams who hear "you are at 41/100 and here is specifically what excellent looks like at each pillar" naturally orient toward closing the gap to excellent. The cultural posture matters for two-year outcomes more than the specific roadmap.
None of this is about being harsh for harshness's sake. It is about choosing a calibration that produces useful information rather than comfortable information. Most leadership we present to in the debrief tells us, after the initial shock, that the honest read is the most valuable conversation they have had in years on the digital program.
The objection we hear most
The most common objection in the first sales conversation is: "Why should I pay for an assessment that is going to tell me my brand scores in the 40s when [other vendor] will tell me we score in the 4.6 range?" The implicit question is whether the calibration is just a marketing positioning, where we score low to make our remediation work seem more valuable.
The honest answer is that our calibration is a methodological choice, not a marketing positioning. We can show, on any specific signal, exactly what the rubric is for 30, 50, 70, 90 — and where the assessed brand lands against the rubric, with the observable evidence. The score is reproducible: another team running the assessment against the same brand at the same time would land within a few points. The calibration is anchored, not arbitrary.
The cleaner version of the same objection is: "Even if the calibration is real, why would I pay to be told a worse-sounding number?" The answer is: because the worse-sounding number is the input to remediation work that actually changes the digital program. The flattering number is input to a slide deck. Most leadership teams we work with eventually decide that they want the input that changes the program, even when the input is uncomfortable to read in the first session.
The flattering number is input to a slide deck. The honest number is input to remediation work that changes the program.
How we maintain the calibration
Calibration drift is the failure mode for any maturity-assessment methodology — the natural pressure on the assessment team is to score higher over time, because higher scores are easier to deliver and produce friendlier client conversations. Our team maintains calibration discipline through three practices.
First, every assessment is peer-reviewed before delivery by a senior team member who was not on the engagement. The peer reviewer checks every signal score against the rubric and the evidence. Score inflation gets caught at this stage.
Second, the rubrics themselves are updated quarterly based on what our research team observes in the broader market. As the industry baseline moves — for example, as page-speed becomes table stakes — the rubric for the corresponding signal updates so that a 50 still means "industry median for brands of comparable size today." This prevents the assessment from becoming easier over time as the world gets better.
Third, our internal record-keeping tracks score distributions across all assessments. If we notice that our recent assessments are clustering higher than our historical distribution without an obvious reason in the underlying brand quality, that triggers a calibration review.
These practices are documented in detail in our diagnostic playbook and applied uniformly across every assessment our team runs. If you want to see what your brand looks like against this calibration, the Digital Health Assessment is offered pro-bono alongside any managed-services engagement, and as a standalone for brands evaluating their digital posture independently.