Progress score

Waterline’s progress score is a single percentage that tells you how much of a ticket’s acceptance criteria are covered by code in your repository. It’s designed to be honest and traceable — you can always see exactly which criteria are satisfied, which are partial, and which have no evidence yet.

A score of 73% means 73% of the ticket’s acceptance criteria have clear code implementations. It does not mean 73% of the feature is done, 73% of the code has been written, or that the implemented criteria are bug-free or tested.

How the score is calculated

The formula is straightforward:

progress % = (SATISFIED criteria count / total criteria count) × 100

Each of the ticket’s acceptance criteria gets a confidence score based on how strongly the indexed code supports it. That confidence score maps to one of three states:

State	Confidence	Counts toward the score?
`SATISFIED`	≥ 0.75	Yes
`PARTIAL`	0.40 – 0.74	No
`UNSATISFIED`	< 0.40	No

Only SATISFIED criteria contribute to the percentage. PARTIAL and UNSATISFIED do not.

Why PARTIAL doesn’t count

A confidence of 0.6 means relevant code exists, but Waterline isn’t confident it fully implements the criterion. Counting that as done would make the score feel more complete than it actually is. The 0.75 threshold for SATISFIED is intentionally high so the score only moves when there’s strong evidence. PARTIAL is a signal that work is in progress — not that it’s finished. The 0.40 floor for PARTIAL keeps noise out. Below that threshold, any signal found is too weak to be meaningful, so the criterion is treated as UNSATISFIED.

Worked example

Consider a ticket with four acceptance criteria:

Criterion	Confidence	State
User can log in with email and password	0.91	`SATISFIED`
Session persists across page reloads	0.82	`SATISFIED`
”Remember me” extends session to 30 days	0.58	`PARTIAL`
Login is rate-limited after 5 failed attempts	0.21	`UNSATISFIED`

progress % = 2 satisfied / 4 total = 50%

The score is 50%, even though three of the four criteria have some evidence. The PARTIAL criterion — “Remember me” — signals that something related to session duration exists in the code, but not enough to be confident the requirement is fully met.

Uncertainty levels

Alongside the percentage, Waterline reports an uncertainty level for the overall analysis:

Level	What it means
`LOW`	Multiple strong signals were found across the codebase. The score is reliable.
`MEDIUM`	Evidence is mixed. Some criteria have ambiguous coverage. Treat the score as a useful estimate, not a firm measure.
`HIGH`	Evidence is sparse. The codebase index may be incomplete, the ticket may be too vague for good matches, or the feature may genuinely not be implemented yet.

A HIGH uncertainty level doesn’t mean the score is wrong — it means you should look more carefully before acting on it.

Score stability

Given the same codebase and the same ticket, Waterline always produces the same score. The aggregation step that converts confidence scores to SATISFIED, PARTIAL, and UNSATISFIED uses fixed thresholds — there’s no randomness in that step. Scores only change when:

New code is merged into the repository (new evidence is available)
The ticket description is edited (changes what criteria are extracted)
The LLM model used for analysis changes

Results are cached for one hour by default. After the cache expires, the next analysis re-runs the full pipeline with the latest indexed code.

Improving a low score

If the score seems lower than you’d expect given the state of the work, here are the most common causes and what you can do:

Check the evidence list

Look at which symbols Waterline found for each criterion. Are the relevant functions and methods in that list? If not, they may not be indexed yet.

Check indexing status

If you’ve pushed code recently, confirm the sync completed. You can trigger a manual re-index if needed.

Improve acceptance criteria specificity

Vague criteria like “it should work” are hard to match to code. Specific criteria like “the login endpoint returns HTTP 401 for invalid credentials” give Waterline concrete signals to search for.

Check your repo size limits

If your repository exceeds the default limits (REPO_MAX_FILES=2000 or REPO_MAX_SYMBOLS=15000), relevant code may not be indexed. Raise the limits and re-sync.

The most reliable way to improve score accuracy is to write specific, testable acceptance criteria in your tickets. A criterion that describes observable behavior — an HTTP status code, a UI state, a data constraint — is far easier for Waterline to match to code than a criterion that describes intent.

Get Started

Integrations

How It Works

Configuration

Self-Hosting

How the score is calculated

Why PARTIAL doesn’t count

Worked example

Uncertainty levels

Score stability

Improving a low score

Check the evidence list

Check indexing status

Improve acceptance criteria specificity

Check your repo size limits

Get Started

Integrations

How It Works

Configuration

Self-Hosting

​How the score is calculated

​Why PARTIAL doesn’t count

​Worked example

​Uncertainty levels

​Score stability

​Improving a low score

Check the evidence list

Check indexing status

Improve acceptance criteria specificity

Check your repo size limits

How the score is calculated

Why PARTIAL doesn’t count

Worked example

Uncertainty levels

Score stability

Improving a low score