Skip to main content
Waterline indexes your repository at the symbol level — functions, methods, and classes — not just files. This granularity is what lets Waterline tell you which specific code implements a given acceptance criterion, rather than pointing you to a file and leaving you to read through it yourself.

What gets indexed

When Waterline processes your code, it extracts symbols: individually meaningful units that represent a discrete behavior or concept.

Functions and methods

Regular, async, static, and class methods — any callable unit of logic.

Classes

Class definitions, including their docstrings and associated metadata.

Constants and type aliases

Top-level constants and type aliases in supported languages.
For each symbol, Waterline stores:
FieldWhat it is
NameThe identifier as it appears in your code
File pathThe path within your repository
Line rangeThe start and end lines of the symbol
Semantic summaryA 2–4 sentence description of what the symbol does, generated by an LLM
EmbeddingA vector representation used for semantic search
The semantic summary and embedding are what allow Waterline to match your ticket description to relevant code — even when the ticket doesn’t use the same words as the function names.

Initial indexing

When you first connect a repository, Waterline crawls every source file and builds the index from scratch.
1

Crawl all source files

Waterline walks the repository tree and identifies files in supported languages.
2

Extract symbols

Each source file is parsed to pull out functions, methods, and classes, along with their names and line ranges.
3

Generate semantic summaries

An LLM writes a short description of what each symbol does. This is the text Waterline uses to match code to ticket language later.
4

Generate embeddings

Each summary is converted into an embedding vector for fast semantic similarity search.
5

Store and record progress

Symbols are saved to the index and a sync cursor (the latest commit SHA) is recorded so future syncs know where to pick up.

Size limits

Waterline stops indexing once either of the following limits is reached. These defaults are set to prevent unexpectedly large processing costs on very large monorepos.
LimitDefaultEnvironment variable
Maximum files2,000REPO_MAX_FILES
Maximum symbols15,000REPO_MAX_SYMBOLS
If your repository legitimately exceeds these defaults, you can raise them by setting REPO_MAX_FILES and REPO_MAX_SYMBOLS in your .env file. Increasing the limits will increase the time and cost of the initial index build, but will give Waterline a more complete picture of your codebase.

Symbol filtering

Not every symbol is worth indexing. Waterline applies line-count filters before processing a symbol:
FilterDefaultEnvironment variable
Minimum symbol lines3SYMBOL_EXTRACTOR_MIN_LINES
Maximum symbol lines500SYMBOL_EXTRACTOR_MAX_LINES
Symbols shorter than 3 lines usually don’t have enough content to produce a meaningful summary. Symbols longer than 500 lines are typically auto-generated files or large utility modules that would add noise without improving search quality.

Incremental sync

After the initial index, Waterline keeps itself up to date automatically. Every push to any branch triggers an incremental re-index.
1

Read the sync cursor

Waterline checks the last commit SHA it processed.
2

Identify changed files

Commits between the cursor and the latest SHA are compared to find which files changed.
3

Re-index changed files

Symbols in those files are re-extracted, re-summarized, and their embeddings are updated.
4

Re-index 1-hop dependents

Files that import symbols from any changed file are also re-indexed. This catches code whose effective behavior changed because a dependency was updated — even if the file itself wasn’t edited directly.
5

Advance the sync cursor

The cursor is updated to the latest SHA, ready for the next push.
The 1-hop dependency traversal means your index reflects the actual current behavior of your code after a push, not just the files that were touched in the diff.

Storage

Waterline keeps your index in two stores that work together:
StoreWhat is storedUsed for
PostgresName, file path, line range, summary text, and metadata for each symbolStructured lookups and result assembly
ChromaDBEmbedding vectors with matching IDsFast semantic similarity search
From your perspective, this is invisible — you always see a single unified result. The two stores stay in sync automatically during every index operation.

Feature flags

You can control indexing behavior with these environment variables:
ENABLE_SYMBOL_INDEXING=true          # Build the symbol index on each sync
ENABLE_SYMBOL_SEARCH=true            # Use the symbol index when analyzing tickets
SYMBOL_SEARCH_FALLBACK_TO_FILES=true # Fall back to file-level search if symbol results are sparse
Setting ENABLE_SYMBOL_INDEXING=false tells Waterline to use file-level semantic diffs instead of symbol-level indexing. This is less precise but reduces indexing cost, and can be useful while evaluating the product on a large codebase.