Indexing

Waterline indexes your repository at the symbol level — functions, methods, and classes — not just files. This granularity is what lets Waterline tell you which specific code implements a given acceptance criterion, rather than pointing you to a file and leaving you to read through it yourself.

What gets indexed

When Waterline processes your code, it extracts symbols: individually meaningful units that represent a discrete behavior or concept.

Functions and methods

Regular, async, static, and class methods — any callable unit of logic.

Classes

Class definitions, including their docstrings and associated metadata.

Constants and type aliases

Top-level constants and type aliases in supported languages.

For each symbol, Waterline stores:

Field	What it is
Name	The identifier as it appears in your code
File path	The path within your repository
Line range	The start and end lines of the symbol
Semantic summary	A 2–4 sentence description of what the symbol does, generated by an LLM
Embedding	A vector representation used for semantic search

The semantic summary and embedding are what allow Waterline to match your ticket description to relevant code — even when the ticket doesn’t use the same words as the function names.

Initial indexing

When you first connect a repository, Waterline crawls every source file and builds the index from scratch.

Crawl all source files

Waterline walks the repository tree and identifies files in supported languages.

Extract symbols

Each source file is parsed to pull out functions, methods, and classes, along with their names and line ranges.

Generate semantic summaries

An LLM writes a short description of what each symbol does. This is the text Waterline uses to match code to ticket language later.

Generate embeddings

Each summary is converted into an embedding vector for fast semantic similarity search.

Store and record progress

Symbols are saved to the index and a sync cursor (the latest commit SHA) is recorded so future syncs know where to pick up.

Size limits

Waterline stops indexing once either of the following limits is reached. These defaults are set to prevent unexpectedly large processing costs on very large monorepos.

Limit	Default	Environment variable
Maximum files	2,000	`REPO_MAX_FILES`
Maximum symbols	15,000	`REPO_MAX_SYMBOLS`

If your repository legitimately exceeds these defaults, you can raise them by setting REPO_MAX_FILES and REPO_MAX_SYMBOLS in your .env file. Increasing the limits will increase the time and cost of the initial index build, but will give Waterline a more complete picture of your codebase.

Symbol filtering

Not every symbol is worth indexing. Waterline applies line-count filters before processing a symbol:

Filter	Default	Environment variable
Minimum symbol lines	3	`SYMBOL_EXTRACTOR_MIN_LINES`
Maximum symbol lines	500	`SYMBOL_EXTRACTOR_MAX_LINES`

Symbols shorter than 3 lines usually don’t have enough content to produce a meaningful summary. Symbols longer than 500 lines are typically auto-generated files or large utility modules that would add noise without improving search quality.

Incremental sync

After the initial index, Waterline keeps itself up to date automatically. Every push to any branch triggers an incremental re-index.

Read the sync cursor

Waterline checks the last commit SHA it processed.

Identify changed files

Commits between the cursor and the latest SHA are compared to find which files changed.

Re-index changed files

Symbols in those files are re-extracted, re-summarized, and their embeddings are updated.

Re-index 1-hop dependents

Files that import symbols from any changed file are also re-indexed. This catches code whose effective behavior changed because a dependency was updated — even if the file itself wasn’t edited directly.

Advance the sync cursor

The cursor is updated to the latest SHA, ready for the next push.

The 1-hop dependency traversal means your index reflects the actual current behavior of your code after a push, not just the files that were touched in the diff.

Storage

Waterline keeps your index in two stores that work together:

Store	What is stored	Used for
Postgres	Name, file path, line range, summary text, and metadata for each symbol	Structured lookups and result assembly
ChromaDB	Embedding vectors with matching IDs	Fast semantic similarity search

From your perspective, this is invisible — you always see a single unified result. The two stores stay in sync automatically during every index operation.

Feature flags

You can control indexing behavior with these environment variables:

ENABLE_SYMBOL_INDEXING=true          # Build the symbol index on each sync
ENABLE_SYMBOL_SEARCH=true            # Use the symbol index when analyzing tickets
SYMBOL_SEARCH_FALLBACK_TO_FILES=true # Fall back to file-level search if symbol results are sparse

Setting ENABLE_SYMBOL_INDEXING=false tells Waterline to use file-level semantic diffs instead of symbol-level indexing. This is less precise but reduces indexing cost, and can be useful while evaluating the product on a large codebase.

Get Started

Integrations

How It Works

Configuration

Self-Hosting

What gets indexed

Functions and methods

Classes

Constants and type aliases

Initial indexing

Size limits

Symbol filtering

Incremental sync

Storage

Feature flags

Get Started

Integrations

How It Works

Configuration

Self-Hosting

​What gets indexed

Functions and methods

Classes

Constants and type aliases

​Initial indexing

​Size limits

​Symbol filtering

​Incremental sync

​Storage

​Feature flags

What gets indexed

Initial indexing

Size limits

Symbol filtering

Incremental sync

Storage

Feature flags