What gets indexed
When Waterline processes your code, it extracts symbols: individually meaningful units that represent a discrete behavior or concept.Functions and methods
Regular, async, static, and class methods — any callable unit of logic.
Classes
Class definitions, including their docstrings and associated metadata.
Constants and type aliases
Top-level constants and type aliases in supported languages.
| Field | What it is |
|---|---|
| Name | The identifier as it appears in your code |
| File path | The path within your repository |
| Line range | The start and end lines of the symbol |
| Semantic summary | A 2–4 sentence description of what the symbol does, generated by an LLM |
| Embedding | A vector representation used for semantic search |
Initial indexing
When you first connect a repository, Waterline crawls every source file and builds the index from scratch.Crawl all source files
Waterline walks the repository tree and identifies files in supported languages.
Extract symbols
Each source file is parsed to pull out functions, methods, and classes, along with their names and line ranges.
Generate semantic summaries
An LLM writes a short description of what each symbol does. This is the text Waterline uses to match code to ticket language later.
Generate embeddings
Each summary is converted into an embedding vector for fast semantic similarity search.
Size limits
Waterline stops indexing once either of the following limits is reached. These defaults are set to prevent unexpectedly large processing costs on very large monorepos.| Limit | Default | Environment variable |
|---|---|---|
| Maximum files | 2,000 | REPO_MAX_FILES |
| Maximum symbols | 15,000 | REPO_MAX_SYMBOLS |
Symbol filtering
Not every symbol is worth indexing. Waterline applies line-count filters before processing a symbol:| Filter | Default | Environment variable |
|---|---|---|
| Minimum symbol lines | 3 | SYMBOL_EXTRACTOR_MIN_LINES |
| Maximum symbol lines | 500 | SYMBOL_EXTRACTOR_MAX_LINES |
Incremental sync
After the initial index, Waterline keeps itself up to date automatically. Every push to any branch triggers an incremental re-index.Identify changed files
Commits between the cursor and the latest SHA are compared to find which files changed.
Re-index changed files
Symbols in those files are re-extracted, re-summarized, and their embeddings are updated.
Re-index 1-hop dependents
Files that import symbols from any changed file are also re-indexed. This catches code whose effective behavior changed because a dependency was updated — even if the file itself wasn’t edited directly.
The 1-hop dependency traversal means your index reflects the actual current behavior of your code after a push, not just the files that were touched in the diff.
Storage
Waterline keeps your index in two stores that work together:| Store | What is stored | Used for |
|---|---|---|
| Postgres | Name, file path, line range, summary text, and metadata for each symbol | Structured lookups and result assembly |
| ChromaDB | Embedding vectors with matching IDs | Fast semantic similarity search |
Feature flags
You can control indexing behavior with these environment variables:ENABLE_SYMBOL_INDEXING=false tells Waterline to use file-level semantic diffs instead of symbol-level indexing. This is less precise but reduces indexing cost, and can be useful while evaluating the product on a large codebase.