Skip to main content
Waterline connects to your GitHub repository to build a symbol-level index of your codebase — functions, methods, and classes — so it can find code evidence for each acceptance criterion in a ticket. This page covers how to connect a repository, what gets indexed, and how to keep the index up to date.

Connection methods

OAuth

Recommended for most users. Authorizes through GitHub’s standard OAuth flow in a few clicks.

Personal Access Token

For self-hosted deployments, CI environments, or anywhere an OAuth app isn’t practical.
1

Open GitHub settings

In Waterline, go to Settings → Integrations → GitHub and click Connect GitHub.
2

Authorize on GitHub

You’ll be redirected to GitHub’s authorization page. Grant access to the repositories you want to index.
3

Select a repository

Back in Waterline, choose the specific repository to connect to this workspace.
Required OAuth scopes:
ScopeWhy Waterline needs it
repoRead access to code, commits, and pull requests — including private repos
read:userRead your GitHub username
user:emailRead your email address

Personal Access Token

PAT support is available for self-hosted deployments where an OAuth app isn’t practical — for example, CI environments or local development.
1

Create a token

Go to github.com/settings/tokens and create a new token. Grant the same scopes: repo, read:user, and user:email.
2

Add the token to Waterline

In Waterline, go to Settings → Integrations → GitHub and enter the token when prompted.

Webhook setup

When you connect a repository, Waterline automatically registers a webhook on GitHub pointing to:
{API_BASE_URL}/api/sync/github/webhook
The webhook listens for push events. Each push triggers an incremental re-index of only the files that changed — your index stays current without a full repository crawl.
For webhooks to work, your Waterline API must be reachable from GitHub’s servers. During local development, use a tunnel like ngrok to expose your local server, then set API_BASE_URL in your .env to the tunnel URL.

What gets indexed

Waterline processes every source file in the repository up to the configured limits. What’s extracted per symbol:
  • Name, file path, and line range
  • LLM-generated semantic summary
  • Embedding vector for semantic search
What’s excluded:
  • Binary files
  • Lock files (package-lock.json, yarn.lock, Pipfile.lock)
  • node_modules/, .git/, and build artifact directories
Default index limits:
LimitDefaultEnvironment variable
Max files per repository2,000REPO_MAX_FILES
Max symbols per repository15,000REPO_MAX_SYMBOLS
You can raise these limits in your .env file if your repository is large.

Incremental sync

Every push webhook triggers a targeted sync pipeline — only changed files are re-processed, regardless of repository size:
1

Check the cursor

Waterline reads the last-processed commit SHA to know where to start.
2

Fetch changed commits

All commits since the last sync are fetched from GitHub.
3

Re-index changed files

For each changed file, Waterline re-extracts symbols, regenerates LLM summaries, and updates the search index.
4

Re-index dependents

Files that import any changed symbol are re-indexed too (one hop of dependency traversal).
5

Advance the cursor

The sync cursor advances to the latest commit SHA, ready for the next push.
A push that modifies three files costs three files’ worth of processing — not a full repository crawl.

Self-hosted OAuth app setup

If you’re running Waterline yourself, create a GitHub OAuth app at github.com/settings/developers:
FieldValue
Application nameWaterline (or your organization’s name)
Homepage URLhttps://your-domain.com
Authorization callback URLhttps://your-api.com/api/connect/github/callback
Then add the following to your .env:
GITHUB_CLIENT_ID=your_client_id
GITHUB_CLIENT_SECRET=your_client_secret
GITHUB_REDIRECT_URI=https://your-api.com/api/connect/github/callback