gitcrawl/docs/configuration.md
2026-05-05 21:23:39 +01:00

5.2 KiB

title nav_order permalink
Configuration 5 /configuration/

Configuration

{: .no_toc }

Where gitcrawl reads settings from, and how to override them. {: .fs-6 .fw-300 }

  1. TOC {:toc}

Resolution order

For each setting, gitcrawl looks in this order and uses the first match:

  1. CLI flag (e.g., --config, --summary-model)
  2. Environment variable (GITCRAWL_*, then standard GITHUB_TOKEN / OPENAI_API_KEY)
  3. [env] table inside config.toml
  4. Top-level config field inside config.toml
  5. Built-in default

Default paths

Path Purpose
~/.config/gitcrawl/config.toml Configuration file
~/.config/gitcrawl/gitcrawl.db SQLite database
~/.config/gitcrawl/cache/ Caches (PR detail, gh shim fallthrough)
~/.config/gitcrawl/cache/gh-shim/ gh shim fallthrough cache
~/.config/gitcrawl/vectors/ Vector store backing embeddings
~/.config/gitcrawl/logs/ Operational logs

Override the config root by setting GITCRAWL_CONFIG=/path/to/config.toml or by passing --config to any command.

config.toml

gitcrawl init writes a minimal config. You can edit it by hand or with gitcrawl configure:

summary_model = "gpt-5.4"
embed_model = "text-embedding-3-small"
embed_dimensions = 1024
embedding_basis = "title_original"

[env]
GITHUB_TOKEN = "<github-token>"
OPENAI_API_KEY = "<openai-api-key>"

[portable_store]
url = "https://github.com/org/portable-store.git"
db_path = "data/openclaw__openclaw.sync.db"
checkout_dir = "/Users/me/.config/gitcrawl/portable"

Notable fields

Field Default Notes
summary_model gpt-5.4 Reserved for future summary commands
embed_model text-embedding-3-small OpenAI embedding model
embed_dimensions 1024 Must match the model
embedding_basis title_original Only title_original is implemented
[env] (empty) Sets process env at startup; useful for tokens you do not want in your shell rc
[portable_store] (empty) Used when working from a shared, Git-backed cache

Environment variables

Core

Variable Purpose
GITCRAWL_CONFIG Override config path
GITCRAWL_DB_PATH Override database path
GITHUB_TOKEN GitHub API token (required for sync, gh shim fallthroughs)
OPENAI_API_KEY OpenAI API key (required for embed)

Model overrides

Variable Purpose
GITCRAWL_SUMMARY_MODEL Override summary model
GITCRAWL_EMBED_MODEL Override embedding model
GITCRAWL_OPENAI_RETRY_DISABLED Set to 1 to disable OpenAI retry/backoff
GITCRAWL_OPENAI_BASE_URL / OPENAI_BASE_URL Custom OpenAI endpoint (e.g., for a proxy)

GitHub overrides

Variable Purpose
GITCRAWL_GITHUB_BASE_URL / GITHUB_BASE_URL Custom GitHub API endpoint (used by sync and the gh shim)
GH_HOST GitHub host; included in the gh shim cache key
GH_REPO Default repo when -R is omitted; included in the gh shim cache key

gh shim

Variable Purpose
GITCRAWL_GH_PATH Path to the real gh binary used for fallthrough
GITCRAWL_GH_AUTO_HYDRATE Set to 0 to disable PR auto-hydration on cache miss
GITCRAWL_GH_CACHE_TTL Override fallthrough cache TTL (e.g., 5m, 1h)
GITCRAWL_GH_STALE_GRACE Override stale-while-revalidate grace for expired successful fallthrough entries
GITCRAWL_GH_CACHE_ERRORS Set to 0 to avoid caching non-zero read-only fallthroughs

If GITCRAWL_GH_PATH is unset, the shim probes common Homebrew install paths and then your PATH. Set it explicitly when you symlink the gitcrawl binary as gh (otherwise the shim will recurse into itself).

Global flags

These flags work on every command:

Flag Default Description
--config <path> $GITCRAWL_CONFIG or default Override config path for this invocation
--format text|json|log text Output format
--json (off) Shorthand for --format json
--no-color (off) Suppress ANSI color codes
--version (off) Print version and exit (global only)

--json overrides --format. Both are honored on subcommands that produce output.

gitcrawl configure

Interactive-friendly config edits without opening the file:

gitcrawl configure --summary-model gpt-5.4
gitcrawl configure --embed-model text-embedding-3-small
gitcrawl configure --embedding-basis title_original
gitcrawl configure --json

Returns the resolved config path, the values that were updated, and the now-current model selection. See gitcrawl configure --help.

gitcrawl doctor

A health check for everything covered above:

gitcrawl doctor          # human-readable
gitcrawl doctor --json   # for scripts

Reports config path and existence, database path, whether GITHUB_TOKEN and OPENAI_API_KEY are present (and whether they came from env vs. config), the active summary/embed models, the embedding basis, and counts of repositories, threads, open threads, clusters, plus the last sync timestamp. If the API call surface is unsupported (older Go, missing crypto), api_supported: false is reported so you can investigate.