gitcrawl/docs/reference.md
2026-05-05 04:38:46 +01:00

5.5 KiB

title nav_order permalink
Reference 16 /reference/

Reference

{: .no_toc }

Lookup tables for paths, environment variables, and defaults. {: .fs-6 .fw-300 }

  1. TOC {:toc}

Paths

Path Purpose
~/.config/gitcrawl/config.toml Configuration file
~/.config/gitcrawl/gitcrawl.db SQLite database
~/.config/gitcrawl/cache/ Caches (PR detail, gh-shim fallthrough)
~/.config/gitcrawl/cache/gh-shim/ gh-shim fallthrough cache
~/.config/gitcrawl/vectors/ Vector store backing embeddings
~/.config/gitcrawl/logs/ Operational logs
~/.config/gitcrawl/portable/ Portable-store checkout (when configured)

Override the config root with --config <path> or GITCRAWL_CONFIG.

Environment variables

Core

Variable Default Used by Purpose
GITCRAWL_CONFIG ~/.config/gitcrawl/config.toml All commands Override config path
GITCRAWL_DB_PATH ~/.config/gitcrawl/gitcrawl.db All commands Override database path
GITHUB_TOKEN (none) sync, gh shim GitHub API token
OPENAI_API_KEY (none) embed, refresh OpenAI API key

Models

Variable Default Purpose
GITCRAWL_SUMMARY_MODEL gpt-5.4 Summary model (reserved for future commands)
GITCRAWL_EMBED_MODEL text-embedding-3-small OpenAI embedding model
GITCRAWL_OPENAI_RETRY_DISABLED (off) Set 1 to disable OpenAI retry/backoff
GITCRAWL_OPENAI_BASE_URL / OPENAI_BASE_URL OpenAI default Custom OpenAI endpoint

GitHub overrides

Variable Default Purpose
GITCRAWL_GITHUB_BASE_URL / GITHUB_BASE_URL GitHub default Custom GitHub API endpoint
GH_HOST (none) Included in gh-shim cache key
GH_REPO (none) Default -R value; included in gh-shim cache key

gh shim

Variable Default Purpose
GITCRAWL_GH_PATH (probed) Path to the real gh binary
GITCRAWL_GH_AUTO_HYDRATE (on) Set 0 to disable PR auto-hydration on cache miss
GITCRAWL_GH_CACHE_TTL 30s for most commands Override fallthrough cache TTL (e.g., 5m, 1h)
GITCRAWL_GH_CACHE_ERRORS (on) Set 0 to avoid caching non-zero read-only fallthroughs

Configuration defaults

Field Default
summary_model gpt-5.4
embed_model text-embedding-3-small
embed_dimensions 1024
embedding_basis title_original
batch_size (embeddings) 64
concurrency (embeddings) 2
tui_default_sort size

Clustering defaults

Parameter Default Source
--threshold 0.80 cluster, refresh
--cross-kind-threshold 0.93 cluster, refresh
--min-size 1 cluster, refresh
--max-cluster-size 40 cluster, refresh
--k (nearest-neighbor fanout) 16 cluster, refresh
Weak-edge title overlap floor 0.18 internal
High-confidence edge score 0.90 internal
Deterministic reference edge score 0.94 internal
Body-only reference prefix length 240 chars internal

TUI defaults

Parameter Default
--min-size 5
--sort size
Working set limit 500 rows
Refresh interval 15s

gh shim cache TTLs

Cache class TTL
Most read-only fallthroughs 30s
gh api (GET) 60s
gh pr diff without stable head SHA 5m
gh pr diff with stable head SHA 7d
Override GITCRAWL_GH_CACHE_TTL
Cache read failures on by default; disable with GITCRAWL_GH_CACHE_ERRORS=0

gh shim cache key composition

A SHA-256 hash of:

  • Version tag (v2)
  • Resolved gitcrawl config path
  • Current working directory
  • GH_HOST env var
  • GH_REPO env var
  • For gh pr diff: pr-diff:owner/repo:number:head-sha (when head SHA is known)
  • Full command argument vector (null-separated)

This isolates sibling checkouts and portable stores while coalescing repeated calls from the same workspace.

Output formats

Format Where to use
text Human terminal use (default)
json Pipelines, scripts, agents (also via --json)
log Internal structured logging output

Exit codes

  • 0 — success
  • non-zero — usage error, "not implemented" command, or runtime failure

stderr always carries error messages. stdout is reserved for command output.

File-system layout (worked example)

~/.config/gitcrawl/
├── config.toml
├── gitcrawl.db                  # SQLite mirror
├── gitcrawl.db-shm              # SQLite shared-memory file
├── gitcrawl.db-wal              # SQLite write-ahead log
├── cache/
│   ├── gh-shim/                 # gh fallthrough cache; inspect with xcache
│   └── pr/                      # hydrated PR detail blobs
├── vectors/                     # vector store backing embeddings
├── logs/
└── portable/                    # portable-store checkout (optional)
    └── data/
        └── owner__repo.sync.db

See also