diff --git a/docs/automation.md b/docs/automation.md index d348c3b..7aa0626 100644 --- a/docs/automation.md +++ b/docs/automation.md @@ -23,7 +23,7 @@ gitcrawl clusters owner/repo --json --sort size --min-size 5 \ | jq '.clusters[] | {id, members: .member_count, latest: .latest_thread_number}' ``` -For the full per-command JSON shapes, see the individual feature pages and the [Commands reference](./commands). +For the full per-command JSON shapes, see the individual feature pages and the [Commands reference](/commands/). ## Exit codes @@ -51,7 +51,7 @@ Best for ad-hoc agent tools that should bound staleness but minimize sync calls. ### Auto-hydration via the gh shim -Symlink the gitcrawl binary as `gh` (or `gitcrawl-gh`) and let the shim pull a single PR's detail when an agent calls `gh pr view` or `gh pr checks` against an unhydrated PR. See [gh shim → auto-hydration](./gh-shim#auto-hydration). +Symlink the gitcrawl binary as `gh` (or `gitcrawl-gh`) and let the shim pull a single PR's detail when an agent calls `gh pr view` or `gh pr checks` against an unhydrated PR. See [gh shim → auto-hydration](/gh-shim/#auto-hydration). This is the lowest-overhead pattern for fleets of agents — no scheduling required. @@ -61,7 +61,7 @@ Run `gitcrawl refresh owner/repo` on a cron, systemd timer, or `launchd` agent e ```cron # Every 5 minutes, refresh the active repos. -*/5 * * * * /usr/local/bin/gitcrawl refresh openclaw/gitcrawl --json > /tmp/gitcrawl.openclaw.json 2>&1 +*/5 * * * * $HOME/bin/gitcrawl refresh openclaw/gitcrawl --json > /tmp/gitcrawl.openclaw.json 2>&1 ``` For multiple repos, loop in a small shell script — gitcrawl is happy to run sequentially against a shared SQLite file. diff --git a/docs/clustering.md b/docs/clustering.md index 80eab3f..552a49f 100644 --- a/docs/clustering.md +++ b/docs/clustering.md @@ -142,7 +142,7 @@ Or slice it manually: gitcrawl exclude-cluster-member owner/repo --id 12 --number 456 --reason "different repro" ``` -See [Governance](./governance) for the full override workflow. +See [Governance](/governance/) for the full override workflow. ## Re-clustering and stable IDs @@ -155,6 +155,6 @@ Cluster runs are recorded in `run_records` and visible via `gitcrawl runs --kind ## See also -- [Governance](./governance) — close clusters, exclude members, set canonical -- [TUI](./tui) — the interactive cluster browser -- [Concepts](./concepts#cluster) — durable clusters and cluster kinds +- [Governance](/governance/) — close clusters, exclude members, set canonical +- [TUI](/tui/) — the interactive cluster browser +- [Concepts](/concepts/#cluster) — durable clusters and cluster kinds diff --git a/docs/commands.md b/docs/commands.md index e2f99e8..b2cd642 100644 --- a/docs/commands.md +++ b/docs/commands.md @@ -30,78 +30,78 @@ These work on every command. | Command | Purpose | Detailed docs | | --- | --- | --- | -| `gitcrawl init [--db --portable-store --portable-db --store-dir --json]` | Create config, database, runtime directories; optionally clone a portable store | [Installation](./installation), [Portable stores](./portable-stores) | -| `gitcrawl doctor [--json]` | Health check for config, database, credentials, model selection, repo/thread counts | [Configuration](./configuration#gitcrawl-doctor) | -| `gitcrawl configure [--summary-model --embed-model --embedding-basis --json]` | Update model fields in `config.toml` | [Configuration](./configuration#gitcrawl-configure) | +| `gitcrawl init [--db --portable-store --portable-db --store-dir --json]` | Create config, database, runtime directories; optionally clone a portable store | [Installation](/installation/), [Portable stores](/portable-stores/) | +| `gitcrawl doctor [--json]` | Health check for config, database, credentials, model selection, repo/thread counts | [Configuration](/configuration/#gitcrawl-doctor) | +| `gitcrawl configure [--summary-model --embed-model --embedding-basis --json]` | Update model fields in `config.toml` | [Configuration](/configuration/#gitcrawl-configure) | | `gitcrawl version` | Print version | — | ## Sync | Command | Purpose | Docs | | --- | --- | --- | -| `gitcrawl sync owner/repo [--state --since --numbers --limit --include-comments --include-pr-details --with pr-details --json]` | Sync issues and PRs from GitHub into local SQLite | [Sync](./sync) | -| `gitcrawl refresh owner/repo [--no-sync --no-embed --no-cluster ...]` | Wrapper that runs sync → embed → cluster | [Refresh and embed](./refresh-and-embed) | -| `gitcrawl embed owner/repo [--number --limit --force --include-closed --json]` | Generate OpenAI embeddings for thread documents | [Refresh and embed](./refresh-and-embed#embed) | -| `gitcrawl runs owner/repo [--kind sync\|embedding\|cluster --limit --json]` | List recorded run history | [Refresh and embed](./refresh-and-embed#runs) | +| `gitcrawl sync owner/repo [--state --since --numbers --limit --include-comments --include-pr-details --with pr-details --json]` | Sync issues and PRs from GitHub into local SQLite | [Sync](/sync/) | +| `gitcrawl refresh owner/repo [--no-sync --no-embed --no-cluster ...]` | Wrapper that runs sync → embed → cluster | [Refresh and embed](/refresh-and-embed/) | +| `gitcrawl embed owner/repo [--number --limit --force --include-closed --json]` | Generate OpenAI embeddings for thread documents | [Refresh and embed](/refresh-and-embed/#embed) | +| `gitcrawl runs owner/repo [--kind sync\|embedding\|cluster --limit --json]` | List recorded run history | [Refresh and embed](/refresh-and-embed/#runs) | ## Inspect | Command | Purpose | Docs | | --- | --- | --- | | `gitcrawl threads owner/repo [--include-closed --numbers --limit --json]` | List threads from local cache | — | -| `gitcrawl search owner/repo --query [--mode keyword\|semantic\|hybrid --limit --json]` | Local search (direct mode) | [Search](./search) | -| `gitcrawl search issues\|prs -R owner/repo [--state --json --limit --sync-if-stale]` | Local search (`gh search` shape) | [Search](./search#gh-search-compatibility-mode) | -| `gitcrawl neighbors owner/repo --number [--limit --threshold --json]` | Vector-similar threads to a specific issue/PR | [Clustering](./clustering#find-similar-threads-neighbors) | +| `gitcrawl search owner/repo --query [--mode keyword\|semantic\|hybrid --limit --json]` | Local search (direct mode) | [Search](/search/) | +| `gitcrawl search issues\|prs -R owner/repo [--state --json --limit --sync-if-stale]` | Local search (`gh search` shape) | [Search](/search/#gh-search-compatibility-mode) | +| `gitcrawl neighbors owner/repo --number [--limit --threshold --json]` | Vector-similar threads to a specific issue/PR | [Clustering](/clustering/#find-similar-threads-neighbors) | ## Cluster | Command | Purpose | Docs | | --- | --- | --- | -| `gitcrawl cluster owner/repo [--threshold --min-size --max-cluster-size --k --cross-kind-threshold --limit --model --basis --include-closed --json]` | Build durable clusters from vectors | [Clustering](./clustering#generate-clusters) | -| `gitcrawl clusters owner/repo [--sort size\|recent\|oldest --min-size --limit --hide-closed --json]` | Latest-run cluster summary, merged with closed durable rows | [Clustering](./clustering#list-clusters) | -| `gitcrawl durable-clusters owner/repo [--include-closed --sort --min-size --limit --json]` | Strict durable-cluster audit view | [Clustering](./clustering#list-clusters) | -| `gitcrawl cluster-detail owner/repo --id [--member-limit --body-chars --include-closed --json]` | Cluster + members detail | [Clustering](./clustering#inspect-a-cluster) | -| `gitcrawl cluster-explain owner/repo --id [...]` | Alias for `cluster-detail` | [Clustering](./clustering#inspect-a-cluster) | +| `gitcrawl cluster owner/repo [--threshold --min-size --max-cluster-size --k --cross-kind-threshold --limit --model --basis --include-closed --json]` | Build durable clusters from vectors | [Clustering](/clustering/#generate-clusters) | +| `gitcrawl clusters owner/repo [--sort size\|recent\|oldest --min-size --limit --hide-closed --json]` | Latest-run cluster summary, merged with closed durable rows | [Clustering](/clustering/#list-clusters) | +| `gitcrawl durable-clusters owner/repo [--include-closed --sort --min-size --limit --json]` | Strict durable-cluster audit view | [Clustering](/clustering/#list-clusters) | +| `gitcrawl cluster-detail owner/repo --id [--member-limit --body-chars --include-closed --json]` | Cluster + members detail | [Clustering](/clustering/#inspect-a-cluster) | +| `gitcrawl cluster-explain owner/repo --id [...]` | Alias for `cluster-detail` | [Clustering](/clustering/#inspect-a-cluster) | ## Governance | Command | Purpose | Docs | | --- | --- | --- | -| `gitcrawl close-thread owner/repo --number [--reason --json]` | Local close on a thread | [Governance](./governance#local-close) | +| `gitcrawl close-thread owner/repo --number [--reason --json]` | Local close on a thread | [Governance](/governance/#local-close) | | `gitcrawl reopen-thread owner/repo --number [--json]` | Inverse | — | -| `gitcrawl close-cluster owner/repo --id [--reason --json]` | Local close on a cluster | [Governance](./governance#local-close) | +| `gitcrawl close-cluster owner/repo --id [--reason --json]` | Local close on a cluster | [Governance](/governance/#local-close) | | `gitcrawl reopen-cluster owner/repo --id [--json]` | Inverse | — | -| `gitcrawl exclude-cluster-member owner/repo --id --number [--reason --json]` | Pull a thread out of a cluster | [Governance](./governance#member-exclusion) | +| `gitcrawl exclude-cluster-member owner/repo --id --number [--reason --json]` | Pull a thread out of a cluster | [Governance](/governance/#member-exclusion) | | `gitcrawl include-cluster-member owner/repo --id --number [--reason --json]` | Inverse | — | -| `gitcrawl set-cluster-canonical owner/repo --id --number [--reason --json]` | Pin canonical thread for a cluster | [Governance](./governance#canonical-member) | +| `gitcrawl set-cluster-canonical owner/repo --id --number [--reason --json]` | Pin canonical thread for a cluster | [Governance](/governance/#canonical-member) | ## TUI | Command | Purpose | Docs | | --- | --- | --- | -| `gitcrawl tui [owner/repo] [--min-size --sort --limit --hide-closed --json]` | Interactive cluster browser; `--json` emits a snapshot instead of launching the UI | [TUI](./tui) | +| `gitcrawl tui [owner/repo] [--min-size --sort --limit --hide-closed --json]` | Interactive cluster browser; `--json` emits a snapshot instead of launching the UI | [TUI](/tui/) | ## gh shim | Command | Purpose | Docs | | --- | --- | --- | -| `gitcrawl gh search issues\|prs -R owner/repo [...]` | Local-first `gh search` | [gh shim](./gh-shim) | -| `gitcrawl gh issue view -R owner/repo --json ` | Local-first thread view | [gh shim](./gh-shim) | -| `gitcrawl gh pr view -R owner/repo --json ` | Same, for PRs (with auto-hydration) | [gh shim](./gh-shim) | -| `gitcrawl gh issue list -R owner/repo [--state --search --author --assignee --label --json]` | Local-first list | [gh shim](./gh-shim) | -| `gitcrawl gh pr list -R owner/repo [...]` | Same, for PRs | [gh shim](./gh-shim) | -| `gitcrawl gh pr checks -R owner/repo --json ` | Cached PR checks (auto-hydrates if stale) | [gh shim](./gh-shim) | -| `gitcrawl gh pr diff -R owner/repo` | Falls through; cached by head SHA | [gh shim](./gh-shim) | -| `gitcrawl gh run list -R owner/repo [--branch --commit --json]` | Cached workflow runs | [gh shim](./gh-shim) | -| `gitcrawl gh run view -R owner/repo [--json]` | Same, single run | [gh shim](./gh-shim) | -| `gitcrawl gh repo view\|list ...` | Falls through; cached briefly | [gh shim](./gh-shim) | -| `gitcrawl gh release list\|view ...` | Falls through; cached briefly | [gh shim](./gh-shim#read-only-fallthroughs-cached) | -| `gitcrawl gh workflow list\|view ...` | Falls through; cached briefly | [gh shim](./gh-shim#read-only-fallthroughs-cached) | -| `gitcrawl gh secret list ...` / `variable get\|list ...` | Falls through; cached briefly | [gh shim](./gh-shim#read-only-fallthroughs-cached) | -| `gitcrawl gh label list ...` | Falls through; cached briefly | [gh shim](./gh-shim) | -| `gitcrawl gh api ` | Falls through; cached briefly (GET-only) | [gh shim](./gh-shim) | -| `gitcrawl gh xcache stats\|keys\|gc\|flush [--json]` | Cache inspection / housekeeping | [gh shim](./gh-shim#cache-inspection-xcache) | -| _Anything else_ | Falls through to real `gh` | [gh shim](./gh-shim) | +| `gitcrawl gh search issues\|prs -R owner/repo [...]` | Local-first `gh search` | [gh shim](/gh-shim/) | +| `gitcrawl gh issue view -R owner/repo --json ` | Local-first thread view | [gh shim](/gh-shim/) | +| `gitcrawl gh pr view -R owner/repo --json ` | Same, for PRs (with auto-hydration) | [gh shim](/gh-shim/) | +| `gitcrawl gh issue list -R owner/repo [--state --search --author --assignee --label --json]` | Local-first list | [gh shim](/gh-shim/) | +| `gitcrawl gh pr list -R owner/repo [...]` | Same, for PRs | [gh shim](/gh-shim/) | +| `gitcrawl gh pr checks -R owner/repo --json ` | Cached PR checks (auto-hydrates if stale) | [gh shim](/gh-shim/) | +| `gitcrawl gh pr diff -R owner/repo` | Falls through; cached by head SHA | [gh shim](/gh-shim/) | +| `gitcrawl gh run list -R owner/repo [--branch --commit --json]` | Cached workflow runs | [gh shim](/gh-shim/) | +| `gitcrawl gh run view -R owner/repo [--json]` | Same, single run | [gh shim](/gh-shim/) | +| `gitcrawl gh repo view\|list ...` | Falls through; cached briefly | [gh shim](/gh-shim/) | +| `gitcrawl gh release list\|view ...` | Falls through; cached briefly | [gh shim](/gh-shim/#read-only-fallthroughs-cached) | +| `gitcrawl gh workflow list\|view ...` | Falls through; cached briefly | [gh shim](/gh-shim/#read-only-fallthroughs-cached) | +| `gitcrawl gh secret list ...` / `variable get\|list ...` | Falls through; cached briefly | [gh shim](/gh-shim/#read-only-fallthroughs-cached) | +| `gitcrawl gh label list ...` | Falls through; cached briefly | [gh shim](/gh-shim/) | +| `gitcrawl gh api ` | Falls through; cached briefly (GET-only) | [gh shim](/gh-shim/) | +| `gitcrawl gh xcache stats\|keys\|gc\|flush [--json]` | Cache inspection / housekeeping | [gh shim](/gh-shim/#cache-inspection-xcache) | +| _Anything else_ | Falls through to real `gh` | [gh shim](/gh-shim/) | The shim binary can be installed standalone by symlinking the `gitcrawl` binary as `gh` or `gitcrawl-gh`. @@ -109,7 +109,7 @@ The shim binary can be installed standalone by symlinking the `gitcrawl` binary | Command | Purpose | Docs | | --- | --- | --- | -| `gitcrawl portable prune [--body-chars --no-vacuum --json]` | Truncate thread bodies and (optionally) `VACUUM` for a small publishable database | [Portable stores](./portable-stores#publishing-gitcrawl-portable-prune) | +| `gitcrawl portable prune [--body-chars --no-vacuum --json]` | Truncate thread bodies and (optionally) `VACUUM` for a small publishable database | [Portable stores](/portable-stores/#publishing-gitcrawl-portable-prune) | ## Not yet implemented diff --git a/docs/concepts.md b/docs/concepts.md index e42fe1b..62ef77f 100644 --- a/docs/concepts.md +++ b/docs/concepts.md @@ -17,7 +17,7 @@ The handful of nouns gitcrawl uses, and how they connect. A **repository** is the `owner/repo` you sync. Every gitcrawl command takes one, and most state in SQLite is keyed by it. You can mirror as many repos as you like into a single `gitcrawl.db`; commands always scope to the one you name. -The mirror is metadata-first: titles, bodies, authors, labels, state, timestamps, and IDs land in SQLite immediately. Comments, reviews, review comments, and full PR detail (files, commits, checks, workflow runs) are opt-in on a per-sync basis (see [Sync](./sync)). +The mirror is metadata-first: titles, bodies, authors, labels, state, timestamps, and IDs land in SQLite immediately. Comments, reviews, review comments, and full PR detail (files, commits, checks, workflow runs) are opt-in on a per-sync basis (see [Sync](/sync/)). ## Thread @@ -78,7 +78,7 @@ Per-cluster maintainer overrides let you correct what the algorithm produced wit - **Member exclusion** (`exclude-cluster-member`/`include-cluster-member`) — pulls a specific thread out of a cluster and remembers why. - **Canonical member** (`set-cluster-canonical`) — pins which thread represents the cluster. -See [Governance](./governance) for the full workflow. +See [Governance](/governance/) for the full workflow. ## Run @@ -88,7 +88,7 @@ Every sync, embed, and cluster operation records a **run** in `run_records` with A **portable store** is a Git-backed publish target for a `gitcrawl.db` plus its derived bodies, designed for sharing a local cache across agents or machines without a hosted service. -`gitcrawl init --portable-store https://github.com/org/repo` clones a portable store into `~/.config/gitcrawl/portable/`, points the runtime at it, and `gitcrawl portable prune --body-chars 256` keeps the published payload small. Read-only commands run against portable stores refresh the checkout before reading. See [Portable stores](./portable-stores). +`gitcrawl init --portable-store https://github.com/org/repo` clones a portable store into `~/.config/gitcrawl/portable/`, points the runtime at it, and `gitcrawl portable prune --body-chars 256` keeps the published payload small. Read-only commands run against portable stores refresh the checkout before reading. See [Portable stores](/portable-stores/). ## Cache @@ -97,4 +97,4 @@ The `cache/` directory under `~/.config/gitcrawl/` holds: - `cache/gh-shim/` — the short-lived fallthrough cache for the `gh` shim, keyed by config path, CWD, `GH_HOST`, `GH_REPO`, and command args. Inspect or clean it with `gitcrawl gh xcache stats|keys|gc|flush`. - `cache/pr/` — hydrated PR detail blobs used to answer `gh pr view`, `gh pr checks`, and `gh run` reads from local SQLite. -See [gh shim](./gh-shim) for the cache key composition and TTL behavior. +See [gh shim](/gh-shim/) for the cache key composition and TTL behavior. diff --git a/docs/configuration.md b/docs/configuration.md index db8912b..2dd7113 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -47,8 +47,8 @@ embed_dimensions = 1024 embedding_basis = "title_original" [env] -GITHUB_TOKEN = "ghp_xxx" -OPENAI_API_KEY = "sk-xxx" +GITHUB_TOKEN = "" +OPENAI_API_KEY = "" [portable_store] url = "https://github.com/org/portable-store.git" @@ -102,6 +102,7 @@ checkout_dir = "/Users/me/.config/gitcrawl/portable" | `GITCRAWL_GH_PATH` | Path to the real `gh` binary used for fallthrough | | `GITCRAWL_GH_AUTO_HYDRATE` | Set to `0` to disable PR auto-hydration on cache miss | | `GITCRAWL_GH_CACHE_TTL` | Override fallthrough cache TTL (e.g., `5m`, `1h`) | +| `GITCRAWL_GH_CACHE_ERRORS` | Set to `0` to avoid caching non-zero read-only fallthroughs | If `GITCRAWL_GH_PATH` is unset, the shim probes common Homebrew install paths and then your `PATH`. Set it explicitly when you symlink the gitcrawl binary as `gh` (otherwise the shim will recurse into itself). diff --git a/docs/gh-shim.md b/docs/gh-shim.md index 8e7abbf..cb7f9a4 100644 --- a/docs/gh-shim.md +++ b/docs/gh-shim.md @@ -23,14 +23,16 @@ The shim never adds GitHub write behavior. Mutating commands (`gh issue close`, ```bash # Side-by-side: agents opt in by calling `gitcrawl-gh`. -ln -s "$(command -v gitcrawl)" /usr/local/bin/gitcrawl-gh +mkdir -p "$HOME/bin" +ln -sf "$(command -v gitcrawl)" "$HOME/bin/gitcrawl-gh" # Or replace the global `gh` so every caller picks up the cache automatically. -ln -s "$(command -v gitcrawl)" /usr/local/bin/gh -export GITCRAWL_GH_PATH=/opt/homebrew/bin/gh # tell the shim where the real gh is +REAL_GH="$(command -v gh)" # capture this before shadowing gh +ln -sf "$(command -v gitcrawl)" "$HOME/bin/gh" +export GITCRAWL_GH_PATH="$REAL_GH" # tell the shim where the real gh is ``` -If `GITCRAWL_GH_PATH` is unset, the shim probes common Homebrew paths and then `PATH`. Set it explicitly when you replace the global `gh` so the shim does not recurse into itself. +Make sure `~/bin` is on `PATH` before the original `gh` location if you want the shim to be picked up as `gh`. If `GITCRAWL_GH_PATH` is unset, the shim probes common Homebrew paths and then `PATH`. Set it explicitly when you replace the global `gh` so the shim does not recurse into itself. ## Supported local reads @@ -138,8 +140,8 @@ All accept `--json` for scripting. "pass_through_writes": 4 }, "commands": { - "gh pr view": { "entries": 30, "bytes": 184320 }, - "gh search issues": { "entries": 14, "bytes": 18230 } + "pr diff": { "entries": 30, "bytes": 184320 }, + "release view": { "entries": 14, "bytes": 18230 } } } ``` @@ -172,4 +174,4 @@ Pattern: replace `gh` with `gitcrawl-gh` (or symlink to `gh`) for every agent in For best results, schedule a periodic `gitcrawl refresh owner/repo` (every few minutes per repo, depending on activity) so the local mirror stays warm. The shim's `--sync-if-stale` (via `gitcrawl search`) and auto-hydration handle the rest. -See [Automation](./automation) for full agent recipes and JSON contracts. +See [Automation](/automation/) for full agent recipes and JSON contracts. diff --git a/docs/governance.md b/docs/governance.md index 950a3d9..1b18008 100644 --- a/docs/governance.md +++ b/docs/governance.md @@ -127,4 +127,4 @@ The thread stays open on GitHub; only your local triage view hides it. - It does not edit, label, comment on, or close GitHub issues. Use `gh` for that. - It does not retrain embeddings or reshape the underlying graph — it overlays decisions on top of the algorithm output. -- It does not propagate to other gitcrawl installations unless you publish your database via a [portable store](./portable-stores). +- It does not propagate to other gitcrawl installations unless you publish your database via a [portable store](/portable-stores/). diff --git a/docs/index.md b/docs/index.md index 47cebcd..0e2c2e5 100644 --- a/docs/index.md +++ b/docs/index.md @@ -12,7 +12,7 @@ permalink: / A local-first GitHub issue and pull request crawler for maintainer triage. Sync, search, cluster, and review related threads from a SQLite cache that lives entirely on your machine. {: .fs-6 .fw-300 } -[Quickstart](./quickstart){: .btn .btn-primary .fs-5 .mb-4 .mb-md-0 .mr-2 } +[Quickstart](/quickstart/){: .btn .btn-primary .fs-5 .mb-4 .mb-md-0 .mr-2 } [View on GitHub](https://github.com/openclaw/gitcrawl){: .btn .fs-5 .mb-4 .mb-md-0 } --- @@ -34,16 +34,16 @@ A local-first GitHub issue and pull request crawler for maintainer triage. Sync,
### I want to try it -[Quickstart](./quickstart) walks you from `git clone` to a populated cluster view in five minutes. +[Quickstart](/quickstart/) walks you from `git clone` to a populated cluster view in five minutes. ### I want to wire up an agent -The [`gh` shim](./gh-shim) is the fastest way to cut GitHub API load — point your agent at `gitcrawl-gh`, keep the agent's `gh` calls intact. +The [`gh` shim](/gh-shim/) is the fastest way to cut GitHub API load — point your agent at `gitcrawl-gh`, keep the agent's `gh` calls intact. ### I want to triage a busy repo -Read [Sync](./sync) to bring data local, then [Clustering](./clustering) and the [TUI](./tui) for the maintainer workflow. +Read [Sync](/sync/) to bring data local, then [Clustering](/clustering/) and the [TUI](/tui/) for the maintainer workflow. ### I want the full reference -[Commands](./commands) lists every flag and JSON field. [Configuration](./configuration) covers env vars and paths. +[Commands](/commands/) lists every flag and JSON field. [Configuration](/configuration/) covers env vars and paths.
diff --git a/docs/installation.md b/docs/installation.md index d72d83d..ced9281 100644 --- a/docs/installation.md +++ b/docs/installation.md @@ -26,13 +26,16 @@ Each tagged release publishes archives for `darwin_amd64`, `darwin_arm64`, `linu ```bash # Replace VERSION and PLATFORM with the values you want. -curl -L "https://github.com/openclaw/gitcrawl/releases/download/v0.1.2/gitcrawl_0.1.2_darwin_arm64.tar.gz" \ - | tar -xz -C /usr/local/bin gitcrawl +VERSION=v0.1.2 +PLATFORM=darwin_arm64 +mkdir -p "$HOME/bin" +curl -L "https://github.com/openclaw/gitcrawl/releases/download/${VERSION}/gitcrawl_${VERSION#v}_${PLATFORM}.tar.gz" \ + | tar -xz -C "$HOME/bin" gitcrawl gitcrawl --version ``` -Browse the [releases page](https://github.com/openclaw/gitcrawl/releases) for the latest tag and the full asset list. +Browse the [releases page](https://github.com/openclaw/gitcrawl/releases) for the latest tag and the full asset list. Use a directory that is already on your `PATH`; `~/bin` and `~/.local/bin` avoid needing elevated permissions. ## Install from source @@ -54,14 +57,16 @@ The shim is the same binary. Symlink it as `gh` (replacing the real CLI) or as ` ```bash # Side-by-side install — agents can opt in by calling `gitcrawl-gh`. -ln -s "$(command -v gitcrawl)" /usr/local/bin/gitcrawl-gh +mkdir -p "$HOME/bin" +ln -sf "$(command -v gitcrawl)" "$HOME/bin/gitcrawl-gh" # Or replace the global `gh` so every agent picks up the cache automatically. -ln -s "$(command -v gitcrawl)" /usr/local/bin/gh -export GITCRAWL_GH_PATH="$(command -v /opt/homebrew/bin/gh)" # point shim at the real gh +REAL_GH="$(command -v gh)" # capture this before shadowing gh +ln -sf "$(command -v gitcrawl)" "$HOME/bin/gh" +export GITCRAWL_GH_PATH="$REAL_GH" # point shim at the real gh ``` -When invoked as `gh` or `gitcrawl-gh`, the binary auto-detects shim mode. See [the gh shim guide](./gh-shim) for details. +When invoked as `gh` or `gitcrawl-gh`, the binary auto-detects shim mode. See [the gh shim guide](/gh-shim/) for details. ## Verify the install diff --git a/docs/portable-stores.md b/docs/portable-stores.md index 78b008e..88c8575 100644 --- a/docs/portable-stores.md +++ b/docs/portable-stores.md @@ -108,5 +108,5 @@ Other agents and machines pull the new commit on their next read-only command. ## See also -- [Sync](./sync) — what gets written into the database that ends up in the portable store -- [gh shim](./gh-shim) — agents reading a shared portable store benefit doubly from the shim's local-first answers +- [Sync](/sync/) — what gets written into the database that ends up in the portable store +- [gh shim](/gh-shim/) — agents reading a shared portable store benefit doubly from the shim's local-first answers diff --git a/docs/quickstart.md b/docs/quickstart.md index 6bdfa94..54a4214 100644 --- a/docs/quickstart.md +++ b/docs/quickstart.md @@ -19,7 +19,8 @@ Five minutes from clean machine to a populated cluster view. # Build (or download a release archive — see Installation). git clone https://github.com/openclaw/gitcrawl.git cd gitcrawl -go build -o /usr/local/bin/gitcrawl ./cmd/gitcrawl +mkdir -p "$HOME/bin" +go build -o "$HOME/bin/gitcrawl" ./cmd/gitcrawl # Create config + database under ~/.config/gitcrawl. gitcrawl init @@ -36,16 +37,16 @@ Defaults written: ## 2. Set credentials ```bash -export GITHUB_TOKEN=ghp_xxx # required for sync -export OPENAI_API_KEY=sk-xxx # required for embeddings +export GITHUB_TOKEN="" # required for sync +export OPENAI_API_KEY="" # required for embeddings ``` Either set them in your shell profile or store them in `~/.config/gitcrawl/config.toml`: ```toml [env] -GITHUB_TOKEN = "ghp_xxx" -OPENAI_API_KEY = "sk-xxx" +GITHUB_TOKEN = "" +OPENAI_API_KEY = "" ``` `gitcrawl doctor` confirms the credentials are visible and reports their source. @@ -72,7 +73,7 @@ The `refresh` command runs sync → embed → cluster end to end: gitcrawl refresh openclaw/gitcrawl ``` -You can run the stages individually if you want finer control — see [Refresh and embed](./refresh-and-embed) and [Clustering](./clustering). +You can run the stages individually if you want finer control — see [Refresh and embed](/refresh-and-embed/) and [Clustering](/clustering/). ## 5. Browse clusters @@ -116,17 +117,18 @@ Add `--sync-if-stale 5m` to refresh the local mirror first when it is older than ## 7. Wire up the `gh` shim (optional) ```bash -ln -s "$(command -v gitcrawl)" /usr/local/bin/gitcrawl-gh +mkdir -p "$HOME/bin" +ln -sf "$(command -v gitcrawl)" "$HOME/bin/gitcrawl-gh" gitcrawl-gh search issues "download stalls" -R openclaw/gitcrawl --json number,title,url gitcrawl-gh pr view 123 -R openclaw/gitcrawl --json number,title,state,url gitcrawl-gh xcache stats ``` -Most read-only `gh` calls answer locally, mutating commands pass straight through to the real `gh`. See [gh shim](./gh-shim) for the full surface. +Most read-only `gh` calls answer locally, mutating commands pass straight through to the real `gh`. See [gh shim](/gh-shim/) for the full surface. ## Where to next -- [Concepts](./concepts) — what threads, durable clusters, and embeddings actually mean -- [Sync](./sync) — every flag for hydrating the local store -- [Clustering](./clustering) — tuning the cluster graph for a specific repo -- [Automation](./automation) — JSON contracts for agents and scripts +- [Concepts](/concepts/) — what threads, durable clusters, and embeddings actually mean +- [Sync](/sync/) — every flag for hydrating the local store +- [Clustering](/clustering/) — tuning the cluster graph for a specific repo +- [Automation](/automation/) — JSON contracts for agents and scripts diff --git a/docs/reference.md b/docs/reference.md index 672e15e..3db89a4 100644 --- a/docs/reference.md +++ b/docs/reference.md @@ -62,6 +62,7 @@ Override the config root with `--config ` or `GITCRAWL_CONFIG`. | `GITCRAWL_GH_PATH` | _(probed)_ | Path to the real `gh` binary | | `GITCRAWL_GH_AUTO_HYDRATE` | _(on)_ | Set `0` to disable PR auto-hydration on cache miss | | `GITCRAWL_GH_CACHE_TTL` | `30s` for most commands | Override fallthrough cache TTL (e.g., `5m`, `1h`) | +| `GITCRAWL_GH_CACHE_ERRORS` | _(on)_ | Set `0` to avoid caching non-zero read-only fallthroughs | ## Configuration defaults @@ -158,7 +159,7 @@ stderr always carries error messages. stdout is reserved for command output. ## See also -- [Configuration](./configuration) — narrative version of this reference -- [Commands](./commands) — every command and flag, in one table +- [Configuration](/configuration/) — narrative version of this reference +- [Commands](/commands/) — every command and flag, in one table - [SPEC.md](https://github.com/openclaw/gitcrawl/blob/main/SPEC.md) — product contract - [CHANGELOG.md](https://github.com/openclaw/gitcrawl/blob/main/CHANGELOG.md) — what shipped recently diff --git a/docs/refresh-and-embed.md b/docs/refresh-and-embed.md index d9afcba..bc8a7df 100644 --- a/docs/refresh-and-embed.md +++ b/docs/refresh-and-embed.md @@ -21,7 +21,7 @@ gitcrawl refresh owner/repo By default this performs: -1. **Sync** — open + recently closed issues and PRs (see [Sync](./sync)) +1. **Sync** — open + recently closed issues and PRs (see [Sync](/sync/)) 2. **Embed** — fill `thread_vectors` for any thread whose document changed 3. **Cluster** — rebuild durable clusters with the standard thresholds @@ -127,4 +127,4 @@ Each row carries `started_at`, `finished_at`, `status`, and `stats_json` — use - **GitHub.** Sync uses standard REST endpoints; the API quota is the dominant cost on busy repos. Use `--include-comments` and `--with pr-details` selectively. - **OpenAI.** `text-embedding-3-small` is inexpensive but not free. `embed` is bounded by `--limit` if you want to stay under a budget on initial backfills. -- **Disk.** Vectors and PR detail blobs grow with the repo. The portable-store flow includes `gitcrawl portable prune` to keep published payloads small — see [Portable stores](./portable-stores). +- **Disk.** Vectors and PR detail blobs grow with the repo. The portable-store flow includes `gitcrawl portable prune` to keep published payloads small — see [Portable stores](/portable-stores/). diff --git a/docs/search.md b/docs/search.md index 9baec1f..ca6f1ec 100644 --- a/docs/search.md +++ b/docs/search.md @@ -100,7 +100,7 @@ There are two ways to run cached searches: | `gitcrawl search issues|prs ...` | Human use; mixes naturally with the rest of the gitcrawl CLI | | `gitcrawl gh search issues|prs ...` | Agents and scripts that call `gh` directly — symlinked as `gh` or `gitcrawl-gh` it is invisible to callers | -Both paths share the same local cache and produce gh-shaped JSON. The shim adds the additional `gh issue/pr view`, `gh issue/pr list`, `gh pr checks`, `gh run`, and `xcache` surface — see [gh shim](./gh-shim). +Both paths share the same local cache and produce gh-shaped JSON. The shim adds the additional `gh issue/pr view`, `gh issue/pr list`, `gh pr checks`, `gh run`, and `xcache` surface — see [gh shim](/gh-shim/). ## Combining with sync diff --git a/docs/sync.md b/docs/sync.md index 4e95c7a..acab314 100644 --- a/docs/sync.md +++ b/docs/sync.md @@ -58,7 +58,7 @@ gitcrawl sync owner/repo --numbers 123,456 --include-comments `--numbers` is the safest way to refresh specific issues or PRs — it bypasses list ordering and the updated-time window, fetching exactly the rows you ask for. Pair it with `--include-comments` and/or `--include-pr-details` to hydrate the conversation and PR-only data at the same time. -This is also what the `gh` shim uses internally for [auto-hydration](./gh-shim#auto-hydration). +This is also what the `gh` shim uses internally for [auto-hydration](/gh-shim/#auto-hydration). ## Hydration depth @@ -68,7 +68,7 @@ This is also what the `gh` shim uses internally for [auto-hydration](./gh-shim#a | `--include-pr-details` | PR files, commits, status checks, workflow runs | | `--with pr-details` | Same as `--include-pr-details` (gh-style flag) | -PR details land in `pr_files`, `pr_commits`, `pr_checks`, and `pr_runs` tables and back the `gh pr view`, `gh pr checks`, and `gh run list/view` shim paths. See [gh shim](./gh-shim). +PR details land in `pr_files`, `pr_commits`, `pr_checks`, and `pr_runs` tables and back the `gh pr view`, `gh pr checks`, and `gh run list/view` shim paths. See [gh shim](/gh-shim/). `--include-code` is accepted for compatibility but is currently a no-op. @@ -150,6 +150,6 @@ gitcrawl sync owner/repo --numbers "$NUMS" --with pr-details ## See also -- [Refresh and embed](./refresh-and-embed) — the wrapper that runs sync, embed, and cluster end to end -- [gh shim](./gh-shim) — how synced PR details power `gh pr view` / `gh pr checks` / `gh run` from local cache -- [Portable stores](./portable-stores) — sharing the synced cache across machines +- [Refresh and embed](/refresh-and-embed/) — the wrapper that runs sync, embed, and cluster end to end +- [gh shim](/gh-shim/) — how synced PR details power `gh pr view` / `gh pr checks` / `gh run` from local cache +- [Portable stores](/portable-stores/) — sharing the synced cache across machines diff --git a/docs/tui.md b/docs/tui.md index b0756f7..ff5895b 100644 --- a/docs/tui.md +++ b/docs/tui.md @@ -91,7 +91,7 @@ Member actions: - Local close / reopen this thread - Exclude from cluster -These map directly onto the [governance](./governance) commands. Anything you can do interactively, you can also script. +These map directly onto the [governance](/governance/) commands. Anything you can do interactively, you can also script. ## Display rules