docs: improve site install and links
This commit is contained in:
parent
fc12f81b6a
commit
1a2f5ba6e0
@ -23,7 +23,7 @@ gitcrawl clusters owner/repo --json --sort size --min-size 5 \
|
||||
| jq '.clusters[] | {id, members: .member_count, latest: .latest_thread_number}'
|
||||
```
|
||||
|
||||
For the full per-command JSON shapes, see the individual feature pages and the [Commands reference](./commands).
|
||||
For the full per-command JSON shapes, see the individual feature pages and the [Commands reference](/commands/).
|
||||
|
||||
## Exit codes
|
||||
|
||||
@ -51,7 +51,7 @@ Best for ad-hoc agent tools that should bound staleness but minimize sync calls.
|
||||
|
||||
### Auto-hydration via the gh shim
|
||||
|
||||
Symlink the gitcrawl binary as `gh` (or `gitcrawl-gh`) and let the shim pull a single PR's detail when an agent calls `gh pr view` or `gh pr checks` against an unhydrated PR. See [gh shim → auto-hydration](./gh-shim#auto-hydration).
|
||||
Symlink the gitcrawl binary as `gh` (or `gitcrawl-gh`) and let the shim pull a single PR's detail when an agent calls `gh pr view` or `gh pr checks` against an unhydrated PR. See [gh shim → auto-hydration](/gh-shim/#auto-hydration).
|
||||
|
||||
This is the lowest-overhead pattern for fleets of agents — no scheduling required.
|
||||
|
||||
@ -61,7 +61,7 @@ Run `gitcrawl refresh owner/repo` on a cron, systemd timer, or `launchd` agent e
|
||||
|
||||
```cron
|
||||
# Every 5 minutes, refresh the active repos.
|
||||
*/5 * * * * /usr/local/bin/gitcrawl refresh openclaw/gitcrawl --json > /tmp/gitcrawl.openclaw.json 2>&1
|
||||
*/5 * * * * $HOME/bin/gitcrawl refresh openclaw/gitcrawl --json > /tmp/gitcrawl.openclaw.json 2>&1
|
||||
```
|
||||
|
||||
For multiple repos, loop in a small shell script — gitcrawl is happy to run sequentially against a shared SQLite file.
|
||||
|
||||
@ -142,7 +142,7 @@ Or slice it manually:
|
||||
gitcrawl exclude-cluster-member owner/repo --id 12 --number 456 --reason "different repro"
|
||||
```
|
||||
|
||||
See [Governance](./governance) for the full override workflow.
|
||||
See [Governance](/governance/) for the full override workflow.
|
||||
|
||||
## Re-clustering and stable IDs
|
||||
|
||||
@ -155,6 +155,6 @@ Cluster runs are recorded in `run_records` and visible via `gitcrawl runs --kind
|
||||
|
||||
## See also
|
||||
|
||||
- [Governance](./governance) — close clusters, exclude members, set canonical
|
||||
- [TUI](./tui) — the interactive cluster browser
|
||||
- [Concepts](./concepts#cluster) — durable clusters and cluster kinds
|
||||
- [Governance](/governance/) — close clusters, exclude members, set canonical
|
||||
- [TUI](/tui/) — the interactive cluster browser
|
||||
- [Concepts](/concepts/#cluster) — durable clusters and cluster kinds
|
||||
|
||||
@ -30,78 +30,78 @@ These work on every command.
|
||||
|
||||
| Command | Purpose | Detailed docs |
|
||||
| --- | --- | --- |
|
||||
| `gitcrawl init [--db --portable-store --portable-db --store-dir --json]` | Create config, database, runtime directories; optionally clone a portable store | [Installation](./installation), [Portable stores](./portable-stores) |
|
||||
| `gitcrawl doctor [--json]` | Health check for config, database, credentials, model selection, repo/thread counts | [Configuration](./configuration#gitcrawl-doctor) |
|
||||
| `gitcrawl configure [--summary-model --embed-model --embedding-basis --json]` | Update model fields in `config.toml` | [Configuration](./configuration#gitcrawl-configure) |
|
||||
| `gitcrawl init [--db --portable-store --portable-db --store-dir --json]` | Create config, database, runtime directories; optionally clone a portable store | [Installation](/installation/), [Portable stores](/portable-stores/) |
|
||||
| `gitcrawl doctor [--json]` | Health check for config, database, credentials, model selection, repo/thread counts | [Configuration](/configuration/#gitcrawl-doctor) |
|
||||
| `gitcrawl configure [--summary-model --embed-model --embedding-basis --json]` | Update model fields in `config.toml` | [Configuration](/configuration/#gitcrawl-configure) |
|
||||
| `gitcrawl version` | Print version | — |
|
||||
|
||||
## Sync
|
||||
|
||||
| Command | Purpose | Docs |
|
||||
| --- | --- | --- |
|
||||
| `gitcrawl sync owner/repo [--state --since --numbers --limit --include-comments --include-pr-details --with pr-details --json]` | Sync issues and PRs from GitHub into local SQLite | [Sync](./sync) |
|
||||
| `gitcrawl refresh owner/repo [--no-sync --no-embed --no-cluster ...]` | Wrapper that runs sync → embed → cluster | [Refresh and embed](./refresh-and-embed) |
|
||||
| `gitcrawl embed owner/repo [--number --limit --force --include-closed --json]` | Generate OpenAI embeddings for thread documents | [Refresh and embed](./refresh-and-embed#embed) |
|
||||
| `gitcrawl runs owner/repo [--kind sync\|embedding\|cluster --limit --json]` | List recorded run history | [Refresh and embed](./refresh-and-embed#runs) |
|
||||
| `gitcrawl sync owner/repo [--state --since --numbers --limit --include-comments --include-pr-details --with pr-details --json]` | Sync issues and PRs from GitHub into local SQLite | [Sync](/sync/) |
|
||||
| `gitcrawl refresh owner/repo [--no-sync --no-embed --no-cluster ...]` | Wrapper that runs sync → embed → cluster | [Refresh and embed](/refresh-and-embed/) |
|
||||
| `gitcrawl embed owner/repo [--number --limit --force --include-closed --json]` | Generate OpenAI embeddings for thread documents | [Refresh and embed](/refresh-and-embed/#embed) |
|
||||
| `gitcrawl runs owner/repo [--kind sync\|embedding\|cluster --limit --json]` | List recorded run history | [Refresh and embed](/refresh-and-embed/#runs) |
|
||||
|
||||
## Inspect
|
||||
|
||||
| Command | Purpose | Docs |
|
||||
| --- | --- | --- |
|
||||
| `gitcrawl threads owner/repo [--include-closed --numbers --limit --json]` | List threads from local cache | — |
|
||||
| `gitcrawl search owner/repo --query <text> [--mode keyword\|semantic\|hybrid --limit --json]` | Local search (direct mode) | [Search](./search) |
|
||||
| `gitcrawl search issues\|prs <query> -R owner/repo [--state --json --limit --sync-if-stale]` | Local search (`gh search` shape) | [Search](./search#gh-search-compatibility-mode) |
|
||||
| `gitcrawl neighbors owner/repo --number <n> [--limit --threshold --json]` | Vector-similar threads to a specific issue/PR | [Clustering](./clustering#find-similar-threads-neighbors) |
|
||||
| `gitcrawl search owner/repo --query <text> [--mode keyword\|semantic\|hybrid --limit --json]` | Local search (direct mode) | [Search](/search/) |
|
||||
| `gitcrawl search issues\|prs <query> -R owner/repo [--state --json --limit --sync-if-stale]` | Local search (`gh search` shape) | [Search](/search/#gh-search-compatibility-mode) |
|
||||
| `gitcrawl neighbors owner/repo --number <n> [--limit --threshold --json]` | Vector-similar threads to a specific issue/PR | [Clustering](/clustering/#find-similar-threads-neighbors) |
|
||||
|
||||
## Cluster
|
||||
|
||||
| Command | Purpose | Docs |
|
||||
| --- | --- | --- |
|
||||
| `gitcrawl cluster owner/repo [--threshold --min-size --max-cluster-size --k --cross-kind-threshold --limit --model --basis --include-closed --json]` | Build durable clusters from vectors | [Clustering](./clustering#generate-clusters) |
|
||||
| `gitcrawl clusters owner/repo [--sort size\|recent\|oldest --min-size --limit --hide-closed --json]` | Latest-run cluster summary, merged with closed durable rows | [Clustering](./clustering#list-clusters) |
|
||||
| `gitcrawl durable-clusters owner/repo [--include-closed --sort --min-size --limit --json]` | Strict durable-cluster audit view | [Clustering](./clustering#list-clusters) |
|
||||
| `gitcrawl cluster-detail owner/repo --id <n> [--member-limit --body-chars --include-closed --json]` | Cluster + members detail | [Clustering](./clustering#inspect-a-cluster) |
|
||||
| `gitcrawl cluster-explain owner/repo --id <n> [...]` | Alias for `cluster-detail` | [Clustering](./clustering#inspect-a-cluster) |
|
||||
| `gitcrawl cluster owner/repo [--threshold --min-size --max-cluster-size --k --cross-kind-threshold --limit --model --basis --include-closed --json]` | Build durable clusters from vectors | [Clustering](/clustering/#generate-clusters) |
|
||||
| `gitcrawl clusters owner/repo [--sort size\|recent\|oldest --min-size --limit --hide-closed --json]` | Latest-run cluster summary, merged with closed durable rows | [Clustering](/clustering/#list-clusters) |
|
||||
| `gitcrawl durable-clusters owner/repo [--include-closed --sort --min-size --limit --json]` | Strict durable-cluster audit view | [Clustering](/clustering/#list-clusters) |
|
||||
| `gitcrawl cluster-detail owner/repo --id <n> [--member-limit --body-chars --include-closed --json]` | Cluster + members detail | [Clustering](/clustering/#inspect-a-cluster) |
|
||||
| `gitcrawl cluster-explain owner/repo --id <n> [...]` | Alias for `cluster-detail` | [Clustering](/clustering/#inspect-a-cluster) |
|
||||
|
||||
## Governance
|
||||
|
||||
| Command | Purpose | Docs |
|
||||
| --- | --- | --- |
|
||||
| `gitcrawl close-thread owner/repo --number <n> [--reason --json]` | Local close on a thread | [Governance](./governance#local-close) |
|
||||
| `gitcrawl close-thread owner/repo --number <n> [--reason --json]` | Local close on a thread | [Governance](/governance/#local-close) |
|
||||
| `gitcrawl reopen-thread owner/repo --number <n> [--json]` | Inverse | — |
|
||||
| `gitcrawl close-cluster owner/repo --id <n> [--reason --json]` | Local close on a cluster | [Governance](./governance#local-close) |
|
||||
| `gitcrawl close-cluster owner/repo --id <n> [--reason --json]` | Local close on a cluster | [Governance](/governance/#local-close) |
|
||||
| `gitcrawl reopen-cluster owner/repo --id <n> [--json]` | Inverse | — |
|
||||
| `gitcrawl exclude-cluster-member owner/repo --id <n> --number <m> [--reason --json]` | Pull a thread out of a cluster | [Governance](./governance#member-exclusion) |
|
||||
| `gitcrawl exclude-cluster-member owner/repo --id <n> --number <m> [--reason --json]` | Pull a thread out of a cluster | [Governance](/governance/#member-exclusion) |
|
||||
| `gitcrawl include-cluster-member owner/repo --id <n> --number <m> [--reason --json]` | Inverse | — |
|
||||
| `gitcrawl set-cluster-canonical owner/repo --id <n> --number <m> [--reason --json]` | Pin canonical thread for a cluster | [Governance](./governance#canonical-member) |
|
||||
| `gitcrawl set-cluster-canonical owner/repo --id <n> --number <m> [--reason --json]` | Pin canonical thread for a cluster | [Governance](/governance/#canonical-member) |
|
||||
|
||||
## TUI
|
||||
|
||||
| Command | Purpose | Docs |
|
||||
| --- | --- | --- |
|
||||
| `gitcrawl tui [owner/repo] [--min-size --sort --limit --hide-closed --json]` | Interactive cluster browser; `--json` emits a snapshot instead of launching the UI | [TUI](./tui) |
|
||||
| `gitcrawl tui [owner/repo] [--min-size --sort --limit --hide-closed --json]` | Interactive cluster browser; `--json` emits a snapshot instead of launching the UI | [TUI](/tui/) |
|
||||
|
||||
## gh shim
|
||||
|
||||
| Command | Purpose | Docs |
|
||||
| --- | --- | --- |
|
||||
| `gitcrawl gh search issues\|prs <query> -R owner/repo [...]` | Local-first `gh search` | [gh shim](./gh-shim) |
|
||||
| `gitcrawl gh issue view <n> -R owner/repo --json <fields>` | Local-first thread view | [gh shim](./gh-shim) |
|
||||
| `gitcrawl gh pr view <n> -R owner/repo --json <fields>` | Same, for PRs (with auto-hydration) | [gh shim](./gh-shim) |
|
||||
| `gitcrawl gh issue list -R owner/repo [--state --search --author --assignee --label --json]` | Local-first list | [gh shim](./gh-shim) |
|
||||
| `gitcrawl gh pr list -R owner/repo [...]` | Same, for PRs | [gh shim](./gh-shim) |
|
||||
| `gitcrawl gh pr checks <n> -R owner/repo --json <fields>` | Cached PR checks (auto-hydrates if stale) | [gh shim](./gh-shim) |
|
||||
| `gitcrawl gh pr diff <n> -R owner/repo` | Falls through; cached by head SHA | [gh shim](./gh-shim) |
|
||||
| `gitcrawl gh run list -R owner/repo [--branch --commit --json]` | Cached workflow runs | [gh shim](./gh-shim) |
|
||||
| `gitcrawl gh run view <run-id> -R owner/repo [--json]` | Same, single run | [gh shim](./gh-shim) |
|
||||
| `gitcrawl gh repo view\|list ...` | Falls through; cached briefly | [gh shim](./gh-shim) |
|
||||
| `gitcrawl gh release list\|view ...` | Falls through; cached briefly | [gh shim](./gh-shim#read-only-fallthroughs-cached) |
|
||||
| `gitcrawl gh workflow list\|view ...` | Falls through; cached briefly | [gh shim](./gh-shim#read-only-fallthroughs-cached) |
|
||||
| `gitcrawl gh secret list ...` / `variable get\|list ...` | Falls through; cached briefly | [gh shim](./gh-shim#read-only-fallthroughs-cached) |
|
||||
| `gitcrawl gh label list ...` | Falls through; cached briefly | [gh shim](./gh-shim) |
|
||||
| `gitcrawl gh api <GET path>` | Falls through; cached briefly (GET-only) | [gh shim](./gh-shim) |
|
||||
| `gitcrawl gh xcache stats\|keys\|gc\|flush [--json]` | Cache inspection / housekeeping | [gh shim](./gh-shim#cache-inspection-xcache) |
|
||||
| _Anything else_ | Falls through to real `gh` | [gh shim](./gh-shim) |
|
||||
| `gitcrawl gh search issues\|prs <query> -R owner/repo [...]` | Local-first `gh search` | [gh shim](/gh-shim/) |
|
||||
| `gitcrawl gh issue view <n> -R owner/repo --json <fields>` | Local-first thread view | [gh shim](/gh-shim/) |
|
||||
| `gitcrawl gh pr view <n> -R owner/repo --json <fields>` | Same, for PRs (with auto-hydration) | [gh shim](/gh-shim/) |
|
||||
| `gitcrawl gh issue list -R owner/repo [--state --search --author --assignee --label --json]` | Local-first list | [gh shim](/gh-shim/) |
|
||||
| `gitcrawl gh pr list -R owner/repo [...]` | Same, for PRs | [gh shim](/gh-shim/) |
|
||||
| `gitcrawl gh pr checks <n> -R owner/repo --json <fields>` | Cached PR checks (auto-hydrates if stale) | [gh shim](/gh-shim/) |
|
||||
| `gitcrawl gh pr diff <n> -R owner/repo` | Falls through; cached by head SHA | [gh shim](/gh-shim/) |
|
||||
| `gitcrawl gh run list -R owner/repo [--branch --commit --json]` | Cached workflow runs | [gh shim](/gh-shim/) |
|
||||
| `gitcrawl gh run view <run-id> -R owner/repo [--json]` | Same, single run | [gh shim](/gh-shim/) |
|
||||
| `gitcrawl gh repo view\|list ...` | Falls through; cached briefly | [gh shim](/gh-shim/) |
|
||||
| `gitcrawl gh release list\|view ...` | Falls through; cached briefly | [gh shim](/gh-shim/#read-only-fallthroughs-cached) |
|
||||
| `gitcrawl gh workflow list\|view ...` | Falls through; cached briefly | [gh shim](/gh-shim/#read-only-fallthroughs-cached) |
|
||||
| `gitcrawl gh secret list ...` / `variable get\|list ...` | Falls through; cached briefly | [gh shim](/gh-shim/#read-only-fallthroughs-cached) |
|
||||
| `gitcrawl gh label list ...` | Falls through; cached briefly | [gh shim](/gh-shim/) |
|
||||
| `gitcrawl gh api <GET path>` | Falls through; cached briefly (GET-only) | [gh shim](/gh-shim/) |
|
||||
| `gitcrawl gh xcache stats\|keys\|gc\|flush [--json]` | Cache inspection / housekeeping | [gh shim](/gh-shim/#cache-inspection-xcache) |
|
||||
| _Anything else_ | Falls through to real `gh` | [gh shim](/gh-shim/) |
|
||||
|
||||
The shim binary can be installed standalone by symlinking the `gitcrawl` binary as `gh` or `gitcrawl-gh`.
|
||||
|
||||
@ -109,7 +109,7 @@ The shim binary can be installed standalone by symlinking the `gitcrawl` binary
|
||||
|
||||
| Command | Purpose | Docs |
|
||||
| --- | --- | --- |
|
||||
| `gitcrawl portable prune [--body-chars --no-vacuum --json]` | Truncate thread bodies and (optionally) `VACUUM` for a small publishable database | [Portable stores](./portable-stores#publishing-gitcrawl-portable-prune) |
|
||||
| `gitcrawl portable prune [--body-chars --no-vacuum --json]` | Truncate thread bodies and (optionally) `VACUUM` for a small publishable database | [Portable stores](/portable-stores/#publishing-gitcrawl-portable-prune) |
|
||||
|
||||
## Not yet implemented
|
||||
|
||||
|
||||
@ -17,7 +17,7 @@ The handful of nouns gitcrawl uses, and how they connect.
|
||||
|
||||
A **repository** is the `owner/repo` you sync. Every gitcrawl command takes one, and most state in SQLite is keyed by it. You can mirror as many repos as you like into a single `gitcrawl.db`; commands always scope to the one you name.
|
||||
|
||||
The mirror is metadata-first: titles, bodies, authors, labels, state, timestamps, and IDs land in SQLite immediately. Comments, reviews, review comments, and full PR detail (files, commits, checks, workflow runs) are opt-in on a per-sync basis (see [Sync](./sync)).
|
||||
The mirror is metadata-first: titles, bodies, authors, labels, state, timestamps, and IDs land in SQLite immediately. Comments, reviews, review comments, and full PR detail (files, commits, checks, workflow runs) are opt-in on a per-sync basis (see [Sync](/sync/)).
|
||||
|
||||
## Thread
|
||||
|
||||
@ -78,7 +78,7 @@ Per-cluster maintainer overrides let you correct what the algorithm produced wit
|
||||
- **Member exclusion** (`exclude-cluster-member`/`include-cluster-member`) — pulls a specific thread out of a cluster and remembers why.
|
||||
- **Canonical member** (`set-cluster-canonical`) — pins which thread represents the cluster.
|
||||
|
||||
See [Governance](./governance) for the full workflow.
|
||||
See [Governance](/governance/) for the full workflow.
|
||||
|
||||
## Run
|
||||
|
||||
@ -88,7 +88,7 @@ Every sync, embed, and cluster operation records a **run** in `run_records` with
|
||||
|
||||
A **portable store** is a Git-backed publish target for a `gitcrawl.db` plus its derived bodies, designed for sharing a local cache across agents or machines without a hosted service.
|
||||
|
||||
`gitcrawl init --portable-store https://github.com/org/repo` clones a portable store into `~/.config/gitcrawl/portable/`, points the runtime at it, and `gitcrawl portable prune --body-chars 256` keeps the published payload small. Read-only commands run against portable stores refresh the checkout before reading. See [Portable stores](./portable-stores).
|
||||
`gitcrawl init --portable-store https://github.com/org/repo` clones a portable store into `~/.config/gitcrawl/portable/`, points the runtime at it, and `gitcrawl portable prune --body-chars 256` keeps the published payload small. Read-only commands run against portable stores refresh the checkout before reading. See [Portable stores](/portable-stores/).
|
||||
|
||||
## Cache
|
||||
|
||||
@ -97,4 +97,4 @@ The `cache/` directory under `~/.config/gitcrawl/` holds:
|
||||
- `cache/gh-shim/` — the short-lived fallthrough cache for the `gh` shim, keyed by config path, CWD, `GH_HOST`, `GH_REPO`, and command args. Inspect or clean it with `gitcrawl gh xcache stats|keys|gc|flush`.
|
||||
- `cache/pr/` — hydrated PR detail blobs used to answer `gh pr view`, `gh pr checks`, and `gh run` reads from local SQLite.
|
||||
|
||||
See [gh shim](./gh-shim) for the cache key composition and TTL behavior.
|
||||
See [gh shim](/gh-shim/) for the cache key composition and TTL behavior.
|
||||
|
||||
@ -47,8 +47,8 @@ embed_dimensions = 1024
|
||||
embedding_basis = "title_original"
|
||||
|
||||
[env]
|
||||
GITHUB_TOKEN = "ghp_xxx"
|
||||
OPENAI_API_KEY = "sk-xxx"
|
||||
GITHUB_TOKEN = "<github-token>"
|
||||
OPENAI_API_KEY = "<openai-api-key>"
|
||||
|
||||
[portable_store]
|
||||
url = "https://github.com/org/portable-store.git"
|
||||
@ -102,6 +102,7 @@ checkout_dir = "/Users/me/.config/gitcrawl/portable"
|
||||
| `GITCRAWL_GH_PATH` | Path to the real `gh` binary used for fallthrough |
|
||||
| `GITCRAWL_GH_AUTO_HYDRATE` | Set to `0` to disable PR auto-hydration on cache miss |
|
||||
| `GITCRAWL_GH_CACHE_TTL` | Override fallthrough cache TTL (e.g., `5m`, `1h`) |
|
||||
| `GITCRAWL_GH_CACHE_ERRORS` | Set to `0` to avoid caching non-zero read-only fallthroughs |
|
||||
|
||||
If `GITCRAWL_GH_PATH` is unset, the shim probes common Homebrew install paths and then your `PATH`. Set it explicitly when you symlink the gitcrawl binary as `gh` (otherwise the shim will recurse into itself).
|
||||
|
||||
|
||||
@ -23,14 +23,16 @@ The shim never adds GitHub write behavior. Mutating commands (`gh issue close`,
|
||||
|
||||
```bash
|
||||
# Side-by-side: agents opt in by calling `gitcrawl-gh`.
|
||||
ln -s "$(command -v gitcrawl)" /usr/local/bin/gitcrawl-gh
|
||||
mkdir -p "$HOME/bin"
|
||||
ln -sf "$(command -v gitcrawl)" "$HOME/bin/gitcrawl-gh"
|
||||
|
||||
# Or replace the global `gh` so every caller picks up the cache automatically.
|
||||
ln -s "$(command -v gitcrawl)" /usr/local/bin/gh
|
||||
export GITCRAWL_GH_PATH=/opt/homebrew/bin/gh # tell the shim where the real gh is
|
||||
REAL_GH="$(command -v gh)" # capture this before shadowing gh
|
||||
ln -sf "$(command -v gitcrawl)" "$HOME/bin/gh"
|
||||
export GITCRAWL_GH_PATH="$REAL_GH" # tell the shim where the real gh is
|
||||
```
|
||||
|
||||
If `GITCRAWL_GH_PATH` is unset, the shim probes common Homebrew paths and then `PATH`. Set it explicitly when you replace the global `gh` so the shim does not recurse into itself.
|
||||
Make sure `~/bin` is on `PATH` before the original `gh` location if you want the shim to be picked up as `gh`. If `GITCRAWL_GH_PATH` is unset, the shim probes common Homebrew paths and then `PATH`. Set it explicitly when you replace the global `gh` so the shim does not recurse into itself.
|
||||
|
||||
## Supported local reads
|
||||
|
||||
@ -138,8 +140,8 @@ All accept `--json` for scripting.
|
||||
"pass_through_writes": 4
|
||||
},
|
||||
"commands": {
|
||||
"gh pr view": { "entries": 30, "bytes": 184320 },
|
||||
"gh search issues": { "entries": 14, "bytes": 18230 }
|
||||
"pr diff": { "entries": 30, "bytes": 184320 },
|
||||
"release view": { "entries": 14, "bytes": 18230 }
|
||||
}
|
||||
}
|
||||
```
|
||||
@ -172,4 +174,4 @@ Pattern: replace `gh` with `gitcrawl-gh` (or symlink to `gh`) for every agent in
|
||||
|
||||
For best results, schedule a periodic `gitcrawl refresh owner/repo` (every few minutes per repo, depending on activity) so the local mirror stays warm. The shim's `--sync-if-stale` (via `gitcrawl search`) and auto-hydration handle the rest.
|
||||
|
||||
See [Automation](./automation) for full agent recipes and JSON contracts.
|
||||
See [Automation](/automation/) for full agent recipes and JSON contracts.
|
||||
|
||||
@ -127,4 +127,4 @@ The thread stays open on GitHub; only your local triage view hides it.
|
||||
|
||||
- It does not edit, label, comment on, or close GitHub issues. Use `gh` for that.
|
||||
- It does not retrain embeddings or reshape the underlying graph — it overlays decisions on top of the algorithm output.
|
||||
- It does not propagate to other gitcrawl installations unless you publish your database via a [portable store](./portable-stores).
|
||||
- It does not propagate to other gitcrawl installations unless you publish your database via a [portable store](/portable-stores/).
|
||||
|
||||
@ -12,7 +12,7 @@ permalink: /
|
||||
A local-first GitHub issue and pull request crawler for maintainer triage. Sync, search, cluster, and review related threads from a SQLite cache that lives entirely on your machine.
|
||||
{: .fs-6 .fw-300 }
|
||||
|
||||
[Quickstart](./quickstart){: .btn .btn-primary .fs-5 .mb-4 .mb-md-0 .mr-2 }
|
||||
[Quickstart](/quickstart/){: .btn .btn-primary .fs-5 .mb-4 .mb-md-0 .mr-2 }
|
||||
[View on GitHub](https://github.com/openclaw/gitcrawl){: .btn .fs-5 .mb-4 .mb-md-0 }
|
||||
|
||||
---
|
||||
@ -34,16 +34,16 @@ A local-first GitHub issue and pull request crawler for maintainer triage. Sync,
|
||||
<div class="code-example" markdown="1">
|
||||
|
||||
### I want to try it
|
||||
[Quickstart](./quickstart) walks you from `git clone` to a populated cluster view in five minutes.
|
||||
[Quickstart](/quickstart/) walks you from `git clone` to a populated cluster view in five minutes.
|
||||
|
||||
### I want to wire up an agent
|
||||
The [`gh` shim](./gh-shim) is the fastest way to cut GitHub API load — point your agent at `gitcrawl-gh`, keep the agent's `gh` calls intact.
|
||||
The [`gh` shim](/gh-shim/) is the fastest way to cut GitHub API load — point your agent at `gitcrawl-gh`, keep the agent's `gh` calls intact.
|
||||
|
||||
### I want to triage a busy repo
|
||||
Read [Sync](./sync) to bring data local, then [Clustering](./clustering) and the [TUI](./tui) for the maintainer workflow.
|
||||
Read [Sync](/sync/) to bring data local, then [Clustering](/clustering/) and the [TUI](/tui/) for the maintainer workflow.
|
||||
|
||||
### I want the full reference
|
||||
[Commands](./commands) lists every flag and JSON field. [Configuration](./configuration) covers env vars and paths.
|
||||
[Commands](/commands/) lists every flag and JSON field. [Configuration](/configuration/) covers env vars and paths.
|
||||
|
||||
</div>
|
||||
|
||||
|
||||
@ -26,13 +26,16 @@ Each tagged release publishes archives for `darwin_amd64`, `darwin_arm64`, `linu
|
||||
|
||||
```bash
|
||||
# Replace VERSION and PLATFORM with the values you want.
|
||||
curl -L "https://github.com/openclaw/gitcrawl/releases/download/v0.1.2/gitcrawl_0.1.2_darwin_arm64.tar.gz" \
|
||||
| tar -xz -C /usr/local/bin gitcrawl
|
||||
VERSION=v0.1.2
|
||||
PLATFORM=darwin_arm64
|
||||
mkdir -p "$HOME/bin"
|
||||
curl -L "https://github.com/openclaw/gitcrawl/releases/download/${VERSION}/gitcrawl_${VERSION#v}_${PLATFORM}.tar.gz" \
|
||||
| tar -xz -C "$HOME/bin" gitcrawl
|
||||
|
||||
gitcrawl --version
|
||||
```
|
||||
|
||||
Browse the [releases page](https://github.com/openclaw/gitcrawl/releases) for the latest tag and the full asset list.
|
||||
Browse the [releases page](https://github.com/openclaw/gitcrawl/releases) for the latest tag and the full asset list. Use a directory that is already on your `PATH`; `~/bin` and `~/.local/bin` avoid needing elevated permissions.
|
||||
|
||||
## Install from source
|
||||
|
||||
@ -54,14 +57,16 @@ The shim is the same binary. Symlink it as `gh` (replacing the real CLI) or as `
|
||||
|
||||
```bash
|
||||
# Side-by-side install — agents can opt in by calling `gitcrawl-gh`.
|
||||
ln -s "$(command -v gitcrawl)" /usr/local/bin/gitcrawl-gh
|
||||
mkdir -p "$HOME/bin"
|
||||
ln -sf "$(command -v gitcrawl)" "$HOME/bin/gitcrawl-gh"
|
||||
|
||||
# Or replace the global `gh` so every agent picks up the cache automatically.
|
||||
ln -s "$(command -v gitcrawl)" /usr/local/bin/gh
|
||||
export GITCRAWL_GH_PATH="$(command -v /opt/homebrew/bin/gh)" # point shim at the real gh
|
||||
REAL_GH="$(command -v gh)" # capture this before shadowing gh
|
||||
ln -sf "$(command -v gitcrawl)" "$HOME/bin/gh"
|
||||
export GITCRAWL_GH_PATH="$REAL_GH" # point shim at the real gh
|
||||
```
|
||||
|
||||
When invoked as `gh` or `gitcrawl-gh`, the binary auto-detects shim mode. See [the gh shim guide](./gh-shim) for details.
|
||||
When invoked as `gh` or `gitcrawl-gh`, the binary auto-detects shim mode. See [the gh shim guide](/gh-shim/) for details.
|
||||
|
||||
## Verify the install
|
||||
|
||||
|
||||
@ -108,5 +108,5 @@ Other agents and machines pull the new commit on their next read-only command.
|
||||
|
||||
## See also
|
||||
|
||||
- [Sync](./sync) — what gets written into the database that ends up in the portable store
|
||||
- [gh shim](./gh-shim) — agents reading a shared portable store benefit doubly from the shim's local-first answers
|
||||
- [Sync](/sync/) — what gets written into the database that ends up in the portable store
|
||||
- [gh shim](/gh-shim/) — agents reading a shared portable store benefit doubly from the shim's local-first answers
|
||||
|
||||
@ -19,7 +19,8 @@ Five minutes from clean machine to a populated cluster view.
|
||||
# Build (or download a release archive — see Installation).
|
||||
git clone https://github.com/openclaw/gitcrawl.git
|
||||
cd gitcrawl
|
||||
go build -o /usr/local/bin/gitcrawl ./cmd/gitcrawl
|
||||
mkdir -p "$HOME/bin"
|
||||
go build -o "$HOME/bin/gitcrawl" ./cmd/gitcrawl
|
||||
|
||||
# Create config + database under ~/.config/gitcrawl.
|
||||
gitcrawl init
|
||||
@ -36,16 +37,16 @@ Defaults written:
|
||||
## 2. Set credentials
|
||||
|
||||
```bash
|
||||
export GITHUB_TOKEN=ghp_xxx # required for sync
|
||||
export OPENAI_API_KEY=sk-xxx # required for embeddings
|
||||
export GITHUB_TOKEN="<github-token>" # required for sync
|
||||
export OPENAI_API_KEY="<openai-api-key>" # required for embeddings
|
||||
```
|
||||
|
||||
Either set them in your shell profile or store them in `~/.config/gitcrawl/config.toml`:
|
||||
|
||||
```toml
|
||||
[env]
|
||||
GITHUB_TOKEN = "ghp_xxx"
|
||||
OPENAI_API_KEY = "sk-xxx"
|
||||
GITHUB_TOKEN = "<github-token>"
|
||||
OPENAI_API_KEY = "<openai-api-key>"
|
||||
```
|
||||
|
||||
`gitcrawl doctor` confirms the credentials are visible and reports their source.
|
||||
@ -72,7 +73,7 @@ The `refresh` command runs sync → embed → cluster end to end:
|
||||
gitcrawl refresh openclaw/gitcrawl
|
||||
```
|
||||
|
||||
You can run the stages individually if you want finer control — see [Refresh and embed](./refresh-and-embed) and [Clustering](./clustering).
|
||||
You can run the stages individually if you want finer control — see [Refresh and embed](/refresh-and-embed/) and [Clustering](/clustering/).
|
||||
|
||||
## 5. Browse clusters
|
||||
|
||||
@ -116,17 +117,18 @@ Add `--sync-if-stale 5m` to refresh the local mirror first when it is older than
|
||||
## 7. Wire up the `gh` shim (optional)
|
||||
|
||||
```bash
|
||||
ln -s "$(command -v gitcrawl)" /usr/local/bin/gitcrawl-gh
|
||||
mkdir -p "$HOME/bin"
|
||||
ln -sf "$(command -v gitcrawl)" "$HOME/bin/gitcrawl-gh"
|
||||
gitcrawl-gh search issues "download stalls" -R openclaw/gitcrawl --json number,title,url
|
||||
gitcrawl-gh pr view 123 -R openclaw/gitcrawl --json number,title,state,url
|
||||
gitcrawl-gh xcache stats
|
||||
```
|
||||
|
||||
Most read-only `gh` calls answer locally, mutating commands pass straight through to the real `gh`. See [gh shim](./gh-shim) for the full surface.
|
||||
Most read-only `gh` calls answer locally, mutating commands pass straight through to the real `gh`. See [gh shim](/gh-shim/) for the full surface.
|
||||
|
||||
## Where to next
|
||||
|
||||
- [Concepts](./concepts) — what threads, durable clusters, and embeddings actually mean
|
||||
- [Sync](./sync) — every flag for hydrating the local store
|
||||
- [Clustering](./clustering) — tuning the cluster graph for a specific repo
|
||||
- [Automation](./automation) — JSON contracts for agents and scripts
|
||||
- [Concepts](/concepts/) — what threads, durable clusters, and embeddings actually mean
|
||||
- [Sync](/sync/) — every flag for hydrating the local store
|
||||
- [Clustering](/clustering/) — tuning the cluster graph for a specific repo
|
||||
- [Automation](/automation/) — JSON contracts for agents and scripts
|
||||
|
||||
@ -62,6 +62,7 @@ Override the config root with `--config <path>` or `GITCRAWL_CONFIG`.
|
||||
| `GITCRAWL_GH_PATH` | _(probed)_ | Path to the real `gh` binary |
|
||||
| `GITCRAWL_GH_AUTO_HYDRATE` | _(on)_ | Set `0` to disable PR auto-hydration on cache miss |
|
||||
| `GITCRAWL_GH_CACHE_TTL` | `30s` for most commands | Override fallthrough cache TTL (e.g., `5m`, `1h`) |
|
||||
| `GITCRAWL_GH_CACHE_ERRORS` | _(on)_ | Set `0` to avoid caching non-zero read-only fallthroughs |
|
||||
|
||||
## Configuration defaults
|
||||
|
||||
@ -158,7 +159,7 @@ stderr always carries error messages. stdout is reserved for command output.
|
||||
|
||||
## See also
|
||||
|
||||
- [Configuration](./configuration) — narrative version of this reference
|
||||
- [Commands](./commands) — every command and flag, in one table
|
||||
- [Configuration](/configuration/) — narrative version of this reference
|
||||
- [Commands](/commands/) — every command and flag, in one table
|
||||
- [SPEC.md](https://github.com/openclaw/gitcrawl/blob/main/SPEC.md) — product contract
|
||||
- [CHANGELOG.md](https://github.com/openclaw/gitcrawl/blob/main/CHANGELOG.md) — what shipped recently
|
||||
|
||||
@ -21,7 +21,7 @@ gitcrawl refresh owner/repo
|
||||
|
||||
By default this performs:
|
||||
|
||||
1. **Sync** — open + recently closed issues and PRs (see [Sync](./sync))
|
||||
1. **Sync** — open + recently closed issues and PRs (see [Sync](/sync/))
|
||||
2. **Embed** — fill `thread_vectors` for any thread whose document changed
|
||||
3. **Cluster** — rebuild durable clusters with the standard thresholds
|
||||
|
||||
@ -127,4 +127,4 @@ Each row carries `started_at`, `finished_at`, `status`, and `stats_json` — use
|
||||
|
||||
- **GitHub.** Sync uses standard REST endpoints; the API quota is the dominant cost on busy repos. Use `--include-comments` and `--with pr-details` selectively.
|
||||
- **OpenAI.** `text-embedding-3-small` is inexpensive but not free. `embed` is bounded by `--limit` if you want to stay under a budget on initial backfills.
|
||||
- **Disk.** Vectors and PR detail blobs grow with the repo. The portable-store flow includes `gitcrawl portable prune` to keep published payloads small — see [Portable stores](./portable-stores).
|
||||
- **Disk.** Vectors and PR detail blobs grow with the repo. The portable-store flow includes `gitcrawl portable prune` to keep published payloads small — see [Portable stores](/portable-stores/).
|
||||
|
||||
@ -100,7 +100,7 @@ There are two ways to run cached searches:
|
||||
| `gitcrawl search issues|prs ...` | Human use; mixes naturally with the rest of the gitcrawl CLI |
|
||||
| `gitcrawl gh search issues|prs ...` | Agents and scripts that call `gh` directly — symlinked as `gh` or `gitcrawl-gh` it is invisible to callers |
|
||||
|
||||
Both paths share the same local cache and produce gh-shaped JSON. The shim adds the additional `gh issue/pr view`, `gh issue/pr list`, `gh pr checks`, `gh run`, and `xcache` surface — see [gh shim](./gh-shim).
|
||||
Both paths share the same local cache and produce gh-shaped JSON. The shim adds the additional `gh issue/pr view`, `gh issue/pr list`, `gh pr checks`, `gh run`, and `xcache` surface — see [gh shim](/gh-shim/).
|
||||
|
||||
## Combining with sync
|
||||
|
||||
|
||||
10
docs/sync.md
10
docs/sync.md
@ -58,7 +58,7 @@ gitcrawl sync owner/repo --numbers 123,456 --include-comments
|
||||
|
||||
`--numbers` is the safest way to refresh specific issues or PRs — it bypasses list ordering and the updated-time window, fetching exactly the rows you ask for. Pair it with `--include-comments` and/or `--include-pr-details` to hydrate the conversation and PR-only data at the same time.
|
||||
|
||||
This is also what the `gh` shim uses internally for [auto-hydration](./gh-shim#auto-hydration).
|
||||
This is also what the `gh` shim uses internally for [auto-hydration](/gh-shim/#auto-hydration).
|
||||
|
||||
## Hydration depth
|
||||
|
||||
@ -68,7 +68,7 @@ This is also what the `gh` shim uses internally for [auto-hydration](./gh-shim#a
|
||||
| `--include-pr-details` | PR files, commits, status checks, workflow runs |
|
||||
| `--with pr-details` | Same as `--include-pr-details` (gh-style flag) |
|
||||
|
||||
PR details land in `pr_files`, `pr_commits`, `pr_checks`, and `pr_runs` tables and back the `gh pr view`, `gh pr checks`, and `gh run list/view` shim paths. See [gh shim](./gh-shim).
|
||||
PR details land in `pr_files`, `pr_commits`, `pr_checks`, and `pr_runs` tables and back the `gh pr view`, `gh pr checks`, and `gh run list/view` shim paths. See [gh shim](/gh-shim/).
|
||||
|
||||
`--include-code` is accepted for compatibility but is currently a no-op.
|
||||
|
||||
@ -150,6 +150,6 @@ gitcrawl sync owner/repo --numbers "$NUMS" --with pr-details
|
||||
|
||||
## See also
|
||||
|
||||
- [Refresh and embed](./refresh-and-embed) — the wrapper that runs sync, embed, and cluster end to end
|
||||
- [gh shim](./gh-shim) — how synced PR details power `gh pr view` / `gh pr checks` / `gh run` from local cache
|
||||
- [Portable stores](./portable-stores) — sharing the synced cache across machines
|
||||
- [Refresh and embed](/refresh-and-embed/) — the wrapper that runs sync, embed, and cluster end to end
|
||||
- [gh shim](/gh-shim/) — how synced PR details power `gh pr view` / `gh pr checks` / `gh run` from local cache
|
||||
- [Portable stores](/portable-stores/) — sharing the synced cache across machines
|
||||
|
||||
@ -91,7 +91,7 @@ Member actions:
|
||||
- Local close / reopen this thread
|
||||
- Exclude from cluster
|
||||
|
||||
These map directly onto the [governance](./governance) commands. Anything you can do interactively, you can also script.
|
||||
These map directly onto the [governance](/governance/) commands. Anything you can do interactively, you can also script.
|
||||
|
||||
## Display rules
|
||||
|
||||
|
||||
Loading…
Reference in New Issue
Block a user