docs: define gitcrawl scope
This commit is contained in:
commit
f13029e3d1
20
.gitignore
vendored
Normal file
20
.gitignore
vendored
Normal file
@ -0,0 +1,20 @@
|
||||
.DS_Store
|
||||
.env
|
||||
.env.local
|
||||
|
||||
bin/
|
||||
dist/
|
||||
coverage/
|
||||
|
||||
*.db
|
||||
*.db-shm
|
||||
*.db-wal
|
||||
*.sqlite
|
||||
*.sqlite-shm
|
||||
*.sqlite-wal
|
||||
|
||||
data/
|
||||
tmp/
|
||||
cache/
|
||||
logs/
|
||||
vectors/
|
||||
21
LICENSE
Normal file
21
LICENSE
Normal file
@ -0,0 +1,21 @@
|
||||
MIT License
|
||||
|
||||
Copyright (c) 2026 OpenClaw
|
||||
|
||||
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
of this software and associated documentation files (the "Software"), to deal
|
||||
in the Software without restriction, including without limitation the rights
|
||||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
copies of the Software, and to permit persons to whom the Software is
|
||||
furnished to do so, subject to the following conditions:
|
||||
|
||||
The above copyright notice and this permission notice shall be included in all
|
||||
copies or substantial portions of the Software.
|
||||
|
||||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||
SOFTWARE.
|
||||
45
README.md
Normal file
45
README.md
Normal file
@ -0,0 +1,45 @@
|
||||
# gitcrawl
|
||||
|
||||
`gitcrawl` is a local-first GitHub issue and pull request crawler for maintainer triage.
|
||||
|
||||
It is the Go implementation of the `ghcrawl` product contract, minus the local HTTP API. Data stays local in SQLite. The primary runtime surfaces are the CLI, JSON command output, and a future TUI.
|
||||
|
||||
## Status
|
||||
|
||||
Early bootstrap. The implementation is being built in small commits.
|
||||
|
||||
## Planned Commands
|
||||
|
||||
```bash
|
||||
gitcrawl init
|
||||
gitcrawl doctor
|
||||
gitcrawl sync owner/repo
|
||||
gitcrawl refresh owner/repo
|
||||
gitcrawl clusters owner/repo --json
|
||||
gitcrawl cluster-detail owner/repo --id 123 --json
|
||||
gitcrawl search owner/repo --query "download stalls" --json
|
||||
gitcrawl tui owner/repo
|
||||
```
|
||||
|
||||
`serve` is intentionally not part of `gitcrawl`.
|
||||
|
||||
## Local Defaults
|
||||
|
||||
- config: `~/.config/gitcrawl/config.toml`
|
||||
- database: `~/.config/gitcrawl/gitcrawl.db`
|
||||
- cache: `~/.config/gitcrawl/cache`
|
||||
- vectors: `~/.config/gitcrawl/vectors`
|
||||
- logs: `~/.config/gitcrawl/logs`
|
||||
|
||||
## Requirements
|
||||
|
||||
- Go 1.26+
|
||||
- a GitHub token for sync commands
|
||||
- an OpenAI API key only for summary and embedding commands
|
||||
|
||||
## Development
|
||||
|
||||
```bash
|
||||
go test ./...
|
||||
go build ./cmd/gitcrawl
|
||||
```
|
||||
110
SPEC.md
Normal file
110
SPEC.md
Normal file
@ -0,0 +1,110 @@
|
||||
# gitcrawl Spec
|
||||
|
||||
## Product Contract
|
||||
|
||||
`gitcrawl` is a Go implementation of `ghcrawl` for local-first GitHub maintainer triage.
|
||||
|
||||
The target is functional parity with `ghcrawl` except that `gitcrawl` does not expose a local HTTP API.
|
||||
|
||||
## In Scope
|
||||
|
||||
- local SQLite storage
|
||||
- metadata-first GitHub sync for open issues and pull requests
|
||||
- optional comment, review, review-comment, and PR code hydration
|
||||
- canonical thread document building
|
||||
- FTS search
|
||||
- OpenAI summaries and embeddings
|
||||
- deterministic fingerprints
|
||||
- vector search
|
||||
- clustering and durable cluster governance
|
||||
- portable sync export/import
|
||||
- CLI JSON surfaces for automation and agents
|
||||
- TUI browsing after core JSON contracts settle
|
||||
|
||||
## Out Of Scope
|
||||
|
||||
- local HTTP API
|
||||
- hosted service runtime
|
||||
- browser web UI
|
||||
- GitHub write-back actions
|
||||
|
||||
## Architecture
|
||||
|
||||
- `cmd/gitcrawl`: executable entrypoint
|
||||
- `internal/cli`: command parsing and output
|
||||
- `internal/config`: config and env resolution
|
||||
- `internal/store`: SQLite schema and persistence
|
||||
- `internal/github`: GitHub API client
|
||||
- `internal/syncer`: repository sync workflows
|
||||
- `internal/documents`: canonical document generation
|
||||
- `internal/openai`: OpenAI summaries and embeddings
|
||||
- `internal/vector`: vector search abstraction
|
||||
- `internal/cluster`: similarity and durable cluster governance
|
||||
- `internal/search`: keyword, semantic, and hybrid search
|
||||
- `internal/portable`: compact sync export/import
|
||||
- `internal/tui`: terminal UI
|
||||
|
||||
## Command Surface
|
||||
|
||||
No `serve` command.
|
||||
|
||||
Planned public commands:
|
||||
|
||||
- `init`
|
||||
- `doctor`
|
||||
- `configure`
|
||||
- `version`
|
||||
- `sync`
|
||||
- `refresh`
|
||||
- `summarize`
|
||||
- `key-summaries`
|
||||
- `embed`
|
||||
- `cluster`
|
||||
- `threads`
|
||||
- `runs`
|
||||
- `clusters`
|
||||
- `durable-clusters`
|
||||
- `cluster-detail`
|
||||
- `cluster-explain`
|
||||
- `neighbors`
|
||||
- `search`
|
||||
- `close-thread`
|
||||
- `close-cluster`
|
||||
- `exclude-cluster-member`
|
||||
- `include-cluster-member`
|
||||
- `set-cluster-canonical`
|
||||
- `merge-clusters`
|
||||
- `split-cluster`
|
||||
- `export-sync`
|
||||
- `import-sync`
|
||||
- `validate-sync`
|
||||
- `portable-size`
|
||||
- `sync-status`
|
||||
- `optimize`
|
||||
- `tui`
|
||||
- `completion`
|
||||
|
||||
## Config
|
||||
|
||||
Default config path:
|
||||
|
||||
```text
|
||||
~/.config/gitcrawl/config.toml
|
||||
```
|
||||
|
||||
Default database path:
|
||||
|
||||
```text
|
||||
~/.config/gitcrawl/gitcrawl.db
|
||||
```
|
||||
|
||||
Primary environment variables:
|
||||
|
||||
- `GITCRAWL_CONFIG`
|
||||
- `GITHUB_TOKEN`
|
||||
- `OPENAI_API_KEY`
|
||||
- `GITCRAWL_DB_PATH`
|
||||
- `GITCRAWL_SUMMARY_MODEL`
|
||||
- `GITCRAWL_EMBED_MODEL`
|
||||
|
||||
Legacy `GHCRAWL_*` aliases should be supported where the compatibility cost is low.
|
||||
Loading…
Reference in New Issue
Block a user