Compare commits

..

1 Commits

Author SHA1 Message Date
Vincent Koc
e3dba2b005
ci(release): update goreleaser action
Some checks failed
Validation / validate (push) Has been cancelled
2026-04-27 14:14:38 -07:00
35 changed files with 319 additions and 2843 deletions

View File

@ -1,103 +0,0 @@
---
name: notcrawl
description: Use for local Notion archive search, sync freshness, Markdown/database exports, git-share snapshots, and Notcrawl repo/release work.
---
# Notcrawl
Use local archive data first for Notion questions. Browse or hit the Notion API
only when the archive is stale, missing the requested scope, or the user asks
for current external context.
## Sources
- DB: `~/.notcrawl/notcrawl.db`
- Config: `~/.notcrawl/config.toml`
- Cache: `~/.notcrawl/cache`
- Markdown archive: `~/.notcrawl/pages`
- Git share repo: `~/.notcrawl/share`
- Repo: `~/GIT/_Perso/notcrawl`
- Preferred CLI: `notcrawl`; fallback to `go run ./cmd/notcrawl` from the repo if the installed binary is stale
## Freshness
For recent/current questions, check freshness before analysis:
```bash
sqlite3 ~/.notcrawl/notcrawl.db \
"select coalesce(max(synced_at), 0) from sync_state;"
```
Routine refresh:
```bash
notcrawl doctor
notcrawl sync --source desktop
```
API refresh:
```bash
notcrawl sync --source api
```
Use `notcrawl sync --source all` only when both desktop and API sources are
configured and the broader refresh is intentional.
## Query Workflow
1. Resolve scope: workspace, teamspace, page, database, author, keyword, or date range.
2. Check freshness for recent/current requests.
3. Use CLI for normal reads; use read-only SQL for precise counts/rankings.
4. Report absolute date spans, counts, page/database titles, and known gaps.
Common commands:
```bash
notcrawl search "query"
notcrawl databases
notcrawl report
notcrawl sql "select count(*) from pages;"
```
## SQL
Use `notcrawl sql` for exact counts, joins, and database/page inventory queries
when normal CLI reads are too coarse. The command only allows read-only
`select`, `with`, and `pragma` queries.
Useful examples:
```bash
notcrawl sql "select count(*) as pages from pages;"
notcrawl sql "select parent_table, count(*) as pages from pages group by parent_table order by pages desc;"
notcrawl sql "select title, last_edited_time from pages order by coalesce(last_edited_time, created_time, 0) desc limit 20;"
```
Do not use SQL to mutate the archive.
When the installed CLI lacks a new feature, build or run from
`~/GIT/_Perso/notcrawl` before concluding the feature is missing.
## Notion Boundaries
Desktop mode snapshots the local Notion SQLite database read-only and must not
write to Notion application storage. API mode requires `NOTION_TOKEN`; do not
invent token availability. Git-share snapshots and Markdown exports must not
include secrets.
## Verification
For repo edits, prefer existing Go gates:
```bash
GOWORK=off go test ./...
```
Then run targeted CLI smoke for the touched surface, for example:
```bash
notcrawl doctor
notcrawl status --json
notcrawl search "test"
```

View File

@ -1,12 +0,0 @@
root = true
[*]
charset = utf-8
end_of_line = lf
insert_final_newline = true
indent_style = tab
indent_size = 4
[*.{md,yml,yaml,json,toml}]
indent_style = space
indent_size = 2

6
.gitattributes vendored
View File

@ -1,6 +0,0 @@
* text=auto
*.go text eol=lf
*.md text eol=lf
*.toml text eol=lf
*.yml text eol=lf
*.yaml text eol=lf

12
.github/CODEOWNERS vendored
View File

@ -1,12 +0,0 @@
# Protect ownership and automation rules.
/.github/CODEOWNERS @openclaw/openclaw-secops
/.github/dependabot.yml @openclaw/openclaw-secops
/.github/release-drafter.yml @openclaw/openclaw-secops
/.github/workflows/ @openclaw/openclaw-secops
# Release and package integrity surfaces.
/.goreleaser.yml @openclaw/openclaw-secops
/go.mod @openclaw/openclaw-secops
/go.sum @openclaw/openclaw-secops
/scripts/*release* @openclaw/openclaw-secops
/scripts/*publish* @openclaw/openclaw-secops

View File

@ -1,13 +0,0 @@
version: 2
updates:
- package-ecosystem: gomod
directory: /
schedule:
interval: weekly
open-pull-requests-limit: 10
- package-ecosystem: github-actions
directory: /
schedule:
interval: weekly
open-pull-requests-limit: 10

View File

@ -20,7 +20,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Assign maintainer
uses: actions/github-script@v9
uses: actions/github-script@v8
with:
script: |
const assignee = "vincentkoc";

View File

@ -1,128 +0,0 @@
name: CI
on:
pull_request:
push:
branches:
- main
- feat/use-crawlkit
workflow_dispatch:
permissions:
contents: read
concurrency:
group: ci-${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
env:
FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: "true"
jobs:
deps:
runs-on: ubuntu-latest
timeout-minutes: 15
steps:
- name: Checkout
uses: actions/checkout@v6
- name: Setup Go
uses: actions/setup-go@v6
with:
go-version-file: go.mod
cache: true
- name: Verify module cache
run: go mod verify
- name: Check go.mod tidy
run: |
go mod tidy
git diff --exit-code -- go.mod go.sum
- name: Install govulncheck
run: go install golang.org/x/vuln/cmd/govulncheck@v1.3.0
- name: Run govulncheck
run: '"$(go env GOPATH)/bin/govulncheck" ./...'
lint:
runs-on: ubuntu-latest
timeout-minutes: 15
steps:
- name: Checkout
uses: actions/checkout@v6
- name: Setup Go
uses: actions/setup-go@v6
with:
go-version-file: go.mod
cache: true
- name: Check formatting
run: |
changed="$(gofmt -l .)"
if [ -n "$changed" ]; then
printf 'gofmt wants changes in:\n%s\n' "$changed"
exit 1
fi
- name: Vet
run: go vet ./...
test:
runs-on: ubuntu-latest
timeout-minutes: 20
steps:
- name: Checkout
uses: actions/checkout@v6
- name: Setup Go
uses: actions/setup-go@v6
with:
go-version-file: go.mod
cache: true
- name: Test
run: go test -count=1 ./...
- name: Build CLI
run: go build -ldflags "-X main.version=ci" -o bin/notcrawl ./cmd/notcrawl
- name: Smoke test CLI control surface
run: |
set -euo pipefail
output="$(./bin/notcrawl --help 2>&1)"
printf '%s\n' "$output"
printf '%s' "$output" | grep -q "Usage of notcrawl:"
printf '%s' "$output" | grep -q "metadata"
printf '%s' "$output" | grep -q "tui"
test "$(./bin/notcrawl --version)" = "ci"
./bin/notcrawl metadata --json | grep -q '"schema_version"'
cfg="$RUNNER_TEMP/notcrawl.toml"
db="$RUNNER_TEMP/notcrawl.db"
./bin/notcrawl --config "$cfg" init
./bin/notcrawl --config "$cfg" --db "$db" status --json | grep -q '"databases"'
./bin/notcrawl --config "$cfg" --db "$db" tui --json --limit 1 | grep -q '^\['
release-check:
runs-on: ubuntu-latest
timeout-minutes: 15
steps:
- name: Checkout
uses: actions/checkout@v6
with:
fetch-depth: 0
- name: Setup Go
uses: actions/setup-go@v6
with:
go-version-file: go.mod
cache: true
- name: Snapshot release build
uses: goreleaser/goreleaser-action@v7.2.1
with:
distribution: goreleaser
version: "~> v2"
args: release --snapshot --clean --skip=publish

View File

@ -1,41 +0,0 @@
name: CodeQL
on:
pull_request:
push:
branches:
- main
- feat/use-crawlkit
schedule:
- cron: "37 4 * * 1"
workflow_dispatch:
permissions:
actions: read
contents: read
security-events: write
env:
FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: "true"
jobs:
analyze:
name: analyze
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v6
- name: Setup Go
uses: actions/setup-go@v6
with:
go-version-file: go.mod
cache: true
- name: Initialize CodeQL
uses: github/codeql-action/init@v4
with:
languages: go
- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v4

View File

@ -27,27 +27,19 @@ jobs:
update-tap:
runs-on: ubuntu-latest
if: startsWith(github.event.release.tag_name || inputs.tag_name, 'v')
env:
TAP_REPO: ${{ vars.HOMEBREW_TAP_REPO || 'vincentkoc/homebrew-tap' }}
steps:
- name: Validate tap configuration
env:
GH_TOKEN: ${{ secrets.HOMEBREW_TAP_GITHUB_TOKEN }}
run: |
set -euo pipefail
if [ -z "${GH_TOKEN}" ]; then
if [ -z "${{ secrets.HOMEBREW_TAP_GITHUB_TOKEN }}" ]; then
echo "Secret HOMEBREW_TAP_GITHUB_TOKEN is required."
exit 1
fi
if [ "$(gh api "repos/${TAP_REPO}" --jq '.permissions.push // false')" != "true" ]; then
echo "HOMEBREW_TAP_GITHUB_TOKEN must have push access to ${TAP_REPO}."
exit 1
fi
- name: Checkout tap repository
uses: actions/checkout@v6
uses: actions/checkout@v5
with:
repository: ${{ env.TAP_REPO }}
repository: ${{ vars.HOMEBREW_TAP_REPO || 'vincentkoc/tap' }}
token: ${{ secrets.HOMEBREW_TAP_GITHUB_TOKEN }}
- name: Update formula
@ -56,6 +48,7 @@ jobs:
SOURCE_REPO: ${{ github.repository }}
run: |
set -euo pipefail
VERSION="${TAG#v}"
SOURCE_URL="https://github.com/${SOURCE_REPO}/archive/refs/tags/${TAG}.tar.gz"
curl -fsSL "${SOURCE_URL}" -o /tmp/notcrawl-src.tar.gz
@ -69,11 +62,12 @@ jobs:
url "${SOURCE_URL}"
sha256 "${SHA256}"
license "MIT"
version "${VERSION}"
depends_on "go" => :build
def install
system "go", "build", *std_go_args(ldflags: "-s -w -X main.version=#{version}"), "./cmd/notcrawl"
system "go", "build", *std_go_args(ldflags: "-s -w"), "./cmd/notcrawl"
pkgshare.install "config.example.toml"
doc.install "README.md", "LICENSE", "SPEC.md"
end

View File

@ -23,7 +23,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v6
uses: actions/checkout@v5
with:
fetch-depth: 0
ref: ${{ inputs.tag_name || github.ref }}

View File

@ -1,63 +0,0 @@
name: "Security Gate: Secret Scanning"
on:
push:
branches: ["**"]
pull_request:
branches: [main, master]
permissions: {}
jobs:
trufflehog:
name: Scan for Verified Secrets
runs-on: ubuntu-latest
permissions:
contents: read
steps:
- name: Checkout code
uses: actions/checkout@v6
with:
fetch-depth: 0
- name: Resolve scan range
id: scan_range
env:
EVENT_NAME: ${{ github.event_name }}
PR_BASE_SHA: ${{ github.event.pull_request.base.sha }}
PR_HEAD_SHA: ${{ github.event.pull_request.head.sha }}
PUSH_BASE_SHA: ${{ github.event.before }}
PUSH_HEAD_SHA: ${{ github.sha }}
DEFAULT_BRANCH: ${{ github.event.repository.default_branch }}
run: |
set -euo pipefail
zero_sha="0000000000000000000000000000000000000000"
if [[ "$EVENT_NAME" == "pull_request" ]]; then
base="$PR_BASE_SHA"
head="$PR_HEAD_SHA"
else
base="$PUSH_BASE_SHA"
head="$PUSH_HEAD_SHA"
if [[ -z "$base" || "$base" == "$zero_sha" ]]; then
base="origin/$DEFAULT_BRANCH"
fi
fi
echo "base=$base" >> "$GITHUB_OUTPUT"
echo "head=$head" >> "$GITHUB_OUTPUT"
- name: TruffleHog OSS
id: trufflehog
uses: trufflesecurity/trufflehog@v3.95.2
with:
path: ./
base: ${{ steps.scan_range.outputs.base }}
head: ${{ steps.scan_range.outputs.head }}
extra_args: --only-verified --debug
- name: Notify on failure
if: steps.trufflehog.outcome == 'failure'
run: |
echo "::error::Verified secrets found. Rotate the credential before merging."
exit 1

View File

@ -1,86 +0,0 @@
name: Stale
on:
schedule:
- cron: "33 4 * * *"
workflow_dispatch:
permissions: {}
jobs:
stale:
permissions:
issues: write
pull-requests: write
runs-on: ubuntu-latest
steps:
- name: Mark stale unassigned issues and pull requests
uses: actions/stale@v10
with:
days-before-issue-stale: 14
days-before-issue-close: 7
days-before-pr-stale: 14
days-before-pr-close: 7
stale-issue-label: stale
stale-pr-label: stale
exempt-issue-labels: enhancement,maintainer,pinned,security,no-stale
exempt-pr-labels: maintainer,no-stale
operations-per-run: 1000
ascending: true
exempt-all-assignees: true
remove-stale-when-updated: true
stale-issue-message: |
This issue has been automatically marked as stale due to inactivity.
Please add updated notcrawl details or it will be closed.
stale-pr-message: |
This pull request has been automatically marked as stale due to inactivity.
Please update it or it will be closed.
close-issue-message: |
Closing due to inactivity.
If this still affects notcrawl, open a new issue with current reproduction details.
close-issue-reason: not_planned
close-pr-message: |
Closing due to inactivity.
If this PR should be revived, reopen it with current context and validation.
- name: Mark stale assigned issues
uses: actions/stale@v10
with:
days-before-issue-stale: 30
days-before-issue-close: 10
days-before-pr-stale: -1
days-before-pr-close: -1
stale-issue-label: stale
exempt-issue-labels: enhancement,maintainer,pinned,security,no-stale
operations-per-run: 1000
ascending: true
include-only-assigned: true
remove-stale-when-updated: true
stale-issue-message: |
This assigned issue has been automatically marked as stale after 30 days of inactivity.
Please add an update or it will be closed.
close-issue-message: |
Closing due to inactivity.
If this still affects notcrawl, reopen or file a new issue with current evidence.
close-issue-reason: not_planned
- name: Mark stale assigned pull requests
uses: actions/stale@v10
with:
days-before-issue-stale: -1
days-before-issue-close: -1
days-before-pr-stale: 27
days-before-pr-close: 7
stale-pr-label: stale
exempt-pr-labels: maintainer,no-stale
operations-per-run: 1000
ascending: true
include-only-assigned: true
ignore-pr-updates: true
remove-stale-when-updated: true
stale-pr-message: |
This assigned pull request has been automatically marked as stale after being open for 27 days.
Please add an update or it will be closed.
close-pr-message: |
Closing due to inactivity.
If this PR should be revived, reopen it with current context and validation.

41
.github/workflows/validation.yml vendored Normal file
View File

@ -0,0 +1,41 @@
name: Validation
on:
push:
branches:
- "**"
pull_request:
env:
FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: "true"
jobs:
validate:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v5
- name: Setup Go
uses: actions/setup-go@v6
with:
go-version-file: go.mod
cache: true
- name: Format check
run: |
set -euo pipefail
test -z "$(gofmt -l .)"
- name: Run tests
run: go test ./...
- name: Build CLI
run: go build ./cmd/notcrawl
- name: Smoke test CLI help
run: |
set -euo pipefail
output="$(go run ./cmd/notcrawl --help 2>&1)"
printf '%s\n' "$output"
printf '%s' "$output" | grep -q "Usage of notcrawl:"

View File

@ -17,7 +17,7 @@ builds:
flags:
- -trimpath
ldflags:
- -s -w -X main.version={{ .Version }}
- -s -w
archives:
- id: bundles

View File

@ -1,35 +0,0 @@
# Changelog
## Unreleased
- Bump routine GitHub Actions dependencies.
- Add a repo-local `notcrawl` agent skill for local archive, freshness, query,
and verification workflows.
- Document `notcrawl sql` read-only query examples in the repo-local agent
skill so agents can do exact archive counts and inventory checks safely.
- Replace the single validation workflow with CI jobs for dependencies,
formatting/vet, tests, CLI control-surface smoke checks, and GoReleaser
snapshot builds.
- Add CodeQL analysis on pull requests, `main`, the crawlkit integration branch,
weekly schedule, and manual dispatch.
- Depend on `github.com/vincentkoc/crawlkit v0.4.0` for shared config,
status/control, snapshot, mirror, output, and terminal explorer mechanics.
- Keep Notion API/Desktop parsing, Markdown rendering, page/comment/database
schemas, Notion FTS body construction, and data-source compatibility
app-owned while the shared mechanics move to crawlkit.
- Document the gitcrawl-style document TUI shape: workspace/teamspace/page or
database groups, page/database rows, preview/comment detail, sorting, mouse
selection, right-click actions, and local/remote status chrome.
- Add crawlkit control metadata/status surfaces with `metadata --json`, `status --json`, and `doctor --json`.
- Report primary archive and desktop-cache SQLite inventories in status JSON for shared local control surfaces.
- Add `notcrawl tui`, a local terminal browser for archived pages and databases backed by `crawlkit/tui`.
- Render TUI rows with compact panes so page and database metadata stays in context/detail instead of crowding the row list.
- Resolve database parent names for the TUI parent pane so collection nesting is readable instead of raw IDs.
- Hide noisy block-derived Notion parent labels in the TUI by falling back to the workspace label when parent text contains raw Notion identifiers.
- Resolve block-parent pages to their owning page when possible so the TUI parent pane shows real Notion hierarchy instead of broad workspace buckets.
- Normalize workspace-level Notion parents as `Workspace: <name>` so the TUI left pane does not split the same workspace into duplicate parent groups.
- Inherit shared crawlkit TUI improvements for newest-first startup, count-header sorting, preview-first document detail panes, and gitcrawl-style metadata labels.
- Feed longer, block-shaped Notion page previews into the TUI detail pane so pages read more like documents instead of flat metadata.
- Include page comments in Notion TUI previews after block content.
- Route the TUI through read-only SQLite access and cover the JSON fallback in tests.

View File

@ -25,7 +25,6 @@ to without holding Notion credentials.
- normalized Markdown export organized by Unicode-safe workspace, teamspace, and page paths
- CSV/TSV export for crawled Notion database rows
- compressed JSONL git-share snapshots plus import/update workflows
- terminal archive browser for quick local page/database inspection
- archive status, activity reporting, and SQLite maintenance commands
- read-only SQL access for ad hoc inspection
@ -51,7 +50,6 @@ notcrawl report
notcrawl sync --source desktop
notcrawl export-md
notcrawl search "launch plan"
notcrawl tui
```
Or use the official Notion API:
@ -61,7 +59,6 @@ export NOTION_TOKEN="secret_..."
notcrawl sync --source api
notcrawl databases
notcrawl export-db --database DATABASE_ID --format csv --output roadmap.csv
notcrawl export-db --all --dir exports/csv
```
Default paths:
@ -77,35 +74,18 @@ Default paths:
- `init` writes a starter config
- `doctor` checks config, SQLite, desktop cache, and token presence
- `status` prints archive counts, last sync time, and database/WAL size
- `metadata --json`, `status --json`, and `doctor --json` expose crawlkit
control/status payloads for launchers, automation, and CI
- `report` summarizes recent page, database, space, and comment activity
- `maintain` rebuilds FTS, optimizes SQLite indexes, and can run `VACUUM`
- `sync` ingests from `desktop`, `api`, or `all`
- `export-md` renders normalized Markdown files from SQLite
- `databases` lists crawled Notion databases
- `export-db` exports one crawled Notion database, or all databases with `--all --dir`, to CSV or TSV
- `export-db` exports a crawled Notion database to CSV or TSV
- `search` searches page and comment text through FTS5
- `tui` opens the terminal archive browser for pages and databases
- `sql` runs read-only SQL against the archive
- `publish` exports SQLite tables and Markdown into a git share repo
- `subscribe` clones a share repo and imports the latest snapshot
- `update` pulls and imports a subscribed share repo
## Shared crawlkit surfaces
`notcrawl` uses `crawlkit` for standard config paths, SQLite open/read helpers,
snapshot packing/import, git-backed archive sharing, output formatting, status
payloads, and the shared terminal explorer. Notion API/Desktop parsing,
Markdown rendering, page/comment/database schemas, and Notion FTS bodies remain
owned by `notcrawl`.
The TUI follows the gitcrawl-style three-pane model: workspace/teamspace/page or
database groups on the left, pages/databases in the middle, and a readable
document preview plus comments and metadata on the right. It supports pane
focus, sortable headers, mouse selection, right-click actions, and a
local/remote footer.
## Distribution
Release packaging is managed with GoReleaser. Tagged releases build tarballs,

View File

@ -163,7 +163,6 @@ those pages through `pages.collection_id`.
```text
notcrawl export-db --database <database-id> --format csv --output rows.csv
notcrawl export-db --database <database-id> --format tsv --output rows.tsv
notcrawl export-db --all --dir exports/csv
```
The first columns are stable metadata:

File diff suppressed because it is too large Load Diff

View File

@ -1,18 +1,6 @@
package main
import (
"bytes"
"context"
"encoding/json"
"errors"
"fmt"
"os"
"path/filepath"
"strings"
"testing"
"github.com/vincentkoc/notcrawl/internal/store"
)
import "testing"
func TestSearchFieldCollapsesRecordSeparators(t *testing.T) {
got := searchField("line one\nline\ttwo line three")
@ -20,431 +8,3 @@ func TestSearchFieldCollapsesRecordSeparators(t *testing.T) {
t.Fatalf("unexpected field: %q", got)
}
}
func TestTUIJSONListsArchiveRowsWithoutMutation(t *testing.T) {
ctx := context.Background()
dir := t.TempDir()
dbPath := filepath.Join(dir, "notcrawl.db")
st, err := store.Open(dbPath)
if err != nil {
t.Fatal(err)
}
now := store.NowMS()
if err := st.UpsertCollection(ctx, store.Collection{ID: "db1", Name: "Roadmap", Source: "test", SyncedAt: now}); err != nil {
t.Fatal(err)
}
if err := st.UpsertPage(ctx, store.Page{
ID: "page1",
CollectionID: "db1",
Title: "Launch Plan",
URL: "https://example.com/launch",
Alive: true,
Source: "test",
SyncedAt: now,
LastEditedTime: now,
}); err != nil {
t.Fatal(err)
}
if err := st.UpsertBlock(ctx, store.Block{
ID: "block1",
PageID: "page1",
ParentID: "page1",
Type: "bulleted_list",
Text: "sync launch checklist",
DisplayOrder: 1,
Alive: true,
Source: "test",
SyncedAt: now,
}); err != nil {
t.Fatal(err)
}
if err := st.Close(); err != nil {
t.Fatal(err)
}
before, err := os.ReadFile(dbPath)
if err != nil {
t.Fatal(err)
}
var stdout, stderr bytes.Buffer
err = run(ctx, []string{"--config", filepath.Join(dir, "missing.toml"), "--db", dbPath, "tui", "--json"}, &stdout, &stderr)
if err != nil {
t.Fatalf("tui --json failed: %v\nstderr:\n%s", err, stderr.String())
}
var rows []map[string]any
if err := json.Unmarshal(stdout.Bytes(), &rows); err != nil {
t.Fatalf("invalid json: %v\n%s", err, stdout.String())
}
if len(rows) == 0 || rows[0]["title"] != "Launch Plan" || rows[0]["source"] != "notion" || rows[0]["kind"] != "page" || rows[0]["container"] != "Roadmap" || !strings.Contains(fmt.Sprint(rows[0]["text"]), "sync launch checklist") || !strings.Contains(fmt.Sprint(rows[0]["detail"]), "sync launch checklist") {
t.Fatalf("unexpected rows: %#v", rows)
}
after, err := os.ReadFile(dbPath)
if err != nil {
t.Fatal(err)
}
if !bytes.Equal(before, after) {
t.Fatal("tui --json mutated the sqlite database")
}
}
func TestTUIAllRowsIncludesDatabasesWhenPagesHitLimit(t *testing.T) {
ctx := context.Background()
dir := t.TempDir()
dbPath := filepath.Join(dir, "notcrawl.db")
st, err := store.Open(dbPath)
if err != nil {
t.Fatal(err)
}
now := store.NowMS()
if err := st.UpsertCollection(ctx, store.Collection{ID: "db1", Name: "Roadmap", Source: "test", SyncedAt: now}); err != nil {
t.Fatal(err)
}
for _, title := range []string{"Launch Plan", "Backlog"} {
if err := st.UpsertPage(ctx, store.Page{ID: title, CollectionID: "db1", Title: title, Alive: true, Source: "test", SyncedAt: now, LastEditedTime: now}); err != nil {
t.Fatal(err)
}
}
if err := st.Close(); err != nil {
t.Fatal(err)
}
var stdout, stderr bytes.Buffer
err = run(ctx, []string{"--config", filepath.Join(dir, "missing.toml"), "--db", dbPath, "tui", "--json", "--limit", "1"}, &stdout, &stderr)
if err != nil {
t.Fatalf("tui --json failed: %v\nstderr:\n%s", err, stderr.String())
}
var rows []map[string]any
if err := json.Unmarshal(stdout.Bytes(), &rows); err != nil {
t.Fatalf("invalid json: %v\n%s", err, stdout.String())
}
seen := map[string]bool{}
for _, row := range rows {
seen[fmt.Sprint(row["kind"])] = true
}
if !seen["page"] || !seen["database"] {
t.Fatalf("all rows should include pages and databases despite page limit: %#v", rows)
}
}
func TestCollectionTUIRowsResolveParentCollectionNames(t *testing.T) {
rows := collectionTUIRows([]store.Collection{{
ID: "child-db",
SpaceID: "space1",
ParentID: "parent-db",
ParentTable: "collection",
Name: "Child",
Source: "test",
}}, 10, nil, map[string]string{"parent-db": "Parent Database"}, map[string]string{"space1": "Workspace"})
if len(rows) != 1 {
t.Fatalf("rows = %#v", rows)
}
if rows[0].ParentID != "Parent Database" {
t.Fatalf("parent label = %q", rows[0].ParentID)
}
if rows[0].Scope != "Workspace" {
t.Fatalf("scope = %q", rows[0].Scope)
}
if !strings.Contains(rows[0].Detail, "Parent: Parent Database") {
t.Fatalf("detail = %q", rows[0].Detail)
}
}
func TestTUIRowsHideRawNotionParentIDs(t *testing.T) {
rows := pageTUIRows([]store.Page{{
ID: "page1",
SpaceID: "space1",
ParentID: "space:00b8cbcf-c520-4790-999a-9c2940263721",
ParentTable: "space",
CollectionID: "",
Title: "Launch Plan",
Alive: true,
Source: "test",
LastEditedTime: 1000,
}}, 10, nil, nil, map[string]string{"space1": "Comet.com", "00b8cbcf-c520-4790-999a-9c2940263721": "Comet.com"}, nil, nil)
if len(rows) != 1 {
t.Fatalf("rows = %#v", rows)
}
if rows[0].ParentID != "Workspace: Comet.com" {
t.Fatalf("parent label = %q", rows[0].ParentID)
}
rows = pageTUIRows([]store.Page{{
ID: "page2",
SpaceID: "space1",
ParentID: "330b54b1-d7cc-4cd7-96bc-4d705b5f37bf",
ParentTable: "block",
Title: "Nested",
Alive: true,
Source: "test",
}}, 10, nil, nil, map[string]string{"space1": "Comet.com"}, nil, nil)
if rows[0].ParentID != "Workspace: Comet.com" {
t.Fatalf("workspace fallback parent = %q", rows[0].ParentID)
}
}
func TestTUIRowsHideNoisyNotionBlockParentLabels(t *testing.T) {
rows := pageTUIRows([]store.Page{{
ID: "page1",
SpaceID: "space1",
ParentID: "block1",
ParentTable: "block",
Title: "Child",
Alive: true,
Source: "test",
}}, 10, map[string]string{
"block1": "ce 2fd71240-10a3-80a0-a65a-007aec07c0d9 00b8cbcf-c520-4790-999a-9c2940263721 Pods",
}, nil, map[string]string{"space1": "Comet.com"}, nil, nil)
if len(rows) != 1 {
t.Fatalf("rows = %#v", rows)
}
if rows[0].ParentID != "Workspace: Comet.com" {
t.Fatalf("noisy parent label = %q", rows[0].ParentID)
}
}
func TestTUIRowsResolveBlockParentToOwningPage(t *testing.T) {
rows := pageTUIRows([]store.Page{{
ID: "page1",
SpaceID: "space1",
ParentID: "block-child",
ParentTable: "block",
Title: "Nested",
Alive: true,
Source: "test",
}}, 10, map[string]string{
"parent-page": "Customer Folder",
}, nil, map[string]string{"space1": "Comet.com"}, map[string]store.ParentRef{
"block-child": {ID: "block-parent", Table: "block"},
"block-parent": {ID: "parent-page", Table: "page"},
}, nil)
if len(rows) != 1 {
t.Fatalf("rows = %#v", rows)
}
if rows[0].ParentID != "Customer Folder" {
t.Fatalf("resolved parent label = %q", rows[0].ParentID)
}
}
func TestBlockPreviewKeepsNotionPageShape(t *testing.T) {
blocks := []store.Block{
{Type: "heading_1", Text: "Launch Plan"},
{Type: "bulleted_list", Text: "ship tui"},
{Type: "to_do", Text: "verify local binary"},
{Type: "numbered_list", Text: "open terminal"},
{Type: "quote", Text: "keep it readable"},
{Type: "code", Text: "notcrawl tui"},
}
got := blockPreview(blocks, tuiPagePreviewMax)
for _, want := range []string{"# Launch Plan", "- ship tui", "- [ ] verify local binary", "1. open terminal", "> keep it readable", " notcrawl tui"} {
if !strings.Contains(got, want) {
t.Fatalf("preview missing %q:\n%s", want, got)
}
}
}
func TestBlockPreviewCleansLegacyNotionMarkers(t *testing.T) {
got := blockPreview([]store.Block{
{Type: "paragraph", Text: "Option A: b"},
{Type: "paragraph", Text: "Marketing Customer Reference Rights a https://example.com/sheet"},
}, tuiPagePreviewMax)
if strings.Contains(got, " a https://") || strings.Contains(got, ": b") {
t.Fatalf("preview leaked legacy markers:\n%s", got)
}
for _, want := range []string{"Option A:", "Marketing Customer Reference Rights <https://example.com/sheet>"} {
if !strings.Contains(got, want) {
t.Fatalf("preview missing %q:\n%s", want, got)
}
}
}
func TestBlockPreviewCompactsRepeatedLinkedPages(t *testing.T) {
got := blockPreview([]store.Block{{
Type: "paragraph",
Text: "linked page, linked page, linked page Add details",
}}, tuiPagePreviewMax)
if got != "linked pages Add details" {
t.Fatalf("got %q", got)
}
}
func TestPagePreviewIncludesComments(t *testing.T) {
got := pagePreview(
[]store.Block{{Type: "paragraph", Text: "status update"}},
[]store.Comment{{Text: "looks good"}, {Text: "ship it"}},
tuiPagePreviewMax,
)
for _, want := range []string{"status update", "## Comments", "- looks good", "- ship it"} {
if !strings.Contains(got, want) {
t.Fatalf("page preview missing %q:\n%s", want, got)
}
}
}
func TestHelpMentionsTUI(t *testing.T) {
var stdout bytes.Buffer
if err := run(context.Background(), []string{"--help"}, &stdout, &bytes.Buffer{}); err != nil {
t.Fatal(err)
}
if !strings.Contains(stdout.String(), "tui") {
t.Fatalf("help missing tui command:\n%s", stdout.String())
}
}
func TestHelpAfterGlobalFlagsHasNoSideEffects(t *testing.T) {
dir := t.TempDir()
configPath := filepath.Join(dir, "config.toml")
var stdout, stderr bytes.Buffer
err := run(context.Background(), []string{"--config", configPath, "--db", filepath.Join(dir, "notcrawl.db"), "--help"}, &stdout, &stderr)
if err != nil {
t.Fatal(err)
}
if !strings.Contains(stdout.String(), "Usage of notcrawl:") || !strings.Contains(stdout.String(), "tui") {
t.Fatalf("help missing usage:\n%s", stdout.String())
}
if stderr.String() != "" {
t.Fatalf("unexpected stderr:\n%s", stderr.String())
}
if _, err := os.Stat(configPath); !errors.Is(err, os.ErrNotExist) {
t.Fatalf("help should not write config, stat err=%v", err)
}
}
func TestInitHelpDoesNotWriteConfig(t *testing.T) {
dir := t.TempDir()
configPath := filepath.Join(dir, "config.toml")
var stdout, stderr bytes.Buffer
err := run(context.Background(), []string{"--config", configPath, "init", "--help"}, &stdout, &stderr)
if err != nil {
t.Fatal(err)
}
if !strings.Contains(stdout.String(), "Usage of init:") {
t.Fatalf("init help missing usage:\n%s", stdout.String())
}
if stderr.String() != "" {
t.Fatalf("unexpected stderr:\n%s", stderr.String())
}
if _, err := os.Stat(configPath); !errors.Is(err, os.ErrNotExist) {
t.Fatalf("init --help should not write config, stat err=%v", err)
}
}
func TestVersionFlagWorksWithOtherGlobalFlags(t *testing.T) {
var stdout bytes.Buffer
err := run(context.Background(), []string{"--config", filepath.Join(t.TempDir(), "missing.toml"), "--version"}, &stdout, &bytes.Buffer{})
if err != nil {
t.Fatal(err)
}
if got := strings.TrimSpace(stdout.String()); got != version {
t.Fatalf("version = %q", got)
}
}
func TestMetadataDoesNotMarkPlainTextCommandsAsJSON(t *testing.T) {
var stdout bytes.Buffer
if err := run(context.Background(), []string{"metadata"}, &stdout, &bytes.Buffer{}); err != nil {
t.Fatal(err)
}
var manifest struct {
Commands map[string]struct {
JSON bool `json:"json"`
} `json:"commands"`
}
if err := json.Unmarshal(stdout.Bytes(), &manifest); err != nil {
t.Fatalf("invalid metadata JSON: %v\n%s", err, stdout.String())
}
for _, name := range []string{"sync", "tap", "publish", "subscribe", "update"} {
if manifest.Commands[name].JSON {
t.Fatalf("%s should not be advertised as JSON", name)
}
}
for _, name := range []string{"status", "doctor", "tui-json"} {
if !manifest.Commands[name].JSON {
t.Fatalf("%s should be advertised as JSON", name)
}
}
}
func TestSyncEmitsProgressPercentToStderr(t *testing.T) {
dir := t.TempDir()
var stdout, stderr bytes.Buffer
err := run(context.Background(), []string{
"--config", filepath.Join(dir, "missing.toml"),
"--db", filepath.Join(dir, "notcrawl.db"),
"sync", "--source", "desktop",
}, &stdout, &stderr)
if err != nil {
t.Fatalf("sync failed: %v\nstdout:\n%s\nstderr:\n%s", err, stdout.String(), stderr.String())
}
logs := stderr.String()
for _, want := range []string{`msg="sync progress"`, `state=finished`, `percent=100.0`, `completion=100.0%`, `phase=desktop`} {
if !strings.Contains(logs, want) {
t.Fatalf("missing %q in progress logs:\n%s", want, logs)
}
}
}
func TestTUIHelpReturnsUsage(t *testing.T) {
var stdout bytes.Buffer
var stderr bytes.Buffer
if err := run(context.Background(), []string{"tui", "--help"}, &stdout, &stderr); err != nil {
t.Fatal(err)
}
if !strings.Contains(stdout.String(), "Usage of tui:") || !strings.Contains(stdout.String(), "-limit") || !strings.Contains(stdout.String(), "right-click") || !strings.Contains(stdout.String(), "# jump") {
t.Fatalf("tui help missing usage:\n%s", stdout.String())
}
if stderr.String() != "" {
t.Fatalf("unexpected stderr:\n%s", stderr.String())
}
}
func TestExportDatabaseAllWritesFilesAndIndex(t *testing.T) {
ctx := context.Background()
dir := t.TempDir()
dbPath := filepath.Join(dir, "notcrawl.db")
st, err := store.Open(dbPath)
if err != nil {
t.Fatal(err)
}
now := store.NowMS()
for _, collection := range []store.Collection{
{ID: "db1", Name: "Roadmap", Source: "test", SyncedAt: now, SchemaJSON: `{"Name":{"type":"title"}}`},
{ID: "db2", Name: "Launch 🚀 Plan ✅", Source: "test", SyncedAt: now, SchemaJSON: `{"Task":{"type":"title"}}`},
} {
if err := st.UpsertCollection(ctx, collection); err != nil {
t.Fatal(err)
}
}
if err := st.UpsertPage(ctx, store.Page{
ID: "page1", CollectionID: "db1", Title: "Ship", URL: "https://example.com/ship", Alive: true, Source: "test", SyncedAt: now,
PropertiesJSON: `{"Name":{"type":"title","title":[{"plain_text":"Ship"}]}}`,
}); err != nil {
t.Fatal(err)
}
if err := st.Close(); err != nil {
t.Fatal(err)
}
outDir := filepath.Join(dir, "csv")
var stdout, stderr bytes.Buffer
err = run(ctx, []string{"--config", filepath.Join(dir, "missing.toml"), "--db", dbPath, "export-db", "--all", "--dir", outDir}, &stdout, &stderr)
if err != nil {
t.Fatalf("export-db --all failed: %v\nstderr:\n%s", err, stderr.String())
}
if got := stdout.String(); !strings.Contains(got, "exported 2 databases and 1 rows") {
t.Fatalf("unexpected stdout: %s", got)
}
for _, name := range []string{"roadmap-db1.csv", "launch-plan-db2.csv", "index.tsv"} {
if _, err := os.Stat(filepath.Join(outDir, name)); err != nil {
t.Fatalf("missing %s: %v", name, err)
}
}
index, err := os.ReadFile(filepath.Join(outDir, "index.tsv"))
if err != nil {
t.Fatal(err)
}
for _, want := range []string{"id\tname\tsource\trows\tcolumns\tfile", "db1\tRoadmap\ttest\t1\t4\troadmap-db1.csv"} {
if !strings.Contains(string(index), want) {
t.Fatalf("index missing %q:\n%s", want, index)
}
}
}

View File

@ -10,19 +10,6 @@ go test ./...
go build ./cmd/notcrawl
```
Also smoke the crawlkit control and non-interactive TUI surfaces before a tag:
```bash
notcrawl metadata --json
notcrawl status --json
notcrawl doctor --json
notcrawl tui --json --limit 10
```
The CI workflow runs the same control-surface smoke checks, plus dependency
verification, `gofmt`, `go vet`, tests, a GoReleaser snapshot build, and
CodeQL.
If GoReleaser is installed:
```bash

34
go.mod
View File

@ -1,32 +1,10 @@
module github.com/vincentkoc/notcrawl
go 1.26.2
require modernc.org/sqlite v1.50.0
go 1.26.0
require (
github.com/aymanbagabas/go-osc52/v2 v2.0.1 // indirect
github.com/charmbracelet/bubbles v1.0.0 // indirect
github.com/charmbracelet/bubbletea v1.3.10 // indirect
github.com/charmbracelet/colorprofile v0.4.1 // indirect
github.com/charmbracelet/lipgloss v1.1.0 // indirect
github.com/charmbracelet/x/ansi v0.11.6 // indirect
github.com/charmbracelet/x/cellbuf v0.0.15 // indirect
github.com/charmbracelet/x/term v0.2.2 // indirect
github.com/clipperhouse/displaywidth v0.9.0 // indirect
github.com/clipperhouse/stringish v0.1.1 // indirect
github.com/clipperhouse/uax29/v2 v2.5.0 // indirect
github.com/erikgeiser/coninput v0.0.0-20211004153227-1c3628e74d0f // indirect
github.com/lucasb-eyer/go-colorful v1.3.0 // indirect
github.com/mattn/go-localereader v0.0.1 // indirect
github.com/mattn/go-runewidth v0.0.19 // indirect
github.com/muesli/ansi v0.0.0-20230316100256-276c6243b2f6 // indirect
github.com/muesli/cancelreader v0.2.2 // indirect
github.com/muesli/termenv v0.16.0 // indirect
github.com/pelletier/go-toml/v2 v2.3.0 // indirect
github.com/rivo/uniseg v0.4.7 // indirect
github.com/xo/terminfo v0.0.0-20220910002029-abceb7e1c41e // indirect
golang.org/x/text v0.3.8 // indirect
github.com/pelletier/go-toml/v2 v2.2.4
modernc.org/sqlite v1.46.1
)
require (
@ -35,9 +13,9 @@ require (
github.com/mattn/go-isatty v0.0.20 // indirect
github.com/ncruces/go-strftime v1.0.0 // indirect
github.com/remyoudompheng/bigfft v0.0.0-20230129092748-24d4a6f8daec // indirect
github.com/vincentkoc/crawlkit v0.4.1
golang.org/x/sys v0.42.0 // indirect
modernc.org/libc v1.72.0 // indirect
golang.org/x/exp v0.0.0-20251023183803-a4bb9ffd2546 // indirect
golang.org/x/sys v0.37.0 // indirect
modernc.org/libc v1.67.6 // indirect
modernc.org/mathutil v1.7.1 // indirect
modernc.org/memory v1.11.0 // indirect
)

93
go.sum
View File

@ -1,89 +1,44 @@
github.com/aymanbagabas/go-osc52/v2 v2.0.1 h1:HwpRHbFMcZLEVr42D4p7XBqjyuxQH5SMiErDT4WkJ2k=
github.com/aymanbagabas/go-osc52/v2 v2.0.1/go.mod h1:uYgXzlJ7ZpABp8OJ+exZzJJhRNQ2ASbcXHWsFqH8hp8=
github.com/charmbracelet/bubbles v1.0.0 h1:12J8/ak/uCZEMQ6KU7pcfwceyjLlWsDLAxB5fXonfvc=
github.com/charmbracelet/bubbles v1.0.0/go.mod h1:9d/Zd5GdnauMI5ivUIVisuEm3ave1XwXtD1ckyV6r3E=
github.com/charmbracelet/bubbletea v1.3.10 h1:otUDHWMMzQSB0Pkc87rm691KZ3SWa4KUlvF9nRvCICw=
github.com/charmbracelet/bubbletea v1.3.10/go.mod h1:ORQfo0fk8U+po9VaNvnV95UPWA1BitP1E0N6xJPlHr4=
github.com/charmbracelet/colorprofile v0.4.1 h1:a1lO03qTrSIRaK8c3JRxJDZOvhvIeSco3ej+ngLk1kk=
github.com/charmbracelet/colorprofile v0.4.1/go.mod h1:U1d9Dljmdf9DLegaJ0nGZNJvoXAhayhmidOdcBwAvKk=
github.com/charmbracelet/lipgloss v1.1.0 h1:vYXsiLHVkK7fp74RkV7b2kq9+zDLoEU4MZoFqR/noCY=
github.com/charmbracelet/lipgloss v1.1.0/go.mod h1:/6Q8FR2o+kj8rz4Dq0zQc3vYf7X+B0binUUBwA0aL30=
github.com/charmbracelet/x/ansi v0.11.6 h1:GhV21SiDz/45W9AnV2R61xZMRri5NlLnl6CVF7ihZW8=
github.com/charmbracelet/x/ansi v0.11.6/go.mod h1:2JNYLgQUsyqaiLovhU2Rv/pb8r6ydXKS3NIttu3VGZQ=
github.com/charmbracelet/x/cellbuf v0.0.15 h1:ur3pZy0o6z/R7EylET877CBxaiE1Sp1GMxoFPAIztPI=
github.com/charmbracelet/x/cellbuf v0.0.15/go.mod h1:J1YVbR7MUuEGIFPCaaZ96KDl5NoS0DAWkskup+mOY+Q=
github.com/charmbracelet/x/term v0.2.2 h1:xVRT/S2ZcKdhhOuSP4t5cLi5o+JxklsoEObBSgfgZRk=
github.com/charmbracelet/x/term v0.2.2/go.mod h1:kF8CY5RddLWrsgVwpw4kAa6TESp6EB5y3uxGLeCqzAI=
github.com/clipperhouse/displaywidth v0.9.0 h1:Qb4KOhYwRiN3viMv1v/3cTBlz3AcAZX3+y9OLhMtAtA=
github.com/clipperhouse/displaywidth v0.9.0/go.mod h1:aCAAqTlh4GIVkhQnJpbL0T/WfcrJXHcj8C0yjYcjOZA=
github.com/clipperhouse/stringish v0.1.1 h1:+NSqMOr3GR6k1FdRhhnXrLfztGzuG+VuFDfatpWHKCs=
github.com/clipperhouse/stringish v0.1.1/go.mod h1:v/WhFtE1q0ovMta2+m+UbpZ+2/HEXNWYXQgCt4hdOzA=
github.com/clipperhouse/uax29/v2 v2.5.0 h1:x7T0T4eTHDONxFJsL94uKNKPHrclyFI0lm7+w94cO8U=
github.com/clipperhouse/uax29/v2 v2.5.0/go.mod h1:Wn1g7MK6OoeDT0vL+Q0SQLDz/KpfsVRgg6W7ihQeh4g=
github.com/dustin/go-humanize v1.0.1 h1:GzkhY7T5VNhEkwH0PVJgjz+fX1rhBrR7pRT3mDkpeCY=
github.com/dustin/go-humanize v1.0.1/go.mod h1:Mu1zIs6XwVuF/gI1OepvI0qD18qycQx+mFykh5fBlto=
github.com/erikgeiser/coninput v0.0.0-20211004153227-1c3628e74d0f h1:Y/CXytFA4m6baUTXGLOoWe4PQhGxaX0KpnayAqC48p4=
github.com/erikgeiser/coninput v0.0.0-20211004153227-1c3628e74d0f/go.mod h1:vw97MGsxSvLiUE2X8qFplwetxpGLQrlU1Q9AUEIzCaM=
github.com/google/pprof v0.0.0-20250317173921-a4b03ec1a45e h1:ijClszYn+mADRFY17kjQEVQ1XRhq2/JR1M3sGqeJoxs=
github.com/google/pprof v0.0.0-20250317173921-a4b03ec1a45e/go.mod h1:boTsfXsheKC2y+lKOCMpSfarhxDeIzfZG1jqGcPl3cA=
github.com/google/uuid v1.6.0 h1:NIvaJDMOsjHA8n1jAhLSgzrAzy1Hgr+hNrb57e+94F0=
github.com/google/uuid v1.6.0/go.mod h1:TIyPZe4MgqvfeYDBFedMoGGpEw/LqOeaOT+nhxU+yHo=
github.com/hashicorp/golang-lru/v2 v2.0.7 h1:a+bsQ5rvGLjzHuww6tVxozPZFVghXaHOwFs4luLUK2k=
github.com/hashicorp/golang-lru/v2 v2.0.7/go.mod h1:QeFd9opnmA6QUJc5vARoKUSoFhyfM2/ZepoAG6RGpeM=
github.com/lucasb-eyer/go-colorful v1.3.0 h1:2/yBRLdWBZKrf7gB40FoiKfAWYQ0lqNcbuQwVHXptag=
github.com/lucasb-eyer/go-colorful v1.3.0/go.mod h1:R4dSotOR9KMtayYi1e77YzuveK+i7ruzyGqttikkLy0=
github.com/mattn/go-isatty v0.0.20 h1:xfD0iDuEKnDkl03q4limB+vH+GxLEtL/jb4xVJSWWEY=
github.com/mattn/go-isatty v0.0.20/go.mod h1:W+V8PltTTMOvKvAeJH7IuucS94S2C6jfK/D7dTCTo3Y=
github.com/mattn/go-localereader v0.0.1 h1:ygSAOl7ZXTx4RdPYinUpg6W99U8jWvWi9Ye2JC/oIi4=
github.com/mattn/go-localereader v0.0.1/go.mod h1:8fBrzywKY7BI3czFoHkuzRoWE9C+EiG4R1k4Cjx5p88=
github.com/mattn/go-runewidth v0.0.19 h1:v++JhqYnZuu5jSKrk9RbgF5v4CGUjqRfBm05byFGLdw=
github.com/mattn/go-runewidth v0.0.19/go.mod h1:XBkDxAl56ILZc9knddidhrOlY5R/pDhgLpndooCuJAs=
github.com/muesli/ansi v0.0.0-20230316100256-276c6243b2f6 h1:ZK8zHtRHOkbHy6Mmr5D264iyp3TiX5OmNcI5cIARiQI=
github.com/muesli/ansi v0.0.0-20230316100256-276c6243b2f6/go.mod h1:CJlz5H+gyd6CUWT45Oy4q24RdLyn7Md9Vj2/ldJBSIo=
github.com/muesli/cancelreader v0.2.2 h1:3I4Kt4BQjOR54NavqnDogx/MIoWBFa0StPA8ELUXHmA=
github.com/muesli/cancelreader v0.2.2/go.mod h1:3XuTXfFS2VjM+HTLZY9Ak0l6eUKfijIfMUZ4EgX0QYo=
github.com/muesli/termenv v0.16.0 h1:S5AlUN9dENB57rsbnkPyfdGuWIlkmzJjbFf0Tf5FWUc=
github.com/muesli/termenv v0.16.0/go.mod h1:ZRfOIKPFDYQoDFF4Olj7/QJbW60Ol/kL1pU3VfY/Cnk=
github.com/ncruces/go-strftime v1.0.0 h1:HMFp8mLCTPp341M/ZnA4qaf7ZlsbTc+miZjCLOFAw7w=
github.com/ncruces/go-strftime v1.0.0/go.mod h1:Fwc5htZGVVkseilnfgOVb9mKy6w1naJmn9CehxcKcls=
github.com/pelletier/go-toml/v2 v2.3.0 h1:k59bC/lIZREW0/iVaQR8nDHxVq8OVlIzYCOJf421CaM=
github.com/pelletier/go-toml/v2 v2.3.0/go.mod h1:2gIqNv+qfxSVS7cM2xJQKtLSTLUE9V8t9Stt+h56mCY=
github.com/pelletier/go-toml/v2 v2.2.4 h1:mye9XuhQ6gvn5h28+VilKrrPoQVanw5PMw/TB0t5Ec4=
github.com/pelletier/go-toml/v2 v2.2.4/go.mod h1:2gIqNv+qfxSVS7cM2xJQKtLSTLUE9V8t9Stt+h56mCY=
github.com/remyoudompheng/bigfft v0.0.0-20230129092748-24d4a6f8daec h1:W09IVJc94icq4NjY3clb7Lk8O1qJ8BdBEF8z0ibU0rE=
github.com/remyoudompheng/bigfft v0.0.0-20230129092748-24d4a6f8daec/go.mod h1:qqbHyh8v60DhA7CoWK5oRCqLrMHRGoxYCSS9EjAz6Eo=
github.com/rivo/uniseg v0.4.7 h1:WUdvkW8uEhrYfLC4ZzdpI2ztxP1I582+49Oc5Mq64VQ=
github.com/rivo/uniseg v0.4.7/go.mod h1:FN3SvrM+Zdj16jyLfmOkMNblXMcoc8DfTHruCPUcx88=
github.com/vincentkoc/crawlkit v0.4.1 h1:qDUF+Kk7nqADmpGMcnWTHEQMiX3bSD2DdFywKyT3kWs=
github.com/vincentkoc/crawlkit v0.4.1/go.mod h1:/ioLA/tyZ/927kAOGg0M8Mrqk7pnTZLpCKWfpul9zoE=
github.com/xo/terminfo v0.0.0-20220910002029-abceb7e1c41e h1:JVG44RsyaB9T2KIHavMF/ppJZNG9ZpyihvCd0w101no=
github.com/xo/terminfo v0.0.0-20220910002029-abceb7e1c41e/go.mod h1:RbqR21r5mrJuqunuUZ/Dhy/avygyECGrLceyNeo4LiM=
golang.org/x/exp v0.0.0-20231006140011-7918f672742d h1:jtJma62tbqLibJ5sFQz8bKtEM8rJBtfilJ2qTU199MI=
golang.org/x/exp v0.0.0-20231006140011-7918f672742d/go.mod h1:ldy0pHrwJyGW56pPQzzkH36rKxoZW1tw7ZJpeKx+hdo=
golang.org/x/mod v0.33.0 h1:tHFzIWbBifEmbwtGz65eaWyGiGZatSrT9prnU8DbVL8=
golang.org/x/mod v0.33.0/go.mod h1:swjeQEj+6r7fODbD2cqrnje9PnziFuw4bmLbBZFrQ5w=
golang.org/x/sync v0.20.0 h1:e0PTpb7pjO8GAtTs2dQ6jYa5BWYlMuX047Dco/pItO4=
golang.org/x/sync v0.20.0/go.mod h1:9xrNwdLfx4jkKbNva9FpL6vEN7evnE43NNNJQ2LF3+0=
golang.org/x/sys v0.0.0-20210809222454-d867a43fc93e/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/exp v0.0.0-20251023183803-a4bb9ffd2546 h1:mgKeJMpvi0yx/sU5GsxQ7p6s2wtOnGAHZWCHUM4KGzY=
golang.org/x/exp v0.0.0-20251023183803-a4bb9ffd2546/go.mod h1:j/pmGrbnkbPtQfxEe5D0VQhZC6qKbfKifgD0oM7sR70=
golang.org/x/mod v0.29.0 h1:HV8lRxZC4l2cr3Zq1LvtOsi/ThTgWnUk/y64QSs8GwA=
golang.org/x/mod v0.29.0/go.mod h1:NyhrlYXJ2H4eJiRy/WDBO6HMqZQ6q9nk4JzS3NuCK+w=
golang.org/x/sync v0.17.0 h1:l60nONMj9l5drqw6jlhIELNv9I0A4OFgRsG9k2oT9Ug=
golang.org/x/sync v0.17.0/go.mod h1:9KTHXmSnoGruLpwFjVSX0lNNA75CykiMECbovNTZqGI=
golang.org/x/sys v0.6.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.42.0 h1:omrd2nAlyT5ESRdCLYdm3+fMfNFE/+Rf4bDIQImRJeo=
golang.org/x/sys v0.42.0/go.mod h1:4GL1E5IUh+htKOUEOaiffhrAeqysfVGipDYzABqnCmw=
golang.org/x/text v0.3.8 h1:nAL+RVCQ9uMn3vJZbV+MRnydTJFPf8qqY42YiA6MrqY=
golang.org/x/text v0.3.8/go.mod h1:E6s5w1FMmriuDzIBO73fBruAKo1PCIq6d2Q6DHfQ8WQ=
golang.org/x/tools v0.42.0 h1:uNgphsn75Tdz5Ji2q36v/nsFSfR/9BRFvqhGBaJGd5k=
golang.org/x/tools v0.42.0/go.mod h1:Ma6lCIwGZvHK6XtgbswSoWroEkhugApmsXyrUmBhfr0=
modernc.org/cc/v4 v4.27.3 h1:uNCgn37E5U09mTv1XgskEVUJ8ADKpmFMPxzGJ0TSo+U=
modernc.org/cc/v4 v4.27.3/go.mod h1:3YjcbCqhoTTHPycJDRl2WZKKFj0nwcOIPBfEZK0Hdk8=
modernc.org/ccgo/v4 v4.32.4 h1:L5OB8rpEX4ZsXEQwGozRfJyJSFHbbNVOoQ59DU9/KuU=
modernc.org/ccgo/v4 v4.32.4/go.mod h1:lY7f+fiTDHfcv6YlRgSkxYfhs+UvOEEzj49jAn2TOx0=
modernc.org/fileutil v1.4.0 h1:j6ZzNTftVS054gi281TyLjHPp6CPHr2KCxEXjEbD6SM=
modernc.org/fileutil v1.4.0/go.mod h1:EqdKFDxiByqxLk8ozOxObDSfcVOv/54xDs/DUHdvCUU=
golang.org/x/sys v0.37.0 h1:fdNQudmxPjkdUTPnLn5mdQv7Zwvbvpaxqs831goi9kQ=
golang.org/x/sys v0.37.0/go.mod h1:OgkHotnGiDImocRcuBABYBEXf8A9a87e/uXjp9XT3ks=
golang.org/x/tools v0.38.0 h1:Hx2Xv8hISq8Lm16jvBZ2VQf+RLmbd7wVUsALibYI/IQ=
golang.org/x/tools v0.38.0/go.mod h1:yEsQ/d/YK8cjh0L6rZlY8tgtlKiBNTL14pGDJPJpYQs=
modernc.org/cc/v4 v4.27.1 h1:9W30zRlYrefrDV2JE2O8VDtJ1yPGownxciz5rrbQZis=
modernc.org/cc/v4 v4.27.1/go.mod h1:uVtb5OGqUKpoLWhqwNQo/8LwvoiEBLvZXIQ/SmO6mL0=
modernc.org/ccgo/v4 v4.30.1 h1:4r4U1J6Fhj98NKfSjnPUN7Ze2c6MnAdL0hWw6+LrJpc=
modernc.org/ccgo/v4 v4.30.1/go.mod h1:bIOeI1JL54Utlxn+LwrFyjCx2n2RDiYEaJVSrgdrRfM=
modernc.org/fileutil v1.3.40 h1:ZGMswMNc9JOCrcrakF1HrvmergNLAmxOPjizirpfqBA=
modernc.org/fileutil v1.3.40/go.mod h1:HxmghZSZVAz/LXcMNwZPA/DRrQZEVP9VX0V4LQGQFOc=
modernc.org/gc/v2 v2.6.5 h1:nyqdV8q46KvTpZlsw66kWqwXRHdjIlJOhG6kxiV/9xI=
modernc.org/gc/v2 v2.6.5/go.mod h1:YgIahr1ypgfe7chRuJi2gD7DBQiKSLMPgBQe9oIiito=
modernc.org/gc/v3 v3.1.2 h1:ZtDCnhonXSZexk/AYsegNRV1lJGgaNZJuKjJSWKyEqo=
modernc.org/gc/v3 v3.1.2/go.mod h1:HFK/6AGESC7Ex+EZJhJ2Gni6cTaYpSMmU/cT9RmlfYY=
modernc.org/gc/v3 v3.1.1 h1:k8T3gkXWY9sEiytKhcgyiZ2L0DTyCQ/nvX+LoCljoRE=
modernc.org/gc/v3 v3.1.1/go.mod h1:HFK/6AGESC7Ex+EZJhJ2Gni6cTaYpSMmU/cT9RmlfYY=
modernc.org/goabi0 v0.2.0 h1:HvEowk7LxcPd0eq6mVOAEMai46V+i7Jrj13t4AzuNks=
modernc.org/goabi0 v0.2.0/go.mod h1:CEFRnnJhKvWT1c1JTI3Avm+tgOWbkOu5oPA8eH8LnMI=
modernc.org/libc v1.72.0 h1:IEu559v9a0XWjw0DPoVKtXpO2qt5NVLAnFaBbjq+n8c=
modernc.org/libc v1.72.0/go.mod h1:tTU8DL8A+XLVkEY3x5E/tO7s2Q/q42EtnNWda/L5QhQ=
modernc.org/libc v1.67.6 h1:eVOQvpModVLKOdT+LvBPjdQqfrZq+pC39BygcT+E7OI=
modernc.org/libc v1.67.6/go.mod h1:JAhxUVlolfYDErnwiqaLvUqc8nfb2r6S6slAgZOnaiE=
modernc.org/mathutil v1.7.1 h1:GCZVGXdaN8gTqB1Mf/usp1Y/hSqgI2vAGGP4jZMCxOU=
modernc.org/mathutil v1.7.1/go.mod h1:4p5IwJITfppl0G4sUEDtCr4DthTaT47/N3aT6MhfgJg=
modernc.org/memory v1.11.0 h1:o4QC8aMQzmcwCK3t3Ux/ZHmwFPzE6hf2Y5LbkRs+hbI=
@ -92,8 +47,8 @@ modernc.org/opt v0.1.4 h1:2kNGMRiUjrp4LcaPuLY2PzUfqM/w9N23quVwhKt5Qm8=
modernc.org/opt v0.1.4/go.mod h1:03fq9lsNfvkYSfxrfUhZCWPk1lm4cq4N+Bh//bEtgns=
modernc.org/sortutil v1.2.1 h1:+xyoGf15mM3NMlPDnFqrteY07klSFxLElE2PVuWIJ7w=
modernc.org/sortutil v1.2.1/go.mod h1:7ZI3a3REbai7gzCLcotuw9AC4VZVpYMjDzETGsSMqJE=
modernc.org/sqlite v1.50.0 h1:eMowQSWLK0MeiQTdmz3lqoF5dqclujdlIKeJA11+7oM=
modernc.org/sqlite v1.50.0/go.mod h1:m0w8xhwYUVY3H6pSDwc3gkJ/irZT/0YEXwBlhaxQEew=
modernc.org/sqlite v1.46.1 h1:eFJ2ShBLIEnUWlLy12raN0Z1plqmFX9Qe3rjQTKt6sU=
modernc.org/sqlite v1.46.1/go.mod h1:CzbrU2lSB1DKUusvwGz7rqEKIq+NUd8GWuBBZDs9/nA=
modernc.org/strutil v1.2.1 h1:UneZBkQA+DX2Rp35KcM69cSsNES9ly8mQWD71HKlOA0=
modernc.org/strutil v1.2.1/go.mod h1:EHkiggD70koQxjVdSBM3JKM7k6L0FbGE5eymy9i3B9A=
modernc.org/token v1.1.0 h1:Xl7Ap9dKaEs5kLoOQeQmPWevfnk/DM5qcLcYlA8ys6Y=

View File

@ -8,7 +8,7 @@ import (
"strings"
"time"
crawlconfig "github.com/vincentkoc/crawlkit/config"
"github.com/pelletier/go-toml/v2"
)
const (
@ -49,22 +49,12 @@ type ShareConfig struct {
StaleAfter string `toml:"stale_after"`
}
var appConfig = crawlconfig.App{Name: "notcrawl", BaseDir: "~/" + defaultDirName, LegacyBaseDir: "~/" + defaultDirName}
func Default() Config {
paths, err := appConfig.DefaultPaths()
if err != nil {
base := filepath.ToSlash(filepath.Join("~", defaultDirName))
paths = crawlconfig.Paths{
DBPath: filepath.ToSlash(filepath.Join(base, "notcrawl.db")),
CacheDir: filepath.ToSlash(filepath.Join(base, "cache")),
ShareDir: filepath.ToSlash(filepath.Join(base, "share")),
}
}
base := filepath.ToSlash(filepath.Join("~", defaultDirName))
return Config{
DBPath: filepath.ToSlash(paths.DBPath),
CacheDir: filepath.ToSlash(paths.CacheDir),
MarkdownDir: filepath.ToSlash(filepath.Join(paths.BaseDir, "pages")),
DBPath: filepath.ToSlash(filepath.Join(base, "notcrawl.db")),
CacheDir: filepath.ToSlash(filepath.Join(base, "cache")),
MarkdownDir: filepath.ToSlash(filepath.Join(base, "pages")),
Notion: NotionConfig{
Desktop: DesktopConfig{Enabled: true, Path: ""},
API: APIConfig{
@ -76,15 +66,18 @@ func Default() Config {
},
Share: ShareConfig{
Branch: "main",
RepoPath: filepath.ToSlash(paths.ShareDir),
RepoPath: filepath.ToSlash(filepath.Join(base, "share")),
StaleAfter: "1h",
},
}
}
func DefaultPath() (string, error) {
paths, err := appConfig.DefaultPaths()
return paths.ConfigPath, err
home, err := os.UserHomeDir()
if err != nil {
return "", err
}
return filepath.Join(home, defaultDirName, "config.toml"), nil
}
func Load(path string) (Config, error) {
@ -100,7 +93,8 @@ func Load(path string) (Config, error) {
return Config{}, err
}
cfg := Default()
if err := crawlconfig.LoadTOML(path, &cfg); err != nil {
b, err := os.ReadFile(path)
if err != nil {
if errors.Is(err, os.ErrNotExist) {
if err := cfg.Resolve(); err != nil {
return Config{}, err
@ -109,6 +103,9 @@ func Load(path string) (Config, error) {
}
return Config{}, err
}
if err := toml.Unmarshal(b, &cfg); err != nil {
return Config{}, fmt.Errorf("parse config: %w", err)
}
if err := cfg.Resolve(); err != nil {
return Config{}, err
}
@ -136,7 +133,11 @@ func WriteStarter(path string) (string, error) {
return "", err
}
cfg := Default()
return path, crawlconfig.WriteTOML(path, cfg, 0o600)
b, err := toml.Marshal(cfg)
if err != nil {
return "", err
}
return path, os.WriteFile(path, b, 0o600)
}
func (c *Config) Resolve() error {
@ -176,7 +177,17 @@ func ExpandPath(path string) (string, error) {
if path == "" {
return "", nil
}
return filepath.Abs(crawlconfig.ExpandHome(path))
if path == "~" || strings.HasPrefix(path, "~/") {
home, err := os.UserHomeDir()
if err != nil {
return "", err
}
if path == "~" {
return home, nil
}
return filepath.Join(home, path[2:]), nil
}
return filepath.Abs(path)
}
func (c Config) APIToken() string {

View File

@ -79,7 +79,7 @@ func TestExporterUsesDisplayOrder(t *testing.T) {
}
}
func TestExporterRemovesEmojiFromPathNames(t *testing.T) {
func TestExporterPreservesUnicodePathNames(t *testing.T) {
ctx := context.Background()
st, err := store.Open(filepath.Join(t.TempDir(), "notcrawl.db"))
if err != nil {
@ -99,7 +99,7 @@ func TestExporterRemovesEmojiFromPathNames(t *testing.T) {
if err != nil {
t.Fatal(err)
}
want := filepath.Join(dir, "研究", "計画-q2-page1.md")
want := filepath.Join(dir, "研究-🚀", "計画-✅-q2-page1.md")
if len(s.Files) != 1 || s.Files[0] != want {
t.Fatalf("unexpected export path: %+v, want %s", s.Files, want)
}

View File

@ -17,8 +17,6 @@ import (
const SourceName = "api"
const maxAPIAttempts = 4
type Client struct {
BaseURL string
Version string
@ -450,50 +448,56 @@ func (c Client) ingestComments(ctx context.Context, st *store.Store, pageID, spa
}
func (c Client) do(ctx context.Context, method, path string, body any, out any) error {
var bodyBytes []byte
var reader io.Reader
if body != nil {
b, err := json.Marshal(body)
if err != nil {
return err
}
bodyBytes = b
reader = bytes.NewReader(b)
}
for attempt := 1; attempt <= maxAPIAttempts; attempt++ {
var reader io.Reader
if bodyBytes != nil {
reader = bytes.NewReader(bodyBytes)
}
req, err := http.NewRequestWithContext(ctx, method, strings.TrimRight(c.BaseURL, "/")+path, reader)
if err != nil {
return err
}
req.Header.Set("Authorization", "Bearer "+c.Token)
req.Header.Set("Notion-Version", c.Version)
req.Header.Set("Accept", "application/json")
if body != nil {
req.Header.Set("Content-Type", "application/json")
}
resp, err := c.HTTP.Do(req)
if err != nil {
return err
}
if resp.StatusCode >= 200 && resp.StatusCode < 300 {
defer resp.Body.Close()
return json.NewDecoder(resp.Body).Decode(out)
}
b, _ := io.ReadAll(io.LimitReader(resp.Body, 4096))
resp.Body.Close()
apiErr := apiErrorFromResponse(method, path, resp, b)
if attempt < maxAPIAttempts && shouldRetry(apiErr) {
if err := waitBeforeRetry(ctx, apiErr.RetryAfter); err != nil {
return err
req, err := http.NewRequestWithContext(ctx, method, strings.TrimRight(c.BaseURL, "/")+path, reader)
if err != nil {
return err
}
req.Header.Set("Authorization", "Bearer "+c.Token)
req.Header.Set("Notion-Version", c.Version)
req.Header.Set("Accept", "application/json")
if body != nil {
req.Header.Set("Content-Type", "application/json")
}
resp, err := c.HTTP.Do(req)
if err != nil {
return err
}
defer resp.Body.Close()
if resp.StatusCode == http.StatusTooManyRequests {
if wait, err := time.ParseDuration(resp.Header.Get("Retry-After") + "s"); err == nil && wait > 0 {
timer := time.NewTimer(wait)
select {
case <-ctx.Done():
timer.Stop()
return ctx.Err()
case <-timer.C:
}
continue
return c.do(ctx, method, path, body, out)
}
}
if resp.StatusCode < 200 || resp.StatusCode >= 300 {
b, _ := io.ReadAll(io.LimitReader(resp.Body, 4096))
bodyText := strings.TrimSpace(string(b))
apiErr := notionAPIError{Method: method, Path: path, Status: resp.Status, StatusCode: resp.StatusCode, Body: bodyText}
var payload struct {
Code string `json:"code"`
Message string `json:"message"`
}
if err := json.Unmarshal(b, &payload); err == nil {
apiErr.Code = payload.Code
apiErr.Message = payload.Message
}
return apiErr
}
return nil
return json.NewDecoder(resp.Body).Decode(out)
}
type notionAPIError struct {
@ -504,8 +508,6 @@ type notionAPIError struct {
Code string
Message string
Body string
RetryAfter time.Duration
Retryable bool
}
func (e notionAPIError) Error() string {
@ -515,76 +517,6 @@ func (e notionAPIError) Error() string {
return fmt.Sprintf("notion api %s %s: %s: %s", e.Method, e.Path, e.Status, e.Body)
}
func apiErrorFromResponse(method, path string, resp *http.Response, body []byte) notionAPIError {
bodyText := strings.TrimSpace(string(body))
apiErr := notionAPIError{
Method: method,
Path: path,
Status: resp.Status,
StatusCode: resp.StatusCode,
Body: bodyText,
RetryAfter: retryAfter(resp.Header.Get("Retry-After"), body),
}
var payload struct {
Code string `json:"code"`
Message string `json:"message"`
Retryable bool `json:"retryable"`
RetryAfter float64 `json:"retry_after"`
}
if err := json.Unmarshal(body, &payload); err == nil {
apiErr.Code = payload.Code
apiErr.Message = payload.Message
apiErr.Retryable = payload.Retryable
if payload.RetryAfter > 0 && apiErr.RetryAfter == 0 {
apiErr.RetryAfter = time.Duration(payload.RetryAfter * float64(time.Second))
}
}
return apiErr
}
func shouldRetry(err notionAPIError) bool {
if err.StatusCode == http.StatusTooManyRequests || err.Retryable {
return true
}
return err.StatusCode == http.StatusBadGateway ||
err.StatusCode == http.StatusServiceUnavailable ||
err.StatusCode == http.StatusGatewayTimeout
}
func retryAfter(header string, body []byte) time.Duration {
if header != "" {
if seconds, err := time.ParseDuration(header + "s"); err == nil && seconds > 0 {
return seconds
}
if when, err := http.ParseTime(header); err == nil {
if wait := time.Until(when); wait > 0 {
return wait
}
}
}
var payload struct {
RetryAfter float64 `json:"retry_after"`
}
if err := json.Unmarshal(body, &payload); err == nil && payload.RetryAfter > 0 {
return time.Duration(payload.RetryAfter * float64(time.Second))
}
return 0
}
func waitBeforeRetry(ctx context.Context, wait time.Duration) error {
if wait <= 0 {
return nil
}
timer := time.NewTimer(wait)
defer timer.Stop()
select {
case <-ctx.Done():
return ctx.Err()
case <-timer.C:
return nil
}
}
func isIgnoredCommentError(err error) bool {
apiErr, ok := err.(notionAPIError)
if !ok {

View File

@ -32,7 +32,7 @@ func TestSyncIngestsDatabasesAndRows(t *testing.T) {
"results":[{
"object":"database",
"id":"db1",
"title":[{"type":"text","plain_text":"Roadmap","text":{"content":"Roadmap"}}],
"title":[{"plain_text":"Roadmap"}],
"parent":{"type":"workspace","workspace":true},
"properties":{
"Name":{"id":"title","type":"title","title":{}},
@ -57,7 +57,7 @@ func TestSyncIngestsDatabasesAndRows(t *testing.T) {
"url":"https://notion.so/page1",
"parent":{"type":"database_id","database_id":"db1"},
"properties":{
"Name":{"id":"title","type":"title","title":[{"type":"text","plain_text":"Ship","text":{"content":"Ship"}}]},
"Name":{"id":"title","type":"title","title":[{"plain_text":"Ship"}]},
"Status":{"id":"status","type":"select","select":{"name":"Done"}}
}
}],
@ -93,7 +93,7 @@ func TestSyncIngestsDatabasesAndRows(t *testing.T) {
if err != nil {
t.Fatal(err)
}
if len(rows) != 1 || rows[0].ID != "page1" || rows[0].CollectionID != "db1" || rows[0].Title != "Ship" {
if len(rows) != 1 || rows[0].ID != "page1" || rows[0].CollectionID != "db1" {
t.Fatalf("unexpected rows: %+v", rows)
}
}
@ -122,7 +122,7 @@ func TestSyncIngestsCurrentDataSourcesAndRows(t *testing.T) {
"results":[{
"object":"data_source",
"id":"ds1",
"title":[{"type":"text","plain_text":"Roadmap","text":{"content":"Roadmap"}}],
"title":[{"plain_text":"Roadmap"}],
"parent":{"type":"database_id","database_id":"db1"},
"database_parent":{"type":"page_id","page_id":"page-parent"},
"properties":{
@ -147,7 +147,7 @@ func TestSyncIngestsCurrentDataSourcesAndRows(t *testing.T) {
"url":"https://notion.so/page1",
"parent":{"type":"data_source_id","data_source_id":"ds1"},
"properties":{
"Name":{"id":"title","type":"title","title":[{"type":"text","plain_text":"Ship","text":{"content":"Ship"}}]},
"Name":{"id":"title","type":"title","title":[{"plain_text":"Ship"}]},
"Status":{"id":"status","type":"select","select":{"name":"Done"}}
}
}],
@ -176,14 +176,14 @@ func TestSyncIngestsCurrentDataSourcesAndRows(t *testing.T) {
if err != nil {
t.Fatal(err)
}
if len(collections) != 1 || collections[0].ID != "ds1" || collections[0].ParentID != "db1" || collections[0].Name != "Roadmap" {
if len(collections) != 1 || collections[0].ID != "ds1" || collections[0].ParentID != "db1" {
t.Fatalf("unexpected collections: %+v", collections)
}
rows, err := st.CollectionPages(context.Background(), "ds1")
if err != nil {
t.Fatal(err)
}
if len(rows) != 1 || rows[0].ID != "page1" || rows[0].CollectionID != "ds1" || rows[0].Title != "Ship" {
if len(rows) != 1 || rows[0].ID != "page1" || rows[0].CollectionID != "ds1" {
t.Fatalf("unexpected rows: %+v", rows)
}
}
@ -213,52 +213,3 @@ func TestIngestCommentsSkipsRestrictedResource(t *testing.T) {
t.Fatalf("unexpected comment count: %d", count)
}
}
func TestIngestCommentsRetriesTransientGatewayError(t *testing.T) {
attempts := 0
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "application/json")
if r.URL.Path != "/comments" {
t.Fatalf("unexpected request: %s %s", r.Method, r.URL.String())
}
attempts++
if attempts == 1 {
w.WriteHeader(http.StatusBadGateway)
_, _ = w.Write([]byte(`{"retryable":true,"retry_after":0}`))
return
}
_, _ = w.Write([]byte(`{
"object":"list",
"results":[{
"id":"comment1",
"rich_text":[{"type":"text","plain_text":"Looks good","text":{"content":"Looks good"}}],
"created_by":{"id":"user1"},
"created_time":"2026-01-01T00:00:00Z",
"last_edited_time":"2026-01-01T00:00:00Z"
}],
"has_more":false
}`))
}))
defer server.Close()
st, err := store.Open(filepath.Join(t.TempDir(), "notcrawl.db"))
if err != nil {
t.Fatal(err)
}
defer st.Close()
count, err := (Client{BaseURL: server.URL, Version: "2026-03-11", Token: "secret", HTTP: http.DefaultClient}).ingestComments(context.Background(), st, "page1", "space1")
if err != nil {
t.Fatal(err)
}
if count != 1 || attempts != 2 {
t.Fatalf("unexpected count/attempts: count=%d attempts=%d", count, attempts)
}
comments, err := st.PageComments(context.Background(), "page1")
if err != nil {
t.Fatal(err)
}
if len(comments) != 1 || comments[0].Text != "Looks good" {
t.Fatalf("unexpected comments: %+v", comments)
}
}

View File

@ -8,39 +8,12 @@ import (
"unicode"
)
var (
spaceRE = regexp.MustCompile(`\s+`)
legacyInlineLinkArtifactRE = regexp.MustCompile(`\ba\s+((?:https?://|/)[^\s]+)`)
legacyInlineMarkArtifactRE = regexp.MustCompile(`\s+\b[bius]\b($|[\s,.;:])`)
legacyMentionArtifactRE = regexp.MustCompile(`\bm\s+[0-9a-fA-F]{8}-[0-9a-fA-F-]{8,}(?:\s+[0-9a-fA-F-]{12,})?`)
legacyPageMentionRE = regexp.MustCompile(`(?:‣\s*)?p\s+[0-9a-fA-F]{8}-[0-9a-fA-F-]{8,}(?:\s+[0-9a-fA-F-]{12,})?`)
legacyLinkedMentionRE = regexp.MustCompile(`‣\s+lm\s+`)
legacyBareMentionRE = regexp.MustCompile(`‣\s+[0-9a-fA-F]{8}-[0-9a-fA-F-]{8,}`)
spaceBeforePunctuationRE = regexp.MustCompile(`\s+([,.;:])`)
repeatedCommaRE = regexp.MustCompile(`(?:,\s*){2,}`)
repeatedLinkedPageRE = regexp.MustCompile(`linked page\b(?:,\s*linked page\b)+`)
)
var spaceRE = regexp.MustCompile(`\s+`)
func Normalize(s string) string {
return strings.TrimSpace(spaceRE.ReplaceAllString(s, " "))
}
func CleanLegacyArtifacts(s string) string {
s = legacyInlineLinkArtifactRE.ReplaceAllString(s, "<$1>")
s = legacyInlineMarkArtifactRE.ReplaceAllString(s, "$1")
s = legacyMentionArtifactRE.ReplaceAllString(s, "@mention")
s = legacyPageMentionRE.ReplaceAllString(s, "linked page")
s = legacyLinkedMentionRE.ReplaceAllString(s, "‣ ")
s = legacyBareMentionRE.ReplaceAllString(s, "@mention")
s = Normalize(s)
s = repeatedCommaRE.ReplaceAllString(s, ", ")
s = repeatedLinkedPageRE.ReplaceAllString(s, "linked pages")
s = strings.ReplaceAll(s, "linked pagess", "linked pages")
s = spaceBeforePunctuationRE.ReplaceAllString(s, "$1")
s = strings.ReplaceAll(s, " and, ", ", ")
return Normalize(s)
}
func PlainFromJSON(raw string) string {
if strings.TrimSpace(raw) == "" {
return ""
@ -130,7 +103,7 @@ func Slug(s string) string {
}
func isSlugRune(r rune) bool {
return unicode.IsLetter(r) || unicode.IsNumber(r)
return unicode.IsLetter(r) || unicode.IsNumber(r) || unicode.IsMark(r) || (r > unicode.MaxASCII && unicode.IsSymbol(r)) || r == '\u200d'
}
func isSlugSeparator(r rune) bool {
@ -154,88 +127,23 @@ func walk(v any, parts *[]string) {
*parts = append(*parts, x)
}
case []any:
if text, ok := legacyRichTextPart(x); ok {
*parts = append(*parts, text)
return
}
for _, item := range x {
walk(item, parts)
}
case map[string]any:
if text, ok := normalizedString(x["plain_text"]); ok {
*parts = append(*parts, text)
return
}
if text, ok := richTextContent(x["text"]); ok {
*parts = append(*parts, text)
return
}
if text, ok := normalizedString(x["content"]); ok {
*parts = append(*parts, text)
return
}
for _, key := range []string{"name", "title", "rich_text", "text"} {
for _, key := range []string{"plain_text", "content", "text", "name", "title"} {
if value, ok := x[key]; ok {
walk(value, parts)
}
}
}
}
func legacyRichTextPart(values []any) (string, bool) {
if len(values) == 0 {
return "", false
}
text, ok := normalizedString(values[0])
if !ok {
return "", false
}
if len(values) < 2 {
return text, true
}
if link := legacyAnnotationLink(values[1]); link != "" {
return Normalize(text + " <" + link + ">"), true
}
return text, true
}
func legacyAnnotationLink(value any) string {
values, ok := value.([]any)
if !ok {
return ""
}
for _, item := range values {
annotation, ok := item.([]any)
if !ok || len(annotation) < 2 {
continue
if rt, ok := x["rich_text"]; ok {
walk(rt, parts)
}
code, ok := annotation[0].(string)
if !ok || code != "a" {
continue
if title, ok := x["title"]; ok {
walk(title, parts)
}
if link, ok := normalizedString(annotation[1]); ok {
return link
if text, ok := x["text"].(map[string]any); ok {
walk(text["content"], parts)
}
}
return ""
}
func richTextContent(v any) (string, bool) {
m, ok := v.(map[string]any)
if !ok {
return "", false
}
return normalizedString(m["content"])
}
func normalizedString(v any) (string, bool) {
s, ok := v.(string)
if !ok {
return "", false
}
s = Normalize(s)
if s == "" {
return "", false
}
return s, true
}

View File

@ -9,90 +9,6 @@ func TestTitleFromProperties(t *testing.T) {
}
}
func TestTitleFromPropertiesPrefersNotionRichTextOnce(t *testing.T) {
got := TitleFromProperties(`{
"Name": {
"id": "title",
"type": "title",
"title": [{
"type": "text",
"plain_text": "OpenClaw",
"text": {"content": "OpenClaw"}
}]
}
}`)
if got != "OpenClaw" {
t.Fatalf("got %q", got)
}
}
func TestPlainPrefersNotionRichTextPlainTextOnce(t *testing.T) {
got := Plain([]any{map[string]any{
"type": "text",
"plain_text": "OpenClaw",
"text": map[string]any{
"content": "OpenClaw",
},
}})
if got != "OpenClaw" {
t.Fatalf("got %q", got)
}
}
func TestPlainFallsBackToNotionTextContentOnce(t *testing.T) {
got := Plain([]any{map[string]any{
"type": "text",
"text": map[string]any{
"content": "OpenClaw",
},
}})
if got != "OpenClaw" {
t.Fatalf("got %q", got)
}
}
func TestPlainHandlesLegacyNotionAnnotations(t *testing.T) {
got := PlainFromJSON(`{"title":[["Marketing Customer Reference Rights",[["a","https://example.com/sheet"]]],[" "],["Product Marketing",[["b"]]]]}`)
if got != "Marketing Customer Reference Rights <https://example.com/sheet> Product Marketing" {
t.Fatalf("got %q", got)
}
}
func TestCleanLegacyArtifacts(t *testing.T) {
got := CleanLegacyArtifacts("Option A: b\nMarketing Customer Reference Rights a https://example.com/sheet\nm 35171240-10a3-80ff-95be-001c31559035 It works")
if got != "Option A: Marketing Customer Reference Rights <https://example.com/sheet> @mention It works" {
t.Fatalf("got %q", got)
}
}
func TestCleanLegacyArtifactsRemovesMentionOpcodes(t *testing.T) {
got := CleanLegacyArtifacts("reach out to ‣ 1b1d872b-594c-811a-ad82-0002ea4fc797 and ‣ p 24d71240-10a3-80ae-8bde-d59bf00682c0 00b8cbcf-c520-4790-999a-9c2940263721,,, see ‣ lm Weekly Walk")
if got != "reach out to @mention and linked page, see ‣ Weekly Walk" {
t.Fatalf("got %q", got)
}
}
func TestCleanLegacyArtifactsCompactsRepeatedLinkedPages(t *testing.T) {
got := CleanLegacyArtifacts("ask ‣ p 24d71240-10a3-80ae-8bde-d59bf00682c0 00b8cbcf-c520-4790-999a-9c2940263721, ‣ p 24d71240-10a3-80d3-a3b0-c06884bad333 00b8cbcf-c520-4790-999a-9c2940263721, ‣ p 1de71240-10a3-809a-98f9-ea6f4d8702b3 00b8cbcf-c520-4790-999a-9c2940263721 Add notes")
if got != "ask linked pages Add notes" {
t.Fatalf("got %q", got)
}
}
func TestPlainWalksTitleOnlyOnce(t *testing.T) {
got := Plain(map[string]any{
"title": []any{map[string]any{
"plain_text": "Roadmap",
"text": map[string]any{
"content": "Roadmap",
},
}},
})
if got != "Roadmap" {
t.Fatalf("got %q", got)
}
}
func TestSlug(t *testing.T) {
got := Slug("Launch Plan / Q2")
if got != "launch-plan-q2" {
@ -100,9 +16,9 @@ func TestSlug(t *testing.T) {
}
}
func TestSlugRemovesEmojiPathText(t *testing.T) {
func TestSlugPreservesUnicodePathText(t *testing.T) {
got := Slug("研究 🚀 / 計画 ✅")
if got != "研究-計画" {
if got != "研究-🚀-計画-✅" {
t.Fatalf("got %q", got)
}
}
@ -114,13 +30,6 @@ func TestSlugRemovesUnsafePathText(t *testing.T) {
}
}
func TestSlugRemovesEmojiVariationSelectors(t *testing.T) {
got := Slug("⚠️ Production Incident Guide")
if got != "production-incident-guide" {
t.Fatalf("got %q", got)
}
}
func TestShortIDKeepsEnoughEntropyForDesktopIDs(t *testing.T) {
got := ShortID("24f71240-0000-0000-0000-123456789abc")
if got != "24f71240-56789abc" {

View File

@ -17,7 +17,6 @@ import (
"syscall"
"time"
"github.com/vincentkoc/crawlkit/mirror"
"github.com/vincentkoc/notcrawl/internal/store"
)
@ -118,14 +117,22 @@ func Publish(ctx context.Context, st *store.Store, opts PublishOptions) (Publish
}
s := PublishSummary{Manifest: manifest}
if opts.Commit {
committed, err := commitGenerated(ctx, opts.RepoPath, opts.Message)
if err := runGit(ctx, opts.RepoPath, "add", "manifest.json", "data", "pages"); err != nil {
return s, err
}
dirty, err := hasChanges(ctx, opts.RepoPath)
if err != nil {
return s, err
}
s.Committed = committed
if dirty {
if err := runGit(ctx, opts.RepoPath, "commit", "-m", opts.Message); err != nil {
return s, err
}
s.Committed = true
}
}
if opts.Push {
if err := mirror.Push(ctx, mirror.Options{RepoPath: opts.RepoPath, Remote: opts.Remote, Branch: opts.Branch}); err != nil {
if err := runGit(ctx, opts.RepoPath, "push", "-u", "origin", opts.Branch); err != nil {
return s, err
}
s.Pushed = true
@ -160,17 +167,28 @@ func Subscribe(ctx context.Context, st *store.Store, remote, repoPath, branch st
if branch == "" {
branch = "main"
}
if err := mirror.Pull(ctx, mirror.Options{RepoPath: repoPath, Remote: remote, Branch: branch}); err != nil {
if _, err := os.Stat(filepath.Join(repoPath, ".git")); os.IsNotExist(err) {
if err := os.MkdirAll(filepath.Dir(repoPath), 0o755); err != nil {
return Manifest{}, err
}
if err := run(ctx, "", "git", "clone", "--branch", branch, remote, repoPath); err != nil {
return Manifest{}, err
}
} else if err == nil {
if err := runGit(ctx, repoPath, "pull", "--ff-only", "origin", branch); err != nil {
return Manifest{}, err
}
} else {
return Manifest{}, err
}
return Import(ctx, st, repoPath)
}
func Update(ctx context.Context, st *store.Store, remote, repoPath, branch string) (Manifest, error) {
func Update(ctx context.Context, st *store.Store, repoPath, branch string) (Manifest, error) {
if branch == "" {
branch = "main"
}
if err := pullForUpdate(ctx, repoPath, remote, branch); err != nil {
if err := runGit(ctx, repoPath, "pull", "--ff-only", "origin", branch); err != nil {
return Manifest{}, err
}
return Import(ctx, st, repoPath)
@ -268,72 +286,35 @@ func importTable(ctx context.Context, db *sql.DB, path, table string) error {
}
func ensureRepo(ctx context.Context, repoPath, remote, branch string) error {
if err := mirror.EnsureRepo(ctx, mirror.Options{RepoPath: repoPath, Remote: remote, Branch: branch}); err != nil {
if err := os.MkdirAll(repoPath, 0o755); err != nil {
return err
}
remote = strings.TrimSpace(remote)
if remote == "" {
return nil
}
if err := runGit(ctx, repoPath, "remote", "set-url", "origin", remote); err != nil {
if strings.Contains(err.Error(), "No such remote") {
return runGit(ctx, repoPath, "remote", "add", "origin", remote)
if _, err := os.Stat(filepath.Join(repoPath, ".git")); os.IsNotExist(err) {
if err := runGit(ctx, repoPath, "init", "-b", branch); err != nil {
return err
}
} else if err != nil {
return err
}
if remote != "" {
if err := runGit(ctx, repoPath, "remote", "get-url", "origin"); err != nil {
if err := runGit(ctx, repoPath, "remote", "add", "origin", remote); err != nil {
return err
}
} else if err := runGit(ctx, repoPath, "remote", "set-url", "origin", remote); err != nil {
return err
}
}
return nil
}
func hasChanges(ctx context.Context, repoPath string) (bool, error) {
return mirror.Dirty(ctx, mirror.Options{RepoPath: repoPath})
}
func pullForUpdate(ctx context.Context, repoPath, remote, branch string) error {
if strings.TrimSpace(remote) != "" {
return mirror.Pull(ctx, mirror.Options{RepoPath: repoPath, Remote: remote, Branch: branch})
}
if err := ensureRepo(ctx, repoPath, "", branch); err != nil {
return err
}
return runGit(ctx, repoPath, "pull", "--ff-only", "origin", branch)
}
func commitGenerated(ctx context.Context, repoPath, message string) (bool, error) {
if message == "" {
message = "archive: notcrawl snapshot"
}
if err := runGit(ctx, repoPath, "add", "--", "manifest.json", "data", "pages"); err != nil {
return false, err
}
staged, err := hasStagedGeneratedChanges(ctx, repoPath)
cmd := exec.CommandContext(ctx, "git", "-C", repoPath, "status", "--porcelain")
out, err := cmd.Output()
if err != nil {
return false, err
}
if !staged {
return false, nil
}
if err := runGit(ctx, repoPath,
"-c", "commit.gpgsign=false",
"-c", "user.name=crawlkit",
"-c", "user.email=crawlkit@example.invalid",
"commit", "-m", message, "--", "manifest.json", "data", "pages",
); err != nil {
return false, err
}
return true, nil
}
func hasStagedGeneratedChanges(ctx context.Context, repoPath string) (bool, error) {
cmd := exec.CommandContext(ctx, "git", "-C", repoPath, "diff", "--cached", "--quiet", "--exit-code", "--", "manifest.json", "data", "pages")
out, err := cmd.CombinedOutput()
if err == nil {
return false, nil
}
var exitErr *exec.ExitError
if errors.As(err, &exitErr) && exitErr.ExitCode() == 1 {
return true, nil
}
return false, fmt.Errorf("git diff --cached: %w\n%s", err, strings.TrimSpace(string(out)))
return strings.TrimSpace(string(out)) != "", nil
}
func runGit(ctx context.Context, dir string, args ...string) error {

View File

@ -3,9 +3,7 @@ package share
import (
"context"
"os"
"os/exec"
"path/filepath"
"strings"
"testing"
"github.com/vincentkoc/notcrawl/internal/markdown"
@ -86,153 +84,3 @@ func TestPublishAndImportSnapshot(t *testing.T) {
t.Fatalf("expected imported search result, got %d", len(results))
}
}
func TestEnsureRepoUpdatesExistingOrigin(t *testing.T) {
ctx := context.Background()
repo := filepath.Join(t.TempDir(), "repo")
if err := os.MkdirAll(repo, 0o755); err != nil {
t.Fatal(err)
}
runGitForTest(t, repo, "init")
runGitForTest(t, repo, "remote", "add", "origin", "https://example.invalid/old.git")
const remote = "https://example.invalid/fresh.git"
if err := ensureRepo(ctx, repo, remote, "main"); err != nil {
t.Fatal(err)
}
got := gitOutputForTest(t, repo, "remote", "get-url", "origin")
if strings.TrimSpace(got) != remote {
t.Fatalf("origin = %q", got)
}
}
func TestPublishCommitsOnlyGeneratedSnapshotFiles(t *testing.T) {
ctx := context.Background()
repo := filepath.Join(t.TempDir(), "repo")
if err := os.MkdirAll(repo, 0o755); err != nil {
t.Fatal(err)
}
runGitForTest(t, repo, "init", "-b", "main")
notes := filepath.Join(repo, "notes.txt")
if err := os.WriteFile(notes, []byte("tracked\n"), 0o644); err != nil {
t.Fatal(err)
}
runGitForTest(t, repo, "add", "notes.txt")
runGitForTest(t, repo,
"-c", "commit.gpgsign=false",
"-c", "user.name=test",
"-c", "user.email=test@example.invalid",
"commit", "-m", "seed notes",
)
if err := os.WriteFile(notes, []byte("local edit\n"), 0o644); err != nil {
t.Fatal(err)
}
src, mdDir := snapshotStoreForTest(t, ctx, "Launch", "hello generated")
defer src.Close()
s, err := Publish(ctx, src, PublishOptions{RepoPath: repo, MarkdownDir: mdDir, Commit: true})
if err != nil {
t.Fatal(err)
}
if !s.Committed {
t.Fatal("expected generated snapshot commit")
}
status := gitOutputForTest(t, repo, "status", "--short", "--", "notes.txt")
if !strings.HasPrefix(status, " M notes.txt") {
t.Fatalf("expected unrelated tracked edit to remain unstaged, got %q", status)
}
committed := gitOutputForTest(t, repo, "show", "--name-only", "--format=", "HEAD")
if strings.Contains(committed, "notes.txt") {
t.Fatalf("unexpected unrelated file in snapshot commit:\n%s", committed)
}
}
func TestUpdatePullsExistingOriginWhenRemoteNotConfigured(t *testing.T) {
ctx := context.Background()
dir := t.TempDir()
remote := filepath.Join(dir, "remote.git")
runGitForTest(t, dir, "init", "--bare", remote)
seed := filepath.Join(dir, "seed")
if err := os.MkdirAll(seed, 0o755); err != nil {
t.Fatal(err)
}
runGitForTest(t, seed, "init", "-b", "main")
src, mdDir := snapshotStoreForTest(t, ctx, "Old", "old snapshot")
if _, err := Publish(ctx, src, PublishOptions{RepoPath: seed, MarkdownDir: mdDir, Commit: true}); err != nil {
t.Fatal(err)
}
if err := src.Close(); err != nil {
t.Fatal(err)
}
runGitForTest(t, seed, "remote", "add", "origin", remote)
runGitForTest(t, seed, "push", "-u", "origin", "main")
local := filepath.Join(dir, "local")
runGitForTest(t, dir, "clone", remote, local)
fresh, freshMD := snapshotStoreForTest(t, ctx, "Fresh", "fresh snapshot")
if _, err := Publish(ctx, fresh, PublishOptions{RepoPath: seed, Remote: remote, MarkdownDir: freshMD, Commit: true, Push: true}); err != nil {
t.Fatal(err)
}
if err := fresh.Close(); err != nil {
t.Fatal(err)
}
dst, err := store.Open(filepath.Join(dir, "dst.db"))
if err != nil {
t.Fatal(err)
}
defer dst.Close()
if _, err := Update(ctx, dst, "", local, "main"); err != nil {
t.Fatal(err)
}
results, err := dst.Search(ctx, "fresh", 10)
if err != nil {
t.Fatal(err)
}
if len(results) != 1 || results[0].Title != "Fresh" {
t.Fatalf("expected fresh pulled snapshot, got %#v", results)
}
}
func snapshotStoreForTest(t *testing.T, ctx context.Context, title, text string) (*store.Store, string) {
t.Helper()
st, err := store.Open(filepath.Join(t.TempDir(), "snapshot.db"))
if err != nil {
t.Fatal(err)
}
now := store.NowMS()
if err := st.UpsertPage(ctx, store.Page{ID: "page1", Title: title, Alive: true, Source: "test", SyncedAt: now}); err != nil {
t.Fatal(err)
}
if err := st.UpsertBlock(ctx, store.Block{ID: "block1", PageID: "page1", ParentID: "page1", Type: "text", Text: text, Alive: true, Source: "test", SyncedAt: now}); err != nil {
t.Fatal(err)
}
mdDir := t.TempDir()
if _, err := (markdown.Exporter{Store: st, Dir: mdDir}).Export(ctx); err != nil {
t.Fatal(err)
}
return st, mdDir
}
func runGitForTest(t *testing.T, dir string, args ...string) {
t.Helper()
cmd := exec.Command("git", args...)
cmd.Dir = dir
if out, err := cmd.CombinedOutput(); err != nil {
t.Fatalf("git %s: %v\n%s", strings.Join(args, " "), err, strings.TrimSpace(string(out)))
}
}
func gitOutputForTest(t *testing.T, dir string, args ...string) string {
t.Helper()
cmd := exec.Command("git", args...)
cmd.Dir = dir
out, err := cmd.CombinedOutput()
if err != nil {
t.Fatalf("git %s: %v\n%s", strings.Join(args, " "), err, strings.TrimSpace(string(out)))
}
return string(out)
}

View File

@ -56,10 +56,7 @@ func (s *Store) Collection(ctx context.Context, id string) (Collection, error) {
func (s *Store) CollectionPages(ctx context.Context, collectionID string) ([]Page, error) {
rows, err := s.queryContext(ctx, `select id, space_id, parent_id, parent_table, collection_id, title, url, icon, cover,
properties_json, created_time, last_edited_time, alive, source, raw_json, synced_at
from pages
where alive = 1
and (collection_id = ? or (parent_id = ? and parent_table in ('collection', 'database', 'data_source')))
order by coalesce(last_edited_time, 0) desc, title`, collectionID, collectionID)
from pages where collection_id = ? and alive = 1 order by coalesce(last_edited_time, 0) desc, title`, collectionID)
if err != nil {
return nil, err
}
@ -97,49 +94,7 @@ func (s *Store) PageBlocks(ctx context.Context, pageID string) ([]Block, error)
b.Alive = IntBool(alive)
blocks = append(blocks, b)
}
if err := rows.Err(); err != nil {
return nil, err
}
return pageBlocksDisplayOrder(pageID, blocks), nil
}
func pageBlocksDisplayOrder(pageID string, blocks []Block) []Block {
children := map[string][]Block{}
for _, block := range blocks {
if block.ID == pageID {
continue
}
children[block.ParentID] = append(children[block.ParentID], block)
}
for parent := range children {
sortBlockSiblings(children[parent])
}
ordered := make([]Block, 0, len(blocks))
seen := map[string]struct{}{}
var appendChildren func(string)
appendChildren = func(parentID string) {
for _, block := range children[parentID] {
if _, ok := seen[block.ID]; ok {
continue
}
seen[block.ID] = struct{}{}
ordered = append(ordered, block)
appendChildren(block.ID)
}
}
appendChildren(pageID)
if len(ordered) == 0 {
return blocks
}
for _, block := range blocks {
if _, ok := seen[block.ID]; ok || block.ID == pageID {
continue
}
seen[block.ID] = struct{}{}
ordered = append(ordered, block)
}
return ordered
return blocks, rows.Err()
}
func (s *Store) PageComments(ctx context.Context, pageID string) ([]Comment, error) {
@ -164,40 +119,6 @@ func (s *Store) PageComments(ctx context.Context, pageID string) ([]Comment, err
return comments, rows.Err()
}
func (s *Store) UserNames(ctx context.Context) (map[string]string, error) {
rows, err := s.queryContext(ctx, `select id, coalesce(nullif(name, ''), nullif(email, ''), id) from users`)
if err != nil {
return nil, err
}
defer rows.Close()
out := map[string]string{}
for rows.Next() {
var id, name string
if err := rows.Scan(&id, &name); err != nil {
return nil, err
}
out[id] = name
}
return out, rows.Err()
}
func (s *Store) PageTitles(ctx context.Context) (map[string]string, error) {
rows, err := s.queryContext(ctx, `select id, coalesce(nullif(title, ''), id) from pages where alive = 1`)
if err != nil {
return nil, err
}
defer rows.Close()
out := map[string]string{}
for rows.Next() {
var id, title string
if err := rows.Scan(&id, &title); err != nil {
return nil, err
}
out[id] = title
}
return out, rows.Err()
}
func (s *Store) SpaceNames(ctx context.Context) (map[string]string, error) {
rows, err := s.queryContext(ctx, `select id, name from spaces`)
if err != nil {

View File

@ -6,11 +6,12 @@ import (
"errors"
"fmt"
"os"
"path/filepath"
"sort"
"strings"
"time"
crawlstore "github.com/vincentkoc/crawlkit/store"
_ "modernc.org/sqlite"
)
const schemaVersion = 1
@ -24,34 +25,64 @@ type Store struct {
}
func Open(path string) (*Store, error) {
base, err := crawlstore.Open(context.Background(), crawlstore.Options{Path: path})
if err := os.MkdirAll(filepath.Dir(path), 0o755); err != nil {
return nil, err
}
if err := ensureDBFile(path); err != nil {
return nil, err
}
db, err := sql.Open("sqlite", sqliteDSN(path))
if err != nil {
return nil, err
}
db := base.DB()
db.SetMaxOpenConns(1)
db.SetMaxIdleConns(1)
if err := db.PingContext(context.Background()); err != nil {
_ = base.Close()
_ = db.Close()
return nil, err
}
st := &Store{db: db, path: path}
if err := st.init(context.Background()); err != nil {
_ = base.Close()
_ = db.Close()
return nil, err
}
return st, nil
}
func OpenReadOnly(path string) (*Store, error) {
base, err := crawlstore.OpenReadOnly(context.Background(), path)
if err != nil {
return nil, err
func sqliteDSN(path string) string {
pragmas := "_pragma=foreign_keys(1)&_pragma=journal_mode(WAL)&_pragma=synchronous(NORMAL)&_pragma=temp_store(MEMORY)&_pragma=mmap_size(268435456)&_pragma=busy_timeout(5000)"
if path == ":memory:" {
return "file::memory:?cache=shared&" + pragmas
}
db := base.DB()
if err := db.PingContext(context.Background()); err != nil {
_ = base.Close()
return nil, err
if strings.HasPrefix(path, "file:") {
sep := "?"
if strings.Contains(path, "?") {
sep = "&"
}
return path + sep + pragmas
}
return &Store{db: db, path: path}, nil
return "file:" + path + "?" + pragmas
}
func ensureDBFile(path string) error {
if path == ":memory:" || strings.HasPrefix(path, "file:") {
return nil
}
if _, err := os.Stat(path); err == nil {
return os.Chmod(path, 0o600)
} else if !errors.Is(err, os.ErrNotExist) {
return err
}
file, err := os.OpenFile(path, os.O_CREATE|os.O_EXCL|os.O_WRONLY, 0o600)
if err != nil && !errors.Is(err, os.ErrExist) {
return err
}
if file != nil {
if err := file.Close(); err != nil {
return err
}
}
return nil
}
func (s *Store) DB() *sql.DB {

View File

@ -260,37 +260,6 @@ func TestStoreBuildsPageFTSInDisplayTreeOrder(t *testing.T) {
}
}
func TestStoreReturnsPageBlocksInDisplayTreeOrder(t *testing.T) {
st, err := Open(filepath.Join(t.TempDir(), "notcrawl.db"))
if err != nil {
t.Fatal(err)
}
defer st.Close()
ctx := context.Background()
now := NowMS()
if err := st.UpsertPage(ctx, Page{ID: "page1", Title: "Recipe", Alive: true, Source: "test", SyncedAt: now}); err != nil {
t.Fatal(err)
}
blocks := []Block{
{ID: "z-root", PageID: "page1", ParentID: "page1", Type: "text", Text: "third", DisplayOrder: 2, CreatedTime: now, Alive: true, Source: "test", SyncedAt: now},
{ID: "a-child", PageID: "page1", ParentID: "a-root", Type: "text", Text: "second", DisplayOrder: 1, CreatedTime: now, Alive: true, Source: "test", SyncedAt: now},
{ID: "a-root", PageID: "page1", ParentID: "page1", Type: "text", Text: "first", DisplayOrder: 1, CreatedTime: now, Alive: true, Source: "test", SyncedAt: now},
}
for _, block := range blocks {
if err := st.UpsertBlock(ctx, block); err != nil {
t.Fatal(err)
}
}
got, err := st.PageBlocks(ctx, "page1")
if err != nil {
t.Fatal(err)
}
if len(got) != 3 || got[0].ID != "a-root" || got[1].ID != "a-child" || got[2].ID != "z-root" {
t.Fatalf("unexpected block tree order: %+v", got)
}
}
func TestStoreResolvesPageTeamThroughCollectionParent(t *testing.T) {
st, err := Open(filepath.Join(t.TempDir(), "notcrawl.db"))
if err != nil {

View File

@ -31,16 +31,6 @@ type Summary struct {
Columns int
}
type exportColumn struct {
Key string
Header string
}
type referenceLabels struct {
Users map[string]string
Pages map[string]string
}
func (e Exporter) Export(ctx context.Context, databaseID string, format Format, w io.Writer) (Summary, error) {
if e.Store == nil {
return Summary{}, fmt.Errorf("missing store")
@ -56,29 +46,21 @@ func (e Exporter) Export(ctx context.Context, databaseID string, format Format,
if err != nil {
return Summary{}, err
}
refs, err := e.referenceLabels(ctx)
if err != nil {
return Summary{}, err
}
columns := columnsFor(collection, pages)
headers := make([]string, 0, len(columns))
for _, col := range columns {
headers = append(headers, col.Header)
}
writer := csv.NewWriter(w)
if format == FormatTSV {
writer.Comma = '\t'
} else if format != "" && format != FormatCSV {
return Summary{}, fmt.Errorf("unsupported format %q", format)
}
if err := writer.Write(headers); err != nil {
if err := writer.Write(columns); err != nil {
return Summary{}, err
}
for _, page := range pages {
props := decodeMap(page.PropertiesJSON)
row := make([]string, 0, len(columns))
for _, col := range columns {
switch col.Key {
switch col {
case "page_id":
row = append(row, page.ID)
case "page_title":
@ -86,7 +68,7 @@ func (e Exporter) Export(ctx context.Context, databaseID string, format Format,
case "url":
row = append(row, page.URL)
default:
row = append(row, propertyValueText(props[col.Key], refs))
row = append(row, propertyValueText(props[col]))
}
}
if err := writer.Write(row); err != nil {
@ -100,95 +82,45 @@ func (e Exporter) Export(ctx context.Context, databaseID string, format Format,
return Summary{Database: collection.ID, Rows: len(pages), Columns: len(columns)}, nil
}
func (e Exporter) referenceLabels(ctx context.Context) (referenceLabels, error) {
users, err := e.Store.UserNames(ctx)
if err != nil {
return referenceLabels{}, err
}
pages, err := e.Store.PageTitles(ctx)
if err != nil {
return referenceLabels{}, err
}
return referenceLabels{Users: users, Pages: pages}, nil
}
func columnsFor(collection store.Collection, pages []store.Page) []exportColumn {
seenKeys := map[string]bool{"page_id": true, "page_title": true, "url": true}
seenHeaders := map[string]bool{"page_id": true, "page_title": true, "url": true}
cols := []exportColumn{
{Key: "page_id", Header: "page_id"},
{Key: "page_title", Header: "page_title"},
{Key: "url", Header: "url"},
}
for _, prop := range schemaProperties(collection.SchemaJSON) {
if !seenKeys[prop.Key] {
seenKeys[prop.Key] = true
prop.Header = uniqueHeader(prop.Header, prop.Key, seenHeaders)
cols = append(cols, prop)
func columnsFor(collection store.Collection, pages []store.Page) []string {
seen := map[string]bool{"page_id": true, "page_title": true, "url": true}
cols := []string{"page_id", "page_title", "url"}
for _, name := range schemaPropertyNames(collection.SchemaJSON) {
if !seen[name] {
seen[name] = true
cols = append(cols, name)
}
}
var extras []exportColumn
var extras []string
for _, page := range pages {
for key := range decodeMap(page.PropertiesJSON) {
if !seenKeys[key] {
seenKeys[key] = true
extras = append(extras, exportColumn{Key: key, Header: key})
for name := range decodeMap(page.PropertiesJSON) {
if !seen[name] {
seen[name] = true
extras = append(extras, name)
}
}
}
sort.Slice(extras, func(i, j int) bool {
return extras[i].Header < extras[j].Header
})
for i := range extras {
extras[i].Header = uniqueHeader(extras[i].Header, extras[i].Key, seenHeaders)
}
sort.Strings(extras)
return append(cols, extras...)
}
func schemaProperties(raw string) []exportColumn {
func schemaPropertyNames(raw string) []string {
props := decodeMap(raw)
var title []exportColumn
var rest []exportColumn
for key, value := range props {
var title []string
var rest []string
for name, value := range props {
m, ok := value.(map[string]any)
header := key
if ok {
if name, ok := m["name"].(string); ok && strings.TrimSpace(name) != "" {
header = name
}
}
prop := exportColumn{Key: key, Header: header}
if ok && m["type"] == "title" {
title = append(title, prop)
title = append(title, name)
continue
}
rest = append(rest, prop)
rest = append(rest, name)
}
sort.Slice(title, func(i, j int) bool {
return title[i].Header < title[j].Header
})
sort.Slice(rest, func(i, j int) bool {
return rest[i].Header < rest[j].Header
})
sort.Strings(title)
sort.Strings(rest)
return append(title, rest...)
}
func uniqueHeader(header, key string, seen map[string]bool) string {
if strings.TrimSpace(header) == "" {
header = key
}
if !seen[header] {
seen[header] = true
return header
}
disambiguated := header + " (" + key + ")"
for i := 2; seen[disambiguated]; i++ {
disambiguated = fmt.Sprintf("%s (%s %d)", header, key, i)
}
seen[disambiguated] = true
return disambiguated
}
func decodeMap(raw string) map[string]any {
out := map[string]any{}
if strings.TrimSpace(raw) == "" {
@ -198,10 +130,7 @@ func decodeMap(raw string) map[string]any {
return out
}
func propertyValueText(v any, refs referenceLabels) string {
if text, ok := desktopValueText(v, refs); ok {
return text
}
func propertyValueText(v any) string {
m, ok := v.(map[string]any)
if !ok {
return notiontext.Plain(v)
@ -232,11 +161,11 @@ func propertyValueText(v any, refs referenceLabels) string {
case "people", "files":
return joinNamed(m[typ])
case "relation":
return joinIDs(m[typ], refs)
return joinIDs(m[typ])
case "formula":
return formulaText(m["formula"], refs)
return formulaText(m["formula"])
case "rollup":
return rollupText(m["rollup"], refs)
return rollupText(m["rollup"])
case "created_by", "last_edited_by":
return namedObject(m[typ])
case "unique_id":
@ -245,111 +174,6 @@ func propertyValueText(v any, refs referenceLabels) string {
return notiontext.Plain(v)
}
func desktopValueText(v any, refs referenceLabels) (string, bool) {
text, ok := desktopPlain(v, refs)
if !ok {
return "", false
}
text = notiontext.Normalize(strings.ReplaceAll(text, " , ", ", "))
return text, true
}
func desktopPlain(v any, refs referenceLabels) (string, bool) {
switch x := v.(type) {
case nil:
return "", true
case string:
if x == "‣" {
return "", true
}
return x, true
case []any:
if len(x) == 0 {
return "", true
}
if marker, ok := x[0].(string); ok {
if marker == "‣" && len(x) > 1 {
return desktopRefListText(x[1], refs), true
}
if marker == "," {
return ",", true
}
if marker != "" {
return marker, true
}
}
parts := make([]string, 0, len(x))
handled := false
for _, item := range x {
text, ok := desktopPlain(item, refs)
if !ok {
return "", false
}
handled = true
if text != "" {
parts = append(parts, text)
}
}
return strings.Join(parts, " "), handled
default:
return "", false
}
}
func desktopRefListText(v any, refs referenceLabels) string {
items, ok := v.([]any)
if !ok {
return notiontext.Plain(v)
}
parts := make([]string, 0, len(items))
for _, item := range items {
if text := desktopRefText(item, refs); text != "" {
parts = append(parts, text)
}
}
return strings.Join(parts, " ")
}
func desktopRefText(v any, refs referenceLabels) string {
item, ok := v.([]any)
if !ok || len(item) == 0 {
return notiontext.Plain(v)
}
typ, _ := item[0].(string)
switch typ {
case ",":
return ","
case "u":
if id, ok := stringAt(item, 1); ok {
return labelOrID(refs.Users, id)
}
case "p":
if id, ok := stringAt(item, 1); ok {
return labelOrID(refs.Pages, id)
}
case "d":
if len(item) > 1 {
return dateText(item[1])
}
}
return notiontext.Plain(v)
}
func stringAt(items []any, index int) (string, bool) {
if index >= len(items) {
return "", false
}
s, ok := items[index].(string)
return s, ok
}
func labelOrID(labels map[string]string, id string) string {
if label := labels[id]; label != "" {
return label
}
return id
}
func namedObject(v any) string {
m, ok := v.(map[string]any)
if !ok {
@ -358,9 +182,6 @@ func namedObject(v any) string {
if name, ok := m["name"].(string); ok {
return name
}
if value, ok := m["value"].(string); ok {
return value
}
if id, ok := m["id"].(string); ok {
return id
}
@ -381,7 +202,7 @@ func joinNamed(v any) string {
return strings.Join(parts, ", ")
}
func joinIDs(v any, refs referenceLabels) string {
func joinIDs(v any) string {
items, ok := v.([]any)
if !ok {
return ""
@ -393,7 +214,7 @@ func joinIDs(v any, refs referenceLabels) string {
continue
}
if id, ok := m["id"].(string); ok {
parts = append(parts, labelOrID(refs.Pages, id))
parts = append(parts, id)
}
}
return strings.Join(parts, ", ")
@ -405,20 +226,14 @@ func dateText(v any) string {
return ""
}
start, _ := m["start"].(string)
if start == "" {
start, _ = m["start_date"].(string)
}
end, _ := m["end"].(string)
if end == "" {
end, _ = m["end_date"].(string)
}
if end != "" {
return start + "/" + end
}
return start
}
func formulaText(v any, refs referenceLabels) string {
func formulaText(v any) string {
m, ok := v.(map[string]any)
if !ok {
return ""
@ -437,13 +252,10 @@ func formulaText(v any, refs referenceLabels) string {
case "date":
return dateText(m["date"])
}
if text, ok := desktopValueText(v, refs); ok {
return text
}
return notiontext.Plain(v)
}
func rollupText(v any, refs referenceLabels) string {
func rollupText(v any) string {
m, ok := v.(map[string]any)
if !ok {
return ""
@ -458,15 +270,12 @@ func rollupText(v any, refs referenceLabels) string {
items, _ := m["array"].([]any)
parts := make([]string, 0, len(items))
for _, item := range items {
if text := propertyValueText(item, refs); text != "" {
if text := propertyValueText(item); text != "" {
parts = append(parts, text)
}
}
return strings.Join(parts, ", ")
}
if text, ok := desktopValueText(v, refs); ok {
return text
}
return notiontext.Plain(v)
}

View File

@ -20,22 +20,13 @@ func TestExportDatabaseTSV(t *testing.T) {
now := store.NowMS()
if err := st.UpsertCollection(ctx, store.Collection{
ID: "db1", Name: "Roadmap", Source: "test", SyncedAt: now,
SchemaJSON: `{"title":{"name":"Name","type":"title"},"assignee_id":{"name":"Assignee","type":"person"},"due_id":{"name":"Due","type":"date"},"status_id":{"name":"Status","type":"select"},"score_id":{"name":"Score","type":"number"}}`,
SchemaJSON: `{"Name":{"type":"title"},"Status":{"type":"select"},"Score":{"type":"number"}}`,
}); err != nil {
t.Fatal(err)
}
if err := st.UpsertUser(ctx, store.User{ID: "user1", Name: "Claire Pena", Source: "test", SyncedAt: now}); err != nil {
t.Fatal(err)
}
if err := st.UpsertPage(ctx, store.Page{
ID: "page1", CollectionID: "db1", Title: "Ship", URL: "https://example.com/ship", Alive: true, Source: "test", SyncedAt: now,
PropertiesJSON: `{"title":{"type":"title","title":[{"plain_text":"Ship"}]},"status_id":{"type":"select","select":{"name":"Done"}},"score_id":{"type":"number","number":7}}`,
}); err != nil {
t.Fatal(err)
}
if err := st.UpsertPage(ctx, store.Page{
ID: "page2", ParentID: "db1", ParentTable: "collection", Title: "Draft", URL: "https://example.com/draft", Alive: true, Source: "test", SyncedAt: now,
PropertiesJSON: `{"title":[["Draft"]],"assignee_id":[["‣",[["u","user1"]]]],"due_id":[["‣",[["d",{"type":"date","start_date":"2025-05-23"}]]]],"status_id":[["In progress"]],"score_id":[["3"]]}`,
PropertiesJSON: `{"Name":{"type":"title","title":[{"plain_text":"Ship"}]},"Status":{"type":"select","select":{"name":"Done"}},"Score":{"type":"number","number":7}}`,
}); err != nil {
t.Fatal(err)
}
@ -44,15 +35,11 @@ func TestExportDatabaseTSV(t *testing.T) {
if err != nil {
t.Fatal(err)
}
if s.Rows != 2 {
t.Fatalf("expected two rows, got %d", s.Rows)
if s.Rows != 1 {
t.Fatalf("expected one row, got %d", s.Rows)
}
got := out.String()
for _, want := range []string{
"page_id\tpage_title\turl\tName\tAssignee\tDue\tScore\tStatus",
"page1\tShip\thttps://example.com/ship\tShip\t\t\t7\tDone",
"page2\tDraft\thttps://example.com/draft\tDraft\tClaire Pena\t2025-05-23\t3\tIn progress",
} {
for _, want := range []string{"page_id\tpage_title\turl\tName\tScore\tStatus", "page1\tShip\thttps://example.com/ship\tShip\t7\tDone"} {
if !strings.Contains(got, want) {
t.Fatalf("missing %q in:\n%s", want, got)
}