From 5cee6499feeda698576d8d9f060d8d060fcc7c5f Mon Sep 17 00:00:00 2001 From: Peter Steinberger Date: Fri, 8 May 2026 16:01:45 +0100 Subject: [PATCH] docs: rewrite site gogcli-style with per-feature pages MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Drop the custom Jekyll layout, CSS, and JS in favor of GitHub Pages' default theme — same approach gogcli.sh uses. Replace the marketing landing page with a plain-markdown overview that mirrors gogcli's "try it / what it does / pick your path" structure. Add one focused page per feature: install, quickstart, visualizations, palettes, decoding, rendering, pipeline (spec), and CLI reference. Verify ffmpeg pipeline (f32le) and decoder coverage against the actual audio package. --- CHANGELOG.md | 1 + docs/_config.yml | 11 - docs/_layouts/default.html | 58 ----- docs/assets/css/site.css | 436 ------------------------------------- docs/assets/js/site.js | 20 -- docs/cli.md | 93 ++++++++ docs/decoding.md | 78 +++++++ docs/index.md | 146 ++++--------- docs/install.md | 68 ++++++ docs/palettes.md | 93 ++++++++ docs/quickstart.md | 74 +++++++ docs/rendering.md | 110 ++++++++++ docs/spec.md | 157 +++++++------ docs/visualizations.md | 120 ++++++++++ 14 files changed, 756 insertions(+), 709 deletions(-) delete mode 100644 docs/_config.yml delete mode 100644 docs/_layouts/default.html delete mode 100644 docs/assets/css/site.css delete mode 100644 docs/assets/js/site.js create mode 100644 docs/cli.md create mode 100644 docs/decoding.md create mode 100644 docs/install.md create mode 100644 docs/palettes.md create mode 100644 docs/quickstart.md create mode 100644 docs/rendering.md create mode 100644 docs/visualizations.md diff --git a/CHANGELOG.md b/CHANGELOG.md index 916196f..313be40 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -3,6 +3,7 @@ ## 0.1.1 - Unreleased - New Clawd style +- Docs: rewritten gogcli-style — plain-markdown pages for install, quickstart, visualizations, palettes, decoding, rendering, pipeline, and CLI; removed custom Jekyll theme so songsee.sh runs on the default GitHub Pages theme ## 0.1.0 - 2026-01-02 diff --git a/docs/_config.yml b/docs/_config.yml deleted file mode 100644 index 3b8b19d..0000000 --- a/docs/_config.yml +++ /dev/null @@ -1,11 +0,0 @@ -title: songsee -description: Generate modern spectrogram images from audio files. -url: "https://songsee.sh" -baseurl: "" -markdown: kramdown -permalink: pretty - -kramdown: - input: GFM - -collections: {} diff --git a/docs/_layouts/default.html b/docs/_layouts/default.html deleted file mode 100644 index 8bec79c..0000000 --- a/docs/_layouts/default.html +++ /dev/null @@ -1,58 +0,0 @@ - - - - - - {% if page.title %}{{ page.title }} | {% endif %}{{ site.title }} - - - - - - - - - - - - - - - - - - -
- {{ content }} -
- - - - - - diff --git a/docs/assets/css/site.css b/docs/assets/css/site.css deleted file mode 100644 index d136577..0000000 --- a/docs/assets/css/site.css +++ /dev/null @@ -1,436 +0,0 @@ -:root { - color-scheme: dark; - --bg: #0d0b14; - --bg-soft: #141022; - --bg-deep: #08070f; - --ink: #f4f2ff; - --muted: #b7b1c8; - --accent: #ffb347; - --accent-2: #2cf6f6; - --accent-3: #ff5da2; - --accent-4: #9d7bff; - --card: rgba(23, 18, 38, 0.72); - --stroke: rgba(255, 255, 255, 0.08); - --shadow: 0 20px 60px rgba(7, 6, 12, 0.65); - --mono: "JetBrains Mono", ui-monospace, SFMono-Regular, Menlo, monospace; - --sans: "Manrope", system-ui, -apple-system, sans-serif; - --display: "Fraunces", "Times New Roman", serif; -} - -* { - box-sizing: border-box; -} - -html { - scroll-behavior: smooth; -} - -body { - margin: 0; - font-family: var(--sans); - color: var(--ink); - background: radial-gradient(1200px 800px at 15% 10%, rgba(157, 123, 255, 0.25), transparent 55%), - radial-gradient(800px 600px at 85% 0%, rgba(44, 246, 246, 0.18), transparent 60%), - radial-gradient(900px 700px at 85% 80%, rgba(255, 179, 71, 0.16), transparent 65%), - linear-gradient(160deg, var(--bg) 0%, var(--bg-soft) 55%, var(--bg-deep) 100%); - min-height: 100vh; - overflow-x: hidden; -} - -a { - color: inherit; - text-decoration: none; -} - -a:hover { - color: var(--accent-2); -} - -.grain { - position: fixed; - inset: 0; - z-index: 0; - pointer-events: none; - opacity: 0.2; - mix-blend-mode: soft-light; - background-image: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' width='140' height='140' viewBox='0 0 140 140'%3E%3Cfilter id='n'%3E%3CfeTurbulence type='fractalNoise' baseFrequency='0.9' numOctaves='2' stitchTiles='stitch'/%3E%3C/filter%3E%3Crect width='140' height='140' filter='url(%23n)' opacity='0.35'/%3E%3C/svg%3E"); -} - -.orb { - position: fixed; - border-radius: 999px; - filter: blur(10px); - opacity: 0.55; - z-index: 0; - animation: drift 18s ease-in-out infinite alternate; -} - -.orb-a { - width: 420px; - height: 420px; - background: radial-gradient(circle at 30% 30%, rgba(255, 93, 162, 0.6), rgba(13, 11, 20, 0)); - top: -120px; - left: -80px; -} - -.orb-b { - width: 520px; - height: 520px; - background: radial-gradient(circle at 60% 40%, rgba(44, 246, 246, 0.4), rgba(13, 11, 20, 0)); - bottom: -220px; - right: -140px; - animation-delay: -6s; -} - -@keyframes drift { - from { - transform: translate3d(0, 0, 0) scale(0.95); - } - to { - transform: translate3d(40px, -20px, 0) scale(1.05); - } -} - -.nav { - position: sticky; - top: 0; - z-index: 10; - backdrop-filter: blur(14px); - background: rgba(10, 9, 16, 0.75); - border-bottom: 1px solid rgba(255, 255, 255, 0.06); -} - -.nav-inner { - max-width: 1100px; - margin: 0 auto; - display: flex; - align-items: center; - justify-content: space-between; - padding: 16px 24px; -} - -.logo { - display: inline-flex; - align-items: center; - gap: 12px; - font-family: var(--display); - font-size: 22px; -} - -.logo-mark { - width: 18px; - height: 18px; - border-radius: 6px; - background: conic-gradient(from 120deg, var(--accent), var(--accent-3), var(--accent-2), var(--accent)); - box-shadow: 0 0 18px rgba(255, 179, 71, 0.35); -} - -.nav-links { - display: flex; - gap: 20px; - font-size: 14px; - color: var(--muted); -} - -.content { - position: relative; - z-index: 1; -} - -.hero { - max-width: 1100px; - margin: 0 auto; - padding: 90px 24px 60px; - display: grid; - grid-template-columns: repeat(12, 1fr); - gap: 24px; -} - -.hero-copy { - grid-column: 1 / span 7; -} - -.hero-title { - font-family: var(--display); - font-size: clamp(42px, 6vw, 82px); - line-height: 0.95; - margin: 0 0 18px; -} - -.hero-sub { - font-size: 18px; - color: var(--muted); - max-width: 520px; - line-height: 1.6; -} - -.hero-actions { - display: flex; - flex-wrap: wrap; - gap: 14px; - margin-top: 26px; -} - -.btn { - padding: 12px 20px; - border-radius: 999px; - border: 1px solid rgba(255, 255, 255, 0.12); - font-weight: 600; - letter-spacing: 0.02em; - transition: transform 0.2s ease, box-shadow 0.2s ease, border 0.2s ease; -} - -.btn.primary { - background: linear-gradient(135deg, rgba(255, 179, 71, 0.95), rgba(255, 93, 162, 0.95)); - color: #130d1d; - box-shadow: 0 12px 30px rgba(255, 124, 99, 0.35); - border: none; -} - -.btn:hover { - transform: translateY(-2px); -} - -.hero-meta { - margin-top: 22px; - font-family: var(--mono); - font-size: 13px; - color: rgba(255, 255, 255, 0.45); -} - -.hero-visual { - grid-column: 8 / span 5; - position: relative; -} - -.spectral-panel { - height: 420px; - border-radius: 24px; - background: radial-gradient(200px 200px at var(--mx, 70%) var(--my, 30%), rgba(44, 246, 246, 0.25), transparent 70%), - linear-gradient(140deg, rgba(24, 16, 44, 0.9), rgba(12, 9, 22, 0.95)); - border: 1px solid var(--stroke); - box-shadow: var(--shadow); - overflow: hidden; - position: relative; -} - -.spectral-panel::before { - content: ""; - position: absolute; - inset: 30px 22px 90px 22px; - border-radius: 18px; - background: linear-gradient(90deg, #0a0a12, #0a0a12 30%, rgba(255, 179, 71, 0.9), rgba(44, 246, 246, 0.9), rgba(157, 123, 255, 0.9)); - filter: saturate(1.4); - animation: sweep 6s ease-in-out infinite alternate; -} - -.spectral-panel::after { - content: ""; - position: absolute; - inset: 0; - background: repeating-linear-gradient(0deg, rgba(255, 255, 255, 0.08), rgba(255, 255, 255, 0.08) 1px, transparent 1px, transparent 6px); - mix-blend-mode: screen; - opacity: 0.35; -} - -.spectral-caption { - position: absolute; - bottom: 22px; - left: 24px; - font-family: var(--mono); - font-size: 12px; - color: rgba(255, 255, 255, 0.6); -} - -@keyframes sweep { - 0% { - transform: translateX(-6%) scaleY(0.95); - } - 100% { - transform: translateX(6%) scaleY(1.02); - } -} - -.section { - max-width: 1100px; - margin: 0 auto; - padding: 30px 24px 70px; -} - -.section-title { - font-family: var(--display); - font-size: clamp(28px, 3vw, 42px); - margin: 0 0 14px; -} - -.section-sub { - color: var(--muted); - max-width: 680px; - line-height: 1.6; -} - -.feature-grid { - display: grid; - grid-template-columns: repeat(auto-fit, minmax(220px, 1fr)); - gap: 18px; - margin-top: 26px; -} - -.card { - background: var(--card); - border: 1px solid var(--stroke); - border-radius: 18px; - padding: 18px; - box-shadow: var(--shadow); -} - -.card h3 { - margin: 0 0 8px; - font-size: 18px; -} - -.card p { - margin: 0; - color: var(--muted); - line-height: 1.5; - font-size: 14px; -} - -.code-block { - background: rgba(8, 8, 14, 0.9); - border-radius: 16px; - padding: 18px 20px; - border: 1px solid rgba(255, 255, 255, 0.08); - font-family: var(--mono); - font-size: 13px; - line-height: 1.6; - overflow-x: auto; -} - -.kicker { - font-family: var(--mono); - letter-spacing: 0.16em; - text-transform: uppercase; - font-size: 11px; - color: rgba(255, 255, 255, 0.45); -} - -.palette-row { - display: grid; - grid-template-columns: repeat(auto-fit, minmax(160px, 1fr)); - gap: 12px; - margin-top: 18px; -} - -.palette { - height: 64px; - border-radius: 14px; - border: 1px solid rgba(255, 255, 255, 0.12); -} - -.palette.classic { - background: linear-gradient(90deg, #000000, #002060, #00a0c8, #ffb400, #ffffff); -} - -.palette.magma { - background: linear-gradient(90deg, #000004, #3b0c57, #b4367a, #fb8c3c, #fcfdbf); -} - -.palette.inferno { - background: linear-gradient(90deg, #000004, #3d0965, #bb3754, #f98e08, #fcffa4); -} - -.palette.viridis { - background: linear-gradient(90deg, #440154, #3a528b, #20908c, #5ec962, #fde725); -} - -.palette.gray { - background: linear-gradient(90deg, #000000, #ffffff); -} - -.domain-note { - margin-top: 18px; - padding: 12px 16px; - border-radius: 12px; - background: rgba(44, 246, 246, 0.08); - border: 1px solid rgba(44, 246, 246, 0.2); - color: rgba(255, 255, 255, 0.75); - font-size: 14px; -} - -.footer { - border-top: 1px solid rgba(255, 255, 255, 0.08); - padding: 30px 24px 40px; - background: rgba(7, 6, 12, 0.65); -} - -.footer-inner { - max-width: 1100px; - margin: 0 auto; - display: flex; - flex-wrap: wrap; - gap: 24px; - justify-content: space-between; - align-items: center; -} - -.footer-title { - font-family: var(--display); - font-size: 20px; -} - -.footer-sub { - color: var(--muted); - font-size: 13px; -} - -.footer-links { - display: flex; - gap: 16px; - font-size: 13px; - color: var(--muted); -} - -.reveal { - opacity: 0; - transform: translateY(16px); - animation: rise 0.8s ease forwards; -} - -.reveal.delay-1 { animation-delay: 0.1s; } -.reveal.delay-2 { animation-delay: 0.2s; } -.reveal.delay-3 { animation-delay: 0.3s; } -.reveal.delay-4 { animation-delay: 0.4s; } - -@keyframes rise { - to { - opacity: 1; - transform: translateY(0); - } -} - -@media (max-width: 900px) { - .hero { - grid-template-columns: 1fr; - } - - .hero-copy, - .hero-visual { - grid-column: auto; - } - - .spectral-panel { - height: 320px; - } - - .nav-links { - display: none; - } -} - -@media (prefers-reduced-motion: reduce) { - * { - animation: none !important; - transition: none !important; - } -} diff --git a/docs/assets/js/site.js b/docs/assets/js/site.js deleted file mode 100644 index f71598f..0000000 --- a/docs/assets/js/site.js +++ /dev/null @@ -1,20 +0,0 @@ -(() => { - const panels = document.querySelectorAll('.spectral-panel'); - if (!panels.length) return; - - const update = (panel, event) => { - const rect = panel.getBoundingClientRect(); - const x = Math.min(Math.max((event.clientX - rect.left) / rect.width, 0), 1); - const y = Math.min(Math.max((event.clientY - rect.top) / rect.height, 0), 1); - panel.style.setProperty('--mx', `${(x * 100).toFixed(1)}%`); - panel.style.setProperty('--my', `${(y * 100).toFixed(1)}%`); - }; - - panels.forEach((panel) => { - panel.addEventListener('pointermove', (event) => update(panel, event)); - panel.addEventListener('pointerleave', () => { - panel.style.removeProperty('--mx'); - panel.style.removeProperty('--my'); - }); - }); -})(); diff --git a/docs/cli.md b/docs/cli.md new file mode 100644 index 0000000..474eca6 --- /dev/null +++ b/docs/cli.md @@ -0,0 +1,93 @@ +--- +title: CLI +description: "Every songsee flag with its default and accepted values." +--- + +# CLI + +```text +songsee [flags] +``` + +`` is a file path or `-` for stdin. + +## Inputs and output + +| flag | type | default | description | +|------|------|---------|-------------| +| `` | string (positional) | required | File path, or `-` to read encoded audio from stdin. | +| `-o`, `--output` | string | input name + extension | Output path. `-` writes the encoded image to stdout. | +| `--format` | `jpg` \| `png` | `jpg` | Output encoder. JPEG is quality 95; PNG is lossless. | +| `--width` | int | `1920` | Output width in pixels. | +| `--height` | int | `1080` | Output height in pixels. | +| `-q`, `--quiet` | bool | `false` | Suppress the stdout output-path echo. | +| `-v`, `--verbose` | bool | `false` | Print decode and slice info to stderr. | +| `--version` | bool | — | Print version and exit. | + +## FFT and windowing + +| flag | type | default | description | +|------|------|---------|-------------| +| `--window` | int | `2048` | FFT window size in samples. Must be a power of two. | +| `--hop` | int | `512` | Hop size in samples between frames. | +| `--min-freq` | float (Hz) | `0` | Lower bound of the visible frequency band. | +| `--max-freq` | float (Hz) | Nyquist | Upper bound of the visible frequency band. Must exceed `--min-freq`. | + +## Slicing + +| flag | type | default | description | +|------|------|---------|-------------| +| `--start` | float (s) | `0` | Skip this many seconds from the start of the input. | +| `--duration` | float (s) | `0` (full) | Render only this many seconds after `--start`. | + +## Visualization + +| flag | type | default | description | +|------|------|---------|-------------| +| `--viz` | repeated string list | `spectrogram` | One or more of: `spectrogram`, `mel`, `chroma`, `hpss`, `selfsim`, `loudness`, `tempogram`, `mfcc`, `flux`. Repeatable or comma-separated. | +| `--style` | string | `classic` | Palette name: `classic`, `magma`, `inferno`, `viridis`, `gray` (alias `grey`), `clawd`. | + +## Decoding + +| flag | type | default | description | +|------|------|---------|-------------| +| `--sample-rate` | int | `44100` | Sample rate requested from the ffmpeg fallback. Native WAV/MP3 keep the file's rate. | +| `--ffmpeg` | string | first `ffmpeg` on `PATH` | Override the ffmpeg binary used for non-WAV/MP3 inputs. | + +## Exit codes + +| code | meaning | +|------|---------| +| `0` | Render succeeded. | +| `1` | Decode, render, or write error (message on stderr). | +| `2` | Usage error — bad flag, invalid combination, or missing input. | + +## Examples + +```bash +# All defaults. +songsee track.mp3 + +# Mel + chroma in viridis at 2K. +songsee track.mp3 --viz mel,chroma --style viridis --width 2048 --height 1024 + +# Eight-second slice starting at 12.5s, written to PNG. +songsee track.mp3 --start 12.5 --duration 8 -o slice.png + +# Stream from stdin, encode to PNG, write to stdout. +cat track.mp3 | songsee - --format png -o - > spectro.png + +# Custom FFT, sub-bass focus. +songsee track.mp3 --window 4096 --hop 1024 --min-freq 20 --max-freq 200 + +# Pin a specific ffmpeg. +songsee weird.opus --ffmpeg /opt/homebrew/bin/ffmpeg +``` + +## Related pages + +- [Quickstart](quickstart.md) — first render in under a minute. +- [Visualizations](visualizations.md) — when to use each viz mode. +- [Palettes](palettes.md) — palette gradient stops. +- [Decoding](decoding.md) — input formats and ffmpeg fallback. +- [Rendering](rendering.md) — output sizing, format, batch use. diff --git a/docs/decoding.md b/docs/decoding.md new file mode 100644 index 0000000..2930ca7 --- /dev/null +++ b/docs/decoding.md @@ -0,0 +1,78 @@ +--- +title: Decoding +description: "How songsee decodes audio — native WAV/MP3 paths, ffmpeg fallback, sample rate, stdin, and slicing." +--- + +# Decoding + +songsee turns the input into mono `float64` samples before any analysis runs. Two fast paths cover most files; everything else falls through to ffmpeg. + +## Inputs + +- **File path.** `songsee track.mp3` — any path the OS can open. +- **Stdin.** `songsee -` — reads the encoded stream from stdin. Useful behind `cat`, `curl`, or shell pipelines. +- **Mono mixdown.** Stereo or multichannel inputs are averaged to mono before windowing. + +## Native WAV + +Pure-Go WAV decoder. Handles: + +- PCM 8/16/24/32-bit integer +- 32-bit float, 64-bit float +- WAVE_FORMAT_EXTENSIBLE (with channel masks and sub-format GUIDs) + +No external dependency, no ffmpeg roundtrip. The decoder validates the RIFF header and rejects truncated `data` chunks before allocating sample buffers. + +## Native MP3 + +Pure-Go MP3 decoder. Handles MPEG-1/2 Layer III with VBR and CBR. Output sample rate is whatever the file declares; songsee does not resample. + +If the decoder hits a malformed frame it surfaces a structured error instead of silently truncating, so corrupt input fails loudly. + +## ffmpeg fallback + +Anything that isn't WAV or MP3 — FLAC, AAC, M4A, OGG, Opus, video containers, raw streams — is decoded by spawning `ffmpeg`. songsee asks for 32-bit float little-endian mono at the configured sample rate: + +```text +ffmpeg -hide_banner -loglevel error \ + -i -f f32le -ac 1 -ar - +``` + +Tweak the pipeline with: + +- `--sample-rate N` — output sample rate fed to ffmpeg (default `44100`). +- `--ffmpeg /path/to/ffmpeg` — override the binary lookup. + +If ffmpeg isn't on `PATH` and the file isn't WAV or MP3, songsee fails with a clear error. Install with `brew install ffmpeg` or your distro's package manager. + +## Slicing + +`--start` and `--duration` slice the decoded audio before analysis. Both are seconds (float). Negative values are rejected. + +```bash +songsee long.mp3 --start 60 --duration 15 -o minute1.jpg +songsee long.mp3 --start 60 # 60s to end +songsee long.mp3 --duration 30 # first 30s +``` + +Slicing happens on samples after decoding; FFT framing then runs on the slice. + +## Sample rate notes + +- Native WAV/MP3 keep the file's sample rate. `--sample-rate` only affects the ffmpeg pipeline. +- The Nyquist frequency (`sampleRate / 2`) is the upper bound visible in the spectrogram. +- 44.1 kHz / 48 kHz inputs render with the default `--window 2048 --hop 512` at ≈21 ms / frame. + +## Verbose decoding + +```bash +songsee track.flac --verbose -o out.png +``` + +`--verbose` prints decode info to stderr — sample count, sample rate, slice bounds — without polluting stdout. Combine with `--quiet` to suppress the trailing output-path echo when piping. + +## Related pages + +- [Install](install.md) — how to add ffmpeg if you need it. +- [Pipeline](spec.md) — what happens to the samples after decoding. +- [CLI](cli.md) — every flag with its default. diff --git a/docs/index.md b/docs/index.md index 62e93fc..f09e7c3 100644 --- a/docs/index.md +++ b/docs/index.md @@ -1,113 +1,51 @@ --- -layout: default -title: Home -description: Generate modern spectrogram images from audio files with a fast, scriptable CLI. -body_class: home +title: Overview +permalink: / +description: "songsee is a single Go CLI that turns audio into modern spectrogram and feature-panel images — fast WAV/MP3 decode, ffmpeg fallback, nine visualization modes, six palettes." --- -
-
-
Spectral imaging CLI
-

See sound as living color.

-

- songsee turns audio into precise, high-resolution spectrograms and feature panels. Fast decode - paths for WAV and MP3, ffmpeg fallback for everything else, and palette styles that make science - look cinematic. -

-
- Install - GitHub -
-
Hann window. Log magnitude. 2048 / 512 defaults.
-
-
- -
-
+## Try it -
-
Why songsee
-

A focused pipeline for modern spectrograms.

-

- Decode audio into mono samples, window it with Hann, run FFT, and render log-magnitude frames into - a crisp image. The CLI stays small, reliable, and scriptable. -

+After [installing](install.md), every render is a one-liner. -
-
-

Precise controls

-

Window, hop, min/max frequency, output dimensions, and time slicing for exact framing.

-
-
-

Fast decode paths

-

Native WAV/MP3 decoding with ffmpeg fallback for everything else.

-
-
-

Palette styles

-

classic, magma, inferno, viridis, and gray for a bold spectral aesthetic.

-
-
-

Feature panels

-

mel, chroma, hpss, selfsim, loudness, tempogram, mfcc, flux — rendered as single or grid views.

-
-
-

Auto-contrast

-

Percentile clamping keeps every panel readable without manual tuning.

-
-
-

Clean output

-

JPEG or PNG output, default quality 95, and stable results for batch workflows.

-
-
-
+```bash +# Default: a clean spectrogram next to the input file. +songsee track.mp3 -
-
Install
-

One command. Instant spectrograms.

-
- brew install steipete/tap/songsee - go install github.com/steipete/songsee/cmd/songsee@latest -
-
- songsee.ai, songsee.app, and songsee.dev all redirect to songsee.sh. -
-
+# Mel spectrogram, magma palette, 2K wide. +songsee track.mp3 --viz mel --style magma --width 2048 --height 1024 -
-
Usage
-

CLI ready for pipes, batches, and automation.

-
- songsee track.mp3 - songsee track.wav --style magma --width 2048 --height 1024 -o spectro.png - cat track.mp3 | songsee - --style gray --format png - songsee track.mp3 --start 12.5 --duration 8 --output slice.jpg - songsee track.mp3 --viz spectrogram,mel,chroma --width 2048 --height 1024 -
-
+# All nine modes in one grid. +songsee track.mp3 --viz spectrogram,mel,chroma,hpss,selfsim,loudness,tempogram,mfcc,flux -
-
Palettes
-

Color maps with character.

-

Pick a palette by name for instant visual tone shifts.

-
-
-
-
-
-
-
🦞
-
-
+# Slice eight seconds out of a long file. +songsee track.mp3 --start 12.5 --duration 8 -o slice.jpg -
-
Specs
-

Detailed pipeline notes.

-

- Windowing, bin mapping, normalization, and rendering details live in the spec. -

-
- Read spec -
-
+# Pipe from stdin, write PNG to stdout. +cat track.mp3 | songsee - --format png -o - > spectro.png +``` + +The default output is a 1920×1080 JPEG (quality 95) written next to the input. `--format png` switches encoder, `-o` redirects the path, and `-o -` streams to stdout for piping. + +## What songsee does + +- **One binary, nine views.** spectrogram, mel, chroma, hpss, selfsim, loudness, tempogram, mfcc, flux — pick one, combine several, or render the full grid. +- **Fast decode paths.** Native Go decoders for WAV (PCM, float, extensible) and MP3; ffmpeg fallback covers everything else. +- **Six palettes.** classic, magma, inferno, viridis, gray, and clawd 🦞 — each tuned for log-magnitude data. +- **Auto-contrast.** Per-panel percentile clamping (0.05 / 0.98) keeps every visualization readable without manual tuning. +- **Scriptable I/O.** File path, stdin (`-`), or stdout. Quiet mode for CI; verbose mode prints decode and slice details to stderr. +- **No Python.** Single static binary. No model files, no virtualenv, no GPU. + +## Pick your path + +- **Trying it.** [Install](install.md) → [Quickstart](quickstart.md). One brew formula, one command, one image. +- **Picking a view.** [Visualizations](visualizations.md) describes what each of the nine modes shows and when to use it. +- **Picking a palette.** [Palettes](palettes.md) lists the six palettes with their gradient stops. +- **Audio inputs.** [Decoding](decoding.md) covers WAV/MP3 fast paths, ffmpeg fallback, sample rate, and stdin. +- **Output and batches.** [Rendering](rendering.md) explains output sizing, grid layout, format selection, and stdout streaming. +- **Algorithm details.** [Pipeline](spec.md) documents windowing, FFT, bin mapping, and normalization. +- **Flag reference.** [CLI](cli.md) lists every flag with its default. + +## Project + +Active development; the [changelog](https://github.com/openclaw/songsee/blob/main/CHANGELOG.md) tracks what shipped. Released under the [MIT license](https://github.com/openclaw/songsee/blob/main/LICENSE). Source on [GitHub](https://github.com/openclaw/songsee). diff --git a/docs/install.md b/docs/install.md new file mode 100644 index 0000000..613174c --- /dev/null +++ b/docs/install.md @@ -0,0 +1,68 @@ +--- +title: Install +description: "Install songsee via Homebrew, go install, or build from source." +--- + +# Install + +`songsee` ships as a single Go binary. Pick whichever delivery mechanism fits. + +## Homebrew (macOS, Linux) + +```bash +brew install steipete/tap/songsee +songsee --version +``` + +The formula lives in [`steipete/homebrew-tap`](https://github.com/steipete/homebrew-tap). `brew upgrade songsee` brings in new releases. + +## go install + +```bash +go install github.com/steipete/songsee/cmd/songsee@latest +songsee --version +``` + +This builds against the Go version declared in `go.mod`. The binary lands in `$(go env GOBIN)` (or `$(go env GOPATH)/bin`). + +## Build from source + +```bash +git clone https://github.com/openclaw/songsee.git +cd songsee +make +./songsee --version +``` + +`make` runs `go build` with the version string injected from `git describe`. + +## ffmpeg (optional) + +WAV and MP3 decode natively in pure Go. Anything else (FLAC, AAC, OGG, M4A, video containers) falls through to `ffmpeg` on `PATH`. + +```bash +brew install ffmpeg # macOS / Linuxbrew +apt install ffmpeg # Debian / Ubuntu +``` + +Override the lookup with `--ffmpeg /custom/path/ffmpeg` when you have several builds installed. + +## Verify the install + +```bash +songsee --version +songsee --help +songsee testdata/short.wav # render a tiny known-good file +``` + +## Updating + +- **Homebrew:** `brew upgrade songsee`. +- **go install:** rerun `go install github.com/steipete/songsee/cmd/songsee@latest`. +- **Source:** `git pull && make` — version comes from `git describe`. + +## Related pages + +- [Quickstart](quickstart.md) — first render in under a minute. +- [Decoding](decoding.md) — WAV/MP3 fast paths and ffmpeg fallback. +- [CLI](cli.md) — every flag with its default. diff --git a/docs/palettes.md b/docs/palettes.md new file mode 100644 index 0000000..3aa245b --- /dev/null +++ b/docs/palettes.md @@ -0,0 +1,93 @@ +--- +title: Palettes +description: "Six built-in color maps for songsee: classic, magma, inferno, viridis, gray, clawd." +--- + +# Palettes + +`--style` picks a palette. All palettes are 5- or 6-stop linear gradients applied to normalized values in `[0, 1]`. The default is `classic`. + +```bash +songsee track.mp3 --style magma +songsee track.mp3 --viz mel --style viridis +songsee track.mp3 --viz hpss,chroma --style clawd +``` + +Unknown names error out before decoding. All palettes are deterministic — the same input always produces the same colors. + +## classic + +The default. A black → navy → cyan → amber → white sweep tuned for log-magnitude data, with strong perceptual contrast across the full range. + +| stop | color | +|------|-------| +| 0.00 | `#000000` | +| 0.20 | `#002060` | +| 0.45 | `#00a0c8` | +| 0.70 | `#ffb400` | +| 1.00 | `#ffffff` | + +## magma + +Matplotlib's magma. Black → deep purple → magenta → orange → cream. Smooth and perceptually uniform; works well for everything from spectrograms to MFCCs. + +| stop | color | +|------|-------| +| 0.00 | `#000004` | +| 0.25 | `#3b0c57` | +| 0.50 | `#b4367a` | +| 0.75 | `#fb8c3c` | +| 1.00 | `#fcfdbf` | + +## inferno + +Matplotlib's inferno. Same shape as magma with hotter highs — black → indigo → red → orange → pale yellow. + +| stop | color | +|------|-------| +| 0.00 | `#000004` | +| 0.25 | `#3d0965` | +| 0.50 | `#bb3754` | +| 0.75 | `#f98e08` | +| 1.00 | `#fcffa4` | + +## viridis + +Matplotlib's viridis. Purple → blue → teal → green → yellow. Colorblind-safe and perceptually uniform; the safest choice for publication figures. + +| stop | color | +|------|-------| +| 0.00 | `#440154` | +| 0.25 | `#3a528b` | +| 0.50 | `#20908c` | +| 0.75 | `#5ec962` | +| 1.00 | `#fde725` | + +## gray + +A straight black-to-white linear ramp. Ideal for print, monochrome compositing, or downstream processing that doesn't want hue information. + +| stop | color | +|------|-------| +| 0.00 | `#000000` | +| 1.00 | `#ffffff` | + +`grey` is accepted as an alias. + +## clawd 🦞 + +The mascot palette. Abyssal navy → ocean teal → coral → lobster red → foam highlight. Six stops, designed to be unmistakable. + +| stop | color | +|------|-------| +| 0.00 | `#02040f` | +| 0.20 | `#0b264a` | +| 0.40 | `#126175` | +| 0.60 | `#c1625c` | +| 0.80 | `#cd3728` | +| 1.00 | `#ffe6d2` | + +## Related pages + +- [Visualizations](visualizations.md) — the nine viz modes that consume these palettes. +- [Pipeline](spec.md) — how values are normalized before palette mapping. diff --git a/docs/quickstart.md b/docs/quickstart.md new file mode 100644 index 0000000..d0a8b16 --- /dev/null +++ b/docs/quickstart.md @@ -0,0 +1,74 @@ +--- +title: Quickstart +description: "From a clean machine to a 1920×1080 spectrogram in under a minute." +--- + +# Quickstart + +One install, one command, one image. + +## 1. Install + +```bash +brew install steipete/tap/songsee +songsee --version +``` + +Other paths (go install, source builds, ffmpeg) live on [Install](install.md). + +## 2. Render a default spectrogram + +```bash +songsee track.mp3 +``` + +Output is a 1920×1080 JPEG (quality 95) written next to the input — `track.jpg`. The file path is echoed to stdout; everything else (decode info under `--verbose`, warnings, errors) goes to stderr so pipes stay clean. + +## 3. Pick a different view + +```bash +# Mel-scaled spectrogram (perceptual frequency). +songsee track.mp3 --viz mel + +# Chromagram (12-bin pitch class) with the magma palette. +songsee track.mp3 --viz chroma --style magma + +# Combine harmonic/percussive split with chroma in a single grid. +songsee track.mp3 --viz hpss,chroma --style inferno +``` + +The full list of views lives on [Visualizations](visualizations.md); palettes are documented on [Palettes](palettes.md). + +## 4. Slice a section + +```bash +songsee track.mp3 --start 12.5 --duration 8 -o slice.jpg +``` + +`--start` and `--duration` are seconds. `-o` overrides the default output path; pass `-o -` to write the encoded image to stdout. + +## 5. Stream from stdin + +```bash +cat track.mp3 | songsee - --format png -o - > spectro.png +``` + +`-` as the input reads from stdin; `--format png` switches the encoder; `-o -` writes the encoded image to stdout. Combine with `find`, `xargs`, or shell loops for batch rendering. + +## 6. Tune dimensions and FFT + +```bash +songsee track.mp3 \ + --width 2560 --height 1440 \ + --window 4096 --hop 1024 \ + --min-freq 50 --max-freq 8000 +``` + +`--window` must be a power of two. Larger windows trade time resolution for frequency resolution. `--min-freq` / `--max-freq` clamp the visible frequency band; the default upper bound is the Nyquist frequency. + +## Where next + +- [Visualizations](visualizations.md) — what each of the nine modes shows. +- [Palettes](palettes.md) — gradient stops for the six built-in palettes. +- [Pipeline](spec.md) — windowing, FFT, bin mapping, normalization. +- [CLI](cli.md) — every flag with its default. diff --git a/docs/rendering.md b/docs/rendering.md new file mode 100644 index 0000000..a27ed89 --- /dev/null +++ b/docs/rendering.md @@ -0,0 +1,110 @@ +--- +title: Rendering +description: "How songsee turns spectrogram data into images — output sizing, format, grid layout, stdout streaming, and batch use." +--- + +# Rendering + +The render stage maps numeric spectrogram and feature data onto pixels, applies the chosen palette, composes panels into a grid, and encodes the result to JPEG or PNG. + +## Output path + +```bash +songsee track.mp3 # writes track.jpg next to the input +songsee track.mp3 -o out.png # explicit path; format inferred from extension +songsee track.mp3 -o spectro # no extension; appends ".jpg" by default +songsee - -o - # stdin in, encoded image to stdout +``` + +If `--format` is set explicitly, it overrides extension-based inference. If `-o` already ends in `.png`, `.jpg`, or `.jpeg`, the encoder follows the extension regardless of `--format`. + +When the input is `-` (stdin) and no `-o` is given, the output filename is `songsee.jpg` (or `.png`) in the current directory. + +## Format + +```bash +songsee track.mp3 --format png # PNG, lossless +songsee track.mp3 --format jpg # JPEG, quality 95 (default) +``` + +JPEG quality is fixed at 95 — high enough that compression artifacts disappear at typical viewing sizes while keeping file sizes reasonable. PNG is the right choice for archival, transparency, or downstream processing that doesn't tolerate JPEG quantization. + +## Dimensions + +```bash +songsee track.mp3 --width 2560 --height 1440 +songsee track.mp3 --width 3840 --height 2160 # 4K +songsee track.mp3 --width 800 --height 200 # banner strip +``` + +Defaults are 1920×1080. Both must be positive. With multiple visualizations, songsee divides the canvas into a grid; very small canvases with many panels can leave cells too small to render and produce an error. + +## Grid layout + +When `--viz` selects more than one mode, panels are tiled into a `ceil(sqrt(n))`-column grid with an 8 px gap, sized to fit `--width` × `--height` exactly: + +| panels | grid | +|--------|------| +| 1 | 1×1 | +| 2 | 2×1 | +| 3, 4 | 2×2 | +| 5–6 | 3×2 | +| 7–9 | 3×3 | + +Cells are equal width and height; the canvas is filled top-down, left-right, in the order panels appear in `--viz`. + +## Frequency range + +`--min-freq` and `--max-freq` clamp the visible band (Hz) for spectrogram, mel, and MFCC panels. The default upper bound is the Nyquist frequency (`sampleRate / 2`). + +```bash +# Vocal range, 80 Hz – 4 kHz. +songsee track.mp3 --min-freq 80 --max-freq 4000 + +# Sub-bass focus. +songsee track.mp3 --min-freq 20 --max-freq 200 +``` + +`--max-freq` must be greater than `--min-freq`; songsee rejects the run with exit code 2 otherwise. + +## Auto-contrast + +Every panel runs an independent percentile clamp on its values before palette mapping (typically 0.05 / 0.98 — ~5% black floor, ~2% white ceiling). This keeps a quiet ambient track and a loud rock track equally readable in the same grid; it also means absolute brightness is not comparable across panels. + +The base spectrogram converts magnitudes to decibels (`20·log10(mag + 1e-9)`) before normalizing. + +## Stdout streaming + +Pass `-o -` to write the encoded image bytes to stdout. Combine with `--quiet` to silence the trailing path echo: + +```bash +songsee track.mp3 -o - --quiet > spectro.jpg +songsee track.mp3 -o - --format png --quiet | imgcat +ssh host "songsee /audio/x.flac -o -" > x.jpg +``` + +Verbose decode info still goes to stderr under `--verbose`, so it doesn't corrupt the binary stream on stdout. + +## Batch usage + +There's no built-in `songsee batch`; lean on the shell. + +```bash +# All MP3s in a directory. +for f in *.mp3; do songsee "$f" --style magma; done + +# Parallel via xargs. +ls *.mp3 | xargs -P 8 -I{} songsee {} --width 1920 --height 540 + +# find + GNU parallel. +find . -name '*.flac' -print0 | parallel -0 songsee {} --style viridis -o {.}.png +``` + +songsee is single-threaded internally, so parallelism comes from running multiple processes. + +## Related pages + +- [Visualizations](visualizations.md) — what each panel shows. +- [Palettes](palettes.md) — color maps applied at render time. +- [Pipeline](spec.md) — windowing, FFT, normalization details. +- [CLI](cli.md) — every flag with its default. diff --git a/docs/spec.md b/docs/spec.md index cf1140d..56551a3 100644 --- a/docs/spec.md +++ b/docs/spec.md @@ -1,89 +1,86 @@ --- -layout: default -title: Spec -description: songsee spectral pipeline, defaults, and rendering details. +title: Pipeline +description: "songsee's spectral pipeline — decode, window, FFT, bin mapping, normalization, render." --- -
-
Spec
-

songsee spectral pipeline

-

- This page captures the core algorithm and defaults used by songsee for repeatable, high quality - spectrogram images. -

-
+# Pipeline -
-

Decode

-
-

- WAV and MP3 decode natively. Any other format falls back to ffmpeg. Input can be a file path or - stdin ("-"). Default sample rate for ffmpeg output is 44100 Hz. -

-
-
+This page documents the algorithm and the defaults songsee uses to produce repeatable, high-quality images. It complements [Visualizations](visualizations.md) (what each mode shows) and [Rendering](rendering.md) (how the canvas gets composed). -
-

Spectrogram

-
-

- Windowed frames use a Hann window. FFT runs on each frame and the magnitude is converted to - decibels using 20 * log10(mag + 1e-9). The default window size is 2048 samples with a hop size - of 512 samples. -

-

- Frames are computed as 1 + (len(samples) - window + hop - 1) / hop, and bins are window/2 + 1. - Bin spacing is sampleRate / windowSize. -

-
-
+## Stages -
-

Rendering

-
-

- Each output pixel maps to a time frame and frequency bin. Values are normalized by the global - min/max in the computed spectrogram unless clamp values are provided. Feature panels use - percentile-based clamping to preserve contrast across different visualizations. Frequency - range can be restricted via min/max frequency in Hz. -

-

- Output size defaults to 1920x1080. JPEG quality is 95. PNG output is available via --format. -

-
-
+```text +input → decode → mono mixdown → optional slice → window → FFT + ↓ + per-mode features (mel, chroma, mfcc, hpss, …) + ↓ + percentile normalize → palette map → grid compose → encode +``` -
-

Palettes

-
-

- Palettes map normalized values to RGBA colors. Available names: classic, magma, inferno, clawd, - viridis, gray. -

-
-
+Every stage is deterministic. The same input file with the same flags always produces the same output bytes. -
-

Visualizations

-
-

- Visualizations are selectable via --viz. Defaults to spectrogram. Supported names: spectrogram, - mel, chroma, hpss, selfsim, loudness, tempogram, mfcc, flux. Multiple entries render as a grid - of panels. -

-
-
+## Decode -
-

CLI defaults

-
- --format jpg - --width 1920 - --height 1080 - --window 2048 - --hop 512 - --sample-rate 44100 - --style classic - --viz spectrogram -
-
+- WAV (PCM 8/16/24/32-bit, 32/64-bit float, WAVE_FORMAT_EXTENSIBLE) and MP3 are decoded in pure Go via the bundled decoders. +- Anything else falls through to `ffmpeg` (32-bit float little-endian, mono, `--sample-rate` Hz; default `44100`). +- Stereo or multichannel input is averaged to mono. +- `--start` / `--duration` slice the decoded sample buffer in seconds before windowing. + +See [Decoding](decoding.md) for input formats, sample rate, ffmpeg lookup, and stdin usage. + +## Windowing and FFT + +- Window: **Hann**, applied per frame. +- Window size: `--window` samples (default `2048`, must be a power of two). +- Hop size: `--hop` samples (default `512`). +- Frame count: `1 + (len(samples) - window + hop - 1) / hop`. +- Bin count: `window / 2 + 1`. +- Bin spacing: `sampleRate / window` Hz per bin. + +Magnitude is converted to decibels with `20·log10(mag + 1e-9)` for the base spectrogram. Per-feature pipelines (mel, chroma, mfcc) use linear power instead. + +## Per-mode features + +| mode | source | notes | +|------|--------|-------| +| `spectrogram` | STFT magnitude in dB | clamped to 5th–98th percentile | +| `mel` | mel-warped power | log-magnitude; clamped 5th–98th percentile | +| `chroma` | 12-bin pitch class | folds octaves; clamped 10th–98th percentile | +| `mfcc` | DCT of mel power | strips pitch, keeps timbre | +| `hpss` | median filters on STFT | 9-frame harmonic + 9-frame percussive kernels | +| `selfsim` | cosine sim on chroma frames | gamma 1.4; clamped 10th–98th percentile | +| `loudness` | per-frame RMS | clamped to 95th percentile | +| `tempogram` | onset autocorrelation | 30–240 BPM, 256 bins | +| `flux` | frame-to-frame STFT delta | clamped to 95th percentile | + +The percentile sampling reservoir is capped at 20 000 values per panel for speed; this is dense enough that boundaries are stable across runs. + +## Rendering + +- Each panel maps `(time × bin)` cells onto pixels at the panel's width × height. +- Values are normalized into `[0, 1]` against the per-panel min/max (after the percentile clamp), then passed through the chosen palette. +- Heatmap panels (mel, chroma, mfcc, selfsim, hpss halves, tempogram) render with `flipVert` so low frequencies are at the bottom. +- Multiple panels compose into a `ceil(sqrt(n))`-column grid with an 8 px gap (see [Rendering](rendering.md)). +- Encoder: PNG (lossless) or JPEG (quality 95). + +## CLI defaults + +```text +--format jpg +--width 1920 +--height 1080 +--window 2048 +--hop 512 +--sample-rate 44100 +--style classic +--viz spectrogram +``` + +Full reference: [CLI](cli.md). + +## Related pages + +- [Visualizations](visualizations.md) — per-mode descriptions. +- [Palettes](palettes.md) — gradient stops. +- [Decoding](decoding.md) — input handling. +- [Rendering](rendering.md) — output and batch use. diff --git a/docs/visualizations.md b/docs/visualizations.md new file mode 100644 index 0000000..71449b1 --- /dev/null +++ b/docs/visualizations.md @@ -0,0 +1,120 @@ +--- +title: Visualizations +description: "The nine visualization modes songsee can render: spectrogram, mel, chroma, hpss, selfsim, loudness, tempogram, mfcc, flux." +--- + +# Visualizations + +`--viz` selects one or more visualization modes. Pass it once with a comma-separated list, or repeat it. Unknown names error out before any decoding runs. + +```bash +songsee track.mp3 --viz spectrogram +songsee track.mp3 --viz mel,chroma,hpss +songsee track.mp3 --viz spectrogram --viz flux +``` + +When more than one mode is selected, songsee composes a square-ish grid (`ceil(sqrt(n))` columns) with an 8 px gap between cells, all sized to fit `--width` × `--height`. + +## spectrogram + +Time × frequency magnitude. The base FFT view: a Hann-windowed STFT converted to decibels (`20·log10(mag + 1e-9)`). The X axis is time, the Y axis is linear frequency from `--min-freq` to `--max-freq` (Nyquist by default), and each pixel's brightness is the magnitude in that time-frequency cell. + +Use it when you want raw spectral truth — verifying decode, hunting harmonics, identifying transients. + +```bash +songsee track.mp3 --viz spectrogram +``` + +## mel + +Perceptual frequency scale. Same STFT, but bins are warped onto the mel scale, which weights low frequencies more heavily — closer to how humans hear pitch. + +Good for vocal and tonal content; the structure of speech and melody jumps out compared to a linear spectrogram. + +```bash +songsee track.mp3 --viz mel --min-freq 80 --max-freq 8000 +``` + +## chroma + +12-bin pitch class. Energy is folded across octaves into the twelve semitones (C, C♯, D, …). The Y axis is pitch class, the X axis is time. + +Reveals harmonic and key content — chord progressions, modulations, repetition between sections. + +```bash +songsee track.mp3 --viz chroma +``` + +## hpss + +Harmonic vs percussive separation. Median-filters the spectrogram twice (9-frame kernels) to split it into a harmonic top half (sustained tones) and a percussive bottom half (transients). + +Use it to see where the kit and where the melody live in the same track. + +```bash +songsee track.mp3 --viz hpss +``` + +## selfsim + +Self-similarity matrix on chroma frames. Each pixel `(i, j)` is the cosine similarity between chroma frame `i` and frame `j`, with a gentle gamma (1.4) for contrast. + +Brings out song structure: verses repeat as bright off-diagonal stripes; choruses form clear blocks. + +```bash +songsee track.mp3 --viz selfsim +``` + +## loudness + +Frame-wise RMS over time. A waveform-style envelope with the X axis as time and the height as energy, clamped to the 95th percentile so peaks don't crush the rest. + +Good for spotting dynamics, fade-ins, and silence. + +```bash +songsee track.mp3 --viz loudness +``` + +## tempogram + +Tempo variation over time. An autocorrelation-style heatmap of the onset envelope, scanning 30–240 BPM in 256 bins. + +Reveals tempo drift, rubato, and switches between rhythmic feels. + +```bash +songsee track.mp3 --viz tempogram +``` + +## mfcc + +Mel-frequency cepstral coefficients — the classic timbre fingerprint. Each row is one cepstral coefficient over time. + +Strips pitch and leaves "color"; useful for distinguishing instruments, voices, or sections that share notes but not tone. + +```bash +songsee track.mp3 --viz mfcc +``` + +## flux + +Spectral flux — frame-to-frame magnitude change. A 1-D envelope with peaks at onsets and discontinuities, clamped to the 95th percentile. + +Use it to find note onsets, edits, or anything sudden. + +```bash +songsee track.mp3 --viz flux +``` + +## Combining + +```bash +songsee track.mp3 --viz spectrogram,mel,chroma,hpss,selfsim,loudness,tempogram,mfcc,flux +``` + +All nine in one 1920×1080 grid (3×3). Mix and match with the same syntax. Each panel auto-contrasts independently — comparing absolute values across panels is not meaningful, comparing structure is. + +## Related pages + +- [Palettes](palettes.md) — color maps applied to every panel. +- [Pipeline](spec.md) — windowing, FFT, bin mapping, normalization. +- [CLI](cli.md) — every flag with its default.