songsee/docs/decoding.md
Peter Steinberger 5cee6499fe
docs: rewrite site gogcli-style with per-feature pages
Drop the custom Jekyll layout, CSS, and JS in favor of GitHub Pages'
default theme — same approach gogcli.sh uses. Replace the marketing
landing page with a plain-markdown overview that mirrors gogcli's
"try it / what it does / pick your path" structure.

Add one focused page per feature: install, quickstart, visualizations,
palettes, decoding, rendering, pipeline (spec), and CLI reference.
Verify ffmpeg pipeline (f32le) and decoder coverage against the actual
audio package.
2026-05-08 16:01:45 +01:00

3.0 KiB

title description
Decoding How songsee decodes audio — native WAV/MP3 paths, ffmpeg fallback, sample rate, stdin, and slicing.

Decoding

songsee turns the input into mono float64 samples before any analysis runs. Two fast paths cover most files; everything else falls through to ffmpeg.

Inputs

  • File path. songsee track.mp3 — any path the OS can open.
  • Stdin. songsee - — reads the encoded stream from stdin. Useful behind cat, curl, or shell pipelines.
  • Mono mixdown. Stereo or multichannel inputs are averaged to mono before windowing.

Native WAV

Pure-Go WAV decoder. Handles:

  • PCM 8/16/24/32-bit integer
  • 32-bit float, 64-bit float
  • WAVE_FORMAT_EXTENSIBLE (with channel masks and sub-format GUIDs)

No external dependency, no ffmpeg roundtrip. The decoder validates the RIFF header and rejects truncated data chunks before allocating sample buffers.

Native MP3

Pure-Go MP3 decoder. Handles MPEG-1/2 Layer III with VBR and CBR. Output sample rate is whatever the file declares; songsee does not resample.

If the decoder hits a malformed frame it surfaces a structured error instead of silently truncating, so corrupt input fails loudly.

ffmpeg fallback

Anything that isn't WAV or MP3 — FLAC, AAC, M4A, OGG, Opus, video containers, raw streams — is decoded by spawning ffmpeg. songsee asks for 32-bit float little-endian mono at the configured sample rate:

ffmpeg -hide_banner -loglevel error \
       -i <input> -f f32le -ac 1 -ar <sample-rate> -

Tweak the pipeline with:

  • --sample-rate N — output sample rate fed to ffmpeg (default 44100).
  • --ffmpeg /path/to/ffmpeg — override the binary lookup.

If ffmpeg isn't on PATH and the file isn't WAV or MP3, songsee fails with a clear error. Install with brew install ffmpeg or your distro's package manager.

Slicing

--start and --duration slice the decoded audio before analysis. Both are seconds (float). Negative values are rejected.

songsee long.mp3 --start 60 --duration 15 -o minute1.jpg
songsee long.mp3 --start 60                  # 60s to end
songsee long.mp3 --duration 30               # first 30s

Slicing happens on samples after decoding; FFT framing then runs on the slice.

Sample rate notes

  • Native WAV/MP3 keep the file's sample rate. --sample-rate only affects the ffmpeg pipeline.
  • The Nyquist frequency (sampleRate / 2) is the upper bound visible in the spectrogram.
  • 44.1 kHz / 48 kHz inputs render with the default --window 2048 --hop 512 at ≈21 ms / frame.

Verbose decoding

songsee track.flac --verbose -o out.png

--verbose prints decode info to stderr — sample count, sample rate, slice bounds — without polluting stdout. Combine with --quiet to suppress the trailing output-path echo when piping.

  • Install — how to add ffmpeg if you need it.
  • Pipeline — what happens to the samples after decoding.
  • CLI — every flag with its default.