songsee/docs/spec.md
2026-01-02 14:48:52 +01:00

2.5 KiB

layout title description
default Spec songsee spectral pipeline, defaults, and rendering details.
Spec

songsee spectral pipeline

This page captures the core algorithm and defaults used by songsee for repeatable, high quality spectrogram images.

Decode

WAV and MP3 decode natively. Any other format falls back to ffmpeg. Input can be a file path or stdin ("-"). Default sample rate for ffmpeg output is 44100 Hz.

Spectrogram

Windowed frames use a Hann window. FFT runs on each frame and the magnitude is converted to decibels using 20 * log10(mag + 1e-9). The default window size is 2048 samples with a hop size of 512 samples.

Frames are computed as 1 + (len(samples) - window + hop - 1) / hop, and bins are window/2 + 1. Bin spacing is sampleRate / windowSize.

Rendering

Each output pixel maps to a time frame and frequency bin. Values are normalized by the global min/max in the computed spectrogram unless clamp values are provided. Frequency range can be restricted via min/max frequency in Hz.

Output size defaults to 1920x1080. JPEG quality is 95. PNG output is available via --format.

Palettes

Palettes map normalized values to RGBA colors. Available names: classic, magma, inferno, viridis, gray.

Visualizations

Visualizations are selectable via --viz. Defaults to spectrogram. Supported names: spectrogram, mel, chroma, hpss, selfsim, loudness, tempogram, mfcc, flux. Multiple entries render as a grid of panels.

CLI defaults

--format jpg --width 1920 --height 1080 --window 2048 --hop 512 --sample-rate 44100 --style classic --viz spectrogram