2.5 KiB
| layout | title | description |
|---|---|---|
| default | Spec | songsee spectral pipeline, defaults, and rendering details. |
songsee spectral pipeline
This page captures the core algorithm and defaults used by songsee for repeatable, high quality spectrogram images.
Decode
WAV and MP3 decode natively. Any other format falls back to ffmpeg. Input can be a file path or stdin ("-"). Default sample rate for ffmpeg output is 44100 Hz.
Spectrogram
Windowed frames use a Hann window. FFT runs on each frame and the magnitude is converted to decibels using 20 * log10(mag + 1e-9). The default window size is 2048 samples with a hop size of 512 samples.
Frames are computed as 1 + (len(samples) - window + hop - 1) / hop, and bins are window/2 + 1. Bin spacing is sampleRate / windowSize.
Rendering
Each output pixel maps to a time frame and frequency bin. Values are normalized by the global min/max in the computed spectrogram unless clamp values are provided. Frequency range can be restricted via min/max frequency in Hz.
Output size defaults to 1920x1080. JPEG quality is 95. PNG output is available via --format.
Palettes
Palettes map normalized values to RGBA colors. Available names: classic, magma, inferno, viridis, gray.
Visualizations
Visualizations are selectable via --viz. Defaults to spectrogram. Supported names: spectrogram, mel, chroma, hpss, selfsim, loudness, tempogram, mfcc, flux. Multiple entries render as a grid of panels.