docs: rewrite site gogcli-style with per-feature pages
Drop the custom Jekyll layout, CSS, and JS in favor of GitHub Pages' default theme — same approach gogcli.sh uses. Replace the marketing landing page with a plain-markdown overview that mirrors gogcli's "try it / what it does / pick your path" structure. Add one focused page per feature: install, quickstart, visualizations, palettes, decoding, rendering, pipeline (spec), and CLI reference. Verify ffmpeg pipeline (f32le) and decoder coverage against the actual audio package.
This commit is contained in:
parent
b177ce9d35
commit
5cee6499fe
@ -3,6 +3,7 @@
|
||||
## 0.1.1 - Unreleased
|
||||
|
||||
- New Clawd style
|
||||
- Docs: rewritten gogcli-style — plain-markdown pages for install, quickstart, visualizations, palettes, decoding, rendering, pipeline, and CLI; removed custom Jekyll theme so songsee.sh runs on the default GitHub Pages theme
|
||||
|
||||
## 0.1.0 - 2026-01-02
|
||||
|
||||
|
||||
@ -1,11 +0,0 @@
|
||||
title: songsee
|
||||
description: Generate modern spectrogram images from audio files.
|
||||
url: "https://songsee.sh"
|
||||
baseurl: ""
|
||||
markdown: kramdown
|
||||
permalink: pretty
|
||||
|
||||
kramdown:
|
||||
input: GFM
|
||||
|
||||
collections: {}
|
||||
@ -1,58 +0,0 @@
|
||||
<!doctype html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1">
|
||||
<title>{% if page.title %}{{ page.title }} | {% endif %}{{ site.title }}</title>
|
||||
<meta name="description" content="{{ page.description | default: site.description }}">
|
||||
<meta name="theme-color" content="#0d0b14">
|
||||
<link rel="canonical" href="{{ site.url }}{{ page.url | replace: 'index.html', '' }}">
|
||||
<meta property="og:site_name" content="{{ site.title }}">
|
||||
<meta property="og:title" content="{% if page.title %}{{ page.title }}{% else %}{{ site.title }}{% endif %}">
|
||||
<meta property="og:description" content="{{ page.description | default: site.description }}">
|
||||
<meta property="og:type" content="website">
|
||||
<meta property="og:url" content="{{ site.url }}{{ page.url | replace: 'index.html', '' }}">
|
||||
<link rel="stylesheet" href="https://fonts.googleapis.com/css2?family=Fraunces:opsz,wght@9..144,400,600,700&family=Manrope:wght@300;400;500;600;700&family=JetBrains+Mono:wght@400;600&display=swap">
|
||||
<link rel="stylesheet" href="{{ '/assets/css/site.css' | relative_url }}">
|
||||
</head>
|
||||
<body class="{{ page.body_class | default: 'page' }}">
|
||||
<div class="grain" aria-hidden="true"></div>
|
||||
<div class="orb orb-a" aria-hidden="true"></div>
|
||||
<div class="orb orb-b" aria-hidden="true"></div>
|
||||
|
||||
<header class="nav">
|
||||
<div class="nav-inner">
|
||||
<a class="logo" href="{{ '/' | relative_url }}">
|
||||
<span class="logo-mark"></span>
|
||||
<span class="logo-text">songsee</span>
|
||||
</a>
|
||||
<nav class="nav-links">
|
||||
<a href="{{ '/' | relative_url }}#install">Install</a>
|
||||
<a href="{{ '/' | relative_url }}#usage">Usage</a>
|
||||
<a href="{{ '/spec/' | relative_url }}">Spec</a>
|
||||
<a href="https://github.com/openclaw/songsee">GitHub</a>
|
||||
</nav>
|
||||
</div>
|
||||
</header>
|
||||
|
||||
<main class="content">
|
||||
{{ content }}
|
||||
</main>
|
||||
|
||||
<footer class="footer">
|
||||
<div class="footer-inner">
|
||||
<div>
|
||||
<div class="footer-title">songsee</div>
|
||||
<div class="footer-sub">Spectrograms that feel alive.</div>
|
||||
</div>
|
||||
<div class="footer-links">
|
||||
<a href="https://github.com/openclaw/songsee">GitHub</a>
|
||||
<a href="{{ '/spec/' | relative_url }}">Spec</a>
|
||||
<a href="https://songsee.sh">songsee.sh</a>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
||||
<script src="{{ '/assets/js/site.js' | relative_url }}" defer></script>
|
||||
</body>
|
||||
</html>
|
||||
@ -1,436 +0,0 @@
|
||||
:root {
|
||||
color-scheme: dark;
|
||||
--bg: #0d0b14;
|
||||
--bg-soft: #141022;
|
||||
--bg-deep: #08070f;
|
||||
--ink: #f4f2ff;
|
||||
--muted: #b7b1c8;
|
||||
--accent: #ffb347;
|
||||
--accent-2: #2cf6f6;
|
||||
--accent-3: #ff5da2;
|
||||
--accent-4: #9d7bff;
|
||||
--card: rgba(23, 18, 38, 0.72);
|
||||
--stroke: rgba(255, 255, 255, 0.08);
|
||||
--shadow: 0 20px 60px rgba(7, 6, 12, 0.65);
|
||||
--mono: "JetBrains Mono", ui-monospace, SFMono-Regular, Menlo, monospace;
|
||||
--sans: "Manrope", system-ui, -apple-system, sans-serif;
|
||||
--display: "Fraunces", "Times New Roman", serif;
|
||||
}
|
||||
|
||||
* {
|
||||
box-sizing: border-box;
|
||||
}
|
||||
|
||||
html {
|
||||
scroll-behavior: smooth;
|
||||
}
|
||||
|
||||
body {
|
||||
margin: 0;
|
||||
font-family: var(--sans);
|
||||
color: var(--ink);
|
||||
background: radial-gradient(1200px 800px at 15% 10%, rgba(157, 123, 255, 0.25), transparent 55%),
|
||||
radial-gradient(800px 600px at 85% 0%, rgba(44, 246, 246, 0.18), transparent 60%),
|
||||
radial-gradient(900px 700px at 85% 80%, rgba(255, 179, 71, 0.16), transparent 65%),
|
||||
linear-gradient(160deg, var(--bg) 0%, var(--bg-soft) 55%, var(--bg-deep) 100%);
|
||||
min-height: 100vh;
|
||||
overflow-x: hidden;
|
||||
}
|
||||
|
||||
a {
|
||||
color: inherit;
|
||||
text-decoration: none;
|
||||
}
|
||||
|
||||
a:hover {
|
||||
color: var(--accent-2);
|
||||
}
|
||||
|
||||
.grain {
|
||||
position: fixed;
|
||||
inset: 0;
|
||||
z-index: 0;
|
||||
pointer-events: none;
|
||||
opacity: 0.2;
|
||||
mix-blend-mode: soft-light;
|
||||
background-image: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' width='140' height='140' viewBox='0 0 140 140'%3E%3Cfilter id='n'%3E%3CfeTurbulence type='fractalNoise' baseFrequency='0.9' numOctaves='2' stitchTiles='stitch'/%3E%3C/filter%3E%3Crect width='140' height='140' filter='url(%23n)' opacity='0.35'/%3E%3C/svg%3E");
|
||||
}
|
||||
|
||||
.orb {
|
||||
position: fixed;
|
||||
border-radius: 999px;
|
||||
filter: blur(10px);
|
||||
opacity: 0.55;
|
||||
z-index: 0;
|
||||
animation: drift 18s ease-in-out infinite alternate;
|
||||
}
|
||||
|
||||
.orb-a {
|
||||
width: 420px;
|
||||
height: 420px;
|
||||
background: radial-gradient(circle at 30% 30%, rgba(255, 93, 162, 0.6), rgba(13, 11, 20, 0));
|
||||
top: -120px;
|
||||
left: -80px;
|
||||
}
|
||||
|
||||
.orb-b {
|
||||
width: 520px;
|
||||
height: 520px;
|
||||
background: radial-gradient(circle at 60% 40%, rgba(44, 246, 246, 0.4), rgba(13, 11, 20, 0));
|
||||
bottom: -220px;
|
||||
right: -140px;
|
||||
animation-delay: -6s;
|
||||
}
|
||||
|
||||
@keyframes drift {
|
||||
from {
|
||||
transform: translate3d(0, 0, 0) scale(0.95);
|
||||
}
|
||||
to {
|
||||
transform: translate3d(40px, -20px, 0) scale(1.05);
|
||||
}
|
||||
}
|
||||
|
||||
.nav {
|
||||
position: sticky;
|
||||
top: 0;
|
||||
z-index: 10;
|
||||
backdrop-filter: blur(14px);
|
||||
background: rgba(10, 9, 16, 0.75);
|
||||
border-bottom: 1px solid rgba(255, 255, 255, 0.06);
|
||||
}
|
||||
|
||||
.nav-inner {
|
||||
max-width: 1100px;
|
||||
margin: 0 auto;
|
||||
display: flex;
|
||||
align-items: center;
|
||||
justify-content: space-between;
|
||||
padding: 16px 24px;
|
||||
}
|
||||
|
||||
.logo {
|
||||
display: inline-flex;
|
||||
align-items: center;
|
||||
gap: 12px;
|
||||
font-family: var(--display);
|
||||
font-size: 22px;
|
||||
}
|
||||
|
||||
.logo-mark {
|
||||
width: 18px;
|
||||
height: 18px;
|
||||
border-radius: 6px;
|
||||
background: conic-gradient(from 120deg, var(--accent), var(--accent-3), var(--accent-2), var(--accent));
|
||||
box-shadow: 0 0 18px rgba(255, 179, 71, 0.35);
|
||||
}
|
||||
|
||||
.nav-links {
|
||||
display: flex;
|
||||
gap: 20px;
|
||||
font-size: 14px;
|
||||
color: var(--muted);
|
||||
}
|
||||
|
||||
.content {
|
||||
position: relative;
|
||||
z-index: 1;
|
||||
}
|
||||
|
||||
.hero {
|
||||
max-width: 1100px;
|
||||
margin: 0 auto;
|
||||
padding: 90px 24px 60px;
|
||||
display: grid;
|
||||
grid-template-columns: repeat(12, 1fr);
|
||||
gap: 24px;
|
||||
}
|
||||
|
||||
.hero-copy {
|
||||
grid-column: 1 / span 7;
|
||||
}
|
||||
|
||||
.hero-title {
|
||||
font-family: var(--display);
|
||||
font-size: clamp(42px, 6vw, 82px);
|
||||
line-height: 0.95;
|
||||
margin: 0 0 18px;
|
||||
}
|
||||
|
||||
.hero-sub {
|
||||
font-size: 18px;
|
||||
color: var(--muted);
|
||||
max-width: 520px;
|
||||
line-height: 1.6;
|
||||
}
|
||||
|
||||
.hero-actions {
|
||||
display: flex;
|
||||
flex-wrap: wrap;
|
||||
gap: 14px;
|
||||
margin-top: 26px;
|
||||
}
|
||||
|
||||
.btn {
|
||||
padding: 12px 20px;
|
||||
border-radius: 999px;
|
||||
border: 1px solid rgba(255, 255, 255, 0.12);
|
||||
font-weight: 600;
|
||||
letter-spacing: 0.02em;
|
||||
transition: transform 0.2s ease, box-shadow 0.2s ease, border 0.2s ease;
|
||||
}
|
||||
|
||||
.btn.primary {
|
||||
background: linear-gradient(135deg, rgba(255, 179, 71, 0.95), rgba(255, 93, 162, 0.95));
|
||||
color: #130d1d;
|
||||
box-shadow: 0 12px 30px rgba(255, 124, 99, 0.35);
|
||||
border: none;
|
||||
}
|
||||
|
||||
.btn:hover {
|
||||
transform: translateY(-2px);
|
||||
}
|
||||
|
||||
.hero-meta {
|
||||
margin-top: 22px;
|
||||
font-family: var(--mono);
|
||||
font-size: 13px;
|
||||
color: rgba(255, 255, 255, 0.45);
|
||||
}
|
||||
|
||||
.hero-visual {
|
||||
grid-column: 8 / span 5;
|
||||
position: relative;
|
||||
}
|
||||
|
||||
.spectral-panel {
|
||||
height: 420px;
|
||||
border-radius: 24px;
|
||||
background: radial-gradient(200px 200px at var(--mx, 70%) var(--my, 30%), rgba(44, 246, 246, 0.25), transparent 70%),
|
||||
linear-gradient(140deg, rgba(24, 16, 44, 0.9), rgba(12, 9, 22, 0.95));
|
||||
border: 1px solid var(--stroke);
|
||||
box-shadow: var(--shadow);
|
||||
overflow: hidden;
|
||||
position: relative;
|
||||
}
|
||||
|
||||
.spectral-panel::before {
|
||||
content: "";
|
||||
position: absolute;
|
||||
inset: 30px 22px 90px 22px;
|
||||
border-radius: 18px;
|
||||
background: linear-gradient(90deg, #0a0a12, #0a0a12 30%, rgba(255, 179, 71, 0.9), rgba(44, 246, 246, 0.9), rgba(157, 123, 255, 0.9));
|
||||
filter: saturate(1.4);
|
||||
animation: sweep 6s ease-in-out infinite alternate;
|
||||
}
|
||||
|
||||
.spectral-panel::after {
|
||||
content: "";
|
||||
position: absolute;
|
||||
inset: 0;
|
||||
background: repeating-linear-gradient(0deg, rgba(255, 255, 255, 0.08), rgba(255, 255, 255, 0.08) 1px, transparent 1px, transparent 6px);
|
||||
mix-blend-mode: screen;
|
||||
opacity: 0.35;
|
||||
}
|
||||
|
||||
.spectral-caption {
|
||||
position: absolute;
|
||||
bottom: 22px;
|
||||
left: 24px;
|
||||
font-family: var(--mono);
|
||||
font-size: 12px;
|
||||
color: rgba(255, 255, 255, 0.6);
|
||||
}
|
||||
|
||||
@keyframes sweep {
|
||||
0% {
|
||||
transform: translateX(-6%) scaleY(0.95);
|
||||
}
|
||||
100% {
|
||||
transform: translateX(6%) scaleY(1.02);
|
||||
}
|
||||
}
|
||||
|
||||
.section {
|
||||
max-width: 1100px;
|
||||
margin: 0 auto;
|
||||
padding: 30px 24px 70px;
|
||||
}
|
||||
|
||||
.section-title {
|
||||
font-family: var(--display);
|
||||
font-size: clamp(28px, 3vw, 42px);
|
||||
margin: 0 0 14px;
|
||||
}
|
||||
|
||||
.section-sub {
|
||||
color: var(--muted);
|
||||
max-width: 680px;
|
||||
line-height: 1.6;
|
||||
}
|
||||
|
||||
.feature-grid {
|
||||
display: grid;
|
||||
grid-template-columns: repeat(auto-fit, minmax(220px, 1fr));
|
||||
gap: 18px;
|
||||
margin-top: 26px;
|
||||
}
|
||||
|
||||
.card {
|
||||
background: var(--card);
|
||||
border: 1px solid var(--stroke);
|
||||
border-radius: 18px;
|
||||
padding: 18px;
|
||||
box-shadow: var(--shadow);
|
||||
}
|
||||
|
||||
.card h3 {
|
||||
margin: 0 0 8px;
|
||||
font-size: 18px;
|
||||
}
|
||||
|
||||
.card p {
|
||||
margin: 0;
|
||||
color: var(--muted);
|
||||
line-height: 1.5;
|
||||
font-size: 14px;
|
||||
}
|
||||
|
||||
.code-block {
|
||||
background: rgba(8, 8, 14, 0.9);
|
||||
border-radius: 16px;
|
||||
padding: 18px 20px;
|
||||
border: 1px solid rgba(255, 255, 255, 0.08);
|
||||
font-family: var(--mono);
|
||||
font-size: 13px;
|
||||
line-height: 1.6;
|
||||
overflow-x: auto;
|
||||
}
|
||||
|
||||
.kicker {
|
||||
font-family: var(--mono);
|
||||
letter-spacing: 0.16em;
|
||||
text-transform: uppercase;
|
||||
font-size: 11px;
|
||||
color: rgba(255, 255, 255, 0.45);
|
||||
}
|
||||
|
||||
.palette-row {
|
||||
display: grid;
|
||||
grid-template-columns: repeat(auto-fit, minmax(160px, 1fr));
|
||||
gap: 12px;
|
||||
margin-top: 18px;
|
||||
}
|
||||
|
||||
.palette {
|
||||
height: 64px;
|
||||
border-radius: 14px;
|
||||
border: 1px solid rgba(255, 255, 255, 0.12);
|
||||
}
|
||||
|
||||
.palette.classic {
|
||||
background: linear-gradient(90deg, #000000, #002060, #00a0c8, #ffb400, #ffffff);
|
||||
}
|
||||
|
||||
.palette.magma {
|
||||
background: linear-gradient(90deg, #000004, #3b0c57, #b4367a, #fb8c3c, #fcfdbf);
|
||||
}
|
||||
|
||||
.palette.inferno {
|
||||
background: linear-gradient(90deg, #000004, #3d0965, #bb3754, #f98e08, #fcffa4);
|
||||
}
|
||||
|
||||
.palette.viridis {
|
||||
background: linear-gradient(90deg, #440154, #3a528b, #20908c, #5ec962, #fde725);
|
||||
}
|
||||
|
||||
.palette.gray {
|
||||
background: linear-gradient(90deg, #000000, #ffffff);
|
||||
}
|
||||
|
||||
.domain-note {
|
||||
margin-top: 18px;
|
||||
padding: 12px 16px;
|
||||
border-radius: 12px;
|
||||
background: rgba(44, 246, 246, 0.08);
|
||||
border: 1px solid rgba(44, 246, 246, 0.2);
|
||||
color: rgba(255, 255, 255, 0.75);
|
||||
font-size: 14px;
|
||||
}
|
||||
|
||||
.footer {
|
||||
border-top: 1px solid rgba(255, 255, 255, 0.08);
|
||||
padding: 30px 24px 40px;
|
||||
background: rgba(7, 6, 12, 0.65);
|
||||
}
|
||||
|
||||
.footer-inner {
|
||||
max-width: 1100px;
|
||||
margin: 0 auto;
|
||||
display: flex;
|
||||
flex-wrap: wrap;
|
||||
gap: 24px;
|
||||
justify-content: space-between;
|
||||
align-items: center;
|
||||
}
|
||||
|
||||
.footer-title {
|
||||
font-family: var(--display);
|
||||
font-size: 20px;
|
||||
}
|
||||
|
||||
.footer-sub {
|
||||
color: var(--muted);
|
||||
font-size: 13px;
|
||||
}
|
||||
|
||||
.footer-links {
|
||||
display: flex;
|
||||
gap: 16px;
|
||||
font-size: 13px;
|
||||
color: var(--muted);
|
||||
}
|
||||
|
||||
.reveal {
|
||||
opacity: 0;
|
||||
transform: translateY(16px);
|
||||
animation: rise 0.8s ease forwards;
|
||||
}
|
||||
|
||||
.reveal.delay-1 { animation-delay: 0.1s; }
|
||||
.reveal.delay-2 { animation-delay: 0.2s; }
|
||||
.reveal.delay-3 { animation-delay: 0.3s; }
|
||||
.reveal.delay-4 { animation-delay: 0.4s; }
|
||||
|
||||
@keyframes rise {
|
||||
to {
|
||||
opacity: 1;
|
||||
transform: translateY(0);
|
||||
}
|
||||
}
|
||||
|
||||
@media (max-width: 900px) {
|
||||
.hero {
|
||||
grid-template-columns: 1fr;
|
||||
}
|
||||
|
||||
.hero-copy,
|
||||
.hero-visual {
|
||||
grid-column: auto;
|
||||
}
|
||||
|
||||
.spectral-panel {
|
||||
height: 320px;
|
||||
}
|
||||
|
||||
.nav-links {
|
||||
display: none;
|
||||
}
|
||||
}
|
||||
|
||||
@media (prefers-reduced-motion: reduce) {
|
||||
* {
|
||||
animation: none !important;
|
||||
transition: none !important;
|
||||
}
|
||||
}
|
||||
@ -1,20 +0,0 @@
|
||||
(() => {
|
||||
const panels = document.querySelectorAll('.spectral-panel');
|
||||
if (!panels.length) return;
|
||||
|
||||
const update = (panel, event) => {
|
||||
const rect = panel.getBoundingClientRect();
|
||||
const x = Math.min(Math.max((event.clientX - rect.left) / rect.width, 0), 1);
|
||||
const y = Math.min(Math.max((event.clientY - rect.top) / rect.height, 0), 1);
|
||||
panel.style.setProperty('--mx', `${(x * 100).toFixed(1)}%`);
|
||||
panel.style.setProperty('--my', `${(y * 100).toFixed(1)}%`);
|
||||
};
|
||||
|
||||
panels.forEach((panel) => {
|
||||
panel.addEventListener('pointermove', (event) => update(panel, event));
|
||||
panel.addEventListener('pointerleave', () => {
|
||||
panel.style.removeProperty('--mx');
|
||||
panel.style.removeProperty('--my');
|
||||
});
|
||||
});
|
||||
})();
|
||||
93
docs/cli.md
Normal file
93
docs/cli.md
Normal file
@ -0,0 +1,93 @@
|
||||
---
|
||||
title: CLI
|
||||
description: "Every songsee flag with its default and accepted values."
|
||||
---
|
||||
|
||||
# CLI
|
||||
|
||||
```text
|
||||
songsee <input> [flags]
|
||||
```
|
||||
|
||||
`<input>` is a file path or `-` for stdin.
|
||||
|
||||
## Inputs and output
|
||||
|
||||
| flag | type | default | description |
|
||||
|------|------|---------|-------------|
|
||||
| `<input>` | string (positional) | required | File path, or `-` to read encoded audio from stdin. |
|
||||
| `-o`, `--output` | string | input name + extension | Output path. `-` writes the encoded image to stdout. |
|
||||
| `--format` | `jpg` \| `png` | `jpg` | Output encoder. JPEG is quality 95; PNG is lossless. |
|
||||
| `--width` | int | `1920` | Output width in pixels. |
|
||||
| `--height` | int | `1080` | Output height in pixels. |
|
||||
| `-q`, `--quiet` | bool | `false` | Suppress the stdout output-path echo. |
|
||||
| `-v`, `--verbose` | bool | `false` | Print decode and slice info to stderr. |
|
||||
| `--version` | bool | — | Print version and exit. |
|
||||
|
||||
## FFT and windowing
|
||||
|
||||
| flag | type | default | description |
|
||||
|------|------|---------|-------------|
|
||||
| `--window` | int | `2048` | FFT window size in samples. Must be a power of two. |
|
||||
| `--hop` | int | `512` | Hop size in samples between frames. |
|
||||
| `--min-freq` | float (Hz) | `0` | Lower bound of the visible frequency band. |
|
||||
| `--max-freq` | float (Hz) | Nyquist | Upper bound of the visible frequency band. Must exceed `--min-freq`. |
|
||||
|
||||
## Slicing
|
||||
|
||||
| flag | type | default | description |
|
||||
|------|------|---------|-------------|
|
||||
| `--start` | float (s) | `0` | Skip this many seconds from the start of the input. |
|
||||
| `--duration` | float (s) | `0` (full) | Render only this many seconds after `--start`. |
|
||||
|
||||
## Visualization
|
||||
|
||||
| flag | type | default | description |
|
||||
|------|------|---------|-------------|
|
||||
| `--viz` | repeated string list | `spectrogram` | One or more of: `spectrogram`, `mel`, `chroma`, `hpss`, `selfsim`, `loudness`, `tempogram`, `mfcc`, `flux`. Repeatable or comma-separated. |
|
||||
| `--style` | string | `classic` | Palette name: `classic`, `magma`, `inferno`, `viridis`, `gray` (alias `grey`), `clawd`. |
|
||||
|
||||
## Decoding
|
||||
|
||||
| flag | type | default | description |
|
||||
|------|------|---------|-------------|
|
||||
| `--sample-rate` | int | `44100` | Sample rate requested from the ffmpeg fallback. Native WAV/MP3 keep the file's rate. |
|
||||
| `--ffmpeg` | string | first `ffmpeg` on `PATH` | Override the ffmpeg binary used for non-WAV/MP3 inputs. |
|
||||
|
||||
## Exit codes
|
||||
|
||||
| code | meaning |
|
||||
|------|---------|
|
||||
| `0` | Render succeeded. |
|
||||
| `1` | Decode, render, or write error (message on stderr). |
|
||||
| `2` | Usage error — bad flag, invalid combination, or missing input. |
|
||||
|
||||
## Examples
|
||||
|
||||
```bash
|
||||
# All defaults.
|
||||
songsee track.mp3
|
||||
|
||||
# Mel + chroma in viridis at 2K.
|
||||
songsee track.mp3 --viz mel,chroma --style viridis --width 2048 --height 1024
|
||||
|
||||
# Eight-second slice starting at 12.5s, written to PNG.
|
||||
songsee track.mp3 --start 12.5 --duration 8 -o slice.png
|
||||
|
||||
# Stream from stdin, encode to PNG, write to stdout.
|
||||
cat track.mp3 | songsee - --format png -o - > spectro.png
|
||||
|
||||
# Custom FFT, sub-bass focus.
|
||||
songsee track.mp3 --window 4096 --hop 1024 --min-freq 20 --max-freq 200
|
||||
|
||||
# Pin a specific ffmpeg.
|
||||
songsee weird.opus --ffmpeg /opt/homebrew/bin/ffmpeg
|
||||
```
|
||||
|
||||
## Related pages
|
||||
|
||||
- [Quickstart](quickstart.md) — first render in under a minute.
|
||||
- [Visualizations](visualizations.md) — when to use each viz mode.
|
||||
- [Palettes](palettes.md) — palette gradient stops.
|
||||
- [Decoding](decoding.md) — input formats and ffmpeg fallback.
|
||||
- [Rendering](rendering.md) — output sizing, format, batch use.
|
||||
78
docs/decoding.md
Normal file
78
docs/decoding.md
Normal file
@ -0,0 +1,78 @@
|
||||
---
|
||||
title: Decoding
|
||||
description: "How songsee decodes audio — native WAV/MP3 paths, ffmpeg fallback, sample rate, stdin, and slicing."
|
||||
---
|
||||
|
||||
# Decoding
|
||||
|
||||
songsee turns the input into mono `float64` samples before any analysis runs. Two fast paths cover most files; everything else falls through to ffmpeg.
|
||||
|
||||
## Inputs
|
||||
|
||||
- **File path.** `songsee track.mp3` — any path the OS can open.
|
||||
- **Stdin.** `songsee -` — reads the encoded stream from stdin. Useful behind `cat`, `curl`, or shell pipelines.
|
||||
- **Mono mixdown.** Stereo or multichannel inputs are averaged to mono before windowing.
|
||||
|
||||
## Native WAV
|
||||
|
||||
Pure-Go WAV decoder. Handles:
|
||||
|
||||
- PCM 8/16/24/32-bit integer
|
||||
- 32-bit float, 64-bit float
|
||||
- WAVE_FORMAT_EXTENSIBLE (with channel masks and sub-format GUIDs)
|
||||
|
||||
No external dependency, no ffmpeg roundtrip. The decoder validates the RIFF header and rejects truncated `data` chunks before allocating sample buffers.
|
||||
|
||||
## Native MP3
|
||||
|
||||
Pure-Go MP3 decoder. Handles MPEG-1/2 Layer III with VBR and CBR. Output sample rate is whatever the file declares; songsee does not resample.
|
||||
|
||||
If the decoder hits a malformed frame it surfaces a structured error instead of silently truncating, so corrupt input fails loudly.
|
||||
|
||||
## ffmpeg fallback
|
||||
|
||||
Anything that isn't WAV or MP3 — FLAC, AAC, M4A, OGG, Opus, video containers, raw streams — is decoded by spawning `ffmpeg`. songsee asks for 32-bit float little-endian mono at the configured sample rate:
|
||||
|
||||
```text
|
||||
ffmpeg -hide_banner -loglevel error \
|
||||
-i <input> -f f32le -ac 1 -ar <sample-rate> -
|
||||
```
|
||||
|
||||
Tweak the pipeline with:
|
||||
|
||||
- `--sample-rate N` — output sample rate fed to ffmpeg (default `44100`).
|
||||
- `--ffmpeg /path/to/ffmpeg` — override the binary lookup.
|
||||
|
||||
If ffmpeg isn't on `PATH` and the file isn't WAV or MP3, songsee fails with a clear error. Install with `brew install ffmpeg` or your distro's package manager.
|
||||
|
||||
## Slicing
|
||||
|
||||
`--start` and `--duration` slice the decoded audio before analysis. Both are seconds (float). Negative values are rejected.
|
||||
|
||||
```bash
|
||||
songsee long.mp3 --start 60 --duration 15 -o minute1.jpg
|
||||
songsee long.mp3 --start 60 # 60s to end
|
||||
songsee long.mp3 --duration 30 # first 30s
|
||||
```
|
||||
|
||||
Slicing happens on samples after decoding; FFT framing then runs on the slice.
|
||||
|
||||
## Sample rate notes
|
||||
|
||||
- Native WAV/MP3 keep the file's sample rate. `--sample-rate` only affects the ffmpeg pipeline.
|
||||
- The Nyquist frequency (`sampleRate / 2`) is the upper bound visible in the spectrogram.
|
||||
- 44.1 kHz / 48 kHz inputs render with the default `--window 2048 --hop 512` at ≈21 ms / frame.
|
||||
|
||||
## Verbose decoding
|
||||
|
||||
```bash
|
||||
songsee track.flac --verbose -o out.png
|
||||
```
|
||||
|
||||
`--verbose` prints decode info to stderr — sample count, sample rate, slice bounds — without polluting stdout. Combine with `--quiet` to suppress the trailing output-path echo when piping.
|
||||
|
||||
## Related pages
|
||||
|
||||
- [Install](install.md) — how to add ffmpeg if you need it.
|
||||
- [Pipeline](spec.md) — what happens to the samples after decoding.
|
||||
- [CLI](cli.md) — every flag with its default.
|
||||
146
docs/index.md
146
docs/index.md
@ -1,113 +1,51 @@
|
||||
---
|
||||
layout: default
|
||||
title: Home
|
||||
description: Generate modern spectrogram images from audio files with a fast, scriptable CLI.
|
||||
body_class: home
|
||||
title: Overview
|
||||
permalink: /
|
||||
description: "songsee is a single Go CLI that turns audio into modern spectrogram and feature-panel images — fast WAV/MP3 decode, ffmpeg fallback, nine visualization modes, six palettes."
|
||||
---
|
||||
|
||||
<section class="hero">
|
||||
<div class="hero-copy">
|
||||
<div class="kicker reveal delay-1">Spectral imaging CLI</div>
|
||||
<h1 class="hero-title reveal delay-2">See sound as living color.</h1>
|
||||
<p class="hero-sub reveal delay-3">
|
||||
songsee turns audio into precise, high-resolution spectrograms and feature panels. Fast decode
|
||||
paths for WAV and MP3, ffmpeg fallback for everything else, and palette styles that make science
|
||||
look cinematic.
|
||||
</p>
|
||||
<div class="hero-actions reveal delay-4">
|
||||
<a class="btn primary" href="#install">Install</a>
|
||||
<a class="btn" href="https://github.com/openclaw/songsee">GitHub</a>
|
||||
</div>
|
||||
<div class="hero-meta reveal delay-4">Hann window. Log magnitude. 2048 / 512 defaults.</div>
|
||||
</div>
|
||||
<div class="hero-visual">
|
||||
<div class="spectral-panel" role="img" aria-label="Animated spectrogram preview">
|
||||
<div class="spectral-caption">Spectrogram preview</div>
|
||||
</div>
|
||||
</div>
|
||||
</section>
|
||||
## Try it
|
||||
|
||||
<section class="section">
|
||||
<div class="kicker">Why songsee</div>
|
||||
<h2 class="section-title">A focused pipeline for modern spectrograms.</h2>
|
||||
<p class="section-sub">
|
||||
Decode audio into mono samples, window it with Hann, run FFT, and render log-magnitude frames into
|
||||
a crisp image. The CLI stays small, reliable, and scriptable.
|
||||
</p>
|
||||
After [installing](install.md), every render is a one-liner.
|
||||
|
||||
<div class="feature-grid">
|
||||
<div class="card">
|
||||
<h3>Precise controls</h3>
|
||||
<p>Window, hop, min/max frequency, output dimensions, and time slicing for exact framing.</p>
|
||||
</div>
|
||||
<div class="card">
|
||||
<h3>Fast decode paths</h3>
|
||||
<p>Native WAV/MP3 decoding with ffmpeg fallback for everything else.</p>
|
||||
</div>
|
||||
<div class="card">
|
||||
<h3>Palette styles</h3>
|
||||
<p>classic, magma, inferno, viridis, and gray for a bold spectral aesthetic.</p>
|
||||
</div>
|
||||
<div class="card">
|
||||
<h3>Feature panels</h3>
|
||||
<p>mel, chroma, hpss, selfsim, loudness, tempogram, mfcc, flux — rendered as single or grid views.</p>
|
||||
</div>
|
||||
<div class="card">
|
||||
<h3>Auto-contrast</h3>
|
||||
<p>Percentile clamping keeps every panel readable without manual tuning.</p>
|
||||
</div>
|
||||
<div class="card">
|
||||
<h3>Clean output</h3>
|
||||
<p>JPEG or PNG output, default quality 95, and stable results for batch workflows.</p>
|
||||
</div>
|
||||
</div>
|
||||
</section>
|
||||
```bash
|
||||
# Default: a clean spectrogram next to the input file.
|
||||
songsee track.mp3
|
||||
|
||||
<section class="section" id="install">
|
||||
<div class="kicker">Install</div>
|
||||
<h2 class="section-title">One command. Instant spectrograms.</h2>
|
||||
<div class="code-block">
|
||||
brew install steipete/tap/songsee
|
||||
go install github.com/steipete/songsee/cmd/songsee@latest
|
||||
</div>
|
||||
<div class="domain-note">
|
||||
songsee.ai, songsee.app, and songsee.dev all redirect to songsee.sh.
|
||||
</div>
|
||||
</section>
|
||||
# Mel spectrogram, magma palette, 2K wide.
|
||||
songsee track.mp3 --viz mel --style magma --width 2048 --height 1024
|
||||
|
||||
<section class="section" id="usage">
|
||||
<div class="kicker">Usage</div>
|
||||
<h2 class="section-title">CLI ready for pipes, batches, and automation.</h2>
|
||||
<div class="code-block">
|
||||
songsee track.mp3
|
||||
songsee track.wav --style magma --width 2048 --height 1024 -o spectro.png
|
||||
cat track.mp3 | songsee - --style gray --format png
|
||||
songsee track.mp3 --start 12.5 --duration 8 --output slice.jpg
|
||||
songsee track.mp3 --viz spectrogram,mel,chroma --width 2048 --height 1024
|
||||
</div>
|
||||
</section>
|
||||
# All nine modes in one grid.
|
||||
songsee track.mp3 --viz spectrogram,mel,chroma,hpss,selfsim,loudness,tempogram,mfcc,flux
|
||||
|
||||
<section class="section">
|
||||
<div class="kicker">Palettes</div>
|
||||
<h2 class="section-title">Color maps with character.</h2>
|
||||
<p class="section-sub">Pick a palette by name for instant visual tone shifts.</p>
|
||||
<div class="palette-row">
|
||||
<div class="palette classic" title="classic"></div>
|
||||
<div class="palette magma" title="magma"></div>
|
||||
<div class="palette inferno" title="inferno"></div>
|
||||
<div class="palette viridis" title="viridis"></div>
|
||||
<div class="palette gray" title="gray"></div>
|
||||
<div class="palette clawd" title="clawd">🦞</div>
|
||||
</div>
|
||||
</section>
|
||||
# Slice eight seconds out of a long file.
|
||||
songsee track.mp3 --start 12.5 --duration 8 -o slice.jpg
|
||||
|
||||
<section class="section">
|
||||
<div class="kicker">Specs</div>
|
||||
<h2 class="section-title">Detailed pipeline notes.</h2>
|
||||
<p class="section-sub">
|
||||
Windowing, bin mapping, normalization, and rendering details live in the spec.
|
||||
</p>
|
||||
<div class="hero-actions">
|
||||
<a class="btn" href="{{ '/spec/' | relative_url }}">Read spec</a>
|
||||
</div>
|
||||
</section>
|
||||
# Pipe from stdin, write PNG to stdout.
|
||||
cat track.mp3 | songsee - --format png -o - > spectro.png
|
||||
```
|
||||
|
||||
The default output is a 1920×1080 JPEG (quality 95) written next to the input. `--format png` switches encoder, `-o` redirects the path, and `-o -` streams to stdout for piping.
|
||||
|
||||
## What songsee does
|
||||
|
||||
- **One binary, nine views.** spectrogram, mel, chroma, hpss, selfsim, loudness, tempogram, mfcc, flux — pick one, combine several, or render the full grid.
|
||||
- **Fast decode paths.** Native Go decoders for WAV (PCM, float, extensible) and MP3; ffmpeg fallback covers everything else.
|
||||
- **Six palettes.** classic, magma, inferno, viridis, gray, and clawd 🦞 — each tuned for log-magnitude data.
|
||||
- **Auto-contrast.** Per-panel percentile clamping (0.05 / 0.98) keeps every visualization readable without manual tuning.
|
||||
- **Scriptable I/O.** File path, stdin (`-`), or stdout. Quiet mode for CI; verbose mode prints decode and slice details to stderr.
|
||||
- **No Python.** Single static binary. No model files, no virtualenv, no GPU.
|
||||
|
||||
## Pick your path
|
||||
|
||||
- **Trying it.** [Install](install.md) → [Quickstart](quickstart.md). One brew formula, one command, one image.
|
||||
- **Picking a view.** [Visualizations](visualizations.md) describes what each of the nine modes shows and when to use it.
|
||||
- **Picking a palette.** [Palettes](palettes.md) lists the six palettes with their gradient stops.
|
||||
- **Audio inputs.** [Decoding](decoding.md) covers WAV/MP3 fast paths, ffmpeg fallback, sample rate, and stdin.
|
||||
- **Output and batches.** [Rendering](rendering.md) explains output sizing, grid layout, format selection, and stdout streaming.
|
||||
- **Algorithm details.** [Pipeline](spec.md) documents windowing, FFT, bin mapping, and normalization.
|
||||
- **Flag reference.** [CLI](cli.md) lists every flag with its default.
|
||||
|
||||
## Project
|
||||
|
||||
Active development; the [changelog](https://github.com/openclaw/songsee/blob/main/CHANGELOG.md) tracks what shipped. Released under the [MIT license](https://github.com/openclaw/songsee/blob/main/LICENSE). Source on [GitHub](https://github.com/openclaw/songsee).
|
||||
|
||||
68
docs/install.md
Normal file
68
docs/install.md
Normal file
@ -0,0 +1,68 @@
|
||||
---
|
||||
title: Install
|
||||
description: "Install songsee via Homebrew, go install, or build from source."
|
||||
---
|
||||
|
||||
# Install
|
||||
|
||||
`songsee` ships as a single Go binary. Pick whichever delivery mechanism fits.
|
||||
|
||||
## Homebrew (macOS, Linux)
|
||||
|
||||
```bash
|
||||
brew install steipete/tap/songsee
|
||||
songsee --version
|
||||
```
|
||||
|
||||
The formula lives in [`steipete/homebrew-tap`](https://github.com/steipete/homebrew-tap). `brew upgrade songsee` brings in new releases.
|
||||
|
||||
## go install
|
||||
|
||||
```bash
|
||||
go install github.com/steipete/songsee/cmd/songsee@latest
|
||||
songsee --version
|
||||
```
|
||||
|
||||
This builds against the Go version declared in `go.mod`. The binary lands in `$(go env GOBIN)` (or `$(go env GOPATH)/bin`).
|
||||
|
||||
## Build from source
|
||||
|
||||
```bash
|
||||
git clone https://github.com/openclaw/songsee.git
|
||||
cd songsee
|
||||
make
|
||||
./songsee --version
|
||||
```
|
||||
|
||||
`make` runs `go build` with the version string injected from `git describe`.
|
||||
|
||||
## ffmpeg (optional)
|
||||
|
||||
WAV and MP3 decode natively in pure Go. Anything else (FLAC, AAC, OGG, M4A, video containers) falls through to `ffmpeg` on `PATH`.
|
||||
|
||||
```bash
|
||||
brew install ffmpeg # macOS / Linuxbrew
|
||||
apt install ffmpeg # Debian / Ubuntu
|
||||
```
|
||||
|
||||
Override the lookup with `--ffmpeg /custom/path/ffmpeg` when you have several builds installed.
|
||||
|
||||
## Verify the install
|
||||
|
||||
```bash
|
||||
songsee --version
|
||||
songsee --help
|
||||
songsee testdata/short.wav # render a tiny known-good file
|
||||
```
|
||||
|
||||
## Updating
|
||||
|
||||
- **Homebrew:** `brew upgrade songsee`.
|
||||
- **go install:** rerun `go install github.com/steipete/songsee/cmd/songsee@latest`.
|
||||
- **Source:** `git pull && make` — version comes from `git describe`.
|
||||
|
||||
## Related pages
|
||||
|
||||
- [Quickstart](quickstart.md) — first render in under a minute.
|
||||
- [Decoding](decoding.md) — WAV/MP3 fast paths and ffmpeg fallback.
|
||||
- [CLI](cli.md) — every flag with its default.
|
||||
93
docs/palettes.md
Normal file
93
docs/palettes.md
Normal file
@ -0,0 +1,93 @@
|
||||
---
|
||||
title: Palettes
|
||||
description: "Six built-in color maps for songsee: classic, magma, inferno, viridis, gray, clawd."
|
||||
---
|
||||
|
||||
# Palettes
|
||||
|
||||
`--style` picks a palette. All palettes are 5- or 6-stop linear gradients applied to normalized values in `[0, 1]`. The default is `classic`.
|
||||
|
||||
```bash
|
||||
songsee track.mp3 --style magma
|
||||
songsee track.mp3 --viz mel --style viridis
|
||||
songsee track.mp3 --viz hpss,chroma --style clawd
|
||||
```
|
||||
|
||||
Unknown names error out before decoding. All palettes are deterministic — the same input always produces the same colors.
|
||||
|
||||
## classic
|
||||
|
||||
The default. A black → navy → cyan → amber → white sweep tuned for log-magnitude data, with strong perceptual contrast across the full range.
|
||||
|
||||
| stop | color |
|
||||
|------|-------|
|
||||
| 0.00 | `#000000` |
|
||||
| 0.20 | `#002060` |
|
||||
| 0.45 | `#00a0c8` |
|
||||
| 0.70 | `#ffb400` |
|
||||
| 1.00 | `#ffffff` |
|
||||
|
||||
## magma
|
||||
|
||||
Matplotlib's magma. Black → deep purple → magenta → orange → cream. Smooth and perceptually uniform; works well for everything from spectrograms to MFCCs.
|
||||
|
||||
| stop | color |
|
||||
|------|-------|
|
||||
| 0.00 | `#000004` |
|
||||
| 0.25 | `#3b0c57` |
|
||||
| 0.50 | `#b4367a` |
|
||||
| 0.75 | `#fb8c3c` |
|
||||
| 1.00 | `#fcfdbf` |
|
||||
|
||||
## inferno
|
||||
|
||||
Matplotlib's inferno. Same shape as magma with hotter highs — black → indigo → red → orange → pale yellow.
|
||||
|
||||
| stop | color |
|
||||
|------|-------|
|
||||
| 0.00 | `#000004` |
|
||||
| 0.25 | `#3d0965` |
|
||||
| 0.50 | `#bb3754` |
|
||||
| 0.75 | `#f98e08` |
|
||||
| 1.00 | `#fcffa4` |
|
||||
|
||||
## viridis
|
||||
|
||||
Matplotlib's viridis. Purple → blue → teal → green → yellow. Colorblind-safe and perceptually uniform; the safest choice for publication figures.
|
||||
|
||||
| stop | color |
|
||||
|------|-------|
|
||||
| 0.00 | `#440154` |
|
||||
| 0.25 | `#3a528b` |
|
||||
| 0.50 | `#20908c` |
|
||||
| 0.75 | `#5ec962` |
|
||||
| 1.00 | `#fde725` |
|
||||
|
||||
## gray
|
||||
|
||||
A straight black-to-white linear ramp. Ideal for print, monochrome compositing, or downstream processing that doesn't want hue information.
|
||||
|
||||
| stop | color |
|
||||
|------|-------|
|
||||
| 0.00 | `#000000` |
|
||||
| 1.00 | `#ffffff` |
|
||||
|
||||
`grey` is accepted as an alias.
|
||||
|
||||
## clawd 🦞
|
||||
|
||||
The mascot palette. Abyssal navy → ocean teal → coral → lobster red → foam highlight. Six stops, designed to be unmistakable.
|
||||
|
||||
| stop | color |
|
||||
|------|-------|
|
||||
| 0.00 | `#02040f` |
|
||||
| 0.20 | `#0b264a` |
|
||||
| 0.40 | `#126175` |
|
||||
| 0.60 | `#c1625c` |
|
||||
| 0.80 | `#cd3728` |
|
||||
| 1.00 | `#ffe6d2` |
|
||||
|
||||
## Related pages
|
||||
|
||||
- [Visualizations](visualizations.md) — the nine viz modes that consume these palettes.
|
||||
- [Pipeline](spec.md) — how values are normalized before palette mapping.
|
||||
74
docs/quickstart.md
Normal file
74
docs/quickstart.md
Normal file
@ -0,0 +1,74 @@
|
||||
---
|
||||
title: Quickstart
|
||||
description: "From a clean machine to a 1920×1080 spectrogram in under a minute."
|
||||
---
|
||||
|
||||
# Quickstart
|
||||
|
||||
One install, one command, one image.
|
||||
|
||||
## 1. Install
|
||||
|
||||
```bash
|
||||
brew install steipete/tap/songsee
|
||||
songsee --version
|
||||
```
|
||||
|
||||
Other paths (go install, source builds, ffmpeg) live on [Install](install.md).
|
||||
|
||||
## 2. Render a default spectrogram
|
||||
|
||||
```bash
|
||||
songsee track.mp3
|
||||
```
|
||||
|
||||
Output is a 1920×1080 JPEG (quality 95) written next to the input — `track.jpg`. The file path is echoed to stdout; everything else (decode info under `--verbose`, warnings, errors) goes to stderr so pipes stay clean.
|
||||
|
||||
## 3. Pick a different view
|
||||
|
||||
```bash
|
||||
# Mel-scaled spectrogram (perceptual frequency).
|
||||
songsee track.mp3 --viz mel
|
||||
|
||||
# Chromagram (12-bin pitch class) with the magma palette.
|
||||
songsee track.mp3 --viz chroma --style magma
|
||||
|
||||
# Combine harmonic/percussive split with chroma in a single grid.
|
||||
songsee track.mp3 --viz hpss,chroma --style inferno
|
||||
```
|
||||
|
||||
The full list of views lives on [Visualizations](visualizations.md); palettes are documented on [Palettes](palettes.md).
|
||||
|
||||
## 4. Slice a section
|
||||
|
||||
```bash
|
||||
songsee track.mp3 --start 12.5 --duration 8 -o slice.jpg
|
||||
```
|
||||
|
||||
`--start` and `--duration` are seconds. `-o` overrides the default output path; pass `-o -` to write the encoded image to stdout.
|
||||
|
||||
## 5. Stream from stdin
|
||||
|
||||
```bash
|
||||
cat track.mp3 | songsee - --format png -o - > spectro.png
|
||||
```
|
||||
|
||||
`-` as the input reads from stdin; `--format png` switches the encoder; `-o -` writes the encoded image to stdout. Combine with `find`, `xargs`, or shell loops for batch rendering.
|
||||
|
||||
## 6. Tune dimensions and FFT
|
||||
|
||||
```bash
|
||||
songsee track.mp3 \
|
||||
--width 2560 --height 1440 \
|
||||
--window 4096 --hop 1024 \
|
||||
--min-freq 50 --max-freq 8000
|
||||
```
|
||||
|
||||
`--window` must be a power of two. Larger windows trade time resolution for frequency resolution. `--min-freq` / `--max-freq` clamp the visible frequency band; the default upper bound is the Nyquist frequency.
|
||||
|
||||
## Where next
|
||||
|
||||
- [Visualizations](visualizations.md) — what each of the nine modes shows.
|
||||
- [Palettes](palettes.md) — gradient stops for the six built-in palettes.
|
||||
- [Pipeline](spec.md) — windowing, FFT, bin mapping, normalization.
|
||||
- [CLI](cli.md) — every flag with its default.
|
||||
110
docs/rendering.md
Normal file
110
docs/rendering.md
Normal file
@ -0,0 +1,110 @@
|
||||
---
|
||||
title: Rendering
|
||||
description: "How songsee turns spectrogram data into images — output sizing, format, grid layout, stdout streaming, and batch use."
|
||||
---
|
||||
|
||||
# Rendering
|
||||
|
||||
The render stage maps numeric spectrogram and feature data onto pixels, applies the chosen palette, composes panels into a grid, and encodes the result to JPEG or PNG.
|
||||
|
||||
## Output path
|
||||
|
||||
```bash
|
||||
songsee track.mp3 # writes track.jpg next to the input
|
||||
songsee track.mp3 -o out.png # explicit path; format inferred from extension
|
||||
songsee track.mp3 -o spectro # no extension; appends ".jpg" by default
|
||||
songsee - -o - # stdin in, encoded image to stdout
|
||||
```
|
||||
|
||||
If `--format` is set explicitly, it overrides extension-based inference. If `-o` already ends in `.png`, `.jpg`, or `.jpeg`, the encoder follows the extension regardless of `--format`.
|
||||
|
||||
When the input is `-` (stdin) and no `-o` is given, the output filename is `songsee.jpg` (or `.png`) in the current directory.
|
||||
|
||||
## Format
|
||||
|
||||
```bash
|
||||
songsee track.mp3 --format png # PNG, lossless
|
||||
songsee track.mp3 --format jpg # JPEG, quality 95 (default)
|
||||
```
|
||||
|
||||
JPEG quality is fixed at 95 — high enough that compression artifacts disappear at typical viewing sizes while keeping file sizes reasonable. PNG is the right choice for archival, transparency, or downstream processing that doesn't tolerate JPEG quantization.
|
||||
|
||||
## Dimensions
|
||||
|
||||
```bash
|
||||
songsee track.mp3 --width 2560 --height 1440
|
||||
songsee track.mp3 --width 3840 --height 2160 # 4K
|
||||
songsee track.mp3 --width 800 --height 200 # banner strip
|
||||
```
|
||||
|
||||
Defaults are 1920×1080. Both must be positive. With multiple visualizations, songsee divides the canvas into a grid; very small canvases with many panels can leave cells too small to render and produce an error.
|
||||
|
||||
## Grid layout
|
||||
|
||||
When `--viz` selects more than one mode, panels are tiled into a `ceil(sqrt(n))`-column grid with an 8 px gap, sized to fit `--width` × `--height` exactly:
|
||||
|
||||
| panels | grid |
|
||||
|--------|------|
|
||||
| 1 | 1×1 |
|
||||
| 2 | 2×1 |
|
||||
| 3, 4 | 2×2 |
|
||||
| 5–6 | 3×2 |
|
||||
| 7–9 | 3×3 |
|
||||
|
||||
Cells are equal width and height; the canvas is filled top-down, left-right, in the order panels appear in `--viz`.
|
||||
|
||||
## Frequency range
|
||||
|
||||
`--min-freq` and `--max-freq` clamp the visible band (Hz) for spectrogram, mel, and MFCC panels. The default upper bound is the Nyquist frequency (`sampleRate / 2`).
|
||||
|
||||
```bash
|
||||
# Vocal range, 80 Hz – 4 kHz.
|
||||
songsee track.mp3 --min-freq 80 --max-freq 4000
|
||||
|
||||
# Sub-bass focus.
|
||||
songsee track.mp3 --min-freq 20 --max-freq 200
|
||||
```
|
||||
|
||||
`--max-freq` must be greater than `--min-freq`; songsee rejects the run with exit code 2 otherwise.
|
||||
|
||||
## Auto-contrast
|
||||
|
||||
Every panel runs an independent percentile clamp on its values before palette mapping (typically 0.05 / 0.98 — ~5% black floor, ~2% white ceiling). This keeps a quiet ambient track and a loud rock track equally readable in the same grid; it also means absolute brightness is not comparable across panels.
|
||||
|
||||
The base spectrogram converts magnitudes to decibels (`20·log10(mag + 1e-9)`) before normalizing.
|
||||
|
||||
## Stdout streaming
|
||||
|
||||
Pass `-o -` to write the encoded image bytes to stdout. Combine with `--quiet` to silence the trailing path echo:
|
||||
|
||||
```bash
|
||||
songsee track.mp3 -o - --quiet > spectro.jpg
|
||||
songsee track.mp3 -o - --format png --quiet | imgcat
|
||||
ssh host "songsee /audio/x.flac -o -" > x.jpg
|
||||
```
|
||||
|
||||
Verbose decode info still goes to stderr under `--verbose`, so it doesn't corrupt the binary stream on stdout.
|
||||
|
||||
## Batch usage
|
||||
|
||||
There's no built-in `songsee batch`; lean on the shell.
|
||||
|
||||
```bash
|
||||
# All MP3s in a directory.
|
||||
for f in *.mp3; do songsee "$f" --style magma; done
|
||||
|
||||
# Parallel via xargs.
|
||||
ls *.mp3 | xargs -P 8 -I{} songsee {} --width 1920 --height 540
|
||||
|
||||
# find + GNU parallel.
|
||||
find . -name '*.flac' -print0 | parallel -0 songsee {} --style viridis -o {.}.png
|
||||
```
|
||||
|
||||
songsee is single-threaded internally, so parallelism comes from running multiple processes.
|
||||
|
||||
## Related pages
|
||||
|
||||
- [Visualizations](visualizations.md) — what each panel shows.
|
||||
- [Palettes](palettes.md) — color maps applied at render time.
|
||||
- [Pipeline](spec.md) — windowing, FFT, normalization details.
|
||||
- [CLI](cli.md) — every flag with its default.
|
||||
157
docs/spec.md
157
docs/spec.md
@ -1,89 +1,86 @@
|
||||
---
|
||||
layout: default
|
||||
title: Spec
|
||||
description: songsee spectral pipeline, defaults, and rendering details.
|
||||
title: Pipeline
|
||||
description: "songsee's spectral pipeline — decode, window, FFT, bin mapping, normalization, render."
|
||||
---
|
||||
|
||||
<section class="section">
|
||||
<div class="kicker">Spec</div>
|
||||
<h1 class="section-title">songsee spectral pipeline</h1>
|
||||
<p class="section-sub">
|
||||
This page captures the core algorithm and defaults used by songsee for repeatable, high quality
|
||||
spectrogram images.
|
||||
</p>
|
||||
</section>
|
||||
# Pipeline
|
||||
|
||||
<section class="section">
|
||||
<h2 class="section-title">Decode</h2>
|
||||
<div class="card">
|
||||
<p>
|
||||
WAV and MP3 decode natively. Any other format falls back to ffmpeg. Input can be a file path or
|
||||
stdin ("-"). Default sample rate for ffmpeg output is 44100 Hz.
|
||||
</p>
|
||||
</div>
|
||||
</section>
|
||||
This page documents the algorithm and the defaults songsee uses to produce repeatable, high-quality images. It complements [Visualizations](visualizations.md) (what each mode shows) and [Rendering](rendering.md) (how the canvas gets composed).
|
||||
|
||||
<section class="section">
|
||||
<h2 class="section-title">Spectrogram</h2>
|
||||
<div class="card">
|
||||
<p>
|
||||
Windowed frames use a Hann window. FFT runs on each frame and the magnitude is converted to
|
||||
decibels using 20 * log10(mag + 1e-9). The default window size is 2048 samples with a hop size
|
||||
of 512 samples.
|
||||
</p>
|
||||
<p>
|
||||
Frames are computed as 1 + (len(samples) - window + hop - 1) / hop, and bins are window/2 + 1.
|
||||
Bin spacing is sampleRate / windowSize.
|
||||
</p>
|
||||
</div>
|
||||
</section>
|
||||
## Stages
|
||||
|
||||
<section class="section">
|
||||
<h2 class="section-title">Rendering</h2>
|
||||
<div class="card">
|
||||
<p>
|
||||
Each output pixel maps to a time frame and frequency bin. Values are normalized by the global
|
||||
min/max in the computed spectrogram unless clamp values are provided. Feature panels use
|
||||
percentile-based clamping to preserve contrast across different visualizations. Frequency
|
||||
range can be restricted via min/max frequency in Hz.
|
||||
</p>
|
||||
<p>
|
||||
Output size defaults to 1920x1080. JPEG quality is 95. PNG output is available via --format.
|
||||
</p>
|
||||
</div>
|
||||
</section>
|
||||
```text
|
||||
input → decode → mono mixdown → optional slice → window → FFT
|
||||
↓
|
||||
per-mode features (mel, chroma, mfcc, hpss, …)
|
||||
↓
|
||||
percentile normalize → palette map → grid compose → encode
|
||||
```
|
||||
|
||||
<section class="section">
|
||||
<h2 class="section-title">Palettes</h2>
|
||||
<div class="card">
|
||||
<p>
|
||||
Palettes map normalized values to RGBA colors. Available names: classic, magma, inferno, clawd,
|
||||
viridis, gray.
|
||||
</p>
|
||||
</div>
|
||||
</section>
|
||||
Every stage is deterministic. The same input file with the same flags always produces the same output bytes.
|
||||
|
||||
<section class="section">
|
||||
<h2 class="section-title">Visualizations</h2>
|
||||
<div class="card">
|
||||
<p>
|
||||
Visualizations are selectable via --viz. Defaults to spectrogram. Supported names: spectrogram,
|
||||
mel, chroma, hpss, selfsim, loudness, tempogram, mfcc, flux. Multiple entries render as a grid
|
||||
of panels.
|
||||
</p>
|
||||
</div>
|
||||
</section>
|
||||
## Decode
|
||||
|
||||
<section class="section">
|
||||
<h2 class="section-title">CLI defaults</h2>
|
||||
<div class="code-block">
|
||||
--format jpg
|
||||
--width 1920
|
||||
--height 1080
|
||||
--window 2048
|
||||
--hop 512
|
||||
--sample-rate 44100
|
||||
--style classic
|
||||
--viz spectrogram
|
||||
</div>
|
||||
</section>
|
||||
- WAV (PCM 8/16/24/32-bit, 32/64-bit float, WAVE_FORMAT_EXTENSIBLE) and MP3 are decoded in pure Go via the bundled decoders.
|
||||
- Anything else falls through to `ffmpeg` (32-bit float little-endian, mono, `--sample-rate` Hz; default `44100`).
|
||||
- Stereo or multichannel input is averaged to mono.
|
||||
- `--start` / `--duration` slice the decoded sample buffer in seconds before windowing.
|
||||
|
||||
See [Decoding](decoding.md) for input formats, sample rate, ffmpeg lookup, and stdin usage.
|
||||
|
||||
## Windowing and FFT
|
||||
|
||||
- Window: **Hann**, applied per frame.
|
||||
- Window size: `--window` samples (default `2048`, must be a power of two).
|
||||
- Hop size: `--hop` samples (default `512`).
|
||||
- Frame count: `1 + (len(samples) - window + hop - 1) / hop`.
|
||||
- Bin count: `window / 2 + 1`.
|
||||
- Bin spacing: `sampleRate / window` Hz per bin.
|
||||
|
||||
Magnitude is converted to decibels with `20·log10(mag + 1e-9)` for the base spectrogram. Per-feature pipelines (mel, chroma, mfcc) use linear power instead.
|
||||
|
||||
## Per-mode features
|
||||
|
||||
| mode | source | notes |
|
||||
|------|--------|-------|
|
||||
| `spectrogram` | STFT magnitude in dB | clamped to 5th–98th percentile |
|
||||
| `mel` | mel-warped power | log-magnitude; clamped 5th–98th percentile |
|
||||
| `chroma` | 12-bin pitch class | folds octaves; clamped 10th–98th percentile |
|
||||
| `mfcc` | DCT of mel power | strips pitch, keeps timbre |
|
||||
| `hpss` | median filters on STFT | 9-frame harmonic + 9-frame percussive kernels |
|
||||
| `selfsim` | cosine sim on chroma frames | gamma 1.4; clamped 10th–98th percentile |
|
||||
| `loudness` | per-frame RMS | clamped to 95th percentile |
|
||||
| `tempogram` | onset autocorrelation | 30–240 BPM, 256 bins |
|
||||
| `flux` | frame-to-frame STFT delta | clamped to 95th percentile |
|
||||
|
||||
The percentile sampling reservoir is capped at 20 000 values per panel for speed; this is dense enough that boundaries are stable across runs.
|
||||
|
||||
## Rendering
|
||||
|
||||
- Each panel maps `(time × bin)` cells onto pixels at the panel's width × height.
|
||||
- Values are normalized into `[0, 1]` against the per-panel min/max (after the percentile clamp), then passed through the chosen palette.
|
||||
- Heatmap panels (mel, chroma, mfcc, selfsim, hpss halves, tempogram) render with `flipVert` so low frequencies are at the bottom.
|
||||
- Multiple panels compose into a `ceil(sqrt(n))`-column grid with an 8 px gap (see [Rendering](rendering.md)).
|
||||
- Encoder: PNG (lossless) or JPEG (quality 95).
|
||||
|
||||
## CLI defaults
|
||||
|
||||
```text
|
||||
--format jpg
|
||||
--width 1920
|
||||
--height 1080
|
||||
--window 2048
|
||||
--hop 512
|
||||
--sample-rate 44100
|
||||
--style classic
|
||||
--viz spectrogram
|
||||
```
|
||||
|
||||
Full reference: [CLI](cli.md).
|
||||
|
||||
## Related pages
|
||||
|
||||
- [Visualizations](visualizations.md) — per-mode descriptions.
|
||||
- [Palettes](palettes.md) — gradient stops.
|
||||
- [Decoding](decoding.md) — input handling.
|
||||
- [Rendering](rendering.md) — output and batch use.
|
||||
|
||||
120
docs/visualizations.md
Normal file
120
docs/visualizations.md
Normal file
@ -0,0 +1,120 @@
|
||||
---
|
||||
title: Visualizations
|
||||
description: "The nine visualization modes songsee can render: spectrogram, mel, chroma, hpss, selfsim, loudness, tempogram, mfcc, flux."
|
||||
---
|
||||
|
||||
# Visualizations
|
||||
|
||||
`--viz` selects one or more visualization modes. Pass it once with a comma-separated list, or repeat it. Unknown names error out before any decoding runs.
|
||||
|
||||
```bash
|
||||
songsee track.mp3 --viz spectrogram
|
||||
songsee track.mp3 --viz mel,chroma,hpss
|
||||
songsee track.mp3 --viz spectrogram --viz flux
|
||||
```
|
||||
|
||||
When more than one mode is selected, songsee composes a square-ish grid (`ceil(sqrt(n))` columns) with an 8 px gap between cells, all sized to fit `--width` × `--height`.
|
||||
|
||||
## spectrogram
|
||||
|
||||
Time × frequency magnitude. The base FFT view: a Hann-windowed STFT converted to decibels (`20·log10(mag + 1e-9)`). The X axis is time, the Y axis is linear frequency from `--min-freq` to `--max-freq` (Nyquist by default), and each pixel's brightness is the magnitude in that time-frequency cell.
|
||||
|
||||
Use it when you want raw spectral truth — verifying decode, hunting harmonics, identifying transients.
|
||||
|
||||
```bash
|
||||
songsee track.mp3 --viz spectrogram
|
||||
```
|
||||
|
||||
## mel
|
||||
|
||||
Perceptual frequency scale. Same STFT, but bins are warped onto the mel scale, which weights low frequencies more heavily — closer to how humans hear pitch.
|
||||
|
||||
Good for vocal and tonal content; the structure of speech and melody jumps out compared to a linear spectrogram.
|
||||
|
||||
```bash
|
||||
songsee track.mp3 --viz mel --min-freq 80 --max-freq 8000
|
||||
```
|
||||
|
||||
## chroma
|
||||
|
||||
12-bin pitch class. Energy is folded across octaves into the twelve semitones (C, C♯, D, …). The Y axis is pitch class, the X axis is time.
|
||||
|
||||
Reveals harmonic and key content — chord progressions, modulations, repetition between sections.
|
||||
|
||||
```bash
|
||||
songsee track.mp3 --viz chroma
|
||||
```
|
||||
|
||||
## hpss
|
||||
|
||||
Harmonic vs percussive separation. Median-filters the spectrogram twice (9-frame kernels) to split it into a harmonic top half (sustained tones) and a percussive bottom half (transients).
|
||||
|
||||
Use it to see where the kit and where the melody live in the same track.
|
||||
|
||||
```bash
|
||||
songsee track.mp3 --viz hpss
|
||||
```
|
||||
|
||||
## selfsim
|
||||
|
||||
Self-similarity matrix on chroma frames. Each pixel `(i, j)` is the cosine similarity between chroma frame `i` and frame `j`, with a gentle gamma (1.4) for contrast.
|
||||
|
||||
Brings out song structure: verses repeat as bright off-diagonal stripes; choruses form clear blocks.
|
||||
|
||||
```bash
|
||||
songsee track.mp3 --viz selfsim
|
||||
```
|
||||
|
||||
## loudness
|
||||
|
||||
Frame-wise RMS over time. A waveform-style envelope with the X axis as time and the height as energy, clamped to the 95th percentile so peaks don't crush the rest.
|
||||
|
||||
Good for spotting dynamics, fade-ins, and silence.
|
||||
|
||||
```bash
|
||||
songsee track.mp3 --viz loudness
|
||||
```
|
||||
|
||||
## tempogram
|
||||
|
||||
Tempo variation over time. An autocorrelation-style heatmap of the onset envelope, scanning 30–240 BPM in 256 bins.
|
||||
|
||||
Reveals tempo drift, rubato, and switches between rhythmic feels.
|
||||
|
||||
```bash
|
||||
songsee track.mp3 --viz tempogram
|
||||
```
|
||||
|
||||
## mfcc
|
||||
|
||||
Mel-frequency cepstral coefficients — the classic timbre fingerprint. Each row is one cepstral coefficient over time.
|
||||
|
||||
Strips pitch and leaves "color"; useful for distinguishing instruments, voices, or sections that share notes but not tone.
|
||||
|
||||
```bash
|
||||
songsee track.mp3 --viz mfcc
|
||||
```
|
||||
|
||||
## flux
|
||||
|
||||
Spectral flux — frame-to-frame magnitude change. A 1-D envelope with peaks at onsets and discontinuities, clamped to the 95th percentile.
|
||||
|
||||
Use it to find note onsets, edits, or anything sudden.
|
||||
|
||||
```bash
|
||||
songsee track.mp3 --viz flux
|
||||
```
|
||||
|
||||
## Combining
|
||||
|
||||
```bash
|
||||
songsee track.mp3 --viz spectrogram,mel,chroma,hpss,selfsim,loudness,tempogram,mfcc,flux
|
||||
```
|
||||
|
||||
All nine in one 1920×1080 grid (3×3). Mix and match with the same syntax. Each panel auto-contrasts independently — comparing absolute values across panels is not meaningful, comparing structure is.
|
||||
|
||||
## Related pages
|
||||
|
||||
- [Palettes](palettes.md) — color maps applied to every panel.
|
||||
- [Pipeline](spec.md) — windowing, FFT, bin mapping, normalization.
|
||||
- [CLI](cli.md) — every flag with its default.
|
||||
Loading…
Reference in New Issue
Block a user