Project Talamaska

1.0 System Architecture Overview

Text-to-piano pipeline with symbolic control (MidiLikeScore) before audio. Epics 1–5 map directly onto these five stages.

⌨️

1. Input

Raw Text

User prompt captured via web UI and passed into the orchestration layer. Linked to Epic 8 (UX).

🧠

2. Parser

Text → ControlDict

Rule/LLM-based prompt conditioning mapping text to ControlDict. Backed by Epic 2 (Prompt Controls).

🎼

3. Composer

Symbolic Generation

Conditional Transformer generating MidiLikeScore from controls + optional prefix. Backed by Epic 3 (Composer Model).

🎹

4. Humanizer

Performance Layer

Converts quantized score to expressive MidiLikePerformance controlled via a single slider. Backed by Epic 4 (Humanizer).

🔊

5. Renderer

Audio / Export

Sample-based piano rendering to WAV/FLAC plus MIDI export. Backed by Epic 5 (Renderer & Export).

2.0 Module Specifications: Deep Dive

The five core architectural blocks and how they tie back to epics and stories.

Block 1: Orchestrator & Types

Epic 1 – Pipeline Skeleton & APIs

Latency: < 200ms overhead

Defines MidiLikeScore and MidiLikePerformance, shared module APIs and the end-to-end orchestrator. Provides caching and partial regeneration hooks.

$ talamaska.generate("rainy sad day piano")

# E1-01: build score/performance data structures

# E1-02: orchestrate parser → composer → humanizer → renderer

# E1-03: cache scores for fast humanization/regeneration

Block 2: Prompt Conditioning

Epic 2 – Prompt Controls & Encoding

Latency: < 800ms

Maps free text to ControlDict and control tokens (mood, tempo, meter, density, length, key). Includes schema, rule-based parser and metadata → labels tooling.

Parameter extraction weights · E2-01/02/03

Block 3: Composer (Symbolic Generation)

Epic 3 – Symbolic Composer Model

Latency: ~2–3s

Generates MidiLikeScore: the “sheet music” layer. Uses event-based token vocabulary and a conditional Transformer, trained on clean solo-piano datasets.

Stories: E3-01 vocabulary, E3-02 preprocessing, E3-03 training & inference.
Controls: length, density, style conditioning.
Features: prefix continuation & partial regeneration.

Time 0s Visualization: MidiLikeScore buffer Time 4s

Block 4: Humanizer

Epic 4 – Performance Humanization

Latency: < 150ms

Applies timing jitter, velocity shaping, pedal and phrase dynamics, controlled by PerformanceSettings and a single “humanization” slider.

Quantized grid vs. humanized performance · E4-01/02/03

Block 5: Audio Renderer

Epic 5 – Renderer & Export

Latency: ~3–4s

Renders MidiLikePerformance into WAV/FLAC and MIDI with loudness normalization and clipping checks.

Sample Piano

44.1kHz

Stereo

3.0 Engineering Roadmap (by Epic)

Relative effort in weeks for Epics 1–8. Phase 1 focuses on Epics 1–5 (walking skeleton); Phase 2 adds Epics 6–8 (data, evaluation, UX).

3.1 Latency Budget (Mapped to Epics)

Target processing time allocation per request (Total: < 10s). Segments map directly to epics: Parser (E2), Composer (E3), Humanizer (E4), Renderer (E5), Overhead (E1 + E8).

4.0 Epic Overview

Eight epics grouped around architecture blocks and supporting capabilities. Each card lists the main outcomes and key stories.

Epic 1

Pipeline Skeleton & APIs

2/3 ✅ Core

End-to-end Python skeleton, symbolic types and shared API contracts for all modules, including caching and partial regeneration hooks.

🧪 TESTING
🧪 TESTING

Epic 2

Prompt Controls & Encoding

2/3 🧪 Controls

Map free-text prompts to a well-typed ControlDict and control tokens usable in both training and inference.

🧪 TESTING
🧪 TESTING

Epic 3

Symbolic Composer Model

Model

Conditional Transformer that generates MidiLikeScore from controls and optional prefix score.

Epic 4

Performance Humanizer

Perf

Deterministic humanization layer that turns quantized scores into expressive performances with a single “humanization” slider.

Epic 5

Renderer & Export

Audio

Stable, repeatable rendering from performance events to audio and MIDI with quality checks and simple export API.

Epic 6

Data & Training Pipeline

Data

Clean, reproducible pipeline from raw piano datasets to tokenized train/val/test sets and training jobs.

Epic 7

Evaluation & Experiments

Eval

Symbolic and audio metrics, plus prompt controllability and humanization A/B experiments.

Epic 8

UX & Product Shell

1/3 ✅ UX

Minimal but robust web shell that exposes prompt → audio, partial regeneration and clip telemetry for future governance. Now deployed live on Cloudflare Pages with serverless backend.

✅ COMPLETE
• Flask REST API (local dev)
• Cloudflare Pages Functions (production)
• Canvas-based waveform with attack-sustain-release envelope
• Piano roll visualization
• Live: c6bc1deb.talamaska.pages.dev

5.0 Work Breakdown Structure

Sunburst view of the EPIC/STORY map used in the GitHub project. Center: Project Talamaska. Ring 1: Epics 1–8. Ring 2: key stories (E*-0*).

6.0 Logic: Mood → PerformanceSettings

Epic 4 maps ControlDict and slider values to concrete performance parameters. This stays explainable for creators.

▶
E4-01 – PerformanceSettings: one struct carrying velocity curve, rubato, pedal density.
▶
E4-02 – Humanization: scale-aware timing jitter and phrase-level dynamics.
▶
E4-03 – Determinism: seeded randomness so A/B tests are repeatable.

Parameter Mapping Visualization

Comparison of performance parameters for two prompts, using the same model but different ControlDict and slider inputs.

Architecture A · Single-Instrument Text-to-Music Pipeline