147 lines
7.2 KiB
Markdown
147 lines
7.2 KiB
Markdown
|
|
# GLaDOS GUI — Local LLM Style Transfer + Realtime TTS
|
|||
|
|
|
|||
|
|
GLaDOS GUI lets you type text, have a local LLM rewrite it in GLaDOS’ voice, and hear it spoken immediately using a high‑quality Piper TTS model. It streams the LLM output into the UI while synthesizing audio in realtime, so you hear sentences as they form.
|
|||
|
|
|
|||
|
|
- Left: editable input box
|
|||
|
|
- Center: Image of GLaDOS; her eye glow follows audio loudness
|
|||
|
|
- Right: non‑editable output (streamed from the LLM) with Copy, Play/Stop, and Save Audio buttons
|
|||
|
|
|
|||
|
|
Works fully offline with a local Ollama model and a local Piper voice.
|
|||
|
|
|
|||
|
|
## Highlights
|
|||
|
|
|
|||
|
|
- Realtime: streams LLM tokens, splits into sentences, and synthesizes with Piper as they arrive
|
|||
|
|
- Low‑latency audio: raw PCM straight to PortAudio via `sounddevice`
|
|||
|
|
- Eye glow: smoothed visual loudness indicator from the audio envelope
|
|||
|
|
- Persistent settings: remembers Ollama URL and selected model, and Piper ONNX path
|
|||
|
|
- Simple launcher: `run.sh` creates a venv, installs deps, and launches the app
|
|||
|
|
|
|||
|
|
## Quick Start
|
|||
|
|
|
|||
|
|
1) Install system prerequisites
|
|||
|
|
- Install Python 3.10+ and ensure `python3` is on your PATH
|
|||
|
|
- Install Ollama (and start it): https://ollama.com/
|
|||
|
|
- Install the Piper TTS voice model (ONNX) from Dave’s Armoury (see TTS section below)
|
|||
|
|
|
|||
|
|
2) Clone and run
|
|||
|
|
- macOS/Linux
|
|||
|
|
- `chmod +x run.sh`
|
|||
|
|
- `./run.sh`
|
|||
|
|
- Or manually
|
|||
|
|
- `python3 -m venv .venv && source .venv/bin/activate`
|
|||
|
|
- `pip install --upgrade pip`
|
|||
|
|
- `pip install -r requirements.txt`
|
|||
|
|
- `python glados_gui.py`
|
|||
|
|
|
|||
|
|
3) In the GUI
|
|||
|
|
- Model: pick your local Ollama model (recommendation below)
|
|||
|
|
- Ollama URL: default is `http://localhost:11434/api/generate`
|
|||
|
|
- Voice model: pick the `glados_piper_medium.onnx` file you downloaded
|
|||
|
|
- Type your text (left) → GLaDOSify → output streams (right) + audio plays
|
|||
|
|
|
|||
|
|
## Recommended LLM (Ollama)
|
|||
|
|
|
|||
|
|
- Best results: `mistral3.2:24b`
|
|||
|
|
- 32 GB unified memory works well
|
|||
|
|
- 16 GB can work, but generation may be slower
|
|||
|
|
- If that tag isn’t available on your system, fall back to another LLM. The app will list locally‑available models via Ollama.
|
|||
|
|
|
|||
|
|
Pull the model in Ollama (example):
|
|||
|
|
- `ollama pull mistral3.2:24b`
|
|||
|
|
|
|||
|
|
Ensure the Ollama service is running:
|
|||
|
|
- `ollama serve`
|
|||
|
|
|
|||
|
|
## TTS Voice (Piper / ONNX)
|
|||
|
|
|
|||
|
|
- Voice: GLaDOS Piper medium ONNX from Dave’s Armoury
|
|||
|
|
- Download from: https://huggingface.co/DavesArmoury/GLaDOS_TTS
|
|||
|
|
- Files needed in the same folder (or anywhere you choose via picker):
|
|||
|
|
- `glados_piper_medium.onnx`
|
|||
|
|
- `glados_piper_medium.onnx.json`
|
|||
|
|
- Credits: massive thanks to Dave’s Armoury for the GLaDOS TTS model!
|
|||
|
|
|
|||
|
|
Piper CLI
|
|||
|
|
- The app expects the `piper` binary on your PATH
|
|||
|
|
- Installing `piper-tts` (Python package) provides the CLI on most setups
|
|||
|
|
- Alternatively, install Piper via your OS package manager if available
|
|||
|
|
|
|||
|
|
## Running from the CLI (no GUI)
|
|||
|
|
|
|||
|
|
You can still use the original streaming script directly:
|
|||
|
|
- `python glados_say_stream.py --stream -t "input text"`
|
|||
|
|
|
|||
|
|
This streams rewritten text from Ollama and speaks it sentence‑by‑sentence with Piper.
|
|||
|
|
|
|||
|
|
## Files
|
|||
|
|
|
|||
|
|
- `glados_gui.py`: Main GUI
|
|||
|
|
- `glados_say_stream.py`: Streaming LLM → TTS engine (also usable as a CLI)
|
|||
|
|
- `speaker.svg`: Speaker icon used in the GUI (can be replaced)
|
|||
|
|
- `glados_head.png`, `glados_eye.png`: Center visuals (eye opacity tracks loudness)
|
|||
|
|
- `run.sh`: One‑shot launcher; creates venv, installs deps, runs GUI
|
|||
|
|
- `requirements.txt`: Python dependencies for the GUI + streaming
|
|||
|
|
|
|||
|
|
## Configuration
|
|||
|
|
|
|||
|
|
The GUI writes a small `config.json` next to the app with:
|
|||
|
|
- `ollama_url`: Ollama HTTP endpoint (default `http://localhost:11434/api/generate`)
|
|||
|
|
- `ollama_model`: last selected model name
|
|||
|
|
- `piper_model`: last selected Piper ONNX path
|
|||
|
|
|
|||
|
|
You can change these at any time in the GUI; the app remembers them.
|
|||
|
|
|
|||
|
|
## How it works (tech overview)
|
|||
|
|
|
|||
|
|
- Prompt: The app uses a carefully crafted “GLaDOS style transfer” prompt. Your input is rewritten; no extra narration or meta text is added.
|
|||
|
|
- Streaming: The GUI posts to `Ollama /api/generate` with `stream=true`. Chunks are filtered to drop any hidden `<think>` blocks and appended to the right textbox, in case you are using a reasoning-model.
|
|||
|
|
- Sentence splitting: As streamed text arrives, it’s split at safe sentence boundaries; each complete sentence is sent to Piper immediately.
|
|||
|
|
- Piper: A single Piper process runs during the session. The GUI writes sentences to Piper’s stdin and plays the raw PCM via `sounddevice` in realtime. Audio is also mirrored to a WAV so you can Save Audio at the end.
|
|||
|
|
- Eye flicker: The audio reader computes an RMS envelope and calls back into the GUI. Opacity is smoothed and slightly delayed to match perceived audio timing.
|
|||
|
|
|
|||
|
|
## Fonts / Visuals
|
|||
|
|
|
|||
|
|
- UI typeface: Lucida Console (please install it locally for best results).
|
|||
|
|
- Colors: black background (`#000000`), amber text (`#e1a101`).
|
|||
|
|
- Borders: dashed “retro” borders around textboxes and control buttons.
|
|||
|
|
|
|||
|
|
## Dependencies
|
|||
|
|
|
|||
|
|
Core Python packages (see `requirements.txt`):
|
|||
|
|
- `requests`: Ollama HTTP client
|
|||
|
|
- `sounddevice`: PortAudio bridge for low‑latency playback
|
|||
|
|
- `numpy`: audio level calculations (RMS)
|
|||
|
|
- `Pillow`: image scaling and eye opacity levels; icon rendering
|
|||
|
|
- `cairosvg` (optional): renders `speaker.svg` to a crisp PNG at UI size
|
|||
|
|
- `piper-tts`: provides the `piper` CLI (or install Piper separately)
|
|||
|
|
|
|||
|
|
Non‑Python:
|
|||
|
|
- Ollama (local LLM runtime): https://ollama.com/
|
|||
|
|
- Piper (CLI TTS engine): https://github.com/rhasspy/piper
|
|||
|
|
|
|||
|
|
## Usage Tips
|
|||
|
|
|
|||
|
|
- If audio starts clipping, reduce the Piper length/noise parameters (the GUI uses sensible defaults).
|
|||
|
|
- If the eye glow feels out of sync, small variations in audio driver buffering can be compensated by adjusting the internal smoothing/delay values (ask for help in issues).
|
|||
|
|
|
|||
|
|
## Credits
|
|||
|
|
|
|||
|
|
- GLaDOS TTS voice: Dave’s Armoury — https://huggingface.co/DavesArmoury/GLaDOS_TTS
|
|||
|
|
- Piper TTS engine: Rhasspy / Piper — https://github.com/rhasspy/piper
|
|||
|
|
- LLM runtime: Ollama — https://ollama.com/
|
|||
|
|
- Python libraries: `requests`, `sounddevice`, `numpy`, `Pillow`, `cairosvg`, `piper-tts`, and the standard Tkinter GUI toolkit.
|
|||
|
|
|
|||
|
|
## Trademarks & Attribution
|
|||
|
|
|
|||
|
|
- Portal and GLaDOS are properties/trademarks of Valve Corporation. Portal is a registered trademark of Valve. See: https://www.valvesoftware.com/
|
|||
|
|
- The GLaDOS image shown in the center panel is a screenshot of an in‑game model from the Portal series by Valve and is used here for demonstration/fan purposes. No ownership is claimed.
|
|||
|
|
- The AI voice used by this app is not created by the author of this GUI. The Piper ONNX model is created by Dave’s Armoury using audio from the Portal games by Valve. Download: https://huggingface.co/DavesArmoury/GLaDOS_TTS
|
|||
|
|
- This project is a fan‑made tool intended for personal/educational use and is not affiliated with, endorsed by, or sponsored by Valve Corporation.
|
|||
|
|
|
|||
|
|
## Troubleshooting
|
|||
|
|
|
|||
|
|
- “piper not found”: ensure the `piper` CLI is on PATH. Installing `piper-tts` via pip often provides it; otherwise install Piper for your OS.
|
|||
|
|
- “No audio device”: some systems need `sounddevice` to be configured with a valid output device; use the CLI `--list-devices` to find indices.
|
|||
|
|
- “Ollama connection error”: make sure `ollama serve` is running and that the URL in the GUI is correct. Pull the model you want with `ollama pull ...` first.
|
|||
|
|
- Icon looks blurry: install `cairosvg` so SVG is rendered to the exact UI size (or drop a tuned `speaker.png`).
|