Enterprise-grade STT, TTS, LLM & Voice-to-Voice pipeline. One API. One gateway. All voice AI.
Self-hosted · OpenAI-compatible · Docker-ready · Multi-provider LLM
Everything you need to build voice-powered applications, exposed as simple HTTP endpoints.
Full pipeline: Audio in → STT → LLM → TTS → Audio out. One API call, complete voice conversation.
Ollama, OpenAI, or any compatible provider. Hot-swap without code changes. One endpoint, any model.
Live metrics, health monitoring, animated charts. Enterprise-grade visibility into every request.
Auth, rate limiting, per-key quotas, structured logging. Production-ready from day one.
A single gateway routes requests to specialized AI services running in Docker containers.
Real responses from the running gateway. No mocks.
OpenAI-compatible. Drop-in replacement for your existing voice stack.
| Method | Endpoint | Description |
|---|---|---|
| POST | /v1/stt | Speech-to-Text. Transcribe audio files. |
| POST | /v1/tts | Text-to-Speech. Synthesize audio from text. |
| POST | /v1/chat | LLM Chat with SSE streaming support. |
| POST | /v1/converse | Full voice pipeline: Audio → STT → LLM → TTS → Audio |
| GET | /v1/voices | List available TTS voices. |
| GET | /v1/models | List available LLM models. |
| GET | /health | Health check for all backend services. |
# Voice-to-Voice in one curl command
curl -X POST http://localhost:3100/v1/converse \
-H "Authorization: Bearer sk-your-key" \
-F "audio=@question.wav" \
-F 'bot_config={"system_prompt":"You are a helpful assistant","voice":"af_heart"}' \
--output response.wav