tts Plugin
How the tts plugin receives speech synthesis through its constructor and returns audio as assistant file parts
tts Plugin
tts is the text-to-speech plugin. It defines the agent-side protocol only. It does not bind to a local model, Python dependency, or provider.
The real speech synthesis capability is injected through the constructor. The recommended shape is to connect City AI:
import { Agent } from "@downcity/agent";
import { TtsPlugin } from "@downcity/plugins";
const agent = new Agent({
id: "speech-agent",
path: "/path/to/project",
plugins: [
new TtsPlugin({
tts: (input) => city.ai.tts(input),
language: "auto",
format: "wav",
}),
],
});Design Boundary
tts is responsible for:
- exposing the
synthesizeaction - calling the constructor-injected
ttsfunction - normalizing the result into an AI SDK UIMessage
- letting audio file parts enter the agent resource materialization flow
- adding system guidance for how the agent should use
tts
tts is not responsible for:
- installing local models
- choosing a TTS provider
- managing Python dependencies
- persisting project config
- providing CLI management commands
Constructor Options
| Option | Purpose | Default |
|---|---|---|
tts | Required real speech synthesis function | none |
language | Default language hint, such as auto, zh, or en | none |
voice | Default voice hint | none |
format | Default output format hint, such as wav, mp3, or ogg | none |
name | Plugin name | tts |
title | Display title | TTS |
description | Plugin description | built-in description |
Explicit Call
const result = await agent.plugins.runAction({
plugin: "tts",
action: "synthesize",
payload: {
text: "Hello, welcome to Downcity",
language: "en",
format: "wav",
},
});text is required. Common optional fields include:
languagevoiceformatspeedprovider_options
Result Shape
The injected tts function can return an AI SDK UIMessage directly, or a simple audio result:
return {
data_url: "data:audio/wav;base64,...",
media_type: "audio/wav",
filename: "speech.wav",
};The plugin normalizes this into a UIMessage with a file part. When called through plugin_call, the agent materializes the audio under .downcity/resources and keeps an Agent-root relative path such as .downcity/resources/... in the final assistant message.
How The Agent Calls It
plugin_call({
plugin: "tts",
action: "synthesize",
payload: {
text: "...",
},
});TTS is an output-generation capability. It does not automatically intercept the message flow. Call it only when the user explicitly wants voice, narration, or an audio file.
asr Plugin
How the asr plugin receives speech transcription through its constructor and writes voice text blocks during chat inbound processing
workboard Plugin
A detailed guide to the workboard built-in plugin, how it exposes structured work snapshots through runtime HTTP, and why it belongs more to observability and control-plane capability