asr Plugin
How the asr plugin receives speech transcription through its constructor and writes voice text blocks during chat inbound processing
asr Plugin
asr is the speech-to-text plugin. It defines the agent-side protocol only. It does not bind to a local model, Python dependency, or provider.
The real transcription capability is injected through the constructor. The recommended shape is to connect City AI:
import { Agent } from "@downcity/agent";
import { AsrPlugin } from "@downcity/plugins";
const agent = new Agent({
id: "voice-agent",
path: "/path/to/project",
plugins: [
new AsrPlugin({
asr: (input) => city.ai.asr(input),
auto: true,
language: "auto",
}),
],
});Design Boundary
asr is responsible for:
- exposing the
transcribeaction - listening to inbound chat voice attachments when
auto: true - appending transcription results into user text
- adding system guidance for how the agent should use
asr
asr is not responsible for:
- installing local models
- choosing an ASR provider
- managing Python dependencies
- persisting project config
- providing CLI management commands
Constructor Options
| Option | Purpose | Default |
|---|---|---|
asr | Required real transcription function | none |
auto | Whether to auto-transcribe inbound chat voice attachments | false |
language | Default language hint, such as auto, zh, or en | none |
name | Plugin name | asr |
title | Display title | ASR |
description | Plugin description | built-in description |
Explicit Call
If you only need to transcribe one audio file, call the action directly:
const result = await agent.plugins.runAction({
plugin: "asr",
action: "transcribe",
payload: {
audio_path: "/path/to/voice.ogg",
language: "zh",
},
});The payload must provide at least one input source:
audio_pathurldata_url
Automatic Transcription
When auto: true, inbound chat messages with voice or audio attachments are processed at CHAT_PLUGIN_POINTS.augmentInbound.
After successful transcription, asr does not replace the original message. It appends the result to the text body:
<voice src="voice.ogg">Remind me about the meeting tomorrow at 3 PM</voice>If automatic transcription fails, the main chat flow continues. This keeps a single voice attachment or external ASR failure from blocking the agent.
How The Agent Calls It
The agent still calls asr through the standard plugin action path:
plugin_call({
plugin: "asr",
action: "transcribe",
payload: {
audio_path: "...",
},
});When auto: true is enabled, many voice messages do not need an explicit model-triggered action because the transcript has already been written into the message before it reaches the agent.