Integrations and SDKsPipecat
Pipecat quickstart
Build a local voice bot with Speechmatics and Pipecat in minutes.
Pipecat is a framework for building real-time voice bots using a pipeline architecture. In this quickstart, you’ll run a local WebRTC server and connect to your bot from your browser.
Features
- Real-time transcription — Low-latency speech-to-text as users speak
- Natural text to speech — Give your bot a clear, natural voice
- Local web client — Test your bot in a browser at
http://localhost:7860/client - No infrastructure — No cloud deployment or room setup required
Prerequisites
- Python 3.10+
- Speechmatics API key
- OpenAI API key (for the LLM)
Setup
1. Create project
mkdir voice-agent && cd voice-agent
2. Install dependencies
Create a requirements.txt file:
requirements.txt
pipecat-ai[local-smart-turn-v3,silero,speechmatics,webrtc,openai,runner]
pipecat-ai-small-webrtc-prebuilt
python-dotenv
loguru
Install with uv:
uv venv
source .venv/bin/activate
uv pip install -r requirements.txt
3. Configure environment
Create a .env file:
.env
SPEECHMATICS_API_KEY=your_speechmatics_key
OPENAI_API_KEY=your_openai_key
4. Create your bot
Create a main.py file:
main.py
import os
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.services.speechmatics.stt import SpeechmaticsSTTService
from pipecat.services.speechmatics.tts import SpeechmaticsTTSService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.turns.user_stop.turn_analyzer_user_turn_stop_strategy import (
TurnAnalyzerUserTurnStopStrategy,
)
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info("Starting bot")
async with aiohttp.ClientSession() as session:
stt = SpeechmaticsSTTService(
api_key=os.getenv("SPEECHMATICS_API_KEY"),
params=SpeechmaticsSTTService.InputParams(
turn_detection_mode=SpeechmaticsSTTService.TurnDetectionMode.EXTERNAL,
),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4o-mini",
)
tts = SpeechmaticsTTSService(
api_key=os.getenv("SPEECHMATICS_API_KEY"),
voice_id="sarah",
aiohttp_session=session,
)
messages = [
{
"role": "system",
"content": "You are a helpful voice assistant. Be concise and friendly.",
},
]
context = LLMContext(messages)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
user_turn_strategies=UserTurnStrategies(
stop=[
TurnAnalyzerUserTurnStopStrategy(
turn_analyzer=LocalSmartTurnAnalyzerV3()
)
]
),
),
)
pipeline = Pipeline(
[
transport.input(),
stt,
user_aggregator,
llm,
tts,
transport.output(),
assistant_aggregator,
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info("Client connected")
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info("Client disconnected")
await task.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
async def bot(runner_args: RunnerArguments):
transport_params = {
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
}
transport = await create_transport(runner_args, transport_params)
await run_bot(transport, runner_args)
if __name__ == "__main__":
from pipecat.runner.run import main
main()
5. Run your bot
python main.py
Open http://localhost:7860/client in your browser and allow microphone access.
The first run can take a little longer while dependencies and models load.
Next steps
- Speech to text — Configure diarization, turn detection, and more
- Text to speech — Choose voices and adjust settings
- Speechmatics Academy — Full working examples
- Pipecat quickstart — Learn more patterns and deployment options