Technology stack: Python, Node.js, React, http.server, boto3, Strands Agents SDK, Amazon Nova Sonic & Lite
AWS services: IAM, EC2/VPC, ECS Fargate, ECR, CloudWatch, Amazon Bedrock
IaC framework: AWS Cloud Development Kit
This project is a fork of an official Amazon Nova S2S workshop but has been heavily modified and is maintained independently.
The following application demonstrates an implementation of the Amazon Nova Sonic model for voice assistance with additional features for using tools via Strands Agents integration. The voice assistant comes by default with two tools for web search using Tavily and for Bedrock Knowledge Base RAG, which are natively implemented through the strands-agents-tools package, while the programmatic implementation of additional tools has been made easier given the monolithic architecture of the repository.
The UI components had been developed using React and the agent flow is integrated through WebSocket speech-to-speech (S2S) sessions, also having midspeech interruption enabled. The backend entrypoint can be executed locally for development or testing, or the server can be deployed via CDK and hosted using ECS Fargate, with Dockerized code pushed into an ECR repository, ideal for production mode.
-
Python version 3.12 or later
-
AWS CLI profile with administrator-like permissions on the services listed above
-
AWS access to the Amazon Nova Lite (
amazon.nova-lite-v1:0) and Amazon Nova Sonic (amazon.nova-sonic-v1:0) models -
Node.js 20.x or later installed (check current Node.js versions supported by AWS CDK)
-
AWS CDK v2
npmpackage installed -
CDK bootstrapped in the AWS account, done by running:
cdk bootstrap [--profile PROFILE_NAME]
Before deploying the project, dependencies have to be installed globally or locally in a virtual environment placed in the root directory which has to be activated. To manage the stack without changing internal project settings, run CDK-related commands inside the src/backend/ directory which contains the already configured CDK source files. Example for Unix-based shells:
python -m venv .venv
source .venv/bin/activate # PowerShell command: .venv/Scripts/activate
pip install -r src/backend/requirements.txtThree Bash scripts have been implemented to easily perform essential operations over the stack and project, with the ability to set a custom AWS CLI profile. First, head to the root folder scripts.
Deploy the project to the cloud and push the Docker image to ECR by running:
./deploy.sh -t TAVILY_API_KEY -k KNOWLEDGE_BASE_IDNote
It is not necessary to set the Tavily and Bedrock Knowledge Base context variables at deployment, but it is required for both Strands Agent tools to work properly
To push any updates in the backend code to the ECR repository run:
./build.shDelete the project and all of its AWS resources by running:
./destroy.shAdditionally, include the -p PROFILE_NAME flag in case of having configured a specific AWS CLI profile.
The CDK stack provisions the following resources:
- An ECR repository for hosting the Dockerized WebSocket server image
- A VPC with two public subnets across two availability zones and no NAT gateways
- A security group allowing inbound TCP traffic on ports 8081 (WebSocket) and 8082 (health check)
- An ECS Fargate cluster with a service initially set to 0 desired tasks, to be scaled up after the Docker image is pushed to ECR
- An IAM task role with permissions for Amazon Bedrock
- A CloudWatch Log Group for container logging
The service is deployed with assignPublicIp enabled and runs on 1 vCPU / 2 GB memory Fargate tasks. A container health check is configured against the http://localhost:8082/health endpoint.
The application supports an optional Strands Agent that orchestrates external tool calls during a voice conversation. When enabled via the --enable-strands flag, the WebSocket server initializes a Strands Agent running on Amazon Nova Lite with two native tools implemented in the integration file (strands_agent.py):
tavilyfor real-time web search queries (e.g. current events, locations, facts)retrievefor domain-specific RAG queries against a configured Bedrock Knowledge Base
The tool definitions are declared in the S2S session configuration (config.json) and are sent to Nova Sonic as part of the prompt setup. When Nova Sonic determines that a tool should be invoked, it emits a toolUse event which the session manager intercepts, delegates to the Strands Agent for reasoning and execution, and returns the result back into the S2S stream as a toolResult event.
Additional tools can be added programmatically by extending the tools list in src/backend/websocket/integration/strands_agent.py and registering the corresponding tool specification in the DEFAULT_TOOL_CONFIG section of src/frontend/src/agent/config.json, which should be reflected in the local backend counterpart (src/backend/websocket/config.json) which is automatically copied into the ECR repository through the Dockerfile.
Other voice assistant behavior properties can be customized through the config.json file:
- Inference configuration:
maxTokens,topP, andtemperaturefor the Nova Sonic model - System prompt: The default instructions given to the assistant for conversation behavior
- Audio input/output configuration: Media type, sample rate, bit depth, channel count, and voice ID (review
configVoices.js) - Tool configuration: The list of tools exposed to Nova Sonic during the S2S session
The frontend also provides a settings modal accessible during a session, allowing runtime adjustments to the voice ID, system prompt, and tool configuration without restarting the server.
To run the application locally without deploying to AWS, both the Python WebSocket server and the React frontend need to be started independently.
-
Install dependencies globally or preferably in a virtual environment at
src/backend -
Set the required environment variables for AWS authentication:
export AWS_ACCESS_KEY_ID="YOUR_AWS_ACCESS_KEY_ID" export AWS_SECRET_ACCESS_KEY="YOUR_AWS_SECRET_ACCESS_KEY" export AWS_DEFAULT_REGION="us-east-1"
-
The WebSocket host and port are optional. If not specified, the server defaults to
localhost:8081:export HOST="localhost" export WS_PORT=8081
The health check port is optional and intended for container deployments (ECS/EKS). If not set, the HTTP health check endpoint will not start:
export HEALTH_PORT=8082 -
Start the server:
python server.py
To enable the Strands Agent integration for tool usage (Tavily web search and Bedrock Knowledge Base retrieval), pass the
--enable-strandsflag:python server.py --enable-strands
For the Strands Agent tools to work, the following environment variables must also be set:
export TAVILY_API_KEY="YOUR_TAVILY_API_KEY" export KNOWLEDGE_BASE_ID="YOUR_BEDROCK_KNOWLEDGE_BASE_ID"
Debug mode can be enabled with the
--debugflag for verbose logging.
Note
Keep the WebSocket server running, then launch the React frontend in a separate terminal.
-
Navigate to the frontend directory and install dependencies:
cd src/frontend npm install -
Set the WebSocket URL environment variable to the ECS public IP WebSocket address if deployed. If not provided, the application defaults to
ws://localhost:8081:export REACT_APP_WEBSOCKET_URL="YOUR_WEBSOCKET_URL"
-
Start the development server:
npm start
When using Chrome, ensure the site's sound setting is set to Allow if there is no audio output.
Warning
This UI is intended for demonstration purposes and may encounter state management issues after frequent conversation start/stop actions. Refreshing the page can help resolve the issue.
amazon-nova-websocket/
│
├── scripts/ # Bash scripts for handling essential operations over the project
│ ├── build.sh # Script for building and pushing the Docker image and updating the ECS services
│ ├── deploy.sh # Script for deploying the CDK stack and building and pushing the Docker image
│ ├── destroy.sh # Script for destroying the CDK stack
│ └── lib/ # Auxiliary functions and helpers for the main scripts
│
├── docs/ # Files related to the project's documentation
│ └── diagram.png # Infrastructure flow diagram
│
├── src/ # Source files for the CDK stack, WebSocket server, and React UI
│ ├── backend/ # Folder containing the CDK and Strands Agents definitions, and the WebSocket logic
│ │ ├── cdk/ # CDK stack definition
│ │ ├── websocket/ # WebSocket + Strands integration code to be Dockerized along with config.json
│ │ ├── app.py # Application file to be referenced by CDK
│ │ ├── cdk.json # CDK configuration file with execution, tags, and context attributes
│ │ └── requirements.txt # All backend dependencies to be installed using pip
│ │
│ └── frontend/ # Folder containing assets for the web UI and the agent configuration
│ ├── package.json # All frontend dependencies to be installed using npm
│ ├── public/ # Static web metadata assets
│ └── src/
│ ├── agent/ # Files related to the full stack agent settings
│ │ └── config.json # The agent configuration, including tool definitions
│ │
│ ├── components/ # React components
│ ├── helper/ # Files for audio handling and agent initialization
│ ├── static/ # Static visual assets
│ ├── App.js # Main React application component
│ ├── s2s.js # S2S chatbot React component
│ ├── index.js # React entrypoint including AWS Amplify settings
│ ├── s2s.css # CSS file for application-specific components
│ └── index.css # CSS file for the general web interface
│
├── CODE_OF_CONDUCT.md # Information on the Amazon Open Source Code of Conduct
├── CONTRIBUTING.md # Guidelines and information for contributing to the project
├── README.md # Project overview, instructions, and architecture details
├── LICENSE # License information for the repository
└── .gitignore # Files and directories to be ignored by Git
