Enable Real Time Human-to-Human Voice Translation with Twilio ConversationRelay

February 12, 2025
Written by
Reviewed by
Paul Kamp
Twilion

Imagine a world where language barriers dissolve in real time, like the Star Trek universal translator making intergalactic diplomacy seamless. We may not be traveling the stars just yet, but Twilio’s ConversationRelay product brings us one step closer to that vision.

With this blog post, learn how to set up a real time simultaneous translation service using Twilio’s ConversationRelay, allowing anyone to communicate in a call using their primary language—or the one they’re most comfortable speaking—without missing a beat. Equally important, the human on the other end of the call can use their preferred language as well! Whether you’re helping customers navigate complex issues or connecting patients with the care they need, this technology makes conversations flow naturally and authentically across languages.

So how does Twilio’s ConversationRelay achieve this magic? And how can it revolutionize the way businesses and organizations connect with the world? Let’s dive into the details.

How the application works

ConversationRelay launched as public beta in November 2024 and focused on connecting humans to AI Agents. ConversationRelay removes the complexity of building voice applications by abstracting the speech-to-text and text-to-speech components and providing a simple websocket-based orchestration system to connect to your application.

The AI Agent use case, rightfully, has gotten plenty of attention. However, humans still need to communicate with each other and ConversationRelay can help supercharge these conversations, too. In addition to connecting to Generative AI, ConversationRelay can connect to other types of text-processing services to power additional use cases. One obvious use case is human-to-human translation.

Below, we’ll introduce a proof of concept to show human-to-human translation using Twilio ConversationRelay. The reference application is built using AWS and their serverless products, but the architecture provides a framework for building solutions on other cloud providers. Translation services are powered by AWS Translate.

Let’s dive in.

Proof of Concept Application

Let’s turn our attention to building a proof of concept application using ConversationRelay. Here’s the reference architecture:

Diagram showing IVR flow between caller, Twilio, AWS Lambda functions, and callee agent.

Let’s walk through the diagram using the notes indicated by the numbers 1 - 6.

  1. Inbound Voice Call: The caller places a call to a Twilio number and the inbound call handler routes the call to your application. Your application will use the caller’s phone number to pull language preferences and establish a ConversationRelay session.
  2. Establish State for Caller and Callee : Since the caller needs to speak to another live human, configure and save the state for that other human (“ callee”) and link both calls together in state management. Finally, trigger an event to place a call to the other human.
  3. Place an Outbound Call: Use the Twilio platform to place the second call and establish a second and independent ConversationRelay session.
  4. Caller Says Something: With both separate calls in place, the caller says something and the ConversationRelay session uses speech-to-text to turn the spoken words into text and passes them via websocket messages to your application.
  5. Translation Step: Your application receives the text from the caller and your application maintains the state of both of these calls. It uses the translate service to convert the text spoken by the caller to the language being used by the callee.
  6. Speak the Translated Words: Your application posts the translated text to the other websocket connection – the one that belongs to the callee. ConversationRelay takes that text and speaks the words in the callee’s language. The callee can then, in turn, speak in their language to initiate the process going the other direction.

This all sounds great, but does it work?

It sure does! Watch the video below for an overview of the architecture and a demo.

Introduction video with demo


Want to watch a video of the installation instead?

Prerequisites

This is not a beginner level build! You should have some knowledge of AWS, serverless computing, and programming before continuing.

Let’s build it!

1. Download the code for this application

Download the code from this repo, and then open up the folder in your preferred development environment.

GitHub repository page showing code directory and recent commits for ConversationRelay-Translator project.

The repo contains everything needed to deploy this solution.

First, we need to install a couple of node packages. From a terminal window in the parent directory, run the following commands to install the Twilio SDK:

$ npm --prefix ./layers/layer-cr-twilio-client/nodejs install

2. Update template.yaml with your variables

In order to make API calls and run this solution, you will need to enter your credentials. For the purpose of this blog post and to get you up and running quickly, you can enter your credentials directly into the yaml file, but for best practices (and certainly for production use), save your credentials using methods approved by your organization ( AWS Secrets Manager would work well for this solution!).

Open up the file template.yaml in the parent directory. This yaml file contains the instructions needed to provision the AWS resources.

In the template.yaml file use FIND and enter TWILIO_ (this is under the TwilioInitiateCallFunction resource).

Uncomment the lines that looks like this:

#TWILIO_ACCOUNT_SID: "YOUR-TWILIO-ACCOUNT-SID"
#TWILIO_AUTH_TOKEN: "YOUR-TWILIO-AUTH-TOKEN"          
#AGENT_PHONE_NUMBER: "<ENTER-A-DEFAULT-TWILIO-NUMBER>"
#TWILIO_DEFAULT_FROM: "ENTER-YOUR-DEFAULT-TWILIO-NUMBER"

…and replace values of each of those with your Twilio Account SID, the Auth Token, a default phone number for your callee (person you want to connect to), and a default Twilio number that is used to call the callee.

Use FIND TWILIO_ in the template.yaml file again to find a second place where you need to update the Twilio Account SID and Auth Token (this is under the OnDisconnectLambdaFunction resource).

Uncomment the lines that looks like this:

#TWILIO_ACCOUNT_SID: "YOUR-TWILIO-ACCOUNT-SID"
#TWILIO_AUTH_TOKEN: "YOUR-TWILIO-AUTH-TOKEN"

…and replace the values of each of those with your Twilio Account SID and Auth Token.

3. Update your AWS profile

In order to deploy the SAM application to your AWS account, you need to be sure that you have the proper AWS credentials configured. Follow these instructions. This application uses your locally saved AWS profile to deploy to your AWS account.

Copy aws-profile.profile.sample to aws-profile.profile and in the new file enter your local AWS profile name. This will allow you to run AWS commands locally to deploy to your AWS account.

Once you have authenticated into your AWS account, you can proceed to the next step.

4. Deploy code

With those settings in place, we are ready to deploy! From a terminal window, be sure you are in the parent directory, and run:

$ sam build

This command goes through the yaml file template.yaml and prepares the stack to be deployed.

$ sam deploy –guided --stack-name CR-TRANSLATOR --template template.yaml --profile $(cat ./aws-profile.profile) --capabilities CAPABILITY_NAMED_IAM

Note that the command references aws-profile.profile in order to authenticate and deploy to your AWS account.

This will start an interactive command prompt session to set basic configurations and then deploy all of your resources via a stack in CloudFormation. Here are the answers to enter after running that command (except, substitute your AWS Region of choice):

Configuring SAM deploy
======================
        Looking for config file [samconfig.toml] :  Not found
        Setting default arguments for 'sam deploy'
        =========================================
        Stack Name [CR-TRANSLATOR]: CR-TRANSLATOR
        AWS Region [us-east-1]: us-east-1 
        #Shows you resources changes to be deployed and require a 'Y' to initiate deploy
        Confirm changes before deploy [y/N]: N
        #SAM needs permission to be able to create roles to connect to the resources in your template
        Allow SAM CLI IAM role creation [Y/n]: Y
        #Preserves the state of previously provisioned resources when an operation fails
        Disable rollback [y/N]: N
        CallSetupFunction has no authentication. Is this okay? [y/N]: y
        Save arguments to configuration file [Y/n]: Y
        SAM configuration file [samconfig.toml]: 
        SAM configuration environment [default]:

After answering the last questions, SAM will create a changeset that lists all of the resources that will be deployed. Answer “y” to the last question to have AWS actually start to create the resources.

Previewing CloudFormation changeset before deployment
======================================================
Deploy this changeset? [y/N]:

The SAM command prompt will let you know when it has finished deploying all of the resources. You can then go to your AWS Console and CloudFormation and browse through the new stack you just created. All of the Lambdas, Lambda Layers, API Gateways, IAM Roles, SNS topics are all created automatically. ( IaC – Infrastructure as Code – is awesome!)

You should see a new stack called CR-TRANSLATOR:

Screenshot of AWS CloudFormation Stacks list showing one stack CR-TRANSLATOR with status UPDATE_COMPLETE.

Click into that stack and then go into the Outputs tab and copy the value for TwimlAPI as you will need it in the next step! It should look like this:

Screenshot of CR-TRANSLATOR outputs with keys, values, descriptions, and highlighted Twilio API initialization link

4. Connect your Twilio phone number

With the value copied from the previous step, go to your Twilio Console and navigate to the phone number that you want to use for this application. In the Voice Handler section under A call comes in, select webhook and then paste in the value copied above to connect your Twilio phone number to the application you just spun up.

The correct Twilio page section will look like this:

Configuration screen showing webhook URL for voice call routing

5. Load the configuration details

Now, we need to load a profile into the new DynamoDB table that is linked to the caller’s phone number. This profile will be used to set up the ConversationRelay sessions for both the caller and the callee.

First, open the file profile-caller-example.js in the configuration/dynamo-loaders folder.

In that file, review the comments up at the top edit the properties of the JSON object to set up the caller and the callee and the languages and voices to be used by both.

Once you have completed editing that file run this command from the parent directory:

$ aws dynamodb put-item --table-name CR-TRANSLATOR-AppDatabase --item "$(node ./configuration/dynamo-loaders/profile-caller-example.js | cat)" --profile $(cat ./aws-profile.profile)

This will create an item in your DynamoDB instance that will be used to configure each caller session. It is worth taking a look at this item in your DynamoDB Console.

In your AWS Console, navigate to DynamoDB and then select TABLES, and then the table called CR-TRANSLATORAppDatabase. Finally, click on EXPLORE TABLE ITEMS.

Once there you should find and click on an item with a primary key that matches the phone number of the caller. The item will look like this:

Screenshot showing various attributes for call details configuration, including language, voice, and providers.

Make a call!

To make this work you are going to need two phones. One phone will call the Twilio phone number, and the other phone will receive the second call from the second ConversationRelay session.

You should now be able to call your Twilio phone number and start talking with your new real time voice translation application! In the sample configuration provided in the repo, the caller speaks english and the callee speaks spanish. You can of course change either of these to any supported language.

To change languages and voices for both the caller and the callee, try changing the configuration parameters directly from the DynamoDB console. Inspect the session items generated by your calls to see how this application stitches together conversations.

Cleanup

To avoid any undesired costs, you can delete the application CloudFormation Stack from the AWS Console. Select the stack and the DELETE option as shown below:

Interface showing AWS CloudFormation stack 'CR-TRANSLATOR' selected for deletion.

Deploy to production

While you can get this system working pretty quickly, it is not ready for your production environment. This blog post and repo is intended to inspire and help you start building awesome real time translation voice applications with Twilio and ConversationRelay.

Conclusion

In this blog post we showed how you could quickly spin up a proof-of-concept translation system using Twilio Voice and Twilio ConversationRelay. Again, I want to stress that this is a proof of concept. However, I hope you agree that the latency and basic functionality are very promising for a PoC. Twilio ConversationRelay is making use cases like this one much more obtainable!

Some use cases that could be powered by human-to-human translation demonstrated in this proof of concept are:

  • Education and Online Learning: Teachers could connect with students from different linguistic backgrounds, fostering inclusive, global classrooms. Schools can also communicate effectively with parents during conferences, regardless of language barriers.
  • Travel and Hospitality: Hotels, airlines, and travel agencies could provide real time language support to tourists, enhancing their experience.
  • E-Commerce and Retail: Online businesses could scale globally by offering real time multilingual customer service, increasing trust and conversions. Help centers can assist customers in navigating technical products in their native languages.
  • Government and Public Services: Emergency hotlines and public service organizations can communicate with multilingual communities, ensuring accessibility and faster response times.
  • Humanitarian Aid and Disaster Relief: Relief organizations can coordinate efforts in crisis zones more effectively and support refugees in accessing essential services.

With ConversationRelay, Twilio is responsible for managing the complexities of telephony and the orchestration of speech-to-text and text-to-speech. Your enterprise gets to focus on your application. (In this case, the translation.)

The latency required to translate text from language to language is pretty minimal, and conversations using short phases are well served with this repo. When a person speaks several sentences at once, there is additional latency added by the time it takes to speak the words in the other language. This is more of “perceived” latency because it is not actually the system being slow – it is just that the words need to be spoken twice. It would be straightforward to address this by adding some sound effects while the translated text is being spoken, or, even speaking the translated text to both parties so that each party knows when the translated words have completed. There are several possible solutions to this issue and the one you choose will depend on your distinct requirements.

One thing for sure is that real time translation has massive potential, and we look forward to seeing –and hearing – what you can build with these awesome Twilio tools!

Bonus Material

Some of my fellow Twilions teamed up recently for an internal event to show how this solution could be used in a contact center – specifically, Twilio Flex.

We were able to connect the second “callee” ConversationRelay session to a Flex-backed agent. That solution uses a proxy to link the two calls together. Some of the code for a proxy solution is in this repo.

Here is our presentation:


Dan Bartlett has been building web applications since the first dotcom wave. The core principles from those days remain the same but these days you can build cooler things faster. He can be reached at dbartlett [at] twilio.com.