Live Transcribing Simultaneous Phone Calls
With Twilio Media Streams, you can extend the capabilities of your Twilio-powered voice application with real time access to the raw audio stream of phone calls.
This blog post follows on from my previous post that shows you how to get started with Twilio Media Streams and live transcription. If you haven’t set up a live call transcription before, I recommend working through that tutorial before moving on to this one. In this post we will scale our application to be able to handle multiple phone calls at the same time. We will be able to monitor the transcribed speech from multiple phone calls, live, in the browser, using Twilio and Google Speech-to-Text with Node.js.
You can quickly spin up working code by cloning my GitHub Repository and following the ReadMe to get setup. If you’d like to see how to refactor your code to accommodate simultaneous calls, follow these steps.
Requirements
Before we can get started, you’ll need to make sure to have:
- A Free Twilio Account
- A Google Cloud Account
- Installed ngrok
- Installed the Twilio CLI
Recap
Let’s recap how our basic call transcription application works. This picks up from a previous post. You can find that working code here: Basic Transcription Application. Follow the README to get it working.
- Twilio number receives a call and Twilio makes a
POST
request to our web server - Our express application responds with TwiML, instructing Twilio to stream the audio from the call to our websocket serve
- Our websocket server uses the Google Speech-to-Text API to transcribe the audio into text
- Finally our websocket server broadcasts the text to any browser clients that are connected and, like magic, words appear in our web browser
When we try to call our Twilio Number from more than one phone simultaneously you may notice that transcription text can get a bit confused. Let’s fix this.
Differentiating incoming calls
First we need to differentiate incoming calls to our server. Thankfully, with Twilio we can add custom parameters to the Twilio Media Stream. Head over to the TwiML that your application returns and let’s add a custom parameter to hold the caller’s phone number. You could also include parameters to include the caller’s name or other information that you may have collected.
index.js
Tracking on-going calls
Every time we receive a new phone call to our number, Twilio will establish a new websocket connection. We need to keep tabs on all of these calls and their respective transcription responses. Let’s create a global variable to hold our active calls
. I have placed mine just before the websocket on connection
event listener.
index.js
Whenever a new audio stream from Twilio starts, we want to add the new call’s details to this array. Let’s head over to the start
case in our switch statement. We’ll make a few changes.
First, we will attach the streamSid
to this websocket client as a property. This will be important when we are ending the call. Next we’ll add the streamSid
to the information that we send out to browser clients and we will also push the information about the new call to our active calls array. Finally, just to keep track we will log how many active calls we currently have to the console.
index.js
We still have a problem. We are sending the transcripts from every call to every client connected to our websocket server. In order to fix that we’ll only send transcription data to the clients that have subscribed to this particular media stream. We will handle subscriptions in the next step.
index.js
Subscribing from the browser
Now we need to add functionality to our web page to allow browser clients to see all the active calls, subscribe to a call and then display the transcript from that call.
First let’s restructure our index.html
file. We have modified the contents to also include a div to hold a list for all the active calls as well as the transcription text we had before.
index.html
We need to populate this list with active calls. First, we need to have our server broadcast the list of active calls out to all the connected browser clients. Let’s head back over to our index.js
file and add the following lines of code.
index.js
Let’s write a script back with our html to populate our ‘active calls’ list whenever the updateCalls
event is emitted from our websocket server.
index.html
Let’s pause for a moment and run a test. Save all the files, restart your web server and navigate with your browser to ‘http://localhost:8080’. Give your twilio number a call. You should see a new button appear with your phone number as a label.
If you try clicking on the button, nothing happens. Let’s fix that. Next we’ll add some code that will send a ‘subscribe’ message to our server whenever a call button is clicked. Add the following lines of code to your html file.
index.html
Let’s go back to our server and add code to handle these subscriptions. Back in our switch statement, we’ll add a new case for incoming websocket messages with the ‘subscribe’ event. We will change the subscribedStream
property to the streamSid we received in their message. Now that client will only receive transcript data from the stream they are subscribed to.
index.js
Let’s test it out! Restart your server, head back to the browser and refresh the page. Now give your twilio number a ring and click on the button for your call. Start talking! You should see the words start to appear.
Ending Calls
One more loose end we need to tie up is to remove calls from the active calls list when calls have ended. Let’s go back to our switch statement and edit the stop
case. We’ll search through the array of activeCalls
to find the index that matches the streamSid of the call that has just ended. Once we find it we’ll splice it out of the array and then send an updated active calls list to all the connected clients.
index.js
One last test, bring a friend
Now for the last test, we’ll call our Twilio number from multiple phones simultaneously. If you have any willing friends or colleagues, ask them to call your twilio number and speak all at the same time. You should see multiple phone numbers appearing and you can switch between their transcriptions by clicking on their phone numbers. As they hang up you should see the active calls disappear.
Wrapping up
Congratulations! You can now harness the power of Twilio media streams to extend your voice applications. This could be useful in a call center where you have multiple agents able to see the transcribed text from the calls that they are on. Now that you have live transcription, try translating the text with Google’s Translate API to create live speech translation or run sentiment analysis on the audio stream to work out the emotions behind the speech.
If you have any questions, feedback or just want to show me what you build, feel free to reach out to me:
- Twitter: @chatterboxcoder
- GitHub: nokenwa
- Email: nokenwa@twilio.com
Related Posts
Related Resources
Twilio Docs
From APIs to SDKs to sample apps
API reference documentation, SDKs, helper libraries, quickstarts, and tutorials for your language and platform.
Resource Center
The latest ebooks, industry reports, and webinars
Learn from customer engagement experts to improve your own communication.
Ahoy
Twilio's developer community hub
Best practices, code samples, and inspiration to build communications and digital engagement experiences.