Transcribe Your Voicemails with Python, Flask, and Twilio
Time to read: 7 minutes
Voice-to-Text (VTT) or Speech Recognition is a relatively new feature of many different software systems in business today. It allows spoken words to be automatically transcribed and entered into a given system, changing raw words into data. This data can then be acted upon by the business for any number of uses: storage and analysis, automatic responses, or even having the messages transcribed and sent out via SMS like we’re going to do.
In this tutorial we’ll set up a voicemail phone line, where each incoming call is recorded and transcribed. The transcriptions will be sent by SMS to the number of your choice.
By the end of this tutorial you’ll be able to:
- Set up a free Twilio Account
- Set up a phone number linked to a Python application that records voicemails and sends their transcriptions by SMS
- Start working with TwiML markup and the Twilio suite of APIs
Requirements:
- Python 3.6 or later - if your operating system doesn’t have a Python interpreter, you can go to python.org to download an installer
- A text editor or IDE - I prefer Visual Studio Code which is super lightweight and has a ton of great plugins but there’s also Atom, Notepad++ and lots more
- A Twilio Account - If you are new to Twilio create a free account now. Use the link to register and you will get a $10 credit to start using their services. You can also review the features and limitations of a free Twilio Account.
- Ngrok - We’ll use this convenient little application to expose our local web server to the Internet and generate a URL for our Twilio webhook to call. If you don’t have ngrok you can download a copy for Windows, MacOS, or Linux
Set up your Twilio account
After you set up your free Twilio account using the link in the Requirements section, you can access your Twilio Console and provision a phone number. Click the “Get a Trial Number” button and follow the prompts to choose and activate your new test number.
If you already have a Twilio account, you can use a number that you already have, or otherwise provision a new one by navigating to the “Phone Numbers” section of the Console. Click the ellipsis on the left side and then click “Phone Numbers” to change your number or create a new one:
Your Twilio dashboard should now show a phone number attached to your account:
Create a Python virtual environment
Now let’s create a Python virtual environment for our project. Here we’ll install the Flask framework, and the Twilio helper library.
For Linux/MacOS
For Windows/PowerShell:
Record an incoming call
Before we can get started, you’ll need your Twilio Account SID and Auth Token from the Twilio Console:
You should store them securely as environment variables before you proceed. This is an important step as the code below relies on authenticating to the Twilio REST API via these credentials. Once you have the credentials stored in your TWILIO_ACCOUNT_SID
and TWILIO_AUTH_TOKEN
environment variables, you’re ready to get started.
The Twilio Programmable Voice service will be configured to notify our application when an incoming call is placed, and in the application will respond to the call with a greeting and then record a message. We will be using the Twilio Markup Language (TwiML), an XML-based language, to tell Twilio how to handle the incoming call. Doing so is very simple using the Flask framework. Put the following code in a file named record_incoming_voice.py:
The above is a simple, yet complete Flask server. We import our libraries, set our application and credential variables, and define a Flask view, /record
that takes a POST call. Within that view we have a function, record()
that creates a TwiML <Response>
object using the VoiceResponse()
helper class from the Twilio library. We then use TwiML verbs say
,record
and hangup
to control the call flow.
You’ll notice we have a conditional statement to parse the request to our view. That’s because Twilio will invoke our endpoint twice. The first invocation occurs when a call is received. At this point we return a TwiML response that plays a greeting via text-to-speech and then records a message. Twilio will invoke the endpoint a second time when the recording ends at which point we just hang up the call. We’re looking at the contents of request.form
to determine if we are being invoked at the start or end of a call. The RecordingUrl
field contains a URL for the recorded call, and is obviously available only on the second invocation. See the docs here for more information.
You can execute the application from within your IDE (if it is linked to a Python interpreter), and see something like this:
Or if you prefer you can run the Python file from the command line after activating your virtual environment:
Now open another command prompt or terminal window and activate ngrok as follows:
If you are successful, you should see a window like below:
We now have our application running a development Flask server on port 5000 of the local system and ngrok exposing that port to the Internet. We can quickly test that everything is working by navigating to the forwarding link that starts with https://
in a web browser:
You should get some version of the above 404/Not Found error. You can also see in the ngrok terminal window that a request came in but we didn’t have a route setup so the 404 error was returned.
This happens because our Flask application does not have any views associated with the root URL (our only view is mapped to the /record
URL), so getting the Not Found error here is the expected result and just means that you have configured everything correctly.
Let’s tell Twilio about our endpoint. Navigate to your Twilio Console and hit the ellipsis button on the left side:
Scroll to your “Phone Numbers” and select your number:
Scroll down the page to the “Voice and Fax” section and paste the link ngrok generated under the field “A Call Comes In”. Make sure to append “/record” to the URL so the call can be routed to our endpoint. For example the full URL would look like: https://122fd757.ngrok.io/record.
Click “Save” to store this change.
We’re now ready to run an initial test on our app. Grab your smartphone and call your Twilio phone number. You’ll hear a robotic voice ask you to leave a message after the beep. Do so!
You should also see the call come in on both ngrok and within your application. If you get a code besides 200, you’ve got an error somewhere. Check your webhook link and utilize the Debug Console to figure out where the break is. You’ll know you’re good when you see this:
Retrieve message transcriptions from the Twilio REST API
Next, let’s write a function to retrieve our message from the API. You can put the code below in a file named message.py:
Here we create a Twilio client object, then call the transcription’s list method to return the most recent transcription (limit=1
). Then we get the SID identifier of the transcript and pass it the transcriptions.fetch()
method and print the transcription_text
string.
Note that the TWILIO_ACCOUNT_SID
and TWILIO_AUTH_TOKEN
environment variables must be set for the Twilio client to authenticate with the service.
Calling the function, whether in your IDE or via the command line should show the transcription for the message you left:
Disclaimer: Voice to Text, while it has come a long way in recent years, can still get parts of the message wrong depending on how clear you speak, your dialect, background noise, etc. Your mileage may vary, but it worked pretty well for me.
Putting it all together
Now we’ve got our Flask server and a message retrieval system for the transcript. Let’s put it all together and send the transcriptions of the recordings that we receive as SMS messages to ourselves.
First we need to add a few things to our initial view and then create a new one and extract our message function to it. Below you can see the updated version of our record_incoming_voice.py file:
There’s a few things going on so let’s break it down:
- We’ve populated the response object’s record verb a new TwiML attribute:
transcribe_callback
. We also commented outtranscribe
because it becomes implied when your pass an argument totranscribe_callback
- The
transcribe_callback
attribute is set to the newly created/message
view. Now our server will receive an asynchronous POST call from Twilio’s API when the transcription is ready and we can activate the retrieval process - We have incorporated our message retrieval code from earlier into the server, with a tweak at the end where we create an SMS message object and pass it the retrieved text. Make sure you enter your Twilio and personal phone numbers in the
from_
andto
arguments of that final call to send the SMS. Use the E.164 format for all phone numbers.
Run the server once again and make sure ngrok is still running. If for any reason you need to restart ngrok, keep in mind that it will assign a different URL, so you will need to go back to the Twilio Console and update your webhook.
If everything goes right now whenever someone leaves a message on your Twilio phone line you will receive a text message with the transcription. Try it out!
Success!
Wrap up and further resources
Be aware that there are legal implications to recording someone’s voice. Make sure you take that into account before entering into any Production scenario.
I hope you had as much fun making this as I did. We really only scratched the surface of what the Twilio Voice API is capable of. You can retrieve multiple transcriptions at once, write more views that take parameters and respond to incoming calls and messages, define special logic and call handling, or just gather general analytics about traffic as you send recordings out. Whatever you do, working with a robust suite of well-documented APIs enables rapid, performant application development. Head over to the docs and start building!
James Putterman is an Integration Data Architect and Full Stack Developer in Kansas City, reach out on LinkedIn to connect!
Related Posts
Related Resources
Twilio Docs
From APIs to SDKs to sample apps
API reference documentation, SDKs, helper libraries, quickstarts, and tutorials for your language and platform.
Resource Center
The latest ebooks, industry reports, and webinars
Learn from customer engagement experts to improve your own communication.
Ahoy
Twilio's developer community hub
Best practices, code samples, and inspiration to build communications and digital engagement experiences.