How to Transcribe a Voice Message Using Twilio, Python, and Flask

April 28, 2021
Written by
Diane Phan
Twilion
Reviewed by

transcribe a voice message using twilio and flask

In this tutorial you’ll leverage Twilio Programmable Voice to receive phone calls at your Twilio phone number, and transcribe any voice messages left by the caller. This guide can be used as a foundation to build your own voicemail system.

Prerequisites

To get started with this tutorial, you’ll need the following:

Project setup

In this section you are going to set up a brand new Flask project. To keep things nicely organized, open a terminal or command prompt, find a suitable place and create a new directory where the project you are about to create will live:

mkdir python-flask-transcription 
cd python-flask-transcription

Create a virtual environment

Following Python best practices, you are now going to create a virtual environment, where you are going to install the Python dependencies needed for this project.

If you are using a Unix or Mac OS system, open a terminal and enter the following commands to create and activate your virtual environment:

python3 -m venv venv
source venv/bin/activate

If you are following the tutorial on Windows, enter the following commands in a command prompt window:

python -m venv venv
venv\Scripts\activate

Now you are ready to install the Python dependencies used by this project:

pip install flask twilio pyngrok python-dotenv

The four Python packages that are needed by this project are:

  • The Flask framework, to create the web application that will receive message notifications from Twilio.
  • The Twilio Python Helper library, to work with WhatsApp messages.
  • Pyngrok, to make the Flask application temporarily accessible on the Internet for testing via the ngrok utility.
  • The python-dotenv package, to read a configuration file.

Set up a development Flask server

Make sure that you are currently in the virtual environment of your project’s directory in the terminal or command prompt. Since we will be utilizing Flask throughout the project, we will need to set up the development server. Add a .flaskenv file (make sure you have the leading dot) to your project with the following lines:

FLASK_APP=app.py
FLASK_ENV=development

These incredibly helpful lines will save you time when it comes to testing and debugging        your project.

  • FLASK_APP tells the Flask framework where our application is located.
  • FLASK_ENV configures Flask to run in debug mode.

Run the command flask run in your terminal to start the Flask framework.

screenshot showing app running in terminal

The screenshot above displays what your console will look like after running the command flask run. The service is running privately on your computer’s port 5000 and will wait for incoming connections there. You will also notice that debugging mode is active. When in this mode, the Flask server will automatically restart to incorporate any further changes you make to the source code.

However, since you don't have an app.py file yet, nothing will happen. Though, this is a great indicator that everything is installed properly.

Feel free to have Flask running in the background as you explore the code. We will be testing the entire project at the end.

Authenticate against Twilio Services

We need to safely store some important credentials that will be used to authenticate against the Twilio services.

Create a file named .env in your working directory and paste the following text:

TWILIO_ACCOUNT_SID=<YOUR_TWILIO_ACCOUNT_SID>
TWILIO_AUTH_TOKEN=<YOUR_TWILIO_AUTH_TOKEN>

Look for the TWILIO_ACCOUNT_SID and TWILIO_AUTH_TOKEN variables on the Twilio Console and add it to the .env file.

Twilio Account Credentials

Start an ngrok tunnel

The problem with the Flask web server is that it is local, which means that it cannot be accessed over the Internet. Twilio needs to send web requests to this server, so during development, a trick is necessary to make the local server available on the Internet.

On a second terminal window, activate the virtual environment and then run the following command:

ngrok http 5000

The ngrok screen should look as follows:

ngrok

While ngrok is running, you can access the application from anywhere in the world using the temporary forwarding URL shown in the output of the command. All web requests that arrive into the ngrok URL will be forwarded to the Flask application by ngrok.

Record an incoming call

Twilio uses the concept of webhooks to handle any incoming calls to your Twilio phone number.

Create a file named app.py and paste the following code:

import os 
from dotenv import load_dotenv
from flask import Flask, request
from twilio.twiml.voice_response import VoiceResponse
from twilio.rest import Client

load_dotenv()

app = Flask(__name__)

@app.route("/record", methods=["POST"])
def record():
  response = VoiceResponse()

  if 'RecordingSid' not in request.form:
    response.say("Hello, please leave your message after the tone.")
    response.record(transcribe=True)

  else:
    print("Hanging up... ")
    response.hangup()
  return str(response)


if __name__ == "__main__":
  app.run()

The record() function defines a response object using the Twilio library's VoiceResponse() helper class. There are a few TwiML verbs that are referenced in the code such as say, record, and hangup in order to control the call flow.

You'll learn more about the record verb in the next section.

The TwiML <Record> verb

Before I dive into the TwiML <Record> verb it’s important to mention that recording phone calls or voice messages has a variety of legal considerations and you must ensure that you’re adhering to local, state, and federal laws when recording anything.

The code above first creates a new variable called twiml that holds a reference to a new TwiML Voice Response object.

TwiML, which stands for Twilio Markup Language, is XML that has special tags defined by Twilio. You can use TwiML to tell Twilio how to handle an incoming phone call or SMS. Instead of writing XML, you can also write TwiML programmatically, which is what you’re doing in this function.

The <Record> verb will create an audio recording of anything the caller says after the call connects, and it can be modified with a number of different attributes. The attributes most relevant for this tutorial are transcribe and transcribeCallback.

transcribe is an optional attribute that, when included and set to true, will tell Twilio to create a speech-to-text transcription of any message left by the caller, with the caveat that the message has to be between 2 and 120 seconds in length. This means that some very short messages and very long messages will not be transcribed, though the actual audio recordings of the message will not be impacted.

The content of the transcription will be stored by Twilio for you, and can be accessed via the transcription API.

Alternatively, you can provide a transcription callback to the <Record> verb that will execute when the transcription is finished. In this callback, you can access the contents of the transcription and perform an action on it, like save it to a database or print it to a webpage.

If you use the transcribeCallback attribute, you don’t also need to include the transcribe: true attribute.

This brings you to your next step: creating the transcription callback function.

Add the transcription callback function

Create a new file named transcribe.py and paste the following code:

import os
from dotenv import load_dotenv
from twilio.rest import Client

load_dotenv()

def message():
    client = Client()
    transcription = client.transcriptions.list(limit=1)
    sid = transcription[0].sid
    t = client.transcriptions(sid).fetch()
    print(t.transcription_text)
    return str(sid)

if __name__ == '__main__':
    message()

In this file, you create a Twilio client object in order to fetch the transcription of the phone call. The client will look at the list of calls and store the most recent transcription ID from the Twilio REST API into the transcription variable. Then you will parse out the individual sid of the voicemail in order to fetch the transcription of the voicemail.

Configure the webhook for your Twilio phone number

Make sure the Flask server and ngrok are still running. You will need to configure the ngrok URL to the Twilio phone number before testing out the app in the next step.

Go to the Twilio Console and find the phone number you’re using for this tutorial in the list to open the configuration page for that number.

Scroll down until you see a section titled Voice & Fax.

Make the following adjustments to the information shown in this section:

  • For Accept Incoming, select Voice Calls
  • For Configure With, select Webhooks, TwiML Bins, Functions, Studio, or Proxy
  • For A Call Comes In, select Webhook

On the same line as A Call Comes In, paste the temporary ngrok URL with "/record" appended at the end. Remember to leave it as "HTTP POST". You can see an example below:

Screenshot showing webhook configuration for twilio phone number

After making these changes, click the Save button.

Test your app

Call your Twilio phone number from your personal phone. You’ll hear a beep after which you can speak into the phone and say a few words. Make sure you speak for at least a few seconds to ensure that there is enough content for the transcription to be triggered. After leaving your message, hang up the call.

On a third terminal window, activate the virtual environment and then run the following command:

transcribe.py

Wait a second to see your transcribed message show up on the terminal.

Congratulations, now that you’ve learned how to record transcriptions, what will you do next?

Diane Phan is a Developer Network editor on the Developer Voices team. She loves to help programmers tackle difficult challenges that might prevent them from bringing their projects to life. She can be reached at dphan [at] twilio.com or LinkedIn.