How to control and record voice calls with Node.js serverless functions

May 13, 2019
Written by

Graphic showing Twilio call flow with serverless functions

There are many ways to communicate with your friends and family. You might use good old SMS, Whatsapp, emails and of course phone calls. Recently, I came across a surprising new way of communication. It all started with my friend Michaela asking a question on Twitter.

Would it be possible to generate an RSS feed out of Twilio voice calls?

What she was looking for was a way to transform phone calls into an RSS feed referencing the recorded phone call. Why? Because she listens to many podcasts and sends very long voice messages to one of her friends via Whatsapp. As these messages usually don’t need a timely response, they both would prefer all the voice messages appear in their podcatchers rather than Whatsapp.

Let me introduce you to the idea of “PodBot - the surprise podcast from your friends”.

Podbot preview in a browser window

The idea is a follows: All you have to do to publish a new podcast episode is to make a phone call. There is no need for a microphone setup, and you wouldn’t need to record, store or upload audio files to a podcast platform.

Requirements for a voice call driven podcast site

To create a phone call driven podcast site you need several things.

You need a phone number that you can control programmatically. Luckily, Twilio Voice provides this exact functionality.

Additionally, you need a way to store the information and transcriptions of the incoming phone calls. Google sheets or cloud databases can be used with Node.js and thus can provide a quick data store used in Twilio Serverless Functions.

Moreover, you need a site hoster that can serve your podcast site and feed.

Configuring programmable voice calls, downloading audio files and building a site with all the information is a lot of work. That’s why we split building PodBot and all its tasks into several tutorials.

In this first tutorial, we’ll use Twilio serverless functions to accept and manage phone calls, recordings, and voice transcriptions. In later tutorials, we’ll extend the functions to write to a data store, and we’ll use this data to build the podcast site using Gatsby, including the mandatory RSS podcast feed. Sound good? Let’s get started writing functions, then! 🎉

Here’s what you need today:

  • A Twilio account to buy a phone number and accept calls
  • Node.js and npm installed

Function-driven voice calls to the rescue

To define what happens when someone calls a Twilio phone number you have to provide some configuration in an XML-based format that is called TwiML. The Twilio API requests this configuration right at the moment a call comes in. You can serve TwiML via TwiML bins, your custom-tailored application via webhooks, or serverless functions.

Before we implement this functionality, let’s recap and think about what we need to build. When someone calls PodBot we need to gather the following episode information:

  • the title
  • the recording URL
  • the episode transcription

Luckily, Twilio provides ways to retrieve all this call information. Below you see the TwiML configuration to ask for the episode title, record it and get a transcript of the recording.

Graphic showing the call flow with serverless functions

Let’s go into detail; when someone calls your Twilio number (step 1), Twilio asks your defined serverless functions for some TwiML configuration (step 2). The responded TwiML defines to ask for the title of the episode and to record the answer (step 3). Information about the complete recording should be sent to a /call-exit/ endpoint. The response of /call-exit/ will control what happens next by responding with more TwiML. The initial TwiML also specifies that the generated transcript should be sent to /transcribe-title/.

After Twilio receives this initial TwiML configuration, PodBot speaks to the caller saying “Tell me the title of your episode.” (step 4). Then it waits and records the answer until five seconds of silence has passed. Magic? Magic!

TwiML configurations like the one above can be chained together. This option makes it possible to ask for the episode title and record it followed by another action to end the phone call or to record more data like the episode itself.

Let’s set up the call handling in a new project. 🎉

To keep this tutorial crisp and short we’ll only record and transcribe the episode title. You can find a solution at the end of the article that you can tweak to your needs and run locally quickly.

The creation of serverless functions

Create a new directory for this project and also create three JavaScript files in the functions directory: call-enter.js, transcribe-title.js and call-exit.js.

$ mkdir podbot-functions
$ cd podbot-functions
$ mkdir functions
$ touch functions/call-enter.js functions/transcribe-title.js functions/call-exit.js

Each of these JavaScript files represents one HTTP endpoint. These endpoints have to respond with TwiML when the Twilio API asks for the configuration. To build a TwiML response you can use the Twilio JS client which is available globally in Twilio functions.

The serverless entry point of your phone calls

call-enter.js is the first endpoint Twilio requests when someone calls your number.

// File: /functions/call-enter.js
'use strict';

exports.handler = function(context, event, callback) {
 let response = new Twilio.twiml.VoiceResponse();

 // documentation for say
 // -> https://www.twilio.com/docs/voice/twiml/say
 response.say(
   { voice: 'woman', language: 'en-US' },
   'Welcome to PodBot. Tell me the title of your episode.'
 );

 // documentation for record
 // -> https://www.twilio.com/docs/voice/twiml/record
 response.record({
   action: '/call-exit',
   timeout: '5',
   transcribe: 'true',
   transcribeCallback: '/transcribe-title'
 });

 callback(null, response);
};

The above defined serverless function will be called with context, event and a callback. The context object provides information about the current execution environment, event contains the request parameters passed into your function, and you can use the callback to respond to the request.

By calling the callback with null as the first argument, you signal that there were no errors. The second argument is a VoiceResponse which you can create by using the global Twilio object.

By defining the Record verb and its included action attribute, the second endpoint will be called after the caller is silent for five seconds.

Additionally, transcribeCallback defines the endpoint to retrieve the transcription when it’s ready.

Log the recording and say goodbye

After the caller gives the name of the episode and remains silent for 5 seconds, the next endpoint (/call-exit/) is called to request additional configuration and continue the phone call.

// File: /functions/call-exit.js
'use strict';

exports.handler = function(context, event, callback) {
 // do something with the data here
 console.log(event.CallSid, event.RecordingUrl);
 // CallSid: ‘CA3ac5f19…’
 // RecordingUrl: ‘https://api.twilio.com/2010-04-01/Accounts/ACa3.../Recordings/RE92…’

 const response = new Twilio.twiml.VoiceResponse();

 // documentation for say
 // -> https://www.twilio.com/docs/voice/twiml/say
 response.say({ voice: 'woman', language: 'en-US' }, 'Thanks');

 callback(null, response);
};

Above you see the first important part of logging phone calls in serverless functions. Using the event object, you can access the data Twilio included in the request. The CallSid is a unique identifier for the phone call. This identifier stays the same for a phone call across the three functions.

The recording URL is also accessible. To request the recording in MP3 format, append .mp3 to the RecordingUrl property of the event object.

Right now this function only logs the information, but with CallSid and RecordingUrl available you can store this data in a database or other stores.

To finish the phone call, configure the VoiceResponse to say “Thanks”.

The transcript logging

The last function you need is transcribe-title.js. The /transcribe-title/ endpoint’s only job is to log the transcript of the episode title. It doesn’t have to provide any additional configuration. Call callback with null to signal that there were no errors and you’re good to go.

// File: /functions/transcribe-title.js
'use strict';

exports.handler = function(context, event, callback) {
 // do something with the data here
 console.log(event.CallSid, event.TranscriptionText);
 // CallSid: ‘CA3ac5f19…’
 // RecordingUrl: “Hello everybody I hope…”

 callback(null);
};

At this point, you have three endpoints in place that can accept and control Twilio voice messages and log the recording URL and a transcript. With this setup, it’s time to test these with your first phone call.

Run Twilio functions locally

To run Twilio functions locally, you can use twilio-run. You can install the tool into your project but thanks to npx, which comes with recent npm version, all you have to do is to run a single command in the root of your project directory.

$ npx twilio-run --live

This command downloads npm-run if it’s not available in your environment and runs it. twilio-run looks for a functions directory which you have already prepared. The --live flag makes sure that the functions won’t be cached when a local server is started. This flag allows you to write functions without the need for a command restart.

Terminal showing the local functions endpoints

After running the command, the local server is started to debug and test your functions. Unfortunately, your localhost is not accessible on the internet. That’s the reason why twilio-run comes with another nifty configuration. If you run it with the --ngrok flag, it automatically spins up a publicly available URL which tunnels all requests to your local machine.

$ npx twilio-run --live --ngrok

Terminal showing the live endpoints of Twilio run

The provided URLs are what you need finally set up your call handling.

Connect your Twilio number with serverless functions

After you buy a number, you can set it up to use your local functions when a call comes in. On the configuration page for a particular number, you’ll find the setting for incoming calls. Select the webhook option for incoming calls and copy/paste the public URL for /call-enter/. Hit

“save” and call your number.

Number configuration to call local function when I call comes in

When you call your number and tell PodBot the title of the podcast episode you should see the CallSid, RecordingUrl, and TranscriptionText logged to your terminal.

Terminal showing the call logs including sid and transcript

Using twilio-run you can develop functions in your local environment right from the command line. If you’re happy with the functionality of your local functions, you can then move them to the functions area in your Twilio console and adjust your number to use your functions instead of webhooks.

Overview of pasted functions in Twilio console and set call handling in number configuration

After you move the functions to Twilio and adjust the incoming call handling, you have a voice bot running in the cloud that is ready to log information for further usage. At that point, you'll no longer require a local setup.

This tutorial was the first step of building PodBot. With these three functions, you’re able to manage and log phone calls using Twilio. If you want to play around with it, you can check out my podbot-functions repo on GitHub or have a look at the function docs.

If you want to start your own podcast business or just want to say “Hi” you can reach me under the following channels.