Promises, Promises: Building a Voice Scripting Toolkit in JavaScript
Time to read: 16 minutes
In this blog post, I'm going to teach you how to build sophisticated Twilio Programmable Voice apps, with the help of a JavaScript toolkit, incorporating Node.js and the Express web application framework.
If you've ever wanted to build an app to handle a Voice use case, beyond what you can easily do with Twilio Studio, but were daunted by the complexities of writing it from scratch, then this blog post is for you. You'll need some JavaScript and Node.js skills, and familiarity with TwiML and Twilio's Programmable Voice API.
I'm going to introduce you to the Twilio Programmable Voice Toolkit, which uses these three ingredients:
- Promises. Promises are a fundamental component of asynchronous programming with JavaScript, and are used in many APIs, including Twilio's own helper libraries. I will go quite deep into Promises, as a good understanding of how they work will help you build your own applications.
- The Express web application framework, running on Node.js.
- Ngrok, a service that provides publicly-accessible URLs that connect securely to a local agent running inside your firewall.
The dream that became a GitHub repo
Before we dive in, let me explain how we got here in the first place. Prior to Twilio Studio, there was often a lot of drudgery in building Twilio apps. The code would typically consist of a series of discrete functions to handle each webhook and status callback running in an application server. The developer would have to construct a state-event machine to thread them all together. Contrast this opaque and hard-to-debug approach with a Studio canvas: start at the top with the Trigger widget and work downwards, following the transitions between the widgets, until the call ends.
It's no exaggeration to say that every Twilio solutions engineer loves Studio. It's easy to build applications and demo them to customers, and if we need to do something that requires complex logic, we can offload that to a Twilio Function. Our customers love Studio, too, but sometimes they reach the limits of what Studio can easily support. Sometimes it's just better to turn a Studio flow into a web application.
This hit home hard one day when I was trying to dissect a complex Studio flow, writing it out as pseudo-code. This particular flow had nested loops, and made heavy use of the Liquid Template Language to manipulate variables, a task for which, frankly, it was not designed. If only, I thought, there was a way of combining the expressiveness and ease-of-use of Studio, on the one hand, with the power of a proper scripting language such as JavaScript, on the other.
As it turns out, there is. Let's go to GitHub and download it.
Getting started
As a prerequisite, you'll need to install Node.js (version 18 or later is required) and, optionally, the Git version control software.
Once you've done that, download and install the Twilio Programmable Voice Toolkit into your project folder as follows:
If you prefer not to use Git, you can instead download the package as a zip file and then run npm install
in the project folder.
Next, if you don't already have it, install Ngrok and sign up for a free account. Ngrok is a service which provides you with a public URL for an application that runs behind your firewall. The URL is connected to your application over a secure tunnel, which terminates at the Ngrok agent. In my case the Ngrok agent is a stand-alone program running on my laptop, but there are also versions of the agent available as libraries that you can incorporate directly into your own application, for example, this one for Node.js.
Note that you don't need Ngrok to use the toolkit, but you will otherwise need to open up your firewall so that Twilio webhooks and status callbacks can reach your app.
Start the Ngrok agent on your local machine, using the command ngrok http 3000
:
This creates a secure tunnel between a random, dynamically created public URL—in this case, https://27faede46a65.ngrok.app—and the Ngrok agent on your computer. This URL will change every time you restart the agent, but if you want a permanent URL for configuring your Twilio webhooks, you can get one through a paid Ngrok subscription.
The tunnel will take HTTPS traffic on port 443 and feed it to your local web server as unencrypted HTTP on port 3000. You may notice that there's a local web interface, and this allows you to interrogate the agent to determine the public URL and the local port the server is running on. You can make use of this when setting up your local server.
A Promising first script
As a gentle introduction to the toolkit and how it uses Promises, create the following simple script to dial a phone number and leave a message, and place it in your sample_apps subfolder:
What is a Promise, anyway?
This code makes heavy use of JavaScript Promises. A Promise, at its most basic, is an action that will take place at some point in the future. Once the action has been taken, the Promise is said to be fulfilled. If the action fails, the Promise is said to be rejected. Either way, it is settled. Prior to the action taking place, the Promise is pending.
Here you can see the two idiomatic ways of using Promises. One of these is the call to setup()
, using the following then()…catch()…finally()
notation:
The call to setup()
starts up the Express web server and prepares it to receive web traffic through the Ngrok tunnel. When the server is ready, the program runs the script using the then()
method of the returned Promise. The script()
function itself returns another Promise, and when that is settled, its finally()
method shuts down the web server.
You can think of Promises as a standardized way of packaging the callback functions that are frequently used in asynchronous operations. (By 'asynchronous', I mean that an action will happen at some point in the future, but not necessarily immediately, and quite possibly out of order from the program flow.) The then()
, catch()
and finally()
methods themselves return Promises, or, in the case of then()
, can also return a resolved value. This arrangement allows Promises to be chained together, as a series of asynchronous actions.
Making things happen in the right order, top to bottom
If you think the catch()
and finally()
method names are suspiciously similar to the keywords in try…catch…finally
exception-handling blocks, you would not be wrong. We can rewrite the Promise chain using the second idiom:
Here you can quite clearly see that the code pauses while the Promises are pending, as evidenced by the await
keyword. That is not to say that the program is blocked and doing nothing while waiting; it may be performing other actions in parallel.
await
returns the resolved value from the then()
method of the Promise. You can only use await
in functions and methods that are declared with the async
keyword. As a corollary, declaring a function or method as async
means that any returned value will be a Promise. If you do not return a Promise object explicitly, JavaScript will wrap your returned value in a Promise for you.
I personally find that this way of using Promises is a lot more intuitive than chaining them together, particularly when it is combined with control structures such as if…else
statements and loops. I will use await
in most of the examples presented here. However, there is one place where you can't use await
, and that is at the top level of a module. This is because there is no enclosing function that can be labelled as async
.
Generating TwiML from the script
Returning to the script()
function, there are two places where it awaits results. The first is when it places the outbound call in Call.makeCall()
, and waits for the Twilio API to make a request to the webhook to obtain the TwiML from our script. The second is in call.sendResponse()
, when it delivers the TwiML and waits for the status callback to indicate that the call has ended. We will explore this further in the next section.
Most of the heavy lifting in this script is done by the call.js module. It is responsible for managing the Express web server and processing the webhooks and status callbacks from the Twilio API. It also provides the Call class, which encapsulates the state and properties of a call. Call.makeCall()
is a factory method which uses the Twilio REST API to initiate a call. It sets up a webhook for the script to return TwiML, and a status callback to inform the script of the call's fate, which is written into the call.status
property.
If you are familiar with the Twilio JavaScript helper library, you may have noticed that the Call class supports many of the same methods for generating TwiML. This is because instances of the Call class contain a VoiceResponse object, and the corresponding methods of the Call class are delegated to that object. There are some restrictions on what options can be used with some of the methods, to ensure that side-effects do not break the toolkit in unexpected ways.
Finally, the toolkit has some utility modules. One such is phonenumbers.js, which provides a thin wrapper around the libphonenumbers-js library, to enable parsing of 'friendly' phone numbers in the local country format, and to convert them into the E.164 international format that Twilio requires. Another module, timeout.js, provides a "Promise-ified" version of timeouts.
The call.js module
To recap, the call.js module does three main things:
- It configures the Express web application framework through the
setup()
function. - It handles webhooks and status callbacks from the Twilio API.
- It provides the Call class which your scripts will use to create and handle calls.
In this section, we'll look at the second and third items in more detail.
The Promise Tango
To allow scripts to interact with the Twilio API, a Call object relies on a pair of Promises: one to wait for a Twilio webhook to request TwiML or deliver some result; and another used by the webhooks to wait for the script to deliver a TwiML document. Let's take a look at how those Promises are constructed and interact with each other.
You create a Promise as follows:
fulfill()
and reject()
are a pair of functions that your code will call to fulfill or reject the Promise as appropriate. They each take one optional parameter: in the case of fulfill()
, it will be the result of the async action (although it might be another Promise that yields the eventual result); in the case of reject()
, it is conventionally an Error object.
The body of the executor function is executed synchronously in the constructor and will typically store the functions for future use. Once the Promise is resolved–one way or the other–the then()
, catch()
and finally()
methods will be run in the next cycle of the Node.js event loop. At this point its state is locked in, and calling fulfill()
or reject()
again will have no further effect.
You can see a Promise being created in the invocation of Call.makeCall()
:
The Promise in this factory method resolves to a Call object, which is initialized with a set of properties returned by the Twilio API. The Call is slotted into an array of current calls, indexed by the call SID, which will be used to look up the call when a related webhook or status callback is received. The webhook and callback handlers will invoke the resolver methods of the Promise, to allow the script to resume execution.
Let's take a closer look at the private properties and the constructor of a Call object:
You can see two pairs of resolver functions, one pair for when a webhook is received, allowing the script to proceed, and the other for when TwiML is ready to be returned by the webhook. The Call constructor stores the first pair of these, in addition to creating the contained VoiceResponse object. There is also a flag to indicate whether the script expects to receive further requests for TwiML, and an array to store the details of connected (child) calls.
The second pair of resolver functions are updated when a webhook is received:
First of all, you can see the webhook Promise being fulfilled, enabling the script to continue. Next, a new Promise is created to await the resulting TwiML. When that Promise is fulfilled, the response to the webhook is returned.
Once the script has generated its TwiML, it invokes the sendResponse()
method:
This method fulfills the TwiML Promise, and creates a new one that will be resolved by the next webhook or status callback. The #scriptContinues
flag is set to true
if the last TwiML verb does not terminate the call, which would be the case with <Hangup>
or <Reject>
; if the script is indeed to continue, the TwiML <Redirect>
verb is appended to the XML document, to ensure that it does so.
There is a related method, sendFinalResponse()
, which can be used if the call is complete and no more webhooks are expected. If you use this method, then it is not necessary to invoke hangup()
prior to returning the TwiML.
Status callbacks
An important aspect of handling a call is dealing with status callbacks:
You might have wondered what happens if the call is hung up before the script ends; the answer is that the webhook Promise will be rejected with a CallEndedException, which is also exported by the call.js module. For all but the simplest scripts, you should make it a practice to catch this exception.
By default, only call completed events will generate status callbacks. If you wanted to get the initiated, ringing or answered events, then you can request them as follows:
Notice the use of nextEvent()
to get the next event in sequence. Be aware that the webhook and status callback events for the call being answered may arrive in any order, so check the eventSource
property of the Call before attempting to return TwiML.
The PV Toolkit also supports asynchronous Answering Machine Detection, which uses a separate status callback. Its use is shown in the apptreminder2.js sample app.
Call properties
With the exception of eventSource
, all the public properties of a Call object are derived from the returned values of the initial API call that created it, or the last webhook, or the last status callback. There's a class property Call.propertyMappings
, which lists the parameters of interest, and their JavaScript property names, which have been normalized into camelCase. You can see the webhook and callback contents by examining them in the Request Inspector of your Twilio Console, or from the Ngrok local agent. You can also see the very same contents and the generated TwiML by setting the debug level for the call.js module. On Mac OS and Linux the commands are:
The equivalent commands for Windows are set DEBUG=debug
and set DEBUG=
.
If there are child calls (created through the <Dial>
verb), the array of child calls can be obtained through the instance property childCalls
, and they will each have their own properties.
The following table describes some of the more noteworthy Call instance properties (in some cases these will actually be a property of a child call); a complete description can be found in the module documentation.
Property | Values |
---|---|
eventSource | The last event that updated the Call object: `api`, `webhook`, `status`, `dial`, `incoming` |
sid | The call SID, which is the call's globally unique identifier. |
to, from | The called and calling numbers, as an E.164 number, SIP URI or Client identifier. |
callerName | If available, the CNAM (caller id display name) of the calling party. |
forwardedFrom | If present, indicates that the call was forwarded by this number. |
status | The call status |
dialCallStatus | The status of a child call leg |
answeredBy,machineDetectionDuration | The results of Answering Machine Detection. |
callToken | Presented on an inbound call, it can be used to make outbound API calls with the incoming call's caller id. |
stirVerstat, stirPassportToken | Indicates the the SHAKEN/STIR status of an incoming call, and the encoded JSON Web Token that contains the passport. |
stirStatus | The SHAKEN/STIR status given to an outbound call from Twilio. |
direction | The direction of the call: `inbound`, `outbound-api` or `outbound-dial` |
digits, finishedOnKey, speechResult, confidence | Gather results. The previous results will be deleted with each new Gather. |
sipResponseCode | The SIP response code that indicates the final state of an outbound call. |
errorCode, errorMessage | Error code and short message for failed calls, from the Twilio Errors and Warnings Dictionary. |
childCalls | The array of child calls; each call has its own set of properties. |
A more complex example
The introductory example could have been trivially written as a Twilio Studio flow, so let's look at something more complex that certainly could not, not least because it involves parallel actions.
Getting set up
In preparation, take a look in the sample_apps folder of the repository, and you will find a file called datasets.js, which contains template datasets that you can edit to add your own phone numbers. It contains entries like these:
In the following example, we'll use one of the entries in the datasets.js file to drive an app, conference.js, which will allow members of a group to create an on-demand conference by calling a Twilio phone number. An inbound call from one member will result in outbound calls to all the other members, and if they respond appropriately, they will be added to the conference.
The outline of the program looks like this:
To run the program using our test dataset, invoke it thus:
This gets the names and numbers of the conference participants defined in the dataset member test, and registers an inbound script.
But wait, how do we wire up a phone number to invoke the script? Well, if you're using a paid version of Ngrok, you can configure the webhook of a Twilio phone number with a custom Ngrok subdomain:
In this example, I would configure one of my phone numbers with the voice URL https://welbourn.ngrok.io/inbound. If I didn't have a reserved custom Ngrok subdomain, I would have to copy the random Ngrok domain name into my webhook configuration, which will change every time I run the Ngrok local agent. This is inconvenient, so to make life easier, there is an option to pass in a phone number to the setup()
function:
This will use the Twilio Incoming Phone Numbers API to configure the given phone number to use the current Ngrok agent URL.
Handling inbound calls
Here is the inbound call handler of call.js:
This creates a Call object, adds it to the array of current calls, and uses the _respondToInboundCall()
method to get a Promise to return some TwiML. Once this is done, the script is invoked:
The script first checks that the caller is a member of the group, and connects them to the conference. Following this, the script calls the other members of the group, to invite them to join the conference.
Notice that there is no await
involved here; the action to connect the caller is initiated, and the script immediately moves on to invite the other parties. The logging of the call leg ending will only take place once the webhook is invoked for further instructions.
Making outbound calls—in parallel!
Do you remember my warning about making sure you have an await
or then()
to handle the results of an async function? Well, with makeOutboundCall()
, we have an exception that proves that rule: the other parties are dialed in parallel, and there is no need to handle any results. Nor will the program terminate prematurely, as the Express web server will keep running until you stop the program at the command line by pressing Control-C
.
There's one feature of this script that I must point out: the use of a call token to make an outbound API call with the inbound call's caller id. Normally Twilio forbids you from making outbound API calls with a caller id you don't own; but the call token is in effect a permission slip to reuse the inbound caller id, while that inbound call leg is still in progress. This means that recipients of the outbound calls will recognize the caller id, instead of hanging up on some Twilio phone number which may not be familiar.
The makeOutboundCall()
function consists mainly of an Interactive Voice Response system (IVR) that asks the called party whether or not they'd like to join the conference:
Again, the program logs the end of the call, which will happen when it gets the end-of-call status callback. There's also a catch
statement which looks for a premature ending, which would indicate that the recipient hung up without responding to the IVR.
This is just one example of a more complex application built using the Programmable Voice Toolkit. The GitHub repo contains other sample apps, including one for simple appointment reminders (apptreminder.js), a slightly more complex one which handles answering machine detection (apptreminder2.js), and another for on-call incident notifications (oncall.js).
Troubleshooting
Working with Promises takes some getting used to. In particular, you need to understand where to use await
, to stop and wait for a result, and where instead to use a Promise's then()
method, to immediately carry on and process the result later. If you accidentally omit the await
keyword, and have no corresponding then()
method, Ngrok will report a 502 Bad Gateway in response to a webhook or status callback:
While you can also use the Ngrok local web interface, the best source for debugging information is your Twilio console's Request Inspector. Here you can find a call in the logs, and drill down into the content of the requests and responses:
Final thoughts
In this blog post, I have explained how to build Twilio Programmable Voice apps using a JavaScript toolkit. The toolkit features the use of Promises to hide the handling of webhooks and status callbacks, which are delegated to an Express-based web application running under Node.js. The developer is free to create their application in a natural, top-down way, without the complexity of writing a state machine.
I have shown the use of Ngrok in a typical behind-the-firewall development environment, but clearly you need to do much more to make your Express-based applications scalable and resilient. You would need to consider:
- Scaling: whether to use vertical scaling, using the Node cluster module; or horizontal scaling, using multiple servers, possibly using containerization, together with a load balancer.
- Redundancy, or what to do if a server goes down. Twilio Studio stores the state of flows and their widgets in a back-end database, and is resilient to outages in individual components. This capability, however, comes at a cost. Your applications, on the other hand, may not require this level of resilience; you may simply want to retry calls that were in flight, using a backup server. If you are using horizontal scaling, this should be readily achievable.
- Logging. The calls.js module uses an extremely simple logging package, which logs to the console. This should be replaced by something much more robust and configurable, capable of sending logs to one of the many enterprise-grade logging platforms.
Needless to say, these are topics for another day.
Additional resources
The best explanation of Promises that I have read can be found at JavaScript.info. The Mozilla Developer Network, as always, has excellent reference material.
Related to Promises is how they fit into the JavaScript event loop. Vishwas Gopinath has written a good tutorial on the Builder.io website. Node.js documentation also covers this.
Shelley Vohr covers a wide range of asynchronous programming considerations, at a talk given at JSConf EU 2018.
Resources on Node.js and Express are plentiful. One thing I should point out is that I have used the ECMAScript module import syntax in my code, which is not Node's default. A good explanation of the various flavors of module import/export syntax are given in Robert Dey's blog post at Reflectoring.io. The control of the syntax used can be specified through the "type" field in the package.json file, and is specified here.
Robert Welbourn is a Solutions Architect in Twilio's North American solutions engineering team. He started his career in the era of mainframes, when source code version control was remembering which deck of punched cards to hand to the computer operators. He is sometimes referred to by his colleagues as "That cantankerous old [Yorkshireman]."
Related Posts
Related Resources
Twilio Docs
From APIs to SDKs to sample apps
API reference documentation, SDKs, helper libraries, quickstarts, and tutorials for your language and platform.
Resource Center
The latest ebooks, industry reports, and webinars
Learn from customer engagement experts to improve your own communication.
Ahoy
Twilio's developer community hub
Best practices, code samples, and inspiration to build communications and digital engagement experiences.