How to Make Outgoing Calls with Twilio Voice and the OpenAI Realtime API and Node.js
Time to read: 9 minutes
Our friends at OpenAI recently launched their Realtime API, which exposed the multimodal capabilities of their GPT-4o model. At launch, we shared how you could build a voice AI assistant in Node.js you could call from your phone.
Since the launch, we’ve had many requests to show the opposite scenario – how do you call a phone number using OpenAI’s Realtime API and Node.js using Twilio?In this tutorial, I’ll show you some demo code which can dial a phone number using Twilio Voice and Media Streams, and the OpenAI Realtime API. I’ll show a function which demonstrates how to check if a phone number you provided is allowed to be called, then begin a phone call. Finally, after a user picks up, we’ll trigger the OpenAI API to have the AI talk first.
Let’s get started.
Prerequisites
To follow along, ensure you have:
- Node.js 18+. Download it from here. (I used
18.20.4
for this tutorial, please check your version if you run into issues.) - A Twilio account. If you don’t have one yet, you can sign up for a free trial here.
- A Twilio number with Voice capabilities to make an outbound call. Here are instructions to purchase a phone number.
- An OpenAI account and an OpenAI API Key with OpenAI Realtime API Access. You can sign up here.
- ngrok or another tunneling solution to expose your local server to the internet for testing. You can download ngrok here.
- Either:
- A second Twilio phone number where you can place a call using the Twilio Dev Phone. Or
- A phone number to a device where you can receive phone calls, that you’ve added to your Twilio Verified Caller IDs. You can find a tutorial here.
Awesome, let’s start building.
Build the AI phone call application
Step 1: Set up your project
Start by creating and navigating to your project directory, then setting up a new Node.js project.
Step 2: Install the necessary dependencies
Next, install the required packages:
As with our dial an AI with Node.js tutorial, we’ll use Fastify as our web framework.
Step 3: Create the project files
We will create a file named index.js
for our main server code. We’ll also have an .env
file to store environment variables. ( More information on this strategy here)
Create a .env
file to securely store your API credentials:
Add the following to your .env file, replacing placeholders with your actual keys. Find your TWILIO_ACCOUNT_SID
and TWILIO_AUTH_TOKEN
in your Twilio Console. The PHONE_NUMBER_FROM
should be the Twilio phone number you purchased in the Prerequisites, in E.164 format (e.g., +18885551212
).
Now, create the index.js
file:
Open it with your favorite text editor or IDE – it’s editing time!
Step 4: Write the Server Code
Excellent work! That was quite a bit of setup with the keys, configuration, and the prerequisites, but we’re ready to get down to business – or silliness, depending on your goals with this build. I’ll go step by step and explain some of the more interesting parts of the code.
Step 4.1 Import dependencies, set constants, and set environment variables
Like with most Node apps, first we start with a bit of boilerplate. And I’ll explain the goofy regular expression after.
Add this at the top of the file:
Like any other Node project, we start with some imports. I’ll skip the explanation here.
Next, we define constants for the system message, voice, and server port. We’ll also choose the OpenAI events to log to the console. SYSTEM_MESSAGE
is instructions we send to the AI when we open the websocket, essentially a system prompt which controls the overall tenor of the conversation. You can find more information on setting the voice and events in OpenAI’s Realtime API Reference.
Then, we load environment variables from the .env
file (and check that you set them all!).
const DOMAIN …
is a convenience regular expression, to remove accidental trailing slashes or leading protocols when you set the DOMAIN
variable later in this tutorial.
Step 4.2 Define a number filter
Now, paste our isNumberAllowed
filter function:
Like I warn in the code, making outbound calls requires you comply with the various rules and regulations in your jurisdiction. For example, in the United States, your outbound calls have to comply with the Telephone Consumer Protection Act (or TCPA). We at Twilio ask you to do your own due diligence when determining whether your usage is compliant.
In this app though, the filter function which shows how to check you’re dialing a number we know you have permission to call – other numbers you own with Twilio, and verified Caller IDs.
incomingPhoneNumbers
sounds like a mistake, but these are regular Twilio Phone Numbers. Using one allows you to test this app by making calls to the Twilio Dev Phone.
OutgoingCallerIDs
are other numbers you can verify with Twilio that allow you to have another number you control appear as an outgoing Caller ID. For example, I verified my cell phone – that made testing this tutorial straightforward!
Step 4.3 Make an outbound call function
Below our filter function, create our outbound calling function:
This one is straightforward – first, we call the number filter function. If the number is valid, make a phone call with the Twilio Node.js Helper Library.
Step 4.4 Initialize Twilio and Fastify, and define the root route
Next, we’ll do a little more initializing, and define our root (/
) route. It isn’t used in the functionality, but it might be useful to check your server is running!
Paste this next:
Step 4.5 Set up the WebSocket route
In this step, you'll configure the WebSocket route in your server to handle media streams. This route will proxy audio between Twilio's media streams and OpenAI's Realtime API.
Add this code right after your root route definition:
This snippet sets up a new WebSocket server for the /media-stream
route. When a connection is established, you log a message indicating the client has connected.
Step 4.6 Connect and configure the OpenAI Realtime WebSocket
Next, you'll connect to the OpenAI Realtime API using a WebSocket. This connection allows you to send and receive audio data in real time. Paste this code below the previous code (but inside the fastify.register(async (fastify) => {
block):
I explain similar code in more detail in the previous Node.js tutorial. But there are a few differences in this post – here’s a brief explanation of what’s going on here:
- WebSocket Initialization: You initialize a WebSocket connection to OpenAI's Realtime API.
- Session Update: We use the
sendInitialSessionUpdate
function to configure the session with desired settings, such as the AI voice and system message (set above in the constants). Then we send asession.update
event to OpenAI to update our session’s configuration ( more details).Note that we set the inbound and outbound audio format tog711_ulaw
. This format is supported by Twilio and Media Streams, so we don’t have to do any transcoding. - AI talks first: Since we’re dialing out, we send a manual conversation update with
conversation.item.create
andresponse.create
We send everything .1 seconds after the WebSocket is open. This gives time for OpenAI to send its default session configuration, and for us to send our preferences.
Step 4.7 Handle OpenAI and Twilio WebSocket messages
Next, you'll need to handle messages from both OpenAI’s and Twilio’s WebSockets, proxying audio data between the two.
Place this code right below the previous segment (it’s a bit longer, I’ll explain more after):
Here’s the general algorithm for this code:
- Event Checking: For each incoming message – from either WebSocket – determine its type and, if necessary, shuttle it over to the other channel. Specifically,
media
messages from Twilio contain audio data, whileresponse.audio.delta
contains audio from OpenAI. - Handle WebSocket Start for Twilio: Log that the WebSocket connection with Twilio started. We don’t send any sort of configuration update here; Twilio expects
audio/x-mulaw
data by default so we can work with a default configuration. - Handle WebSocket
close
messages gracefully: deal with call ends, socket closures, and errors.
Step 4.8 Initialize and launch the server
Finally, we set up code to launch our server when you run index.js
. But compared to our previous tutorial, this time we also initiate an outbound call before hitting that media-stream
route.
Paste this at the end of your file:
Here, we check that when you launch the server, you pass in a --call
parameter, for example --call=+18885551212
. If you do, we’ll run through the earlier logic to check you can make outbound calls, then initiate a call to the number you passed in.
Okay, great! You’re good to go - close the file, and let’s show you how to run and test the code.
Run and test your code
In the next steps, I’ll cover how to get the code to run so you can have the AI make an outbound call to you.
Step 1: Launch ngrok
You need to use ngrok or a similar service (or a VPS or another solution, etc.) to expose your server to the internet. Twilio requires a public URL to send requests to your server and to receive instructions back from your code.
Download and install ngrok if you still need to, then run the following command. If you have changed the port from 6060
, be sure to update it here:
Step 1.1 Set the DOMAIN variable
Remember earlier when I told you to wait on the DOMAIN
variable in the .env
file? Let’s set it now. When you launch ngrok, you’ll see a screen like the following:
In your .env file, you’ll want to change DOMAIN
to the Forwarding address from ngrok, without the protocol (https://
in my image).
Here’s an example using my .env
(with fake values, other than DOMAIN
):
Save that – let’s continue.
Step 2: Run the Twilio Dev Phone
As you saw, we have a filter function which makes sure we’re only calling numbers we have permission to call. While you’ll write a different function for your use case, my demo function allows you to call Twilio numbers you own.
If you haven’t yet, go through the Twilio Dev Phone tutorial. It will ask you to install the Twilio CLI, and add your account credentials.
When you’re done, run twilio dev-phone
in your console. You should see a screen like this:
In the Phone Number box, choose the Twilio number you’re going to call. If – like me – you have that number configured, it’ll warn you before overwriting the config. Double check the number is okay to use, then hit Use this phone number.
Step 3: Place an outbound call
We’re almost there, can you feel it? Well, you’re about to hear it – run the following in your console, replacing the placeholder number with your Twilio Dev Phone number (or alternatively, a Verified Caller ID number):
Either accept the call from the Dev Phone (or your other device) – you should hear a greeting from the AI. Enjoy your chat!
Debugging your setup
Assuming your server is running, here are the first places to check if you have issues placing an outbound call:
- Is ngrok running? Is the
DOMAIN
variable properly set in the.env
file? - Is your code calling OpenAI correctly? See more information in their documentation.
- Have you checked the Error Logs in the Developer Tools?
- Did you get error 21216 from Twilio ? Do you need to add a Primary Caller Profile in TrustHub ?
Conclusion
Congratulations! You successfully created an AI voice assistant that will place an outbound call using Twilio Voice and the OpenAI Realtime API. The code is now ready for your modifications, though be sure to check our Node app first to see if we already have a demo.
Happy chatting!
Next step:
- Check out the Twilio documentation and OpenAI's Realtime API docs for more advanced features.
- Try our Code Exchange app or repo for an example of inbound calling to an AI Voice Assistant. (The repo also demonstrates one way to handle interruptions).
- See OpenAI’s documentation on concepts.
Paul Kamp is the Technical Editor-in-Chief of the Twilio Blog. He had the AI call his wife quite a few times while creating this tutorial. (Sorry Christine!) You can reach him at pkamp [at] twilio.com.
Related Posts
Related Resources
Twilio Docs
From APIs to SDKs to sample apps
API reference documentation, SDKs, helper libraries, quickstarts, and tutorials for your language and platform.
Resource Center
The latest ebooks, industry reports, and webinars
Learn from customer engagement experts to improve your own communication.
Ahoy
Twilio's developer community hub
Best practices, code samples, and inspiration to build communications and digital engagement experiences.