How to Make Outgoing Calls with Twilio Voice and the OpenAI Realtime API and Node.js

November 14, 2024
Written by
Paul Kamp
Twilion
Reviewed by

Our friends at OpenAI recently launched their Realtime API, which exposed the multimodal capabilities of their GPT-4o model. At launch, we shared how you could build a voice AI assistant in Node.js you could call from your phone.

Since the launch, we’ve had many requests to show the opposite scenario – how do you call a phone number using OpenAI’s Realtime API and Node.js using Twilio?In this tutorial, I’ll show you some demo code which can dial a phone number using Twilio Voice and Media Streams, and the OpenAI Realtime API. I’ll show a function which demonstrates how to check if a phone number you provided is allowed to be called, then begin a phone call. Finally, after a user picks up, we’ll trigger the OpenAI API to have the AI talk first.

Let’s get started.

Prerequisites

To follow along, ensure you have:

  • Node.js 18+. Download it from here. (I used 18.20.4 for this tutorial, please check your version if you run into issues.)
  • A Twilio account. If you don’t have one yet, you can sign up for a free trial here.
  • A Twilio number with Voice capabilities to make an outbound call. Here are instructions to purchase a phone number.
  • An OpenAI account and an OpenAI API Key with OpenAI Realtime API Access. You can sign up here.
  • ngrok or another tunneling solution to expose your local server to the internet for testing. You can download ngrok here.
  • Either:
  • A second Twilio phone number where you can place a call using the Twilio Dev Phone. Or
  • A phone number to a device where you can receive phone calls, that you’ve added to your Twilio Verified Caller IDs. You can find a tutorial here.

Awesome, let’s start building.

Build the AI phone call application

Step 1: Set up your project

Start by creating and navigating to your project directory, then setting up a new Node.js project.

mkdir outbound-calling-speech-assistant-openai-realtime-api-node
cd outbound-calling-speech-assistant-openai-realtime-api-node
npm init -y; npm pkg set type="module"

Step 2: Install the necessary dependencies

Next, install the required packages:

npm install fastify ws dotenv @fastify/formbody @fastify/websocket twilio

As with our dial an AI with Node.js tutorial, we’ll use Fastify as our web framework.

Step 3: Create the project files

We will create a file named index.js for our main server code. We’ll also have an .env file to store environment variables. ( More information on this strategy here)

Create a .env file to securely store your API credentials:

touch .env

Add the following to your .env file, replacing placeholders with your actual keys. Find your TWILIO_ACCOUNT_SID and TWILIO_AUTH_TOKEN in your Twilio Console. The PHONE_NUMBER_FROM should be the Twilio phone number you purchased in the Prerequisites, in E.164 format (e.g., +18885551212).

Leave DOMAIN alone, for now, I’ll show you where to fill that in later.
TWILIO_ACCOUNT_SID="your_twilio_account_sid"
TWILIO_AUTH_TOKEN="your_twilio_auth_token"
PHONE_NUMBER_FROM="your_twilio_phone_number"
DOMAIN="your_ngrok_domain"
OPENAI_API_KEY="your_openai_api_key"

Now, create the index.js file:

touch index.js

Open it with your favorite text editor or IDE – it’s editing time!

Step 4: Write the Server Code

Excellent work! That was quite a bit of setup with the keys, configuration, and the prerequisites, but we’re ready to get down to business – or silliness, depending on your goals with this build. I’ll go step by step and explain some of the more interesting parts of the code.

Step 4.1 Import dependencies, set constants, and set environment variables

Like with most Node apps, first we start with a bit of boilerplate. And I’ll explain the goofy regular expression after.

Add this at the top of the file:

import Fastify from 'fastify';
import WebSocket from 'ws';
import fs from 'fs';
import dotenv from 'dotenv';
import fastifyFormBody from '@fastify/formbody';
import fastifyWs from '@fastify/websocket';
import twilio from 'twilio';
// Load environment variables from .env file
dotenv.config();

// Retrieve the OpenAI API key, Twilio Account Credentials, outgoing phone number, and public domain address from environment variables.
const {
  TWILIO_ACCOUNT_SID,
  TWILIO_AUTH_TOKEN,
  PHONE_NUMBER_FROM,
  DOMAIN: rawDomain,
  OPENAI_API_KEY,
} = process.env;

// Constants
const DOMAIN = rawDomain.replace(/(^\w+:|^)\/\//, '').replace(/\/+$/, ''); // Clean protocols and slashes
const SYSTEM_MESSAGE = 'You are a helpful and bubbly AI assistant who loves to chat about anything the user is interested in and is prepared to offer them facts. You have a penchant for dad jokes, owl jokes, and rickrolling – subtly. Always stay positive, but work in a joke when appropriate.';
const VOICE = 'alloy';
const PORT = process.env.PORT || 6060; // Allow dynamic port assignment
const outboundTwiML = `<?xml version="1.0" encoding="UTF-8"?><Response><Connect><Stream url="wss://${DOMAIN}/media-stream" /></Connect></Response>`;

// List of Event Types to log to the console. See the OpenAI Realtime API Documentation.
const LOG_EVENT_TYPES = [
    'error',
    'response.content.done',
    'rate_limits.updated',
    'response.done',
    'input_audio_buffer.committed',
    'input_audio_buffer.speech_stopped',
    'input_audio_buffer.speech_started',
    'session.created'
];
if (!TWILIO_ACCOUNT_SID || !TWILIO_AUTH_TOKEN || !PHONE_NUMBER_FROM || !rawDomain || !OPENAI_API_KEY) {
  console.error('One or more environment variables are missing. Please ensure TWILIO_ACCOUNT_SID, TWILIO_AUTH_TOKEN, PHONE_NUMBER_FROM, DOMAIN, and OPENAI_API_KEY are set.');
  process.exit(1);
}

Like any other Node project, we start with some imports. I’ll skip the explanation here.

Next, we define constants for the system message, voice, and server port. We’ll also choose the OpenAI events to log to the console. SYSTEM_MESSAGE is instructions we send to the AI when we open the websocket, essentially a system prompt which controls the overall tenor of the conversation. You can find more information on setting the voice and events in OpenAI’s Realtime API Reference.

Then, we load environment variables from the .env file (and check that you set them all!).

const DOMAIN … is a convenience regular expression, to remove accidental trailing slashes or leading protocols when you set the DOMAIN variable later in this tutorial.

Step 4.2 Define a number filter

Now, paste our isNumberAllowed filter function:

// Function to check if a number is allowed to be called. With your own function, be sure 
// to do your own diligence to be compliant.
async function isNumberAllowed(to) {
  try {

    // Uncomment these lines to test numbers. Only add numbers you have permission to call
    // const consentMap = {"+18005551212": true}
    // if (consentMap[to]) return true;

    // Check if the number is a Twilio phone number in the account, for example, when making a call to the Twilio Dev Phone
    const incomingNumbers = await client.incomingPhoneNumbers.list({ phoneNumber: to });
    if (incomingNumbers.length > 0) {
      return true;
    }

    // Check if the number is a verified outgoing caller ID. https://www.twilio.com/docs/voice/api/outgoing-caller-ids
    const outgoingCallerIds = await client.outgoingCallerIds.list({ phoneNumber: to });
    if (outgoingCallerIds.length > 0) {
      return true;
    }

    return false;
  } catch (error) {
    console.error('Error checking phone number:', error);
    return false;
  }
}

Like I warn in the code, making outbound calls requires you comply with the various rules and regulations in your jurisdiction. For example, in the United States, your outbound calls have to comply with the Telephone Consumer Protection Act (or TCPA). We at Twilio ask you to do your own due diligence when determining whether your usage is compliant.

In this app though, the filter function which shows how to check you’re dialing a number we know you have permission to call – other numbers you own with Twilio, and verified Caller IDs.

incomingPhoneNumbers sounds like a mistake, but these are regular Twilio Phone Numbers. Using one allows you to test this app by making calls to the Twilio Dev Phone.

OutgoingCallerIDs are other numbers you can verify with Twilio that allow you to have another number you control appear as an outgoing Caller ID. For example, I verified my cell phone – that made testing this tutorial straightforward!

Your usage in your final app also has to comply with Twilio’s Terms of Service and Voice Services Policies.

Step 4.3 Make an outbound call function

Below our filter function, create our outbound calling function:

// Function to make an outbound call
async function makeCall(to) {
  try {
    const isAllowed = await isNumberAllowed(to);
    if (!isAllowed) {
      console.warn(`The number ${to} is not recognized as a valid outgoing number or caller ID.`);
      process.exit(1);
    }

    const call = await client.calls.create({
      from: PHONE_NUMBER_FROM,
      to,
      twiml: outboundTwiML,
    });
    console.log(`Call started with SID: ${call.sid}`);
  } catch (error) {
    console.error('Error making call:', error);
  }
}

This one is straightforward – first, we call the number filter function. If the number is valid, make a phone call with the Twilio Node.js Helper Library.

Step 4.4 Initialize Twilio and Fastify, and define the root route

Next, we’ll do a little more initializing, and define our root (/) route. It isn’t used in the functionality, but it might be useful to check your server is running!

Paste this next:

// Initialize the Twilio library and set our outgoing call TwiML
const client = twilio(TWILIO_ACCOUNT_SID, TWILIO_AUTH_TOKEN);

// Initialize Fastify
const fastify = Fastify();
fastify.register(fastifyFormBody);
fastify.register(fastifyWs);

// Root Route
fastify.get('/', async (request, reply) => {
    reply.send({ message: 'Twilio Media Stream Server is running!' });
});

Step 4.5 Set up the WebSocket route

In this step, you'll configure the WebSocket route in your server to handle media streams. This route will proxy audio between Twilio's media streams and OpenAI's Realtime API.

Add this code right after your root route definition:

// WebSocket route for media-stream
fastify.register(async (fastify) => {
    // Setup WebSocket server for handling media streams
    fastify.get('/media-stream', { websocket: true }, (connection, req) => {
        console.log('Client connected');

This snippet sets up a new WebSocket server for the /media-stream route. When a connection is established, you log a message indicating the client has connected.

Step 4.6 Connect and configure the OpenAI Realtime WebSocket

Next, you'll connect to the OpenAI Realtime API using a WebSocket. This connection allows you to send and receive audio data in real time. Paste this code below the previous code (but inside the fastify.register(async (fastify) => { block):

const openAiWs = new WebSocket('wss://api.openai.com/v1/realtime?model=gpt-4o-realtime-preview-2024-10-01', {
            headers: {
                Authorization: `Bearer ${OPENAI_API_KEY}`,
                "OpenAI-Beta": "realtime=v1"
            }
        });

        let streamSid = null;

        const sendInitialSessionUpdate = () => {
            const sessionUpdate = {
                type: 'session.update',
                session: {
                    turn_detection: { type: 'server_vad' },
                    input_audio_format: 'g711_ulaw',
                    output_audio_format: 'g711_ulaw',
                    voice: VOICE,
                    instructions: SYSTEM_MESSAGE,
                    modalities: ["text", "audio"],
                    temperature: 0.8,
                }
            };

            console.log('Sending session update:', JSON.stringify(sessionUpdate));
            openAiWs.send(JSON.stringify(sessionUpdate));

            const initialConversationItem = {
                type: 'conversation.item.create',
                item: {
                    type: 'message',
                    role: 'user',
                    content: [
                        {
                            type: 'input_text',
                            text: 'Greet the user with "Hello there! I\'m an AI voice assistant from Twilio and the OpenAI Realtime API. How can I help?"'
                        }
                    ]
                }
            };

            openAiWs.send(JSON.stringify(initialConversationItem));
            openAiWs.send(JSON.stringify({ type: 'response.create' }));
        };

        // Open event for OpenAI WebSocket
        openAiWs.on('open', () => {
            console.log('Connected to the OpenAI Realtime API');
            setTimeout(sendInitialSessionUpdate, 100); // Ensure connection stability, send after .1 second
        });

I explain similar code in more detail in the previous Node.js tutorial. But there are a few differences in this post – here’s a brief explanation of what’s going on here:

  • WebSocket Initialization: You initialize a WebSocket connection to OpenAI's Realtime API.
  • Session Update: We use the sendInitialSessionUpdate function to configure the session with desired settings, such as the AI voice and system message (set above in the constants). Then we send a session.update event to OpenAI to update our session’s configuration ( more details).Note that we set the inbound and outbound audio format to g711_ulaw. This format is supported by Twilio and Media Streams, so we don’t have to do any transcoding.
  • AI talks first: Since we’re dialing out, we send a manual conversation update with conversation.item.create and response.create

We send everything .1 seconds after the WebSocket is open. This gives time for OpenAI to send its default session configuration, and for us to send our preferences.

If you modify the greeting, we suggest that you always disclose that one side of your conversation is powered by AI.

Step 4.7 Handle OpenAI and Twilio WebSocket messages

Next, you'll need to handle messages from both OpenAI’s and Twilio’s WebSockets, proxying audio data between the two.

Place this code right below the previous segment (it’s a bit longer, I’ll explain more after):

// Listen for messages from the OpenAI WebSocket (and send to Twilio if necessary)
        openAiWs.on('message', (data) => {
            try {
                const response = JSON.parse(data);

                if (LOG_EVENT_TYPES.includes(response.type)) {
                    console.log(`Received event: ${response.type}`, response);
                }

                if (response.type === 'session.updated') {
                    console.log('Session updated successfully:', response);
                }

                if (response.type === 'response.audio.delta' && response.delta) {
                    const audioDelta = {
                        event: 'media',
                        streamSid: streamSid,
                        media: { payload: Buffer.from(response.delta, 'base64').toString('base64') }
                    };
                    connection.send(JSON.stringify(audioDelta));
                }
            } catch (error) {
                console.error('Error processing OpenAI message:', error, 'Raw message:', data);
            }
        });

        // Handle incoming messages from Twilio
        connection.on('message', (message) => {
            try {
                const data = JSON.parse(message);

                switch (data.event) {
                    case 'media':
                        if (openAiWs.readyState === WebSocket.OPEN) {
                            const audioAppend = {
                                type: 'input_audio_buffer.append',
                                audio: data.media.payload
                            };

                            openAiWs.send(JSON.stringify(audioAppend));
                        }
                        break;
                    case 'start':
                        streamSid = data.start.streamSid;
                        console.log('Incoming stream has started', streamSid);
                        break;
                    default:
                        console.log('Received non-media event:', data.event);
                        break;
                }
            } catch (error) {
                console.error('Error parsing message:', error, 'Message:', message);
            }
        });

        // Handle connection close
        connection.on('close', () => {
            if (openAiWs.readyState === WebSocket.OPEN) openAiWs.close();
            console.log('Client disconnected.');
        });

        // Handle WebSocket close and errors
        openAiWs.on('close', () => {
            console.log('Disconnected from the OpenAI Realtime API');
        });

        openAiWs.on('error', (error) => {
            console.error('Error in the OpenAI WebSocket:', error);
        });
    });
});

Here’s the general algorithm for this code:

  • Event Checking: For each incoming message – from either WebSocket – determine its type and, if necessary, shuttle it over to the other channel. Specifically, media messages from Twilio contain audio data, while response.audio.delta contains audio from OpenAI.
  • Handle WebSocket Start for Twilio: Log that the WebSocket connection with Twilio started. We don’t send any sort of configuration update here; Twilio expects audio/x-mulaw data by default so we can work with a default configuration.
  • Handle WebSocket close messages gracefully: deal with call ends, socket closures, and errors.
For simplicity, this code doesn’t implement interruption handling. After finishing the tutorial, see our repo for one way to handle interruptions.

Step 4.8 Initialize and launch the server

Finally, we set up code to launch our server when you run index.js. But compared to our previous tutorial, this time we also initiate an outbound call before hitting that media-stream route.

Paste this at the end of your file:

// Initialize server
fastify.listen({ port: PORT }, (err) => {
    if (err) {
        console.error(err);
        process.exit(1);
    }
    console.log(`Server is listening on port ${PORT}`);

    // Parse command-line arguments to get the phone number
    const args = process.argv.slice(2);
    const phoneNumberArg = args.find(arg => arg.startsWith('--call='));
    if (!phoneNumberArg) {
        console.error('Please provide a phone number to call, e.g., --call=+18885551212');
        process.exit(1);
    }
    console.log(
        'Our recommendation is to always disclose the use of AI for outbound or inbound calls.\n'+
        'Reminder: all of the rules of TCPA apply even if a call is made by AI \n' +
        'Check with your counsel for legal and compliance advice.'
    );
    const phoneNumberToCall = phoneNumberArg.split('=')[1].trim();
    console.log('Calling ', phoneNumberToCall);
    makeCall(phoneNumberToCall);
});

Here, we check that when you launch the server, you pass in a --call parameter, for example --call=+18885551212. If you do, we’ll run through the earlier logic to check you can make outbound calls, then initiate a call to the number you passed in.

Okay, great! You’re good to go - close the file, and let’s show you how to run and test the code.

Run and test your code

In the next steps, I’ll cover how to get the code to run so you can have the AI make an outbound call to you.

Step 1: Launch ngrok

You need to use ngrok or a similar service (or a VPS or another solution, etc.) to expose your server to the internet. Twilio requires a public URL to send requests to your server and to receive instructions back from your code.

I’ll provide instructions for ngrok in this post. You can find other reverse proxy or tunneling options here, and some notes on further options here.

Download and install ngrok if you still need to, then run the following command. If you have changed the port from 6060, be sure to update it here:

ngrok http 6060

Step 1.1 Set the DOMAIN variable

Remember earlier when I told you to wait on the DOMAIN variable in the .env file? Let’s set it now. When you launch ngrok, you’ll see a screen like the following:

Terminal window showing Ngrok session status, update availability, region, latency, and web interface forwarding URL.

In your .env file, you’ll want to change DOMAIN to the Forwarding address from ngrok, without the protocol (https:// in my image).

Here’s an example using my .env (with fake values, other than DOMAIN):

OPENAI_API_KEY=sk-proj-U.........
TWILIO_ACCOUNT_SID=ACe......
TWILIO_AUTH_TOKEN=........
PHONE_NUMBER_FROM=+140120.....
DOMAIN=a1fe24b64cad.ngrok.app

Save that – let’s continue.

Step 2: Run the Twilio Dev Phone

Alternatively, add a Verified Caller ID to Twilio for a phone that can receive an inbound call. Call that number in the next step instead of the Dev Phone.

As you saw, we have a filter function which makes sure we’re only calling numbers we have permission to call. While you’ll write a different function for your use case, my demo function allows you to call Twilio numbers you own.

If you haven’t yet, go through the Twilio Dev Phone tutorial. It will ask you to install the Twilio CLI, and add your account credentials.

When you’re done, run twilio dev-phone in your console. You should see a screen like this:

Screenshot of Twilio Dev Phone setup page with phone number selection and configuration options.

In the Phone Number box, choose the Twilio number you’re going to call. If – like me – you have that number configured, it’ll warn you before overwriting the config. Double check the number is okay to use, then hit Use this phone number.

Step 3: Place an outbound call

We’re almost there, can you feel it? Well, you’re about to hear it – run the following in your console, replacing the placeholder number with your Twilio Dev Phone number (or alternatively, a Verified Caller ID number):

node index.js --call=+18005551212

Either accept the call from the Dev Phone (or your other device) – you should hear a greeting from the AI. Enjoy your chat!

Debugging your setup

Assuming your server is running, here are the first places to check if you have issues placing an outbound call:

Conclusion

Congratulations! You successfully created an AI voice assistant that will place an outbound call using Twilio Voice and the OpenAI Realtime API. The code is now ready for your modifications, though be sure to check our Node app first to see if we already have a demo.

Happy chatting!

Next step:

Paul Kamp is the Technical Editor-in-Chief of the Twilio Blog. He had the AI call his wife quite a few times while creating this tutorial. (Sorry Christine!) You can reach him at pkamp [at] twilio.com.