How to Make Outgoing Calls with Twilio Voice, the OpenAI Realtime API, and Python

November 14, 2024
Written by
Paul Kamp
Twilion
Reviewed by

OpenAI recently launched their Realtime API, exposing the multimodal capabilities of their GPT-4o model. When they launched, we posted our tutorial on how you could build a voice AI assistant in Python. Since then, many of you have asked for a demonstration of how to have the AI call out to a number.

Don’t worry, I’ve got you covered. In this tutorial, I’ll show you how to make an outbound phone call using Python, Twilio Voice and Media Streams, and the OpenAI Realtime API. I’ll show an example filter function, which demonstrates how to check if a phone number is allowed to be called, then (assuming it is!) begins a phone call. Finally, after a user picks up the call, we’ll have OpenAI’s Realtime API talk first to kick off a conversation.

Sounds good? Well, the AI will sound even better… let’s code.

Prerequisites

To follow along, ensure you have:

  • Python 3.9+ installed. Download it from here. (I used 3.9.13 here, but newer versions should work too. Verify your version if issues arise.)
  • A Twilio account. If you don’t have one yet, you can sign up for a free trial here.
  • A Twilio number with Voice capabilities to make an outbound call. Here are instructions to purchase one.
  • An OpenAI account and an OpenAI API Key with OpenAI Realtime API access. Sign up here to get one.
  • ngrok or another tunneling solution to expose your local server to the internet for testing. You can download ngrok here.
  • Either:
  • A second Twilio phone number where you can place a call using the Twilio Dev Phone. Or
  • A phone number to a device where you can receive phone calls, that you’ve added to your Twilio Verified Caller IDs. You can find a tutorial here.

Awesome, let’s do this.

Build the Python outbound AI call application

Step 1: Set up your project

To start, create a project directory and set up your Python environment:

mkdir outbound-calling-speech-assistant-openai-realtime-api-python
cd outbound-calling-speech-assistant-openai-realtime-api-python
python -m venv venv

As you can see there, we’ll do our work in a virtual environment. Activate the virtual environment:

  • On Windows: .\venv\Scripts\activate
  • On macOS/Linux: source venv/bin/activate

Step 2: Install the required packages

Once the virtual environment is active, install the necessary Python packages using pip:

pip install fastapi uvicorn twilio websockets python-dotenv

These packages provide the tools needed to handle HTTP requests and WebSockets, and to simplify interactions with Twilio and OpenAI.

I’m using FastAPI here, just like in the Python inbound OpenAI Realtime example. I found it more straightforward to handle websockets and the asynchronous code than some other frameworks.

Step 3: Create the project files

We will create a file named main.py for our main server code. We’ll also use an .env file to store sensitive environment variables. ( More information on this strategy here)

Create a .env file to securely store API keys and other variables:

touch .env

Add the following to your .env file, replacing my placeholders with your actual keys. Find your TWILIO_ACCOUNT_SID and TWILIO_AUTH_TOKEN in your Twilio Console. The PHONE_NUMBER_FROM should be the Twilio phone number you purchased in the Prerequisites, formatted as E.164 (e.g., +18885551212). Set DOMAIN to nothing for now—we'll address it later. You can copy my PORT and set it to 6060.

TWILIO_ACCOUNT_SID="your_twilio_account_sid"
TWILIO_AUTH_TOKEN="your_twilio_auth_token"
PHONE_NUMBER_FROM="your_twilio_phone_number"
DOMAIN="your_ngrok_domain"
OPENAI_API_KEY="your_openai_api_key"
PORT=6060

Now, create the main.py file:

touch main.py

Great! Now, open main.py with your favorite text editor or IDE and let’s get to it.

Step 4: Write the server code

With the project's structure ready, the following steps will guide you through writing the server code. I’ll try to explain the trickier parts, but you can skip the explanations for the parts you understand (and paste the code directly).

import os
import json
import base64
import asyncio
import argparse
from fastapi import FastAPI, WebSocket, BackgroundTasks
from fastapi.responses import JSONResponse
from fastapi.websockets import WebSocketDisconnect
from twilio.rest import Client
import websockets
from dotenv import load_dotenv
import uvicorn
import re

load_dotenv()

# Configuration
TWILIO_ACCOUNT_SID = os.getenv('TWILIO_ACCOUNT_SID')
TWILIO_AUTH_TOKEN = os.getenv('TWILIO_AUTH_TOKEN')
PHONE_NUMBER_FROM = os.getenv('PHONE_NUMBER_FROM')
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')
raw_domain = os.getenv('DOMAIN', '')
DOMAIN = re.sub(r'(^\w+:|^)\/\/|\/+$', '', raw_domain) # Strip protocols and trailing slashes from DOMAIN

PORT = int(os.getenv('PORT', 6060))
SYSTEM_MESSAGE = (
    "You are a helpful and bubbly AI assistant who loves to chat about "
    "anything the user is interested in and is prepared to offer them facts. "
    "You have a penchant for dad jokes, owl jokes, and rickrolling – subtly. "
    "Always stay positive, but work in a joke when appropriate."
)
VOICE = 'alloy'
LOG_EVENT_TYPES = [
    'error', 'response.content.done', 'rate_limits.updated', 'response.done',
    'input_audio_buffer.committed', 'input_audio_buffer.speech_stopped',
    'input_audio_buffer.speech_started', 'session.created'
]

app = FastAPI()

if not (TWILIO_ACCOUNT_SID and TWILIO_AUTH_TOKEN and PHONE_NUMBER_FROM and OPENAI_API_KEY):
    raise ValueError('Missing Twilio and/or OpenAI environment variables. Please set them in the .env file.')

# Initialize Twilio client
client = Client(TWILIO_ACCOUNT_SID, TWILIO_AUTH_TOKEN)

As you can see, we first import all of the packages we’ll use, then load all the environment variables in the .env file (that we discussed above) using load_dotenv(). We then initialize a FastAPI instance for routing as well as the Twilio client we’ll be using to make our outbound call.

We also define the system message, voice, and server port. Then, we choose the OpenAI events to log to the console.

SYSTEM_MESSAGE is instructions we send to OpenAI, basically controlling the AI’s behavior during the phone call, while VOICE controls how the AI will sound. (You can find more information in OpenAI’s Realtime API Reference.)

Step 4.2 Define FastAPI Routes for HTTP and WebSocket handling

After the above code, implement the main HTTP and WebSocket routes for server interactions:

@app.get('/', response_class=JSONResponse)
async def index_page():
    return {"message": "Twilio Media Stream Server is running!"}

@app.websocket('/media-stream')
async def handle_media_stream(websocket: WebSocket):
    """Handle WebSocket connections between Twilio and OpenAI."""
    print("Client connected")
    await websocket.accept()

    async with websockets.connect(
        'wss://api.openai.com/v1/realtime?model=gpt-4o-realtime-preview-2024-10-01',
        extra_headers={
            "Authorization": f"Bearer {OPENAI_API_KEY}",
            "OpenAI-Beta": "realtime=v1"
        }
    ) as openai_ws:
        await initialize_session(openai_ws)
        stream_sid = None

        async def receive_from_twilio():
            """Receive audio data from Twilio and send it to the OpenAI Realtime API."""
            nonlocal stream_sid
            try:
                async for message in websocket.iter_text():
                    data = json.loads(message)
                    if data['event'] == 'media' and openai_ws.open:
                        audio_append = {
                            "type": "input_audio_buffer.append",
                            "audio": data['media']['payload']
                        }
                        await openai_ws.send(json.dumps(audio_append))
                    elif data['event'] == 'start':
                        stream_sid = data['start']['streamSid']
                        print(f"Incoming stream has started {stream_sid}")
            except WebSocketDisconnect:
                print("Client disconnected.")
                if openai_ws.open:
                    await openai_ws.close()

        async def send_to_twilio():
            """Receive events from the OpenAI Realtime API, send audio back to Twilio."""
            nonlocal stream_sid
            try:
                async for openai_message in openai_ws:
                    response = json.loads(openai_message)
                    if response['type'] in LOG_EVENT_TYPES:
                        print(f"Received event: {response['type']}", response)
                    if response['type'] == 'session.updated':
                        print("Session updated successfully:", response)
                    if response['type'] == 'response.audio.delta' and response.get('delta'):
                        try:
                            audio_payload = base64.b64encode(base64.b64decode(response['delta'])).decode('utf-8')
                            audio_delta = {
                                "event": "media",
                                "streamSid": stream_sid,
                                "media": {
                                    "payload": audio_payload
                                }
                            }
                            await websocket.send_json(audio_delta)
                        except Exception as e:
                            print(f"Error processing audio data: {e}")
            except Exception as e:
                print(f"Error in send_to_twilio: {e}")
        await asyncio.gather(receive_from_twilio(), send_to_twilio())

The /media-stream WebSocket route maintains a live connection for continuous data exchange between Twilio and OpenAI. As audio events come in, audio is proxied between the two – response.audio.delta from OpenAI, and media payloads from Twilio.

For simplicity, this code doesn’t implement interruption handling. After finishing the tutorial, see our repo for one way to handle interruptions.

There is a lot going on here. I’m skipping some explanations, but you can read more details in our initial tutorial.

Step 4.3 Set up the initial OpenAI Session

Next, we initialize the session with OpenAI to configure our phone interaction, and send a conversation item to get the AI to talk first. Paste this next:

async def send_initial_conversation_item(openai_ws):
    """Send initial conversation so AI talks first."""
    initial_conversation_item = {
        "type": "conversation.item.create",
        "item": {
            "type": "message",
            "role": "user",
            "content": [
                {
                    "type": "input_text",
                    "text": (
                        "Greet the user with 'Hello there! I am an AI voice assistant powered by "
                        "Twilio and the OpenAI Realtime API. You can ask me for facts, jokes, or "
                        "anything you can imagine. How can I help you?'"
                    )
                }
            ]
        }
    }
    await openai_ws.send(json.dumps(initial_conversation_item))
    await openai_ws.send(json.dumps({"type": "response.create"}))

async def initialize_session(openai_ws):
    """Control initial session with OpenAI."""
    session_update = {
        "type": "session.update",
        "session": {
            "turn_detection": {"type": "server_vad"},
            "input_audio_format": "g711_ulaw",
            "output_audio_format": "g711_ulaw",
            "voice": VOICE,
            "instructions": SYSTEM_MESSAGE,
            "modalities": ["text", "audio"],
            "temperature": 0.8,
        }
    }
    print('Sending session update:', json.dumps(session_update))
    await openai_ws.send(json.dumps(session_update))

    # Have the AI speak first
    await send_initial_conversation_item(openai_ws)

I explain similar code in more detail in the previous Python tutorial. However, you’re here, so here’s a brief explanation of what’s going on… well, here:

  • Session Update/Initialization: We use the initialize_session function to configure the session with our desired settings, such as the AI voice and system message (set in the constants in Step 4.1). After that, we send a session.update event to OpenAI to update our session’s configuration ( more details can be found here).Another important detail is we set the inbound and outbound audio format to g711_ulaw. This format is supported by Twilio and Media Streams, so we don’t have to do any transcoding.
  • AI talks first: This code is new for this tutorial. Since we’re dialing a number, we want the AI to talk when the call is picked up. We send a manual conversation update with conversation.item.create and response.create. This causes the OpenAI Realtime API to “go first” in this conversation, and greet the person who answers the phone.
Be cautious when modifying the greeting. We suggest that you always disclose that one side of your conversation is powered by AI.

Step 4.4 Implement the outbound call functionality

In this section, we'll implement the functionality to make an outbound call using the Twilio API. This involves verifying that you are allowed to make calls to the number you specify, and only then making the call.

Step 4.4.1 Phone number validation

Next, paste in my example phone number validation code:

async def check_number_allowed(to):
    """Check if a number is allowed to be called."""
    try:
        # Uncomment these lines to test numbers. Only add numbers you have permission to call
        # OVERRIDE_NUMBERS = ['+18005551212'] 
        # if to in OVERRIDE_NUMBERS:             
          # return True

        incoming_numbers = client.incoming_phone_numbers.list(phone_number=to)
        if incoming_numbers:
            return True

        outgoing_caller_ids = client.outgoing_caller_ids.list(phone_number=to)
        if outgoing_caller_ids:
            return True

        return False
    except Exception as e:
        print(f"Error checking phone number: {e}")
        return False

This function checks if the given phone number to is allowed to receive calls from your application.

Working through exactly who you are allowed to call is beyond the scope of this tutorial, but if it’s a Twilio phone number you control or one of your validated Outgoing Caller IDs, it’s a safe bet. client.incoming_phone_numbers.list(phone_number=to) is checking the former, while client.outgoing_caller_ids.list(phone_number=to) is checking the latter.

Step 4.4.2 Create the outbound call function and a Call SID logger

Next, paste in the outbound calling code:

async def make_call(phone_number_to_call: str):
    """Make an outbound call."""
    if not phone_number_to_call:
        raise ValueError("Please provide a phone number to call.")

    is_allowed = await check_number_allowed(phone_number_to_call)
    if not is_allowed:
        raise ValueError(f"The number {phone_number_to_call} is not recognized as a valid outgoing number or caller ID.")

    # Ensure compliance with applicable laws and regulations
    # All of the rules of TCPA apply even if a call is made by AI.
    # Do your own diligence for compliance.

    outbound_twiml = (
        f'<?xml version="1.0" encoding="UTF-8"?>'
        f'<Response><Connect><Stream url="wss://{DOMAIN}/media-stream" /></Connect></Response>'
    )

    call = client.calls.create(
        from_=PHONE_NUMBER_FROM,
        to=phone_number_to_call,
        twiml=outbound_twiml
    )

    await log_call_sid(call.sid)

async def log_call_sid(call_sid):
    """Log the call SID."""
    print(f"Call started with SID: {call_sid}")

The make_call function initiates an outbound call to the specified phone number using Twilio's Python Helper Library. On connect, it connects to your WebSocket route to start proxying audio between OpenAI and Twilio. (The code to do that is in the outbound_twiml variable.)

Finally, we define the log_call_sid function to print out the Call SID when we make the outbound call.

Making outbound calls requires you comply with the various rules and regulations in your jurisdiction. For example, in the United States, your outbound calls have to comply with the Telephone Consumer Protection Act (or TCPA). We ask that you seek your own counsel when determining whether your usage is compliant. Your app also has to comply with Twilio’s Terms of Service and Voice Services Policies.

Step 4.5 Launch the server

Next, we’ll run through our logic while starting the server. Paste this at the end of main.py, then save.

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Run the Twilio AI voice assistant server.")
    parser.add_argument('--call', required=True, help="The phone number to call, e.g., '--call=+18005551212'")
    args = parser.parse_args()

    phone_number = args.call
    print(
        'Our recommendation is to always disclose the use of AI for outbound or inbound calls.\n'
        'Reminder: All of the rules of TCPA apply even if a call is made by AI.\n'
        'Check with your counsel for legal and compliance advice.'
    )

    loop = asyncio.get_event_loop()
    loop.run_until_complete(make_call(phone_number))
    
    uvicorn.run(app, host="0.0.0.0", port=PORT)

This segment employs argument parsing for phone number input, then executes the call setup and starts the server using Uvicorn.

You must pass in a --call parameter when you start the code, for example --call=+18885551212. (That’s controlled with required=True.) If you do, we’ll run through the logic to check your outbound call permissions, then initiate the call.

Okay, awesome! Let’s move on to running and testing it.

Run and test your code

In the next steps, I’ll cover how to get the code to run so you can have the AI make an outbound call.

Step 1: Launch ngrok

You need to use ngrok or a similar product (a VPS, reverse proxy, etc.) to expose your server to Twilio.

I’ll provide instructions for using ngrok here. You can find other reverse proxy or tunneling options here, and some notes on further options.

Run the following command. (If you changed the port from 6060, update it here):

ngrok http 6060

Step 1.1 Set the DOMAIN variable

Earlier, we left the DOMAIN variable in the .env file blank – let’s set it now.

Screenshot showing Ngrok session status as online with update available and forwarding URLs provided.

Copy the Forwarding address from ngrok, without the protocol (omitting the https:// in my image).

Here’s an example using my .env (with fake values other than DOMAIN and PORT):

OPENAI_API_KEY=sk-proj-U.........
TWILIO_ACCOUNT_SID=ACe......
TWILIO_AUTH_TOKEN=........
PHONE_NUMBER_FROM=+140120.....
DOMAIN=a1fe24b64cad.ngrok.app
PORT=6060
You can instead add a Verified Caller ID to Twilio for a phone that can receive an inbound call. My cell phone is verified, so I could test this tutorial by calling both my cell and my Dev Phone.

Further up the digital page, we built a filter function which makes sure we’re only calling numbers we have permission to call. One part of that function allows you to call Twilio numbers you own.

If you’re new to the Dev Phone, go through the Twilio Dev Phone tutorial. It will ask you to install the Twilio CLI and add your account credentials.

When you’re done, run twilio dev-phone in your console. A screen should pop up that looks like this:

Interface for configuring Twilio Dev Phone with a welcome message and options to select and configure a phone number.

In the Phone Number box, choose the Twilio number you’ll call to test this app. If you have that number configured, it’ll warn you before overwriting the config. Quadruple check the number is okay to use (there’s no Undo!), then hit Use this phone number.

Step 3: Place an outbound call

Run the following in your console, replacing the placeholder number with your Twilio Dev Phone number (or, alternatively, a Verified Caller ID number):

python main.py --call=+18005551212

Pick it up and you’ll hear a greeting – go ahead and respond. Enjoy your call with the Realtime API!

Debugging your setup

Assuming your server is running, here are the first places to check if you have issues placing an outbound call:

Conclusion

Congratulations! You successfully created an AI voice assistant that will place an outbound call using Twilio Voice and the OpenAI Realtime API using Python. The code is now ready for your modifications – but check our Python repo first to see if we’ve already implemented some of your dream functionality.

Have fun!

Next steps:

Paul Kamp is the Technical Editor-in-Chief of the Twilio Blog. He struck up a conversation about various Python frameworks with the AI to test this tutorial. But you don’t have to call to get in touch with him – reach Paul at pkamp [at] twilio.com.