A Minimalist Integration of Twilio and OpenAI Realtime

Time to read: 4 minutes

October 28, 2024

Written by

Phil Bredeson

Twilion

Al Kiramoto

Twilion

Reviewed by

Hao Wang

Twilion

Paul Kamp

Twilion

Building with Twilio and the OpenAI Realtime API

Do you want to talk to OpenAI’s Realtime API on the phone? Here’s the simplest way to get started.

This blog post is the first in a series that will show you how to integrate Twilio Voice with OpenAI to solve common real-world problems.

Human speech and interpretation is an incredibly hard problem to solve. For example, when collecting personal information like home addresses, how do you program for the variability across countries’ formats, while handling abbreviations, inconsistent input, language and cultural differences, and errors? (How about collecting Canadian Alpha numeric ZIP codes, anyone?) Or, for example when rescheduling an appointment, how can you make sure all the information is accurate and updated?

AI can uniquely help with this. In this post, we’ll take the first step towards better phone interactions using AI. We’ll show you how to register for an OpenAI API key, set up your environment to build a Twilio Voice and Media Streams integration with OpenAI’s Realtime API using Typescript, then connect your app to Twilio so you can call it!

Before you begin: architecting the integration

Twilio and OpenAI Clonable app architecture

Integrating OpenAI’s Realtime API to Twilio Voice is straightforward. Twilio connects the PSTN ( Public Switched Telephone Network) to a websocket-based media stream. OpenAI’s Realtime API ingests the raw audio from the media stream and emits a raw audio stream. Your server simply proxies the audio between the two.

Next, we created a baseline code you can use to quickstart your application development. You can clone the code here.

Prerequisites

To get started you will need:

A Twilio account - sign up for a free developer account here.
A paid OpenAI account - sign up for a free developer account here and update to a paid account here.
ngrok or another tunneling solution to expose your local server to the internet for testing. Download ngrok here.

OpenAI Realtime API is in Beta. As we wrote this post, there is no client-side authentication, so audio must be relayed to the server in order to authenticate securely.

This blog post is a complement to the following posts:

Putting it together

This project is broken into three sections: one for setting up OpenAI, one for setting up your working environment, and one section for working with Twilio.

Set up OpenAI

Login to your OpenAI account or create a new account. Once logged in, click on your profile icon on the top right, then select Billing. Then select Add payment details.

Once your payment method is set up and you have added credits we are ready to generate an API Key.

Select Dashboard on the top right panel, then API keys:

Click on + Create new secret key. This is the OPENAI_API_KEY you'll use to make API calls to OpenAI on the next step.

Set up your Application

Clone the repository and install dependencies

Clone this repo.

git clone https://github.com/pBread/twilio-openai-voicebot-simple
cd twilio-openai-voicebot-simple

Install the dependencies:

npm install

Start an ngrok Tunnel

The application needs to know the domain (HOSTNAME) it is deployed to in order to function correctly. This domain is set in the HOSTNAME environment variable and it must be configured before starting the app.

Start ngrok by running this command.

ngrok http 3000

Then copy the domain (this will be your HOSTNAME “your-ngrok-domain.ngrok.app”, or b8b22eacf803.ngrok.app in my screenshot below):

Add Environment Variables

OPENAI_API_KEY=your-openai-api-key

HOSTNAME=your-ngrok-domain.ngrok.app

Run the App

This command will start the Express server which handles incoming Twilio webhook requests and Media Streams.

npm run dev

Set up your Twilio account

Go to twilio.com and sign up for a free account (or log in to your existing account).

If you don't currently own a Twilio phone number with Voice functionality, you'll need to purchase one. Follow this guide to purchase one.

Once you purchased the number, under Voice Configuration and A call comes in, select the Webhook pull down.

Configure the Voice webhooks for your Twilio phone number:

- Incoming Call Webhook: Select POST and set the url to: https://your-ngrok-domain.ngrok.app/incoming-call

- Call Status Update Webhook: Select POST and set the url to: https://your-ngrok-domain.ngrok.app/call-status-update

Hit Save Configuration

Test it out

Place a call to your Twilio phone number.

You should see the real-time transcript logged to your local terminal.

> twilio-openai-voice-agent@1.0.0 dev
> nodemon
[nodemon] 3.1.7
[nodemon] to restart at any time, enter `rs`
[nodemon] watching path(s): src/**/*.ts config.js
[nodemon] watching extensions: ts,json,js
[nodemon] starting `ts-node src/index.ts`
2024-10-15T21:49:42.734Z [APP] [INFO] server running on http://localhost:3000
2024-10-15T21:49:46.493Z [TWL] [INFO] incoming-call from +15558880001 to +15558880002
2024-10-15T21:49:47.086Z [OAI] [INFO] openai websocket opened
2024-10-15T21:49:47.244Z [TWL] [INFO] incoming websocket
2024-10-15T21:49:47.296Z [TWL] [SUCCESS] media stream started
2024-10-15T21:49:48.631Z [OAI] [INFO] bot transcript (final):  Hello, how can I help you today?
2024-10-15T21:49:54.186Z [APP] [INFO] user started speaking
2024-10-15T21:50:20.342Z [APP] [INFO] user started speaking
…

Success!!

Taking it to the next level

To recap, we built a simple integration between Twilio Voice and OpenAI Realtime that shows the power of voice and AI, and demonstrates how to connect a Twilio Number using speech to speech.

We hope you can customize this implementation and connect to your (real) application and use Twilio Media Streams to connect to OpenAI.

You're now ready to personalize your customer communications – and can’t wait to see what you build!

Supporting Assets/ Documentation/ Links

Phil Bredeson is a Solutions Architect at Twilio who helps engineers solve complex communication problems. He can be reached at pbredeson [at] twilio.com

Al Kiramoto is a Solutions Architect at Twilio. He lives in Dallas - TX and enjoys working with customers and solving business problems - besides a good barbecue and TexMex food. He can be reached at akiramoto [at] twilio.com

Related Resources

Twilio Docs

From APIs to SDKs to sample apps

API reference documentation, SDKs, helper libraries, quickstarts, and tutorials for your language and platform.

Resource Center

The latest ebooks, industry reports, and webinars

Learn from customer engagement experts to improve your own communication.

Ahoy

Twilio's developer community hub

Best practices, code samples, and inspiration to build communications and digital engagement experiences.

A Minimalist Integration of Twilio and OpenAI Realtime

Related Posts

Related Resources

From APIs to SDKs to sample apps

The latest ebooks, industry reports, and webinars

Twilio's developer community hub