A Minimalist Integration of Twilio and OpenAI Realtime
Time to read: 4 minutes
Do you want to talk to OpenAI’s Realtime API on the phone? Here’s the simplest way to get started.
This blog post is the first in a series that will show you how to integrate Twilio Voice with OpenAI to solve common real-world problems.
Human speech and interpretation is an incredibly hard problem to solve. For example, when collecting personal information like home addresses, how do you program for the variability across countries’ formats, while handling abbreviations, inconsistent input, language and cultural differences, and errors? (How about collecting Canadian Alpha numeric ZIP codes, anyone?) Or, for example when rescheduling an appointment, how can you make sure all the information is accurate and updated?
AI can uniquely help with this. In this post, we’ll take the first step towards better phone interactions using AI. We’ll show you how to register for an OpenAI API key, set up your environment to build a Twilio Voice and Media Streams integration with OpenAI’s Realtime API using Typescript, then connect your app to Twilio so you can call it!
Before you begin: architecting the integration
Integrating OpenAI’s Realtime API to Twilio Voice is straightforward. Twilio connects the PSTN ( Public Switched Telephone Network) to a websocket-based media stream. OpenAI’s Realtime API ingests the raw audio from the media stream and emits a raw audio stream. Your server simply proxies the audio between the two.
Next, we created a baseline code you can use to quickstart your application development. You can clone the code here.
Prerequisites
To get started you will need:
- A Twilio account - sign up for a free developer account here.
- A paid OpenAI account - sign up for a free developer account here and update to a paid account here.
- ngrok or another tunneling solution to expose your local server to the internet for testing. Download ngrok here.
This blog post is a complement to the following posts:
- Building Conversational AI Applications with Twilio and the OpenAI Realtime API
- Build an AI Voice Assistant with Twilio Voice, OpenAI’s Realtime API and Node.js
Putting it together
This project is broken into three sections: one for setting up OpenAI, one for setting up your working environment, and one section for working with Twilio.
Set up OpenAI
Login to your OpenAI account or create a new account. Once logged in, click on your profile icon on the top right, then select Billing. Then select Add payment details.
Once your payment method is set up and you have added credits we are ready to generate an API Key.
Select Dashboard on the top right panel, then API keys:
Click on + Create new secret key. This is the OPENAI_API_KEY
you'll use to make API calls to OpenAI on the next step.
Install the dependencies:
Start an ngrok Tunnel
The application needs to know the domain (HOSTNAME
) it is deployed to in order to function correctly. This domain is set in the HOSTNAME
environment variable and it must be configured before starting the app.
Start ngrok by running this command.
Then copy the domain (this will be your HOSTNAME “your-ngrok-domain.ngrok.app”, or b8b22eacf803.ngrok.app
in my screenshot below):
Run the App
This command will start the Express server which handles incoming Twilio webhook requests and Media Streams.
Set up your Twilio account
Go to twilio.com and sign up for a free account (or log in to your existing account).
If you don't currently own a Twilio phone number with Voice functionality, you'll need to purchase one. Follow this guide to purchase one.
Once you purchased the number, under Voice Configuration and A call comes in, select the Webhook pull down.
Configure the Voice webhooks for your Twilio phone number:
- Incoming Call Webhook: Select POST and set the url to: https://your-ngrok-domain.ngrok.app/incoming-call
- Call Status Update Webhook: Select POST and set the url to: https://your-ngrok-domain.ngrok.app/call-status-update
Hit Save Configuration
Test it out
Place a call to your Twilio phone number.
You should see the real-time transcript logged to your local terminal.
Success!!
Taking it to the next level
To recap, we built a simple integration between Twilio Voice and OpenAI Realtime that shows the power of voice and AI, and demonstrates how to connect a Twilio Number using speech to speech.
We hope you can customize this implementation and connect to your (real) application and use Twilio Media Streams to connect to OpenAI.
You're now ready to personalize your customer communications – and can’t wait to see what you build!
Supporting Assets/ Documentation/ Links
Phil Bredeson is a Solutions Architect at Twilio who helps engineers solve complex communication problems. He can be reached at pbredeson [at] twilio.com
Al Kiramoto is a Solutions Architect at Twilio. He lives in Dallas - TX and enjoys working with customers and solving business problems - besides a good barbecue and TexMex food. He can be reached at akiramoto [at] twilio.com
Related Posts
Related Resources
Twilio Docs
From APIs to SDKs to sample apps
API reference documentation, SDKs, helper libraries, quickstarts, and tutorials for your language and platform.
Resource Center
The latest ebooks, industry reports, and webinars
Learn from customer engagement experts to improve your own communication.
Ahoy
Twilio's developer community hub
Best practices, code samples, and inspiration to build communications and digital engagement experiences.