Integrate OpenAI with Twilio Voice Using ConversationRelay

March 20, 2025
Written by

Integrate OpenAI with Twilio Voice Using ConversationRelay

ConversationRelay is a product from Twilio that allows you to build real-time, human-like voice applications for conversations with any AI Large Language Model, or LLM. It opens a WebSocket so you can integrate with any AI API, allowing for a fluid, event-based interaction and fast two-way connection.

This tutorial will serve as a quick overview of a basic integration of OpenAI’s models with Twilio Voice using ConversationRelay. When you are finished with this quickstart, you will be able to deploy a Node.js server that allows you to call a Twilio phone number and get into a conversation with an LLM. When you’re done with the tutorial, you’ll have a solid base to add more advanced features.

Let's get started!

Prerequisites

To deploy this tutorial you will need:

  1. Node.js installed on your machine
  2. A Twilio phone number (Sign up for Twilio here)
  3. Your IDE of choice (such as Visual Studio Code)
  4. The ngrok tunneling service (or other tunneling service)
  5. An OpenAI Account to generate an API Key
  6. A phone to place your outgoing call to Twilio

Write the code

Start by creating a new folder for your project.

mkdir conversationRelayNode
cd conversationRelayNode

Next, initiate a new node.js project, and install the prerequisites.

npm init -y
npm pkg set type="module";
npm install fastify @fastify/websocket openai dotenv

For this tutorial, you’ll use Fastify as your framework. It will let you quickly spin up a server for both the WebSocket you'll need, as well as the route for the instructions you're going to need to provide to Twilio.

To view all of the code for this quickstart, please visit the repo on GitHub.

Start by creating the files you will need to run your connection.

To store your API key for OpenAI, you will need an .env file. Create this file in your project folder, then open it in your favorite editor.

Use the following line of code, replacing the placeholder shown with your actual key from the OpenAI API keys page.

OPENAI_API_KEY="YOUR_OPEN_API_KEY"
If you're going to save your project on GitHub, be sure not to expose any API keys to the internet. Do this by adding your .env file to a .gitignore file, or making sure to blank any API keys out before committing your build as in the provided GitHub example.

Next, create a new file called server.js. This is where the primary code for your project server is going to be stored. Create this file in the same directory as your .env file.

Nice work – next, you will work on your imports and define the constants you'll need to change the behavior of the LLM.

Add the imports and constants

First, add the necessary constants to your file by putting in this code.

import Fastify from "fastify";
import fastifyWs from "@fastify/websocket";
import OpenAI from "openai";
import dotenv from "dotenv";
dotenv.config();
const PORT = process.env.PORT || 8080;
const DOMAIN = process.env.NGROK_URL;
const WS_URL = `wss://${DOMAIN}/ws`;
const WELCOME_GREETING = "Hi! I am a voice assistant powered by Twilio and Open A I . Ask me anything!";
const SYSTEM_PROMPT = "You are a helpful assistant. This conversation is being translated to voice, so answer carefully. When you respond, please spell out all numbers, for example twenty not 20. Do not include emojis in your responses. Do not include bullet points, asterisks, or special symbols.";
const sessions = new Map();

Here, notice that you are adding the system prompt that will sculpt out the personality for our AI. This prompt keeps it simple – and lets your AI know this conversation will be spoken aloud. Therefore, you want the AI to avoid using special characters that will sound awkward to spell out.

Crafting the prompt is an art in itself. You’ll want to bookmark our Prompt Engineering best practices to read through when you are done with the tutorial and continuing the build.

You also add the greeting that our AI can say when a caller rings in using the variable WELCOME_GREETING. As you can see, the greeting spaces out letters so the AI speaks them aloud correctly.

Write the Fastify server code

Great stuff. Now, you'll move on to the heart of the code: the server.

Next, add the following lines of code to server.js below where you let off before:

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
async function aiResponse(messages) {
 let completion = await openai.chat.completions.create({
   model: "gpt-4o-mini",
   messages: messages,
 });
 return completion.choices[0].message.content;
}

This code block is adding the connection to OpenAI. And the process.env.OPENAI_API_KEY line will get your API Key from the /env file.

Finally, beneath that, add the code to get your server started and complete the webhook connection.

const fastify = Fastify();
fastify.register(fastifyWs);
fastify.get("/twiml", async (request, reply) => {
 reply.type("text/xml").send(
   `<?xml version="1.0" encoding="UTF-8"?>
   <Response>
     <Connect>
       <ConversationRelay url="${WS_URL}" welcomeGreeting="${WELCOME_GREETING}" />
     </Connect>
   </Response>`
 );
});
fastify.register(async function (fastify) {
 fastify.get("/ws", { websocket: true }, (ws, req) => {
   ws.on("message", async (data) => {
     const message = JSON.parse(data);
     switch (message.type) {
       case "setup":
         const callSid = message.callSid;
         console.log("Setup for call:", callSid);
         ws.callSid = callSid;
         sessions.set(callSid, [{ role: "system", content: SYSTEM_PROMPT }]);
         break;
       case "prompt":
         console.log("Processing prompt:", message.voicePrompt);
         const conversation = sessions.get(ws.callSid);
         conversation.push({ role: "user", content: message.voicePrompt });
         const response = await aiResponse(conversation);
         conversation.push({ role: "assistant", content: response });
         ws.send(
           JSON.stringify({
             type: "text",
             token: response,
             last: true,
           })
         );
         console.log("Sent response:", response);
         break;
       case "interrupt":
         console.log("Handling interruption.");
         break;
       default:
         console.warn("Unknown message type received:", message.type);
         break;
     }
   });
   ws.on("close", () => {
     console.log("WebSocket connection closed");
     sessions.delete(ws.callSid);
   });
 });
});
try {
 fastify.listen({ port: PORT });
 console.log(`Server running at http://localhost:${PORT} and wss://${DOMAIN}/ws`);
} catch (err) {
 fastify.log.error(err);
 process.exit(1);
}

This block of code is doing most of the heavy lifting. The first thing it does is establish a connection from your phone call to Twilio, at the route /twiml. That route returns a special dialect called TwiML, which gives Twilio instructions about how to connect to your WebSocket.

Then, it sets up a /ws route for Twilio to open a WebSocket app to you. This WebSocket is where you will communicate with ConversationRelay; you will receive messages from Twilio, but you will also need to pass messages from your LLM to Twilio to run the Text-to-Speech step.

We won’t go into all of the messages that will go in either direction. Here, you're handling the setup, prompt, and interrupt message type from ConversationRelay. You can find more detail on these message types here.

You can see the message types you can send back to ConversationRelay here. You’ll note that this tutorial is only demonstrating text messages (in the line ws.send()) here, but know that you can ask Twilio to play media, send DTMF digits, or even handoff the call!

 

Run and test

To finish setting up the ConversationRelay, there are a few more critical steps to connect your code to Twilio.

The first step is to return to your terminal and open up a connection using ngrok:

ngrok http 8080

You need to open up the connection socket first, because you will need to keep the ngrok url for use in two places: in the Twilio console, and in your environment files.

Get the URL for your file and add it to the .env file using this line:

NGROK_URL="1234abcd.ngrok.app"

Replace the beginning of this placeholder with the correct information from your ngrok url. Note that you do not include the scheme (the “https://” or “http://”) in the environment variable.

Now you are ready to run your server.

node server

Go into your Twilio console, and look for the phone number that you registered.

Set the configuration under A call comes in with the Webhook option as shown below.

In the URL space, add your ngrok URL (this time including the “https://”), and follow that up with /twiml for the correct routing.

Finally, set the HTTP option on the right to GET.

A screenshot showing Twilio console call service

When a call is connected, Twilio will first get the greeting message that you provided. Then it will use the provided ngrok URL to connect directly to the websocket. That websocket connection will open up the line for you to have a conversation with OpenAI.

Save your configurations in the console. Now dial up the number on your phone.

If everything is hooked up correctly, you should hear your customized greeting. Say hello and ask the AI anything you like!

What's Next for ConversationRelay?

This simple demonstration works well, but it has limits.

For example, though you may be able to interrupt the conversation verbally, you may notice that the conversation text is generated before it's spoken aloud. With this code, the server does not have knowledge of exactly when in the conversation you interrupted it, which might lead to a misunderstanding down the line. You’ll also notice that this version of the code introduces quite a bit of latency when your prompt generates a lot of text from the LLM (try asking it to count to 100!).

In our next post, we’ll show you how to improve latency by streaming tokens to the LLM. We’ll also show you one way to maintain local conversation state with OpenAI with ConversationRelay’s interruption handling. Finally, we’ll show you how to add external tools your LLM can call using OpenAI function calling, and integrate everything into this same app.

We hope you had fun building with ConversationRelay! Let's build - and talk to - something amazing!

Appendix

Our colleagues have built some awesome sample applications and demos on top of ConversationRelay. Here’s a selection of use cases:

 

Amanda Lange is a .NET Engineer of Technical Content. She is here to teach how to create great things using C# and .NET programming. She can be reached at amlange [ at] twilio.com.