Creating a Voice AI Assistant using Twilio, Meta LLaMA 3 with Together.ai and Flask

March 24, 2025
Written by
Kumarasubrahmanya Hosamane
Contributor
Opinions expressed by Twilio contributors are their own
Reviewed by

Creating a Voice AI Assistant using Twilio, Meta LLaMA 3 and Together.ai

Introduction

Twilio Programmable Voice helps to build a scalable voice experience with the Voice API and SDKs that connect millions around the world. It offers extensive customization options and convenience for automating telephone workflows.

In this article, we will explore how to use Twilio and Meta LLaMA-3 with together.ai and Flask to create an AI assistant which can answer the customer queries via voice call based on your use case.

Meta Llama 3 is the latest generation of open-source large language model (LLM) from Meta (formerly Facebook). The model is available in 8B and 70B parameter sizes, each with a base and instruction-tuned variant. Together.ai provides the necessary computational resources and infrastructure to efficiently run and deploy Meta LLaMA 3, making it more accessible for users without extensive hardware or technical expertise.

Prerequisites

This tutorial is designed for users operating on Windows OS, but it can also be used by Mac and Linux users by setting up a virtual environment. Refer to this link for instructions. To proceed with the tutorial, ensure you meet the following prerequisites:

  • Python 3.6 or a more recent version. If your system doesn't come with a Python interpreter, you can download an installer from python.org.
  • A Twilio account. If you don’t have one, you can register for a trial account here.
  • A phone that can make phone calls, to test the project. You can also use Twilio Dev Phone for this tutorial.
  • Install ngrok and make sure it’s authenticated. You can install ngrok from here.

Building the app

Setting up a together.ai account

As a first step, we need to create a new account in together.ai. It has pre-configured instances of popular models like LLaMA, Gemma, Mixtral etc. which can be used by just getting the personal API Key. You can find the full list of available models here - https://docs.together.ai/docs/inference-models

You can get started with creating a free account which gives $1 credit for your development work. You can refer to Pricing page for more details - https://www.together.ai/pricing

Once the account is created, you’ll be redirected to the homepage where you can find an API key just for you. Copy it for your future reference. Remember to never share API keys with anyone.

Setting up your development environment

Before building the AI assistant, you need to set up your development environment. The first step is to create a new folder twilio-meta-ai, and move into it:

mkdir twilio-meta-ai
cd twilio-meta-ai

Create a virtual environment:

python -m venv venv

This command makes a venv folder inside your project. That folder contains the Python interpreter and all installed packages.

Activate the virtual environment:

venv\Scripts\activate

When activated, your shell prompt should change to show (venv), indicating you’re now using the virtual environment.

Create a src folder for your Python code:

mkdir src
cd src

Inside src is where you’ll keep your Python files or project code.

If you’re using Git, you’ll typically want to ignore the venv folder so it isn’t committed to version control. From the root of your project twilio-meta-ai, you can create a .gitignore file:

echo "venv/" >> .gitignore

Now you are ready to install Flask and other necessary dependencies. Create a file called requirements.txt to store all the dependencies inside your working directory twilio-meta-ai and add the following lines to it:

twilio
flask
together

Here is a breakdown of these dependencies:

  • twilio - A package that allows you to interact with the Twilio API.
  • flask - Flask is a lightweight web application framework for Python, designed to make it easy to build web applications quickly.
  • together - Official Python client for Together.ai API platform, providing a convenient way for interacting with different open source AI models

Then install all the packages with pip from your terminal:

pip install -r requirements.txt

Create a Flask Server

Let us set up a Flask server. In your source code directory src, create a file app.py and paste the code snippet given below:

from flask import Flask, request, Response
from twilio.twiml.voice_response import Gather, VoiceResponse
from together import Together
import os
app = Flask(__name__)
client = Together(api_key="API_KEY")
sessions = {}

The code begins by importing the necessary dependencies. Then, it will set up a Flask application, using the flask module. Additionally, it initializes a client for the Together.ai API, using the together module and supplying an API key for authentication purposes. Make sure to replace API_KEY with your Together.ai API Key.

For the purposes of testing this application, you can paste your API key into the code provided. However, this is not a secure way to store an api key, and for production, you should store the API key in environment variables or an .env file.

To manage the conversation history for each session, the code initializes a dictionary named sessions. This dictionary serves as a storage mechanism for retaining the context and progression of conversations between users and the chat assistant.

Design the prompt

To provide context to your Chat assistant, you can create an initial prompt based on your use case. The following prompt is an example to create an assistant which can answer queries related to products, and you can design the prompt depending on your use case. Paste the following code in your app.py file under the code you already pasted.

prompt = """You are a customer service agent for Doms company. Your role is to assist customers with queries related to our products. Please only provide information about our products and refrain from answering any other questions. Currently we do not provide any discounts on any products.
Product list:
DomsCare Toothpaste
   - Description: DomsCare Toothpaste is formulated with natural ingredients to provide maximum protection against cavities and maintain overall oral hygiene. It leaves your mouth feeling fresh and clean.
   - Available Variants:
        - Mint
            - Price = $2.99
        - Fresh Mint
            - Price = $5.99
        - Whitening
            - Price = $8.99
DomsGlow Facial Moisturizer
   - Description: DomsGlow Facial Moisturizer is a lightweight, non-greasy formula enriched with vitamins and antioxidants to hydrate and rejuvenate your skin. It leaves your skin feeling soft, smooth, and radiant.
   - Available Variants: 
        - Normal Skin
            - Price = $3.99
        - Dry Skin
            - Price = $7.99
        - Oily Skin
            - Price = $9.99
DomsFit Protein Bars
   - Description: DomsFit Protein Bars are a delicious and convenient way to fuel your body with high-quality protein on the go. Each bar is packed with nutrients to support muscle recovery and energy levels.
   - Price: $2.49 per bar
   - Available Flavors: Chocolate Chip, Peanut Butter, Cookies and Cream
"""

Create an incoming voice handler

To enable users to interact with the chatbot via voice calls, you will create a route named /voiceHandler. Copy and paste the following code into the app.py file to implement the /voiceHandler route.

@app.route("/voiceHandler", methods=["POST"])
def process_input():
    session_id = request.form.get("CallSid")
    session = sessions.get(session_id)
    if not session:
        print(f"\n\n{[session_id]}")
        msg = "Hey there! Thanks for calling Doms! How can I help you?"
        session = {
        "conversationHistory": [
            {
                "content": prompt,
                "role": "system"
            }
            ]
        }
        sessions[session_id] = session
    else:
        user_response = request.form.get("SpeechResult")
        print(f"\nUSER: {user_response}")
        session["conversationHistory"].append({"role": "user", "content": user_response})
        assistant_response = client.chat.completions.create(
            model="meta-llama/Llama-3-70b-chat-hf",
            max_tokens=200,
            temperature=0.3,
            top_p=0.7,
            top_k=50,
            repetition_penalty= 1,
            messages=session["conversationHistory"]
        )
        msg = assistant_response.choices[0].message.content
        session["conversationHistory"].append({"role": "assistant", "content": msg})
    print(f"AI: {msg}")
    twiml = VoiceResponse()
    gather = Gather(input="speech", action="/voiceHandler", enhanced=True, speech_model="phone_call")
    gather.say(msg, voice='Polly.Joanna')
    twiml.append(gather)
    return Response(str(twiml), mimetype="text/xml")
if __name__ == "__main__":
    app.run(port="3000")

The /voiceHandler endpoint is used to handle your incoming voice call. It begins by extracting the unique call identifier (CallSid) from the incoming request. Utilizing this identifier, the system initializes or retrieves the session associated with the call from the sessions dictionary. In cases where a new session is detected, a default conversation history is established to kickstart the interaction. To facilitate interaction with the user, Twilio's VoiceResponse is configured to construct a tailored response for the incoming call. This response includes a message prompting the user and awaits their input.

Once the input is gathered from the user, it is then added to the conversation history within the session. Then, using the Together.ai model, the system generates a response from the assistant based on the accumulated conversation history. Subsequently, the assistant's response is appended to the conversation history.

Finally the app.run(port="3000") line will initiate a web server using Flask on port 3000 which exposes the /voiceHandler endpoint which is required to handle the incoming voice call requests.

Setup the Server

Now, start your server so that you can handle the incoming calls. Navigate to the source code directory src and run the below command to start your Flask server:

python app.py
Starting the flask server

By configuring the port to 3000, your server is accessible via http://localhost:3000. However, for Twilio to access this URL, your app must be hosted on a server, and it must be publicly accessible. For testing purposes we can rely on ngrok. If you're unfamiliar with ngrok, you can consult this blog post for guidance on creating an account. To initiate the ngrok tunnel, execute the following command in a separate terminal tab.

ngrok http 3000

The command above establishes a connection between your local server operating on port 3000, and a public domain generated on the ngrok.io website. Upon execution, you'll observe the response displayed below. For security reasons, the ngrok domain is struck through and displayed in white color.

The terminal output of the command "ngrok http 3000". It contains the ngrok public URL which can be used to set the incoming voice webhook handler of your Twilio number.

Now, our application is accessible via the ngrok URL. Copy the URL provided by ngrok, as you'll need it in the next steps.

Configuring the Twilio webhook

In the Twilio Console, head over to Explore Products in the left-hand menu, then select Explore Products → Phone Numbers → Manage → Active numbers to access the list of available numbers.

Next, click on any available Twilio number and proceed to the Configure tab. Under Voice Configuration, set up an incoming call webhook to point to the URL provided by ngrok. Remember to append '/voiceHandler’ to the base URL to match the endpoint on our server, and ensure the HTTP method is set to POST.

Incoming webhook configuration of your Twilio number

Once configured, click on the Save configuration button at the bottom to save the changes. Kudos! Now you're all set to perform your initial test.

Test your application

Now that you have completed setting up the incoming webhook handler for your Twilio phone number, make a phone call from your mobile phone to the Twilio number. You should observe an HTTP request in your ngrok console. Your Flask app will then process the incoming request and respond with your initial TwiML message: 'Hey there! Thanks for calling Doms! How can I help you?' Feel free to ask any questions to the AI assistant, and it will respond.

For testing purposes, you can also use Twilio Dev Phone, a developer tool for testing SMS and Voice applications. Please follow the instructions here to set up Dev Phone locally. Once the setup is completed, you can run the Dev Phone with the following command:

twilio dev-phone

The command above will open the Dev Phone in your browser. Simply dial the configured phone number to start interacting with the chatbot. Your interactions will be displayed in your terminal similar to the screenshot below.

Terminal showing interactions with the chatbot

Let's Keep the Conversation Going!

You have successfully developed a virtual AI assistant using Meta LLaMA 3 with together.ai and Twilio Programmable Voice. This innovative feature will allow users to ask questions directly to the virtual assistant, which promptly responds with relevant information or assistance. Using Twilio Programmable Voice, this virtual assistant ensures seamless communication and provides valuable support to the users.

To personalize your new bot, try changing the language in the bot's prompt to something relevant to you, your business, or your interests. Happy building!

Kumarasubrahmanya Hosamane is a Software Developer from the IT hub of The United States, San Jose who is keenly interested in exploring latest technologies. He can be reached at Kumarasubrahmanya Hosamane .