Implementing Multi-Party Calls with VoIP and GSM using the Programmable Voice API
Time to read: 7 minutes
At Tarjimly, we provide free on-demand translation services for refugees and people in need of humanitarian service. We are supported by the Twilio.org Impact Fund, as we are a tech nonprofit tackling the world’s toughest problems through the power of communications.
Using Twilio Programmable Voice, our new feature allows translators and aid workers to add additional people to two-way calls (for example, a lawyer or a doctor who can give advice remotely.) While you can do this with a regular conference call, we allow the experience to feel like a regular call to the end-user – backed by a robust infrastructure in the backend supporting up to 250 people on the line.
In this guide, we aim to take your existing VoIP two-user call capability and build a robust infrastructure that can handle dialing in other users – using only their phone numbers.
This guide will focus on the backend, a Python Flask App for Twilio. The endpoints in the backend are called from your frontend (Web/Mobile App) and the Twilio console.
Getting Started with VoIP Calls
To complete this tutorial, you will need the following:
- A free Twilio Account
- Basic Working Python Flask App for Twilio (see this guide on how to get setup)
- A Twilio Phone Number (see this guide on how to buy a phone number)
- A Programmable Voice TwiML App (see this guide on how to setup)
- A Frontend App (e.g. Web App or Mobile App) that communicates with the Twilio Console / Flask App
- Amazon S3 Bucket (to hold the Wait Music)
Twilio Console Setup
Now that you have your Twilio account and all of the prerequisites, we’ll go through Twilio’s console setup.
Conference Type Setup
To make it seamless to add additional people to a call, we now make all calls as conferences, including two-way calls behind the scenes. We need conferences to be Agent Conferences, in order to be able to add participants (make outbound calls from the conference and add participants in).
In the Twilio Programmable Voice Console go to Programmable Voice > Conferences > Settings and ensure your conferences are Agent Conferences. There is no difference in cost for an agent conference.
Set up a Voice Request URL for your TwiML App
In the Twilio Voice Console, now go to Programmable Voice > TwiML > TwiML Apps
Click on your app and set up the Voice Request URL Setup. This is the endpoint that Twilio calls when your Frontend App requests Twilio make a call.
For example, "myURL.com/makeCall" [placeholder] is the endpoint Twilio will call when you receive a call.
If you don’t have Voice URL set up yet, click on your app and configure the Voice URL to your desired endpoint. Make sure the method is "HTTP POST" as you will pass in request values for your endpoint like a to and from for the call.
Here is how you would set up your endpoint for Twilio Console
Start a two-user call from your App
Now, let’s write the logic for the endpoint you configured earlier to create a call between two users. For reference, the caller is the person who initiates the call and the callee is the person who receives an incoming call as the target of the caller.
The entire code is presented here for completeness and each significant portion is repeated in the subsections below for clarity.
Start the call
This endpoint receives the following values:
- From - the client id of the caller
- To - the client id of the callee
- SessionID - a unique, friendly name for the conference so we can be sure to add all users to the conference
Add the caller to the conference call
Next, we add the caller into the conference call using the session_id as the friendly name.
Note that during the wait between when the caller is added to the conference and the callee joins the conference, the caller will hear wait music (they will be in the conference alone). To make this experience replicate a regular call you can customize the wait music to a phone dialing tune, so the caller feels like they are just waiting on the callee to pick up.
You can provide a waitUrl inside the console which serves TwiML and the MP3 wait music of your choice. Here is a detailed guide on how to do this, you will need a Public Amazon S3 bucket and an audio file.
- Soundbible and soundsnap are examples of websites where you can obtain common dial tunes. Here is a good option for a dialing tune.
We also specify a status_callback_event and status_callback URL to handle the case where only 1 person is left in the conference. We use the leave event and will use it in other cases as well. For this case, we also include the join event, as we need to get the conference_sid when the callee rejects the call. The code and explanation for these endpoints come later in this guide.
joinConf Endpoint
The joinConf endpoint adds the callee to the conference call if they accept the call.
Its logic is very similar to how we add the caller to the conference call. We specify no waitMusic in this case; when the callee joins the call the caller will already be in the call and we do not need to worry about the join event. We only need leave for the status_callback_event.
Dial additional participants into the call using their phone number
This endpoint is called by your frontend app when you attempt to add a third person into your call (it doesn’t go through Twilio unlike when you start a call).
This endpoint receives the following values:
- phone_number - the phone number of the person you wish to add to the call
- session_ID - a unique, friendly name for the conference so we can add this phone number to the conference
You will also need a Twilio Phone Number, in order to dial another number into this conference.
Using the participant property of a conference, we add the phone number to this conference. The from_ is our Twilio Phone Number and the to is the phone number we wish to add.
As earlier, we specify a conference_status_callback_event and conference_status_callback URL. This will handle the case where only 1 person is left in the conference. We use the ‘leave’ event here so no one gets stuck in conference limbo.
Ensure the call ends when only one person is left
This endpoint helps us end the call when only 1 participant is left. It also handles the edge case when the original caller leaves the call before a callee picks up
When this event is fired, we first get the sequence number. We then create a global dictionary sessionID_to_conflsid (defined at the top of the file as a global dictionary) to map the session_id to the conference_sid.
This is useful when we handle the edge case of the callee rejecting the call (covered next). Note this part of the code runs whenever this endpoint is called (including the case where the caller first joins the conference call, allowing the global dictionary to have a mapping from the session_id to the conference_sid if the callee rejects the call).
We then handle participants leaving the call by determining if that’s why the endpoint was called (this excludes the case when the caller first joins the conference call).
We determine the number of participants (active participants) left in this call. Note this endpoint is called by Twilio whenever a person leaves the conference aside from the exception where the original caller first joins the conference. So if only 1 person is left, we end the conference by updating the status of the conference to `completed`. This means the last remaining user has their call ended.
The next statements handle the edge case to end the conference call if the caller leaves before the callee picks up.
In this case, there are no participants in the call when the endpoint is triggered, but since the first person joined (Sequence Number 0) and was removed (Sequence Number 1) the Sequence Number will now be 2. Using these two conditions, we take advantage of the global dictionary used in the makeCall endpoint that maps a sessionID to a specific call_sid to end the specific call to the callee (as the caller has ended the call).
Edge Case: Callee rejects call
The completeCall endpoint handles the edge case when the callee rejects the original call from the caller.
Taking advantage of the global dictionary used in the leave endpoint that maps a sessionID to the conference_sid, we first find out the number of participants in the conference.
If there is only 1 participant in the call (after the callee rejects the call), then we end the conference by updating the status of the conference to completed.
Config File
Make sure to define a config file and have the following values defined:
- TWILIO_ACCOUNT_SID - obtained from Twilio Console
- TWILIO_AUTH_TOKEN - obtained from Twilio Console
- MY_URL - the URL where your Flask App is hosted
- MY_TWILIO_PHONE_NUMBER - obtained from Twilio console
Translating for Humanity: one conference call at a time
All over the world, the inability to communicate has stopped people from gaining access to basic human rights and needs. If you speak multiple languages, even if you can just hold a conversation, you can be very useful. While reading and writing are useful skills, they aren't necessary for helping! A minute or two of your time can change a life. Download Tarjimly, sign up and help!
If you know of people who would benefit from our service connect us! Caseworkers, asylum offices, doctors offices, food banks, civil rights activists, journalists are just some of the people who benefit from Tarjimly!
About the Author
Sasankh Munukutla is a Software Engineer Intern at Tarjimly, a tech nonprofit that provides on-demand humanitarian language translation and an undergraduate student at Stanford University. sasankh@tarjim.ly
Related Posts
Related Resources
Twilio Docs
From APIs to SDKs to sample apps
API reference documentation, SDKs, helper libraries, quickstarts, and tutorials for your language and platform.
Resource Center
The latest ebooks, industry reports, and webinars
Learn from customer engagement experts to improve your own communication.
Ahoy
Twilio's developer community hub
Best practices, code samples, and inspiration to build communications and digital engagement experiences.