Improve Call Experience with New Twilio Conference Jitter Buffer Controls

June 18, 2020
Written by

jitter buffer conference blog.png

Twilio Conference uses a jitter buffer to smooth out irregularity in media packet arrival times when mixing audio for conference participants. This buffer results in fewer audio artifacts, but introduces a fixed delay for the audio of each participant.

If a participant suffers from extremely high jitter (commonly seen using applications or browsers on WiFi networks), the jitter buffer may swell to compensate, causing their media to be significantly delayed. Once the jitter buffer has grown, it will not shrink - even if the jitter is eliminated on the media stream. At sizes greater than ~250ms the jitter buffer can be perceived by the participants as audio latency.

Twilio customers have communicated their interest in gaining visibility into the jitter buffer behavior and gaining programmatic means to control it. With this in mind we have added a jitterBufferSize parameter to conferences that allows the buffer setting to be configured enabling developers to tune the behavior of their conferences and optimize the experience for participants with different network conditions.

How does it work?

A new attribute jitterBufferSize has been added to <Conference> TwiML:

And the AddParticipant API:

curl -v https://api.twilio.com/2010-04-01/Accounts/{{ACCOUNT_SID}}/Conferences/{{FRIENDLY_NAME}}/Participants.json \
    --data-urlencode "From=+14258675309" \
    --data-urlencode "To=+1234567890" \
    --data-urlencode "JitterBufferSize=small|medium|large|off"

The buffer may be set to small, medium, large, or off.

The small, medium, and large buffers are fixed buffer implementations with different target durations and maximum sizes. Conferences use large by default.

The off setting completely disables the buffer and mixes packets every 20ms. Packets with even fairly low levels of jitter will be completely dropped, but Twilio will add no extra latency when mixing.

The buffer value is a participant-level setting; this means the value provided for participant A does not apply to participant B, etc.

Twilio's Test Results

Our tests indicate that changing the size of the jitter buffer reduces latency at the expense of increased audio artifacts, resulting in robotic and choppy audio being introduced instead of delay. In our tests with the small/medium fixed buffer settings, we saw lower overall latencies, with some increase in dropped/corrupted audio throughout the call.

jitterBufferSizeBuffer (ms)Average LatencyMax Latency
small20~150ms~220ms
medium40~200ms~300ms
large60~300ms~400ms
jitterBufferSizeBuffer (ms)Average LatencyMax Latency
small20~200ms~280ms
medium40~360ms~460ms
large60~1000ms~1500ms

We validated this behavior with customers in real-world implementations as part of our private beta, so you can feel confident that the results described above map to the actual experience of your users.

Implementation Recommendations

Tolerance for different types of call quality issues is highly subjective and varies significantly. By exposing the jitter buffer controls, developers can tailor the jitter buffer behavior to cater to their preferences. For example, you may have one set of users who are highly sensitive to choppy or robotic audio, and prefer large jitter buffer settings to smooth out audio performance, and another set of users who would rather hear audio drop outs and degradation over latency.

Here are some general best practices we recommend:

  • Types of calls: In most cases, we recommend that the jitter buffer value be set on the Client or SIP participants only. PSTN participants (those from wireless or landline phones) are less likely to introduce jitter or packet loss, and are therefore unlikely to benefit from modifying the buffer settings. Twilio's Super Network team is monitoring our PSTN connections 24x7, but feel free to experiment as you wish.
  • Buffer settings for moderate jitter: If a participant has consistent, moderate jitter, the large jitter buffer will perform best by providing clean audio with slightly higher latency. This is the default behavior of the conference mixer if no jitter buffer setting is provided.
  • Buffer settings for high jitter: If a participant has bursts of extremely high jitter due to fluctuating local network performance, setting the buffer to small will pass through some audio artifacts to the other participants, but their audio should have reduced latency when played to other participants.
  • Troubleshooting: If you are trying to pinpoint the source of audio degradation, turning the jitter buffer off will not apply a buffer to a participant's audio stream. This is sometimes useful when you need to convince the "it sounded fine to me" crowd that it is their local network conditions degrading everyone else's experience.
  • Types of latency: The latency we are describing here is in-mixer latency; i.e. how long audio packets are rattling around inside the conference mixer itself. This is separate and distinct from transport latency caused by geographic distances. For example, a user in Singapore dialing into a conference that is being mixed in Brazil will always have some transport latency simply due to the great distance that the voice packets must traverse before even reaching the conference mixer. In isolation the jitter buffer settings won't make impact transport latency, but increased distance introduces increased opportunity for transport issues like jitter to arise, so you may find that changing the jitter buffer values for geographically remote participants improves the overall experience of latency by reducing or eliminating in-mixer latency.

What now?

There is no single answer for what's right for your users. But with this new parameter, applications that use Twilio conferences have the ability to programmatically make adjustments to the jitter buffer behavior, allow you to customize your implementations for different scenarios, and deliver the best experience for your users. We can't wait to see what you build!

Michael Carpenter (aka MC) is a telecom API lifer who has been making phones ring with software since 2001. As a Product Manager for Voice & Video Insights at Twilio, the Venn Diagram of his interests is the intersection of APIs, SIP, WebRTC, and mobile SDKs. He also knows a lot about Depeche Mode. Hit him up at mc@twilio.com