Skip to contentSkip to navigationSkip to topbar
On this page

Developing High Quality Video Applications


This guide provides advice for developing high-quality Twilio Video applications. For an optimal end user experience, we highly recommend that you read the complete Twilio Programmable Video documentation and tailor our general recommendations provided here to your specific use-case.


Settings overview

settings-overview page anchor

Use this table as a fast guide to find the recommended settings for your application.

Choosing the column

Choosing the row

  • To choose your mode (i.e. grid, collaboration or presentation), use this section. If in doubt, use collaboration.
Desktop BrowserMobile BrowserMobile SDK
gridRecommended SettingsRecommended SettingsRecommended Settings
collaborationRecommended SettingsRecommended SettingsRecommended Settings
presentationRecommended SettingsRecommended SettingsRecommended Settings

What does quality mean?

what-does-quality-mean page anchor

Quality is an elusive concept that may have different meanings in different contexts. With Twilio Programmable Video, quality is a synonym of Quality of Experience understood as how well a video application solves end users' needs and addresses their expectations.

Videoconferencing is the most typical use-case of real-time video applications. They allow end users to communicate "as they do face-to-face." Hence, end users expectations are high fidelity (for example, high resolution and high frame-rate) and low latency (for example, real-time conversational interactions). However, the quality of experience may also be impacted by other aspects such as battery consumption, availability of computing and networking resources, etc. Some of the variables affecting quality impact one another. For example, if you increase the video resolution then the battery consumption and the networking costs will also increase.

Hence, before starting developing a high-quality video application, first you must wonder: what do end users need and expect? Having a precise answer to that question will help you make the most appropriate decisions for quality optimization.


Concepts and terminology

concepts-and-terminology page anchor

You may find useful the following concepts and definitions:

Resolution

Video tracks can be understood as sequences of still images each of which is encoded as a matrix of pixels. The resolution refers to the dimensions of such a matrix expressed as width x height. The following resolutions are common:

ResolutionDimensions (pixels)
FullHD (Full High Definition) - aka 1080p1920x1080 [1]
HD (High Definition) - aka 720p1280x720
qHD (Quarter High Definition) - aka 540p960x540 [1]
VGA (Video Graphics Array)640x480
QCIF (Quarter Common Interface Format)176×144

Frame-rate

The frame-rate refers to the number of still images that the video stream includes per time unit. It is typically expressed in terms of fps (frames per second). Hence, an HD@30fps video will comprise a sequence of 30 HD still images per second.

Bitrate

The bitrate refers to the number of bits that a given video or audio stream consumes when being transported through a digital network. It is typically measured in terms of bps (bits per second) sometimes prefixed with a power of 10 prefix (e.g. Kbps, Mbps, etc).

Codecs: VP8, H.264, and VP8 Simulcast

A codec refers to a type of algorithm that encodes a video signal typically compressing it in the process. VP8 and H.264 are the two main codecs used for videoconferencing. VP8 Simulcast is a scalable version of the VP8 codec. For further information, you may read our Managing Codecs and Working with VP8 Simulcast developer guides.

Network Bandwidth Profile API

The Network Bandwidth Profile API (aka BW Profile API) is a Twilio Video API specifically designed for optimizing bandwidth utilization in Video Rooms. This is a critical API for creating high-quality video applications.

Track Priority API

The Track Priority API allows developers to set the relative priority of Tracks in a video application. The Network Bandwidth Profile API uses Track priorities to assign bandwidth to tracks.

Dominant Speaker Detection API

The Dominant Speaker is the participant having the highest audio activity at a given time. Many videoconferencing applications enhance the Dominant Speaker (e.g. by representing it larger in the central area of the UI). Twilio's Dominant Speaker Detection API makes it possible for developers to be notified when the Dominant Speaker changes in a Room. Refer to our Detecting the Dominant Speaker developers guide for further guidance.

Network Quality API

The Network Quality API is a Video API specifically designed for monitoring the network quality on Rooms. Refer to Using the Network Quality API developer guide for further information.


Minimum Bandwidth Recommendations

min-bw-recommendations page anchor

Video Bitrates

The bandwidth requirement of video streams will depend on the codec, resolution and frame rate. The following table describes the minimum bandwidth required for various codecs and resolutions. In all cases the frame rate is assumed to be 30 fps.

Video CodecResolution (width x height)Bitrate (kbps)
VP8176x144150
VP8640x480400
VP81280x720650
VP81920x1080 [1]1,200
VP8 Simulcast176x144150
VP8 Simulcast640x480550
VP8 Simulcast1280x7201,400
VP8 Simulcast1920x10803,000
H.264 [2]176x144125
H.264 [2]640x480400
H.264 [2]1280x720600

Screen Share Bitrates

Screen share typically uses a frame rate of 5 fps. The following table describes the minimum bandwidth required for various codecs and resolutions. In all cases the frame rate is assumed to be 5 fps.

Video CodecResolution (width x height)Bitrate (kbps)
VP81280x72085
VP81920x1080 [1]175
VP8 Simulcast1280x720700
VP8 Simulcast1920x10801,800
H.264 [2]1280x72090

[1]: Note that on some devices, frame dimensions may differ slightly due to limitations of some hardware video encoders requiring the dimensions to be a multiple of 16. For example, 960x540 may actually be 960x528.

[2]: Note that each device or browser has a different H.264 codec implementation and as such there will be some variance to the bitrates presented above.

Audio Bitrates

The default bitrate for the Opus codec is 32 kbps.


Before going deep into the technical details, it may be interesting to understand some general common-sense recommendations that you may find useful in your design process.

Subscribe only to what end users need

Encoding, communicating and rendering video tracks is expensive. This is very noticeable in multiparty applications when the number of participants is large. For example, in a room with 20 participants, it is generally a bad idea to have all the participants rendering 20 high resolution video tracks. That could contribute to network congestion and will overload the client CPU resources making the quality of experience unacceptable. Instead, well-designed videoconferencing services tend to limit the number of rendered video tracks to the ones that are really required. For example, in an e-learning application, it doesn't provide much value having all the students rendering the video of the rest of the students all the time. It is more reasonable to do it only in special situations such as when a question is being asked by that specific participant. In that case, developers must make use of the Network Bandwidth Profile API, which dynamically adjusts to the dominant speaker and rendered size of the participants who are displayed on screen. In addition, the Network Bandwdith Profile API can automatically switch off video tracks that are not visible on screen.

Make it simple for end users to mute

Your application should provide mute capabilities to end users so that they can disable the video or audio communication as they wish. This will avoid unnecessary traffic and background noise.

Use VP8 Simulcast in multiparty Rooms

Multiparty Rooms participants should prefer VP8 Simulcast over other video codecs. The larger the number of participants in a room, the more important Simulcast is for providing the best possible quality of experience.

Use a reasonable resolution and frame-rate

Frame-rate and resolution are the two main capture constraints that affect video fidelity. When the video source is a camera showing people or moving objects, typically the perceptual quality is better at higher frame-rate. However, for screen-sharing, the resolution is typically more relevant. You should try to set resolution and frame-rate to the minimum value required by your use-case. Over-dimensioning resolution and frame-rate will have a negative impact on the CPU and network consumption and may increase latency. In addition, remember that the resolution and frame-rate you specify as capture constraints are just hints for the client video engine. The actual resolution and frame-rate may decrease if CPU overuse is detected or if the network capacity is not enough for the required traffic.

Consider the render dimensions

When setting your video capture constraints for publishers you must also wonder about the render size on the subscriber's side. If you know that a given video track is to be rendered only in thumbnail size for all subscribers, then it does not make sense to capture it in high resolution at the publisher.

Do not share resources

High-resolution video and audio consume relevant CPU and bandwidth resources. If those resources are being shared with other applications the quality of experience will decrease. To have the best possible experience, you should recommend your end users to close all the applications that may steal CPU or bandwidth to your video service while it's executing.

Use the best connectivity you can find

Network connectivity is the most critical aspect affecting communication quality. Restricted bandwidth, high latency, and packet loss may affect very negatively your end users' experience. Hence, you should recommend using the best possible network access they may find: wired connectivity is commonly better than a wireless connection. Among wireless connections, typically corporate or cellular connectivity is better than public open shared WiFi networks.

Using maxVideoBitrate or maxAudioBitrate

Both parameters allow controlling the maximum Participant's upstream bandwidth.

  • maxVideoBitrate specifies the maximum video bitrate a participant can publish to the Room. By default, no value is set and the maxVideoBitrate is unlimited. In that case, the bitrate is only limited by the Twilio client SDK using an algorithm that considers the available bandwidth and CPU resources. In general, we recommend trusting that algorithm and avoid setting the maxVideoBitrate. However, in devices with restricted CPU or battery life we recommend setting maxVideoBitrate to a value between 500000 and 2000000 bps per track. Note, if a Participant is Publishing N video tracks then each video track will be limited to consuming maxVideoBitrate/N.
  • maxAudioBitrate specifies the maximum audio bitrate published by a Participant. It only takes effect when using Opus (i.e. it has no effect on PCM codecs). By default it is unset and Opus is configured with its default settings consuming between 20Kbps and 40Kbps. Twilio's recommendation is to keep the default. However, when the audio is human speech, we may restrict maxAudioBitrate to 16Kbps to save bandwidth without any significant quality degradation. Do not restrict maxAudioBitrate if you intent to communicate music or other type of audio signal beyond human speech.
RecommendationWhen to use it
maxVideoBitrateKeep default (unset)In mobile platforms keep it between 500000 and 2000000 bps per video track
maxAudioBitrateKeep default (unset)In speech communications keep it over 16000 bps per audio track

Use GLL

On the Internet, latency and packet loss depend on geolocation. When the connection between a sender and a receiver spans the globe, latency and jitter are increased by the distance between the parties. Packet loss is also more likely, due to the number of routers in the connection path. Due to this, the Twilio infrastructure that serves your rooms should be as close as possible to your clients. Otherwise, quality may be affected:

  • Connectivity time may increase.
  • Media latency and packet loss may increase making the fidelity drop.

To minimize these problems, Twilio makes it possible to specify the signaling and media regions for your Rooms. However, determining what's the closest region for a participant is not always trivial. For this reason, we recommend developers use GLL (Global Low Latency). When GLL is specified, Twilio will automatically choose the region that minimizes latency. See our Video Regions and Global Low Latency documentation for further insight.

Measure

Quality should be understood as a process. You should try to measure both your end users' perception as well as the many different factors that may affect it including CPU consumption and network connectivity metrics. You may find Twilio's Network Quality API interesting for the latter. With that information, try to understand your end users' pain points and design a strategy to minimize them. Periodically repeating the measure-analyze-implement cycle is the best way to guarantee you are offering the best possible quality of experience to your users.


Additional tools for enhancing quality

additional-tools-for-enhancing-quality page anchor

Room quality strongly depends on how the bandwidth is managed. To optimize quality, you must make sure that your video tracks are appropriately prioritized and that bandwidth is allocated in alignment with your use-case needs. This is done using the Track Priority API and the Network Bandwidth Profile API.

Track Priority API: Recommendations

track-priority-api-recommendations page anchor

Track priorities are used to determine the importance of tracks. They are used to allocate bandwidth and to decide which tracks should be switched off in case of congestion. Track priorities are use-case dependent and setting them correctly is essential for having optimal quality. The following general guidelines may be helpful for that objective:

Audio track priorities

  • From the perspective of the Network Bandwidth Profile API, audio tracks are always a higher priority than video tracks. Hence, you may think of audio as being in a special more important category.
  • Setting the priority of an audio track will have no effect in your application.

Video track priorities

  • If there is one participant or video track that is more important than the others then this should be set to high priority so that in the case of network congestion this video track will be the last to be switched off
  • Typically there should be only one video track with priority high. When screen-sharing, the screen should be the high priority track. If screen-share is absent, and Dominant Speaker Detection is activated, the dominant speaker video may be the high priority track.
  • You may need to dynamically adapt video track priorities. For example, dominantSpeakerPriority may need to go from high to low when a screen-share is activated.

Network Bandwidth Profile API: Selecting the mode

network-bandwidth-profile page anchor

Determining the Bandwidth Profile Mode Bandwidth Profiles have three modes: collaboration, presentation, and grid. You can determine the mode that best fits your use-case with the following decision diagram:

Decision diagram for Network Bandwidth Profile mode selection.

Is it a multiparty service?

  • If your application is only used for 1-to-1 communications (i.e. there are never more than 2 Participants in the Room) answer NO. Otherwise, answer YES.

Is there a main video track?

  • If your application UI renders all video tracks with the same display size, answer NO. If your application has one (or several) video tracks that are enhanced in the UI (e.g. dominant speaker, screen-share, etc.) taking more display area answer YES.

Can I use VP8 Simulcast?

  • If a relevant fraction of your application end users cannot use VP8 simulcast (e.g. because you have decided to use H.264, or because it's not supported, etc.) answer NO. Otherwise, answer YES.

Is the main track quality critical?

  • If you prefer the main video track quality to be preserved by all means, even at the cost of completely switching off other less relevant tracks when bandwidth is low (e.g. the screen-share in a presentation), answer YES. Otherwise, answer NO.

Developing Applications with grid mode

grid-mode page anchor

Applications use grid mode for one of the following reasons:

  • The application is 1-to-1.
  • The application is for multiparty communications but the UI layout does not enhance any video tracks over others (i.e. all tracks are rendered with the same size).
  • It's not possible to use Simulcast. Note that for large rooms (i.e. rooms with 5 or more participants), not using Simulcast will typically bring a significant degradation on video quality even in grid mode.
Typical GUI layout used for grid mode. Videos are displayed in a matrix where all video tracks have equal relevance.

Developing Applications with collaboration mode

collaboration-mode page anchor

Applications using collaboration mode typically share the following properties:

  • Interactions are multiparty (i.e. a large number of participants communicate)
  • The UI layout is designed to enhance one main video track (e.g. dominant speaker).
  • The rest of the video tracks are displayed in thumbnail size.
  • Keeping all tracks visible is more important than having higher quality in the main track.
Applications using collaboration mode typically enhance the dominant speaker and represent the rest of participants in thumbnail size.

Developing Applications with presentation mode

presentation-mode page anchor

Applications using presentation mode typically share the following properties:

  • Interactions are one-to-many (i.e. one participant presents to a large audience).
  • The UI layout is designed to enhance one main video track (e.g. the presenter screen-share).
  • The rest of the video tracks may or may not be displayed as they are not so relevant.
  • Presenter quality is critical and more relevant than keeping viewers' tracks on.
Applications using presentation mode typically have a screen-share track whose quality must be maximized by all means. They may additionally display the presenter's webcam or other participants webcam but with lower priority.

Need some help?

Terms of service

Copyright © 2024 Twilio Inc.