How to Play Audio Files in a Twilio Video Call
A common need in video calling applications is to allow a user to play an audio file for the other participants. This could be used to add background music to a call, to share a recorded conversation, or just to make calls more fun with sound effects or a rickroll.
In this short tutorial you will learn how a participant in a Twilio Video call can publish a secondary audio track and play back an audio file on it.
Brief introduction to the MediaStream API
The JavaScript Twilio Video library uses existing APIs in the browser to obtain access to the camera and microphone. More specifically, it uses the MediaDevices.getUserMedia() function to access MediaStream objects for these devices, which expose the raw video and audio tracks that are then published to the video call.
Interestingly, there are other ways to obtain MediaStream
objects, beyond the camera and the microphone. The MediaDevices.getDisplayMedia() function prompts the user to select a display, window or browser tab to share, and returns a MediaStream
object with a video track of the selected screen element, allowing the user to share their screen on the call (there is a tutorial available for this if you are interested).
Another one that is less known is the HTMLMediaElement.captureStream() function, which returns a MediaStream
object associated with a <video>
or <audio>
element. To play an audio file on a video call, the browser application from the originating participant must get the MediaStream
from an <audio>
element and publish it to the call as a local audio track. Once the track is published, any audio played on this element will be received by the other participants in the call.
The following sections describe how to use HTMLMediaElement.captureStream()
play audio files on a video call. Near the end of the article you can find the link to a complete example that you can try.
Adding an audio element
The audio file will need to play in a <audio>
element on the page of the originating participant. This element can be created dynamically with JavaScript when needed, but given that it can be a completely invisible element, it can also be conveniently created from the start.
An invisible audio element can be added to the page with the following HTML:
The id
attribute is not required, but makes it easier to locate this element from JavaScript later on. Using vanilla JavaScript, the element can be accessed as follows:
Loading an audio file
When the participant decides to play audio on the call, you must select which audio file to play. If the audio file is known in advance, provide it as the src
attribute when the <audio>
element is defined:
In many cases the application should allow the user to select an audio file while the call is taking place. This can be done in the browser via drag and drop, or with a file input element. In both cases, the selected file can be retrieved with JavaScript as a File object.
This File
object needs to be converted to a URL that can be assigned to the src
attribute of the audio element. The URL.createObjectURL() function does this for us:
Loading the audio into the audio element happens asynchronously. The canplay
event fires when the audio element is ready to play the file.
Publishing the audio track to the video room
The audio element is now ready to play the audio file, so the next step is to publish a new audio track to the video room.
The captureStream()
method of the audio element returns a MediaStream
instance, which includes all the media tracks that are available. The audio tracks are provided by the getAudioTracks()
method. Since we are interested in a single audio track, we can take the first track and ignore any extra ones.
The Twilio Video library uses the LocalAudioTrack class to represent audio from the local participant. The constructor from this class accepts standard browser’s audio tracks such as the one stored in audioStream
above..
The bgAudioTrack
variable is defined globally because this track will need to be accessed later when audio playback ends to clean everything up.
Now the LocalAudioTrack
can be published to the video room:
Playing the audio
The audio track is now published, and all participants are ready to receive audio on it. The last step to share this audio is to tell the audio element to start playing. If you are using a visible audio element, this can be done manually by the user, but in the case of an invisible audio element, you can use the play()
method:
At this point the audio will start playing for the local participant, and will also be streamed to the remaining participants as a secondary audio track from this participant. The participant sharing this audio file will still be able to speak on their microphone, as these are two independent audio tracks.
Cleanup
The application needs to decide when to stop sharing audio. One option is to offer a UI element for the user to stop playback, or it may rely on the controls offered in a visible audio element. Another option is to wait for the audio playback to end. This really depends on the application, but whenever the application decides to stop sharing this audio, the audio track that was published to the call must be unpublished.
In the following example, a handler for the audio element’s ended
event is used to perform the cleanup operations:
When the track is unpublished, all the participants will remove it.
A working example
Are you interested in trying this out with a fully working application? I have implemented the techniques discussed in this article on the project I developed for my serverless video tutorial.
To try it out you need the following:
- A Node.js version that is compatible with Twilio Functions. Currently (March 2022), versions 12 and 14 are supported (14 is recommended, since 12 is about to drop out of maintenance). You can download a Node.js installer from the Node.js website.
- A free or paid Twilio account. If you are new to Twilio get your free account now! This link will give you $10 when you upgrade.
- The Twilio CLI. You can find installation instructions on the Twilio CLI Quickstart page of the documentation.
- The Twilio Serverless Toolkit. This installs as a plugin to the Twilio CLI. Find installation instructions in the documentation.
Clone the project’s repository with the following commands:
Note that the audio file playback support is in the bgaudio
branch of the repository.
Create a file named .env in the project directory with the following contents:
If you don’t know what to set these three variables to, see the original tutorial for detailed instructions.
Run the project locally with the following command, and navigate to the application in your browser at http://localhost:3000/index.html.
The command below deploys the application to the Twilio Serverless platform:
Note that for this command to work your Twilio CLI must be authenticated in advance. You can authenticate with the twilio login
command, or by setting the TWILIO_ACCOUNT_SID
and TWILIO_AUTH_TOKEN
environment variables.
The deploy command will give you the URLs for all the assets and functions. You can use the URL for the index.html file, or just navigate to the domain, without any files.
Once the application is running, connect to the video room in two or more browsers, and then drag and drop an audio file on the local video on any of the browsers to play the audio on the call.
Next steps
I hope this article gives you some ideas on how to work with audio files in your video calling application.
If you are wondering if there is a way to apply the same technique to video, the answer is yes! Users can share a secondary video track as well as audio. The code examples shown in this article can be adapted to work with a video element, which may expose secondary video and audio tracks to share media played in a local video element. Let me know what the results are if you attempt this.
I can’t wait to see what you build with Twilio Video!
Miguel Grinberg is a Principal Software Engineer for Technical Content at Twilio. Reach out to him at mgrinberg [at] twilio [dot] com if you have a cool project you’d like to share on this blog!
Related Posts
Related Resources
Twilio Docs
From APIs to SDKs to sample apps
API reference documentation, SDKs, helper libraries, quickstarts, and tutorials for your language and platform.
Resource Center
The latest ebooks, industry reports, and webinars
Learn from customer engagement experts to improve your own communication.
Ahoy
Twilio's developer community hub
Best practices, code samples, and inspiration to build communications and digital engagement experiences.