Adding Dominant Speaker Detection for Twilio Programmable Video with TypeScript
Time to read: 6 minutes
In this article, you’ll learn to use TypeScript and Twilio Programmable Video to build a video chatting application with a dominant speaker display. You’ll use an existing base project making use of the Twilio Programmable Video JavaScript SDK (for front-end video) and the Twilio Node Helper Library (for back-end authentication) and retrofit it to support dominant speaker detection.
This article is a continuation of my last one, Add Muting and Unmuting Capability to your Twilio Programmable Video App with TypeScript, and it will build off the “adding-mute-unmute” branch of this GitHub Repository. To see the final code, visit the “adding-dominant-speaker-detection” branch.
Twilio Programmable Video is a suite of tools for building real-time video apps that scale as you grow, from free 1:1 chats with WebRTC to larger group rooms with many participants. You can sign up for a free Twilio account to get started using Programmable Video.
TypeScript is an extension of pure JavaScript - a “superset” if you will - and adds static typing to the language. It enforces type safety, makes code easier to reason about, and permits the implementation of classic patterns in a more “traditional” manner. As a language extension, all JavaScript is valid TypeScript, and TypeScript is compiled down to JavaScript.
Parcel is a blazing-fast web configuration bundler that supports hot-module replacement and which bundles and transforms your assets. You’ll use it in this article to work with TypeScript on the client without having to worry about transpilation or bundling and configuration.
Requirements
- Node.js - Consider using a tool like nvm to manage Node.js versions.
- A Twilio Account for Programmable Video. If you are new to Twilio, you can create a free account. If you sign up using this link, we’ll both get $10 in free Twilio credit when you upgrade your account.
Project Configuration
Download the project files and install dependencies
Begin by cloning the “adding-mute-unmute” branch of the accompanying GitHub Repository with the command below:
Then, install dependencies in both the client and the server project:
Configure Environment Variables
The server directory contains a small Express Application which is used to manage identity and authentication for users joining rooms (by generating tokens). Before the server will function correctly, you’ll need to specify three environment variables corresponding to your Twilio Account SID, your Twilio API Key, and your Twilio API Key Secret. You can see how the Twilio Server Library uses them within your Express Application to generate access tokens in the highlighted lines here:
This function can be found in server/src/api/controller.ts. See my article Get Started with Twilio Programmable Video Authentication and Identity using TypeScript or the relevant section of the documentation to learn more about Access Tokens.
If you are not already there from the prior step, navigate into the server folder and create a new folder within it entitled env. Within that, create a file with the name dev.env. The commands below will perform these steps:
Add the following variables to dev.env.
You can find your Account SID on the Twilio Console and you can create your API Key and API Secret here. Add these keys in their respective locations, overwriting the [Your Key]
placeholder in its entirety each time.
Note that on the API dashboard of the Console, your API key will be referred to as the API SID. Also, be sure to take note of your API Key Secret before navigating away from the page - you won’t be able to access it again.
With this, you’ve completed all necessary configuration for the project, and can move on to implementing dominant speaker detection.
Update the Client
You’ll alert participants of who the current dominant speaker is by displaying a highlighted outline/bounding box around their video stream. To accomplish this, add the following CSS Class, highlighted below, to index.html:
I chose green and 3 pixels, but you can pick any styles you want.
Next, move to the video.ts file in the src folder, find the attachTrack()
function (which is about halfway down the file near line 270), and replace it as follows:
Originally, this function would only attach the track to the DOM within your remoteMediaContainer
. With the update, it now creates a wrapper <div/>
, within which both tracks are stored, and sets the ID of the <div/>
to be the ID of the participant. This will allow you to more easily grab a reference to the individual participants’ container for styling, etc.
You can now update the onTrackSubscribed()
function (which is located on line 150) to pass through the identity of the participant in the attachTrack()
call:
You can do the same in the attachAttachableTracksForRemoteParticipant()
function:
After that, create a new function underneath manageTracksForRemoteParticipant()
but above attachAttachableTracksForRemoteParticipant()
called onDominantSpeakerChanged()
as shown:
This function will first remove any active .dominant-speaker
classes from any participants to “reset” the state of the application, and will then apply the .dominant-speaker
class to the active speaker. You’re able to query the DOM based on participant.identity
thanks to the updates you made in the attachTrack()
function earlier.
The second statement of the function uses a potentially peculiar looking syntax - ?.
. This is a JavaScript feature known as Optional Chaining, and it permits you to read values of nested properties which may be undefined
.
You can’t guarantee that document.getElementById()
will return a valid reference to a DOM Element that has a classList
property (there may in fact be no such element with that ID as far as the TypeScript Compiler is concerned). The Optional Chaining syntax here will short circuit the entire operation should the reference returned from getElementById()
be nullish, which will save you from trying to access properties on undefined
values.
While you’re here, ensure the Participant
type has been imported from twilio-video
at the top import statement if your editor/IDE doesn’t import it automatically.
With the event handler complete, you now need to wire it up. The Twilio Programmable Video JavaScript SDK emits a dominantSpeakerChanged
event, described here, which you can listen for. As its payload, it provides a reference to the participant who has transitioned into becoming the dominant speaker.
Pass a dominantSpeaker: true
flag to the connect
function and wire up the event handler from the room
object in the onJoinClick()
function towards the top of the video.ts file as shown below:
For more insight into options that can be passed into the connect()
function, view the ConnectOptions reference. For more insight into the Twilio Client Library in general, visit the SDK documentation.
With this, your project is complete!
Running the Application
Before you can demo the project, you’ll need to start the Express Application from inside the server folder with:
As mentioned earlier, this server is used for generating tokens for participants. In order to test the dominant speaker detection across different machines, you can utilize ngrok to temporarily tunnel your localhost service to the public Internet with a public URL.
Make a note of the port displayed after running npm start
above - it’ll likely be 3000
.
In another terminal window, run npx ngrok http -host-header=rewrite 3000
. This command will temporarily install ngrok and tunnel HTTP connections between localhost:3000
and a public URL. The `-host-header=rewrite” flag is to solve some reported CORS issues, and you should see output like what’s shown below:
Pick the last HTTPS link, and paste it inside client/token-repository.ts as follows:
In doing so, you’ll allow your client to access the Express Application from another machine. Be sure to append the /create-token
route on the end of the URL so you can reach the correct endpoint (which is the function for generating tokens which you saw earlier).
Next, you can open another terminal window and build the client application by navigating into the client folder and running Parcel:
Your client should start running on port 1234, or any other port of Parcel’s choosing. In order to make the client accessible from other devices too, open one more terminal window and tunnel through ngrok once more:
Make note of the HTTPS URL, and try visiting it across other devices, accepting the relevant permissions if prompted, or share it with other participants. You should notice the dominant speaker detection automatically kick in for each participant who speaks or has an elevated level of ambient environment noise.
Conclusion
In this article, you learned how to perform dominant speaker detection via the Twilio Client Library with TypeScript for Twilio Programmable Video. To view this project’s source code, visit the “adding-dominant-speaker-detection” branch at its GitHub Repository.
Jamie is an 18-year-old software developer located in Texas. He has particular interests in enterprise architecture (DDD/CQRS/ES), writing elegant and testable code, and Physics and Mathematics. He is currently working on a startup in the business automation and tech education space, and when not behind a computer, he enjoys reading and learning.
- Twitter: https://twitter.com/eithermonad
- Personal Site: https://jamiecorkhill.com/
- GitHub: https://github.com/JamieCorkhill
- LinkedIn: https://www.linkedin.com/in/jamie-corkhill-aaab76153/
Related Posts
Related Resources
Twilio Docs
From APIs to SDKs to sample apps
API reference documentation, SDKs, helper libraries, quickstarts, and tutorials for your language and platform.
Resource Center
The latest ebooks, industry reports, and webinars
Learn from customer engagement experts to improve your own communication.
Ahoy
Twilio's developer community hub
Best practices, code samples, and inspiration to build communications and digital engagement experiences.