Getting Started with Twilio Video
Time to read: 8 minutes
Hey there! This blog post covers an older, pre-release version of the Twilio Video SDK, so the code below likely won’t work anymore. For the very latest, check out the Twilio Video quickstart in the language of your choice.
If a picture is worth a thousand words, what is a video worth? A million? More? That’s the question we’ll look for developers like you to answer with Twilio Video, which we’ve released in a limited beta at Signal. Twilio Video makes it easy for you to connect your users by capturing every wave, groan, and belly laugh in a high-quality, peer-to-peer video conversation built on top of standard technologies like WebRTC.
But Twilio Video is about much more than providing the server-side infrastructure you’d need and drastically reducing the amount of code to write video applications with WebRTC. Twilio’s new SDKs enable cross-platform video conversations between web and native mobile clients (iOS and Android targeted initially), and cross-platform use of APIs like WebRTC’s data channel, which allows you to send arbitrary data between clients (think chat/IP messaging or screen sharing). You will also have server-side control over conversations with a rich REST API, allowing you to to intelligently manage the client-side experience from your back end code when necessary.
Twilio’s mission is to empower developers to change the way the world communicates forever, and we feel this technology is a critical step along that path. We are excited beyond words to see what you’ll build with Twilio Video.
In this tutorial, we’ll show you how to get started with Twilio Video in desktop web browsers that support WebRTC (recent Chrome and Firefox builds should do the trick). We’ll show you the server-side code you’ll need to write to power video (spoiler alert, it’s not much) and many of the JavaScript APIs you’ll have available to build a communication experience in the browser. Sound like a plan? Then let’s get cracking!
What You’ll Need
- Twilio Account and Access to the Video Beta Program (All Signal Attendees!)
- Node.js installed on your system for the required back-end components (we’re using Node.js in this example, but Twilio’s helper libraries are available in C#, Java, Ruby, PHP, and Python also)
- A WebRTC enabled browser – the latest Chrome and Firefox releases should do nicely
What You’ll Build
- A basic video calling application with two parties in the browser
- A backend to handle generating capability tokens using your Twilio account
Let’s get to it!
Building the Video Calling Interface
Most of the fireworks for Twilio Video will be launched in the browser, so let’s start here! We’ll begin by creating a single HTML page that will become our video chat application. This page will be served by your web application as-is, and with a little bit of JavaScript and CSS will form 100% of the UI for this example.
In the static asset directory of our application, we have a file called “index.html”. We need two chunks of UI to power our application – one to allow the user to enter a name (which will allow other users to video call them using that name), and another chunk to either accept or initiate an outbound video call.
Here’s the markup we need to allow the user to enter their name.
And here’s the markup for the actual video call UI, which is initially hidden.
Twilio’s JavaScript SDK will handle inserting HTML 5 video elements into the divs we’re targeting – #me for your local video feed, and #you for the remote participant.
All together, the markup (plus a tiny bit of CSS) for the UI looks like this.
Now, we’re ready to write the JavaScript code that will power the video conversation.
Starting the Conversation
The next component you’ll need is the Twilio JavaScript SDK. This file must be loaded from a CDN managed by Twilio. The WebRTC APIs that our tools are built on are evolving rapidly, and loading the JS SDK from our CDN ensures you always have code that is compatible with the latest revisions.
We include the JS SDK using a script tag beneath our UI markup but above the </body> tag, like so:
Next, we include a version of jQuery from a CDN to make our event handling and DOM manipulation a little easier:
Next, we’ll create a script tag that will contain some actual code to drive our UI. The first thing we need to do is allow the user to specify what name they will be reachable at by other users. This name is the unique address for an object we call an “Endpoint” in the API. An Endpoint is a person or entity that can be involved in a “Conversation”, which is a shared communication channel between multiple Endpoints. An Endpoint could be a browser based client like the one we’re building, an old-school telephone on the PSTN, or an iPad application running Twilio’s iOS SDK.
To allow the user to specify their unique name, we provided a text field and a button in a basic form-like interface in our markup. We begin by attaching an event handler to the button click like so.
When the button is clicked, we need to create an Endpoint with the name the user entered. To do that, we’ll need to use the “Twilio.Endpoint” constructor along with a secure access token which will allow our browser-based client to communicate with Twilio. This value we’ll actually need to generate on the server rather than in the browser, so we’ll fetch the token via an Ajax request. We’ll check out that server code in a bit, but for now just understand that it’s generating a one-time use token, for a client with our unique name, to allow our browser to communicate with Twilio.
Here’s the initialization code all together.
At the end of the click handler, we pass the Endpoint we created to an “init” function that will set up the actual video calling UI. Let’s see how that works next.
Reach Out And Video Call Someone
After creating our endpoint, we need to configure it to accept incoming video calls, as well as initiate outbound calls to any other user we choose. That process begins in the “init” function we used a moment ago – let’s look at the key steps of that function.
The first order of business is to register an event listener for incoming calls via the “invite” event.
This event handler is passed an “Invitation” object, which in this case we will immediately accept (you could reject it as well). The “accept” function executes an async process to connect your browser to another client, and will notify you when that process is done using a promise. When the conversation is established, we pass in a function called “showConversation”, which will handle rendering the conversation video feeds in our UI. We’ll check out how that works in just a bit.
The next thing we need to do to initialize our calling UI is attach event handlers to handle the user initiating a call themselves.
When the #call button is clicked, we create a new Conversation between our endpoint and another endpoint with the name the user entered into the #other-name input box. Just like accepting a call, we specify a “showConversation” as a callback when our browser is connected to the person we are trying to call.
Finally, we need to tell our endpoint to start listening for inbound calls:
“listen” returns a promise as well, but for brevity we are omitting a callback function to handle it. Everything will work on the first try always, right? Right?
All together, the init function looks like this:
Now our app is all set to both make and receive calls. The next thing we need to do is actually show the video feeds associated with the call, which happens in the “showConversation” function which we’ll show next.
Video Killed the… Um… I Can’t Think of a Good “Radio Star” Pun
The “showConversation” function has two jobs – attach the local media stream (the input from your own web cam) and the remote stream (the video feed coming from the other person) to elements in the UI. This is done using functions called “attach”, which take a selector string for an element, to which it will append a video tag with the feeds.
Here’s what that looks like:
Now, all together, our front end code (in a single HTML file) looks like this:
That’s it for the front end – but remember that token we had to fetch from our server via Ajax? Well, it’s not going to generate itself, so let’s hop into our server code to see what we need to do to generate this token.
Our Express Webapp
In this example, our back end application is a simple Node.js web application using the popular Express web framework. Our usage of it is fairly minimal – we create an HTTP server and use Express to handle incoming requests to it. We also use the built-in Express middleware for serving static assets (HTML, CSS, JavaScript) from the “public” folder of our app. That will handle sending “index.html” to the browser when a user visits the root URL of the app.
We define only a single route, that will be requested via Ajax from the browser. This route generates the access token we’ll need to allow our browser-based code to talk to Twilio. It also initializes our application by using the Twilio REST API to generate a secure keypair which we use to sign the access token we send to the browser.
Here’s our server code all in one shot:
Most of the token generation logic is found in the “token.js” module, which exports two functions. The “initialize” function fetches the keys that we use to sign our token. The “generateToken” function generates the secure string token we send to the browser.
Our call to “initialize” happens only once on startup, after which the module-level SIGNING_KEY_SID and SIGNING_KEY_SECRET variables are populated. We’ll need these values to mint our token. We won’t dive into this code right now – eventually you will be able to create and save these values in the account portal, which is probably going to be easier than using the REST API to create them.
Where we will spend some time is in the code that generates the access token we send to the browser. Ultimately, this code will return a JSON Web Token (JWT), serialized as a string that we’ll include with our response on the “/token” route. Let’s take a look at the code we need to write to make this happen.
First, we’ll need to create a new access token, which is a helper object that will help us build our JWT. To this constructor, you’ll pass in the signing key SID from our “initialize” function (SIGNING_KEY_SID) and your Twilio Account SID (found on your dashboard).
Next, we’ll need to configure the token we generate to have a unique Endpoint name, and have permission to both accept and send conversation invites:
We also need to grant our browser-based client the ability to create NAT traversal tokens to assist in connecting browsers peer-to-peer:
Finally, we sign and generate a string representation the token:
All together, the generateToken function looks like this:
And that’s it! Now you’re ready to start making and receiving video calls in the browser.
Wrapping Up
Video is the first step on a longer journey to open up scalable IP communications to every developer in every application. Using video, you’ll be able to connect your users in rich conversations where more emotion and meaning are sent over the wire than what could be accomplished in a voice-only call.
In a short time, you’ll be able to create cross-platform interactions of this kind, connecting iOS, Android, and web apps seamlessly using the same infrastructure. We can’t wait to see what you build! Please hit us up at help@twilio.com with any questions, and we’d love to help you out.
Related Posts
Related Resources
Twilio Docs
From APIs to SDKs to sample apps
API reference documentation, SDKs, helper libraries, quickstarts, and tutorials for your language and platform.
Resource Center
The latest ebooks, industry reports, and webinars
Learn from customer engagement experts to improve your own communication.
Ahoy
Twilio's developer community hub
Best practices, code samples, and inspiration to build communications and digital engagement experiences.