Getting Started with Twilio Video

May 19, 2015
Written by

Twilio Video

Hey there! This blog post covers an older, pre-release version of the Twilio Video SDK, so the code below likely won’t work anymore. For the very latest, check out the Twilio Video quickstart in the language of your choice.

If a picture is worth a thousand words, what is a video worth? A million? More? That’s the question we’ll look for developers like you to answer with Twilio Video, which we’ve released in a limited beta at Signal. Twilio Video makes it easy for you to connect your users by capturing every wave, groan, and belly laugh in a high-quality, peer-to-peer video conversation built on top of standard technologies like WebRTC.

But Twilio Video is about much more than providing the server-side infrastructure you’d need and drastically reducing the amount of code to write video applications with WebRTC. Twilio’s new SDKs enable cross-platform video conversations between web and native mobile clients (iOS and Android targeted initially), and cross-platform use of APIs like WebRTC’s data channel, which allows you to send arbitrary data between clients (think chat/IP messaging or screen sharing). You will also have server-side control over conversations with a rich REST API, allowing you to to intelligently manage the client-side experience from your back end code when necessary.

Twilio’s mission is to empower developers to change the way the world communicates forever, and we feel this technology is a critical step along that path. We are excited beyond words to see what you’ll build with Twilio Video.

In this tutorial, we’ll show you how to get started with Twilio Video in desktop web browsers that support WebRTC (recent Chrome and Firefox builds should do the trick). We’ll show you the server-side code you’ll need to write to power video (spoiler alert, it’s not much) and many of the JavaScript APIs you’ll have available to build a communication experience in the browser. Sound like a plan? Then let’s get cracking!

What You’ll Need

  • Twilio Account and Access to the Video Beta Program (All Signal Attendees!)
  • Node.js installed on your system for the required back-end components (we’re using Node.js in this example, but Twilio’s helper libraries are available in C#, Java, Ruby, PHP, and Python also)
  • A WebRTC enabled browser – the latest Chrome and Firefox releases should do nicely

What You’ll Build

  • A basic video calling application with two parties in the browser
  • A backend to handle generating capability tokens using your Twilio account

Let’s get to it!

Building the Video Calling Interface

Most of the fireworks for Twilio Video will be launched in the browser, so let’s start here! We’ll begin by creating a single HTML page that will become our video chat application. This page will be served by your web application as-is, and with a little bit of JavaScript and CSS will form 100% of the UI for this example.

In the static asset directory of our application, we have a file called “index.html”. We need two chunks of UI to power our application – one to allow the user to enter a name (which will allow other users to video call them using that name), and another chunk to either accept or initiate an outbound video call.

Here’s the markup we need to allow the user to enter their name.

<div id="startDiv">
    <p>Enter Your Name Here:</p>
    <input id="my-name" 
        placeholder="your name"/>
    <button id="start">Let's Do This!</button>
</div>

And here’s the markup for the actual video call UI, which is initially hidden.

<div id="callDiv" style="display:none;">
    <div id="me"></div>
    <div id="you"></div>
    <div>
        <input id="other-name" placeholder="other person's name"/>
        <button id="call">Video Call This Person</button>
    </div>
</div>

Twilio’s JavaScript SDK will handle inserting HTML 5 video elements into the divs we’re targeting – #me for your local video feed, and #you for the remote participant.

All together, the markup (plus a tiny bit of CSS) for the UI looks like this.

<!DOCTYPE html>
<html>
<head>
    <title>Getting Started with Twilio Video</title>
    <style type="text/css">
    #me, #you { display:inline-block; }
    </style>
</head>
<body>
    <h1>Getting Started with Twilio Video</h1>

    <!-- Begin by specifying a name for your endpoint -->
    <div id="startDiv">
        <p>Enter Your Name Here:</p>
        <input id="my-name" 
            placeholder="your name"/>
        <button id="start">Let's Do This!</button>
    </div>

    <!-- Here's the call UI -->
    <div id="callDiv" style="display:none;">
        <div id="me"></div>
        <div id="you"></div>
        <div>
            <input id="other-name" placeholder="other person's name"/>
            <button id="call">Video Call This Person</button>
        </div>
    </div>

</body>
</html>

Now, we’re ready to write the JavaScript code that will power the video conversation.

Starting the Conversation

The next component you’ll need is the Twilio JavaScript SDK. This file must be loaded from a CDN managed by Twilio. The WebRTC APIs that our tools are built on are evolving rapidly, and loading the JS SDK from our CDN ensures you always have code that is compatible with the latest revisions.

We include the JS SDK using a script tag beneath our UI markup but above the </body> tag, like so:

<script src="//media.twiliocdn.com/sdk/conversations/v0.7/js/releases/0.7.1.b1-7238b35/twilio-conversations-loader.min.js"></script>

Next, we include a version of jQuery from a CDN to make our event handling and DOM manipulation a little easier:

<script src="//ajax.googleapis.com/ajax/libs/jquery/2.1.4/jquery.min.js"></script>

Next, we’ll create a script tag that will contain some actual code to drive our UI. The first thing we need to do is allow the user to specify what name they will be reachable at by other users. This name is the unique address for an object we call an “Endpoint” in the API. An Endpoint is a person or entity that can be involved in a “Conversation”, which is a shared communication channel between multiple Endpoints. An Endpoint could be a browser based client like the one we’re building, an old-school telephone on the PSTN, or an iPad application running Twilio’s iOS SDK.

To allow the user to specify their unique name, we provided a text field and a button in a basic form-like interface in our markup. We begin by attaching an event handler to the button click like so.

// Initialize endpoint
$('#start').on('click', function() {
    // this is the code we’ll see next
});

When the button is clicked, we need to create an Endpoint with the name the user entered. To do that, we’ll need to use the “Twilio.Endpoint” constructor along with a secure access token which will allow our browser-based client to communicate with Twilio. This value we’ll actually need to generate on the server rather than in the browser, so we’ll fetch the token via an Ajax request. We’ll check out that server code in a bit, but for now just understand that it’s generating a one-time use token, for a client with our unique name, to allow our browser to communicate with Twilio.

Here’s the initialization code all together.

// Initialize endpoint
$('#start').on('click', function() {
    // First, grab the SAT token from the server
    $.getJSON('/token', {
        name: $('#my-name').val()
    }, function(data) {
        console.log('Token response:');
        console.log(data);

        // Create the endpoint, and then initialize the main calling app
        var endpoint = new Twilio.Endpoint(data.token);
        $('#startDiv').hide();
        $('#callDiv').show();
        init(endpoint);
    });
});

At the end of the click handler, we pass the Endpoint we created to an “init” function that will set up the actual video calling UI. Let’s see how that works next.

Reach Out And Video Call Someone

After creating our endpoint, we need to configure it to accept incoming video calls, as well as initiate outbound calls to any other user we choose. That process begins in the “init” function we used a moment ago – let’s look at the key steps of that function.

The first order of business is to register an event listener for incoming calls via the “invite” event.

endpoint.on('invite', function(invitation) {
    invitation.accept().then(showConversation);
});

This event handler is passed an “Invitation” object, which in this case we will immediately accept (you could reject it as well). The “accept” function executes an async process to connect your browser to another client, and will notify you when that process is done using a promise. When the conversation is established, we pass in a function called “showConversation”, which will handle rendering the conversation video feeds in our UI. We’ll check out how that works in just a bit.

The next thing we need to do to initialize our calling UI is attach event handlers to handle the user initiating a call themselves.

// Start a conversation
$('#call').on('click', function() {
    endpoint.createConversation($('#other-name').val())
        .then(showConversation);
});

When the #call button is clicked, we create a new Conversation between our endpoint and another endpoint with the name the user entered into the #other-name input box. Just like accepting a call, we specify a “showConversation” as a callback when our browser is connected to the person we are trying to call.

Finally, we need to tell our endpoint to start listening for inbound calls:

endpoint.listen();

“listen” returns a promise as well, but for brevity we are omitting a callback function to handle it. Everything will work on the first try always, right? Right?

All together, the init function looks like this:

// Initialize video calling app with my endpoint
function init(endpoint) {
    console.log('Endpoint Created:');
    console.log(endpoint);

    // Automatically accept any incoming calls
    endpoint.on('invite', function(invitation) {
        invitation.accept().then(showConversation);
    });

    // Start an outbound conversation
    $('#call').on('click', function() {
        endpoint.createConversation($('#other-name').val())
            .then(showConversation);
    });

    // Listen for incoming calls
    endpoint.listen();
}

Now our app is all set to both make and receive calls. The next thing we need to do is actually show the video feeds associated with the call, which happens in the “showConversation” function which we’ll show next.

Video Killed the… Um… I Can’t Think of a Good “Radio Star” Pun

The “showConversation” function has two jobs – attach the local media stream (the input from your own web cam) and the remote stream (the video feed coming from the other person) to elements in the UI. This is done using functions called “attach”, which take a selector string for an element, to which it will append a video tag with the feeds.

Here’s what that looks like:

// Show a conversation (inbound or outbound)
function showConversation(conversation) {
    // Attach to DOM
    conversation.localMedia.attach('#me');

    // Listen for participants
    conversation.on('participantConnected', function(participant) {
        participant.media.attach('#you');
    });
}

Now, all together, our front end code (in a single HTML file) looks like this:

<!DOCTYPE html>
<html>
<head>
    <title>Getting Started with Twilio Video</title>
    <style type="text/css">
    #me, #you { display:inline-block; }
    </style>
</head>
<body>
    <h1>Getting Started with Twilio Video</h1>

    <!-- Begin by specifying a name for your endpoint -->
    <div id="startDiv">
        <p>Enter Your Name Here:</p>
        <input id="my-name" 
            placeholder="your name"/>
        <button id="start">Let's Do This!</button>
    </div>

    <!-- Here's the call UI -->
    <div id="callDiv" style="display:none;">
        <div id="me"></div>
        <div id="you"></div>
        <div>
            <input id="other-name" placeholder="other person's name"/>
            <button id="call">Video Call This Person</button>
        </div>
    </div>

    <!-- Release the JavaScripts -->
    <script src="//ajax.googleapis.com/ajax/libs/jquery/2.1.4/jquery.min.js"></script>
    <script src="//media.twiliocdn.com/sdk/conversations/v0.7/js/releases/0.7.1.b1-7238b35/twilio-conversations-loader.min.js"></script>
    <script>
    // Initialize endpoint
    $('#start').on('click', function() {
        // First, grab the SAT token from the server
        $.getJSON('/token', {
            name: $('#my-name').val()
        }, function(data) {
            console.log('Token response:');
            console.log(data);

            // Create the endpoint, and then initialize the main calling app
            var endpoint = new Twilio.Endpoint(data.token);
            $('#startDiv').hide();
            $('#callDiv').show();
            init(endpoint);
        });
    });

    // Initialize video calling app with my endpoint
    function init(endpoint) {
        console.log('Endpoint Created:');
        console.log(endpoint);

        // Automatically accept any incoming calls
        endpoint.on('invite', function(invitation) {
            invitation.accept().then(showConversation);
        });

        // Start an outbound conversation
        $('#call').on('click', function() {
            endpoint.createConversation($('#other-name').val())
                .then(showConversation);
        });

        // Listen for incoming calls
        endpoint.listen();
    }

    // Show a conversation (inbound or outbound)
    function showConversation(conversation) {
        // Attach to DOM
        conversation.localMedia.attach('#me');

        // Listen for participants
        conversation.on('participantConnected', function(participant) {
            participant.media.attach('#you');
        });
    }
    </script>
</body>
</html>

That’s it for the front end – but remember that token we had to fetch from our server via Ajax? Well, it’s not going to generate itself, so let’s hop into our server code to see what we need to do to generate this token.

Our Express Webapp

In this example, our back end application is a simple Node.js web application using the popular Express web framework. Our usage of it is fairly minimal – we create an HTTP server and use Express to handle incoming requests to it. We also use the built-in Express middleware for serving static assets (HTML, CSS, JavaScript) from the “public” folder of our app. That will handle sending “index.html” to the browser when a user visits the root URL of the app.

We define only a single route, that will be requested via Ajax from the browser. This route generates the access token we’ll need to allow our browser-based code to talk to Twilio. It also initializes our application by using the Twilio REST API to generate a secure keypair which we use to sign the access token we send to the browser.

Here’s our server code all in one shot:

var http = require('http');
var path = require('path');
var express = require('express');
var token = require('./token');

// Create Express app and HTTP Server, serve up static HTML/CSS/etc from the
// public directory
var app = express();
app.use(express.static(path.join(__dirname, 'public')));
var server = http.createServer(app);

// Generate a JWT token to use with the video SDK
app.get('/token', function(request, response) {
    // Generate a token for the name requested, with both "listen" and "invite"
    // permissions (the default set of permissions)
    response.send({
        token: token.generateToken(request.query.name)
    });
});

// Initialize the app
token.initialize(function(err) {
    // If there was an error during init, log it and fail
    if (err) return console.error(err);

    // Otherwise start up the app server
    var port = process.env.PORT || 3000;
    server.listen(port, function() {
        console.log('Express server now listening on *:' + port);
    });
});

Most of the token generation logic is found in the “token.js” module, which exports two functions. The “initialize” function fetches the keys that we use to sign our token. The “generateToken” function generates the secure string token we send to the browser.

Our call to “initialize” happens only once on startup, after which the module-level SIGNING_KEY_SID and SIGNING_KEY_SECRET variables are populated. We’ll need these values to mint our token. We won’t dive into this code right now – eventually you will be able to create and save these values in the account portal, which is probably going to be easier than using the REST API to create them.

Where we will spend some time is in the code that generates the access token we send to the browser. Ultimately, this code will return a JSON Web Token (JWT), serialized as a string that we’ll include with our response on the “/token” route. Let’s take a look at the code we need to write to make this happen.

First, we’ll need to create a new access token, which is a helper object that will help us build our JWT. To this constructor, you’ll pass in the signing key SID from our “initialize” function (SIGNING_KEY_SID) and your Twilio Account SID (found on your dashboard).

var token = new twilio.AccessToken(
    // Sid for the signing key we generated on init
    SIGNING_KEY_SID,
    // your regular account SID
    process.env.TWILIO_ACCOUNT_SID
);

Next, we’ll need to configure the token we generate to have a unique Endpoint name, and have permission to both accept and send conversation invites:

token.addEndpointGrant(name);

We also need to grant our browser-based client the ability to create NAT traversal tokens to assist in connecting browsers peer-to-peer:

var resUrl = 'https://api.twilio.com/2010-04-01/Accounts/%s/Tokens.json';
var grantUrl = util.format(resUrl, process.env.TWILIO_ACCOUNT_SID);
token.addGrant(grantUrl);

Finally, we sign and generate a string representation the token:

return token.toJwt(SIGNING_KEY_SECRET);

All together, the generateToken function looks like this:

// Helper function to generate an access token to enable a client to use Twilio 
// Video in the browser. Grants limited permissions to use
// Twilio back end services for NAT traversal and general "endpoint" services
// like listening for inbound calls and initiating outbound calls.
exports.generateToken = function(name) {
    var token = new twilio.AccessToken(
        // Sid for the signing key we generated on init
        SIGNING_KEY_SID,
        // your regular account SID
        process.env.TWILIO_ACCOUNT_SID
    );

    // Add the capabilities for conversation endpoints to this token, including
    // it's unique name. We'll use the default permission set of "listen" and 
    // "invite"
    token.addEndpointGrant(name);

    // Authorize the client to use Twilio's NAT traversal service - for that,
    // it will need access to the "Tokens" resource
    var resUrl = 'https://api.twilio.com/2010-04-01/Accounts/%s/Tokens.json';
    var grantUrl = util.format(resUrl, process.env.TWILIO_ACCOUNT_SID);
    token.addGrant(grantUrl);

    // Generate a JWT token string that will be sent to the browser and used
    // by the Conversations SDK
    return token.toJwt(SIGNING_KEY_SECRET);
};

And that’s it! Now you’re ready to start making and receiving video calls in the browser.

Wrapping Up

Video is the first step on a longer journey to open up scalable IP communications to every developer in every application. Using video, you’ll be able to connect your users in rich conversations where more emotion and meaning are sent over the wire than what could be accomplished in a voice-only call.

In a short time, you’ll be able to create cross-platform interactions of this kind, connecting iOS, Android, and web apps seamlessly using the same infrastructure. We can’t wait to see what you build! Please hit us up at help@twilio.com with any questions, and we’d love to help you out.