Build a Serverless Video Chat Application with JavaScript and Twilio Programmable Video

March 02, 2022
Written by
Reviewed by
Mia Adjei
Twilion

Build a Serverless Video Chat Application with JavaScript and Twilio

In this article you are going to learn how to build a basic video conferencing solution using Twilio Video. The system is going to run on desktop and mobile web browsers, so participants will not need to download or install any software on their computers to join a call.

The application will be coded in JavaScript, with some HTML and CSS. It will be deployed to the Twilio Serverless platform, so that you can use it to connect with family, friends or colleagues anywhere in the world without having to worry about complicated hosting solutions.

Do you want to see what you’ll build before you start? Here is a screenshot.

Project screenshot

By following this article, you can learn how to create this project in small incremental steps. If you prefer to download the complete project instead of building it step-by-step, you can find the code in this GitHub repository: https://github.com/miguelgrinberg/twilio-serverless-video.

Tutorial requirements

To build the project you will need:

  • A Node.js version that is compatible with Twilio Functions. Currently (March 2022), versions 12 and 14 are supported (14 is recommended, since 12 is about to drop out of maintenance). You can download a Node.js installer from the Node.js website.
  • A free or paid Twilio account. If you are new to Twilio get your free account now! This link will give you $10 when you upgrade.
  • The Twilio CLI. You can find installation instructions on the Twilio CLI Quickstart page of the documentation.
  • The Twilio Serverless Toolkit. This installs as a plugin to the Twilio CLI. Find installation instructions in the documentation.
  • A web browser that is compatible with the Twilio Programmable Video JavaScript library (see below for a list of them). Note that this requirement also applies to the users you intend to invite to use this application once built.

Supported web browsers

Since the core video and audio functionality of this project is provided by the Twilio Video service, you have to use a web browser that is supported by this service. Here is the current list of supported browsers:

  • Android: Chrome and Firefox
  • iOS: Safari
  • Linux: Chrome and Firefox
  • MacOS: Chrome, Firefox, Safari and Edge
  • Windows: Chrome, Firefox and Edge

Check the Twilio Video documentation for the latest supported web browser list and detailed version information.

Create a Project Directory

In order to use the Twilio CLI to create a serverless project, you have to configure your Twilio credentials. The most convenient way to do this is to enter them as environment variables in your terminal session. Open a terminal or command prompt window and set two environment variables with your Twilio Account SID and Auth Token values:

export TWILIO_ACCOUNT_SID=XXXXX
export TWILIO_AUTH_TOKEN=XXXXX

You can find the values of these two variables for your account in the Twilio Console.

The above export commands are for UNIX operating systems. If you are following this tutorial on a Windows computer, they will not work. You can learn how to set environment variables in Windows using the Control Panel or directly within PowerShell or Command Prompt.

Make sure the Twilio CLI and the Serverless Toolkit are both installed (see the Requirements section above for installation instructions). Then find a suitable parent directory for your project and create a new serverless project with the following command:

twilio serverless:init twilio-serverless-video --empty
cd twilio-serverless-video

Configure Twilio credentials

Some of the operations that are going to be carried out by this project require the use of a Twilio API Key, in addition to the Account SID used with the Twilio CLI above. In this section all these values will be configured into the project.

In the same terminal session you used above, enter the following command to create a new API key in your Twilio account:

twilio api:core:keys:create --friendly-name=twilio-serverless-video -o=json

The output of this command is going to look like the following:

  {
    "dateCreated": "2022-02-26T22:56:43.000Z"

The two important values are the sid and secret properties of the API Key, which you are going to save in the configuration file of the project. This is a file called .env in the twilio-serverless-video directory that was created above.

Open .env in your favorite text editor or IDE and edit it so that it has the following contents:

ACCOUNT_SID=XXXXX
API_KEY_SID=XXXXX
API_KEY_SECRET=XXXXX

Make sure you replace the XXXXX placeholders with all the correct values for the three variables and then save and close the file.

Run the development web server

The project at this point is empty, but regardless, you can start a development web server on your computer and have it running while you continue with the tutorial. You will use this server as a convenience to quickly test each step, without having to deploy the code to the Twilio servers. I recommend that you start the server now and leave it running in the background while you continue working on the rest of this tutorial. Use the following command to start the server:

npm start

When the server is up and running you should see the following output in your terminal:

┌─────────────────────────────────┐
│                                 │
│   Twilio functions available:   │
│   ⚠ No functions found          │
│                                 │
│   Twilio assets available:      │
│   ⚠ No assets found             │
│                                 │
└─────────────────────────────────┘

The list of available functions and assets will automatically update as you start adding components in the following sections. Note the functions and assets directories in your project, where these will be added.

Application page layout

Our page design is going to be very simple. We’ll include a title, a web form where the user can enter their name and join or leave video calls, and then the content area, where the video streams for all the participants will be shown. For now we’ll add a placeholder video for ourselves.

Here is how the page will look:

Page layout

To create this page, we need a combination of HTML and CSS. Below you can see the index.html file. Add this file in the assets subdirectory in your project. The Twilio serverless platform will serve any files added in this place as static assets of the project.

<!doctype html>
<html>
  <head>
    <link rel="stylesheet" type="text/css" href="styles.css">
  </head>
  <body>
    <h1>Twilio Serverless Video Calling</h1>
    <form>
      Name: <input type="text" id="username">
      <button id="join_leave">Join call</button>
    </form>
    <p id="count"></p>
    <div id="container" class="container">
      <div id="local" class="participant">
        <div></div>
        <div>Me</div>
      </div>
      <!-- more participants will be added dynamically here -->
    </div>

    <script src="https://sdk.twilio.com/js/video/releases/2.20.1/twilio-video.min.js"></script>
    <script src="app.js"></script>
  </body>
</html>

The <head> section of this file references a styles.css file, which we will define in a moment. The <body> section of the page defines the following elements:

  • An <h1> title.
  • A <form> element with a name field and a submit button.
  • A <p> element where we’ll show the connection status and participant count.
  • A container <div> with one participant identified with the name local, where we’ll show our own video feed. More participants will be added dynamically as they join the video call.
  • Each participant’s <div> contains an empty <div> where the video will be inserted dynamically, and a second <div> with the participant’s name.
  • Links to two JavaScript files that we’ll need: the official release of the twilio-video.js library and an app.js file with the application logic, which we will write soon.

The contents of the assets/styles.css file are below:

.container {
    margin-top: 20px;
    width: 100%;
    display: flex;
    flex-wrap: wrap;
}
.participant {
    margin-bottom: 5px;
    margin-right: 5px;
}
.participant div {
    text-align: center;
}
.participant div:first-child {
    width: 240px;
    height: 180px;
    background-color: #ccc;
    border: 1px solid black;
}
.participant video {
    width: 100%;
    height: 100%;
}

These CSS definitions are all dedicated to the layout of the “container” <div> element, which is structured as a flexbox, so that participants are automatically added to the right and wrapped to the next row as needed, according to the size of the browser window.

The .participant div:first-child definition applies to the first child element of the participant <div> elements. Here we are constraining the size of the video to 240x180 pixels. We also have a darker background and a black border, just so that we can see a placeholder for the video window. The background color is also going to be useful as letterboxing when the dimensions of the video do not exactly match our aspect ratio. Feel free to adjust these options to your liking.

With the HTML and CSS files in place, the server should be able to respond to your web browser and show the basic page layout you’ve seen above. While the server is running, open your web browser and type http://localhost:3000/index.html in the address bar to see the first version of the application running.

Displaying your own video feed

If you looked in the browser’s network log, you likely noticed that the browser tried to load the app.js file that we reference at the bottom of index.html and that this failed with a 404 error because we don’t have that file in our project yet. We are now going to write our first function in this file to add our own video feed to the page.

Create the assets/app.js file and add the following code to it:

const addLocalVideo = async () => {
  const track = await Twilio.Video.createLocalVideoTrack();
  const video = document.getElementById('local').firstElementChild;
  video.appendChild(track.attach());
};

addLocalVideo();

The addLocalVideo() function uses the Twilio Video JavaScript library to create a local video track. The createLocalVideoTrack() function from the library is asynchronous and returns a promise, so it can be used with the await keyword.

The return value is a LocalVideoTrack object. We use its attach() method to add the <video> element representing this track as a child of the first <div> inside the local participant. In case this is confusing, let’s review the structure of the local participant from the index.html file:

            <div id="local" class="participant">
              <div></div>
              <div>Me</div>
            </div>

You can see here that the local element has two <div> elements as children. The first is empty, and this is the element to which we are attaching the video. The second <div> is for the label that appears below the video.

Refresh the page in the browser and you should have your video displayed. Note that most browsers will ask for your permission before enabling the camera.

Local video track

Generating an access token for a participant

Twilio takes security very seriously. Before users can join a video call, the application must verify that the user is allowed, and in that case generate an access token for them. For security reasons, tokens must be generated by a server, using the secrets stored in the .env file. You will write a Twilio serverless function to accomplish this task.

For this project, the client only submits a username to the server, but in a real-world application, this is the place where the application would authenticate the user wanting to join the call against a user database. The connection request in such an application would likely include a password, authentication cookie, or some other form of identification in addition to the user’s name. An access token to the video chat room would only be generated after the user requesting access to the video call is properly authenticated.

The token generation will happen in a serverless function. When using the Serverless Toolkit, each JavaScript module added in the functions directory is associated with a URL based on its filename. When a request is sent to this URL, the exported function in the module is called. We will write this function in a file called get_token.js, so to run this function, the client will send a request to /get_token.

Write the following JavaScript code in a new file named get_token.js, located in the functions subdirectory of your project.

const twilio = require('twilio');

exports.handler = async function(context, event, callback) {
  const accessToken = new twilio.jwt.AccessToken(
    context.ACCOUNT_SID, context.API_KEY_SID, context.API_KEY_SECRET
  );
  accessToken.identity = event.username;
  const videoGrant = new twilio.jwt.AccessToken.VideoGrant({
    room: 'My Room'
  });
  accessToken.addGrant(videoGrant);
  return callback(null, {
    token: accessToken.toJwt(),
  });
}

The variables stored in the .env file earlier are provided as properties of the context object, which is passed as an argument to the function. Any arguments sent in the body of the request as a JSON payload are made available as properties of the event argument, also passed to the function. To return a response to the client, the function must call the function passed to the callback argument.

The token is generated using the AccessToken helper class from the Twilio Node Helper library. We attach a video grant for a video room called “My Room”. A more complex application can work with more than one video room and decide which room or rooms this user can enter.

To return a value to the client, the function calls the callback. Passing null as the first argument indicates that there were no errors. The object passed in the second argument is returned as a JSON payload in the response.

Handling the connection form

Next we are going to implement the handling of the connection form on the web page. The participant will enter their name in the form and then click the “Join call” button. Once the connection is established, the same button will be used to disconnect from the call.

To manage the form button, we have to attach a handler for the click event. The updated app.js file is shown below.

const usernameInput = document.getElementById('username');
const button = document.getElementById('join_leave');
const container = document.getElementById('container');
const count = document.getElementById('count');
let connected = false;
let room;

const addLocalVideo = async () => {
  const track = await Twilio.Video.createLocalVideoTrack();
  const video = document.getElementById('local').firstElementChild;
  video.appendChild(track.attach());
};

const connectButtonHandler = async (event) => {
  event.preventDefault();
  if (!connected) {
    const username = usernameInput.value;
    if (!username) {
      alert('Enter your name before connecting');
      return;
    }
    button.disabled = true;
    button.innerHTML = 'Connecting...';
    try {
      await connect(username);
      button.innerHTML = 'Leave call';
      button.disabled = false;
    }
    catch {
      alert('Connection failed. Is the backend running?');
      button.innerHTML = 'Join call';
      button.disabled = false;    
    }
  }
  else {
    disconnect();
    button.innerHTML = 'Join call';
    connected = false;
  }
};

addLocalVideo();
button.addEventListener('click', connectButtonHandler);

You can see that we now have a few global variables declared at the top. Four of them are for convenient access to elements on the page, such as the name entry field, the submit button in our web form and so on. The connected boolean tracks the state of the connection, mainly to help decide if a button click needs to connect or disconnect. The room variable will hold the video chat room object once we have it.

At the very bottom of the script, we attach the connectButtonHandler() function to the click event on the form button. The function is somewhat long, but it mostly deals with validating that the user entered a name and updating how the button looks as the state of the connection changes. If you filter out the form management, you can see that the actual connection and disconnection are handled by two auxiliary functions connect() and disconnect() that we are going to write in the following sections.

Connecting to a video chat room

We now reach the most important (and also most complex!) part of our application. To connect a user to the video chat room, the JavaScript application running in the web browser must perform two operations in sequence. First, the client needs to contact the web server and request an access token for the user, and then once the token is received, the client has to call the twilio-video.js library with this token to make the connection.

Add the connect() function below to app.js, right after the connectButtonHandler() function.

const connect = async (username) => {
  const response = await fetch('/get_token', {
    method: 'POST',
    headers: {'Content-Type': 'application/json'},
    body: JSON.stringify({'username': username}),
  });
  const data = await response.json();
  room = await Twilio.Video.connect(data.token);
  room.participants.forEach(participantConnected);
  room.on('participantConnected', participantConnected);
  room.on('participantDisconnected', participantDisconnected);
  connected = true;
  updateParticipantCount();
};

The connection logic has two steps as indicated above. First we use the browser’s fetch() function to send a request to the /get_token route, which maps to the serverless function we created in the previous section. The JSON payload in this request includes the username field, entered by the user in the form.

Upon return from this call, the data constant holds the response payload, decoded into an object. The only property in this object is token, containing the access token that can be used to connect to the video room as a participant. The token is passed to the Twilio.Video.connect() function from the twilio-video.js library to establish this connection.

The return value from the connection function is a room object, representing the video room that we just joined. Recall that room is a global variable, so this room will be available throughout the video call.

The room.participants array contains the list of people already in the call. For each of these we have to add a <div> section that shows their video, audio and name. The logic to add a participant is encapsulated in a participantConnected() function that we will write soon, so for now, we loop over the participants and call this function on each of them.

We also want any participants that join in the future to be handled in the same way, so we set up a handler for the participantConnected event pointing to the same function. The participantDisconnected event is also important, as we’d want to remove any participants that leave the call from the page, so we set up a handler for this event as well.

At this point we are fully connected, so we can indicate that by updating the connected boolean variable. The final action we take is to update the <p> element that shows the connection status with the updated participant count. This is done in another auxiliary function, because we’ll need to do this in several places. The function updates the text of the element based on the length of the room.participants array. Add the implementation of this function right after connect() in app.js.

const updateParticipantCount = () => {
  if (!connected) {
    count.innerHTML = 'Disconnected.';
  }
  else {
    count.innerHTML = (room.participants.size + 1) + ' participants online.';
  }
};

Note that the room.participants array includes every participant except ourselves, so the total number of people in a call is always one more than the size of the array.

Connecting and disconnecting participants

You saw in the previous section that when a participant joins the call, we call the participantConnected() handler to add them to the page. This function needs to create a new <div> inside the container element, following the same structure that we used for the local element that shows our own video stream.

Below you can see the implementation of the participantConnected() function along with the participantDisconnected() counterpart and a few more auxiliary functions. Add all these functions after updateParticipantCount() in app.js.

const participantConnected = (participant) => {
  const participantDiv = document.createElement('div');
  participantDiv.setAttribute('id', participant.sid);
  participantDiv.setAttribute('class', 'participant');

  const tracksDiv = document.createElement('div');
  participantDiv.appendChild(tracksDiv);

  const labelDiv = document.createElement('div');
  labelDiv.innerHTML = participant.identity;
  participantDiv.appendChild(labelDiv);

  container.appendChild(participantDiv);

  participant.tracks.forEach(publication => {
    if (publication.isSubscribed) {
      trackSubscribed(tracks_div, publication.track);
    }
  });
  participant.on('trackSubscribed', track => trackSubscribed(tracksDiv, track));
  participant.on('trackUnsubscribed', trackUnsubscribed);
  updateParticipantCount();
};

const participantDisconnected = (participant) => {
  document.getElementById(participant.sid).remove();
  updateParticipantCount();
};

const trackSubscribed = (div, track) => {
  div.appendChild(track.attach());
};

const trackUnsubscribed = (track) => {
  track.detach().forEach(element => element.remove());
};

The participantConnected() function receives a Participant object from the Twilio Video JavaScript library. The two important properties of this object are participant.sid and participant.identity, which are a unique user identifier and name respectively. The identity attribute is what the user typed in the username field when they joined the call.

The HTML structure for a remote participant is similar to the one we used for the local video. The big difference is that we now need to create this structure dynamically using the browser’s DOM API. This is the markup that we need to create for each participant:

<div id="{ participant.sid }" class="participant">
    <div></div>  <!-- the video and audio tracks will be attached to this div -->
    <div>{ participant.name }</div>
</div>

At the start of the participantConnected() function, you can see that we create a participantDiv, to which we add tracksDiv and labelDiv as children. We finally add the participantDiv as a child of container, which is the top-level <div> element where we have all the participants of the call including ourselves.

The second part of the function deals with attaching the video and audio tracks from the participant to the tracksDiv element we just created. We run a loop through all the tracks the participant exports, and following the basic usage shown in the Twilio Video library’s documentation, we attach those that are marked as subscribed. The actual track attachment is handled in a trackSubscribed() auxiliary function that is defined right below.

In more advanced usages of this library, a participant can dynamically add or remove tracks during a call (for example if they were to turn off their video temporarily, mute their audio, or start sharing their screen). Because we want to respond to all those track changes dynamically, we also create event handlers for the trackSubscribed and trackUnsubscribed events, which use the attach() and detach() methods of the track object to add and remove the HTML video and audio elements on the page.

Disconnecting from the video room

The counterpart of the connect() function is disconnect(), which has to restore the state of the page to how it was before connecting. This is a lot simpler, as it mostly involves removing all the children of the container element except the first one, which is our local video stream. Add the disconnect() function after the trackUnsubscribed() function in app.js.

const disconnect = () => {
  room.disconnect();
  while (container.lastChild.id != 'local') {
      container.removeChild(container.lastChild);
  }
  button.innerHTML = 'Join call';
  connected = false;
  updateParticipantCount();
};

As you can see here, we remove all children of the container element starting from the end and until we come upon the <div> with the id local, which is the one that we created statically in the index.html page. We also use the opportunity to update our connected global variable, change the text of the connect button and refresh the <p> element with the participant count to show a “Disconnected” message.

Deploying your video chat server

If you started your development web server in the early stages of this tutorial, make sure it is still running. If it isn’t running anymore, start it one more time with the following command:

npm start

With the server running, you can connect from the same computer by entering http://localhost:3000/index.html in the address bar of your web browser. You can connect multiple times from different tabs or different browser windows to see how participants are added and removed from the call dynamically.

But of course, you very likely want to connect from a second computer, a smartphone, or maybe even invite some friends to join your video call. This requires one more step, because the server is only running internally on your computer and is not accessible from the Internet just yet.

We are now ready to deploy this application to the Twilio Serverless platform, on which it will be assigned a public URL that is accessible from anywhere in the world. From your terminal, run the following command to deploy the application:

twilio serverless:deploy

After a few seconds, the application will be deployed, and you will have a list of URLs that were assigned to the function and the static file assets created earlier. Here is an example:

Functions:
   https://twilio-serverless-video-1234-dev.twil.io/get_token
Assets:
   https://twilio-serverless-video-1234-dev.twil.io/app.js
   https://twilio-serverless-video-1234-dev.twil.io/index.html
   https://twilio-serverless-video-1234-dev.twil.io/styles.css

Locate the URL that ends with index.html and open it in your web browser to access the fully deployed application. You can access this URL from your smartphone, or give it to a friend to connect and chat with you.

Adding a default index page

For most websites, you can just connect to the website’s domain, without having to explicitly include index.html in the URL. For applications deployed on the Twilio Serverless platform, this does not work by default, but there is a trick to enable a default index page.

Create a subdirectory called assets inside the existing assets directory:

mkdir assets/assets

Then make a copy of the index.html file inside this new subdirectory:

cp assets/index.html assets/assets/index.html

Now you can redeploy your application to the Twilio Serverless platform:

twilio serverless:deploy

If you connect to the domain assigned to your application (i.e. https://twilio-serverless-video-1234-dev.twil.io in the example above) the index page will be loaded.

If you make changes to the index.html page, remember to update both copies of it in the project.

Conclusion

I hope this was a fun and interesting tutorial. If you decide to use it as a base to build your own video chat project, you should know that the Twilio Video API has many more features that we haven’t explored, including:

I can’t wait to see what you build with Twilio Video!

Miguel Grinberg is a Principal Software Engineer for Technical Content at Twilio. Reach out to him at mgrinberg [at] twilio [dot] com if you have a cool project you’d like to share on this blog!