Mask or no mask? With Twilio Video, machine learning, and JavaScript
Time to read:
 
 As the number of positive COVID-19 cases rises everywhere, mask-wearing is coming back in vogue. Read on to learn how to build an app to detect whether or not someone is wearing a mask in a Twilio Video call with ml5.js.
 
 What is ml5.js?
ml5.js is a JavaScript library that lets developers use machine learning (ML) algorithms and models in the browser. It's built on top of TensorFlow.js which does most of the low-level ML tasks, including:
- using pre-trained models to detect human poses, generate text, style an image with another image, compose music, detect pitches or common English language word relationships
- and more, including image recognition!
Image recognition contains two popular tasks: classification and regression. This post uses ml5.js to explore the classification problem of image recognition: given an input of an image (in this case, someone wearing or not wearing a mask), the machine classifies the category (mask or no mask) of the image. This isn’t limited to mask-wearing: you could train the model to detect other things as well, like if someone is wearing a hat or holding a banana.
This project uses the pre-trained model MobileNet to recognize the content of certain images as well as Feature Extractor, which, utilizing the last layer of a neural network, maps the image content to the new classes/categories (ie. a person wearing a mask or not).
With Feature Extractor, developers don’t need to care much about how the model should be trained, or how the hyperparameters should be adjusted, etc: this is Transfer Learning, which ml5 makes easy for us.
Setup
To build the ml5.js app detecting mask usage in a Twilio Programmable Video application, we will need:
- A Twilio account - sign up for a free one here and receive an extra $10 if you upgrade through this link
- Twilio Account SID, which can be found in your Twilio Console
- A Twilio API Key SID and API Key Secret: generate them here
- The Twilio CLI
Before continuing, you’ll need a working Twilio Video app. To get started, download this repo and follow the README instructions to get started.
Make the webpage to add training data to the model
 
 To train the model, the model must know what someone wearing a mask looks like and what someone not wearing one looks like. We could pass it images of people wearing masks and images of people not wearing masks, but instead we will use images from our computer webcam.
Make a file in the assets folder in your Twilio Video app called train.html and paste in the following code:
This code first imports the ml5.js library (version 0.6.1 for now). Then, in the <body>, it adds an h2 heading with the text "Are you wearing a mask?",  a result span displaying "yes" or "no" to answer that question, and a confidence span showing the model's confidence level of "yes, there is a mask" or "no, there is not a mask."
Then the video element is used to both train new data and also predict whether or not a mask is being worn.
The buttons with IDs noMaskButton and maskButton will add new image data to the model while the train button trains the model and the predict button begins running the model on the video feed to predict if a mask is detected.
If you like the results of the model, you can save the model to the assets folder by clicking the button that says Save model to Assets folders.
Next, let’s add JavaScript to connect the DOM elements. Create a new file assets/train.js and add the following code to declare variables and access the DOM elements:
This code defines the video element source as the computer video camera and makes a featureExtractor object from the MobileNet model. The code calls the classification() method on the featureExtractor object, setting the input source of the classifier object as the video element. This means that whatever appears on the camera acts as the input to classifier.
After adding your images, click the button that says Train. This button trains the model with the images added above. Once training begins, the DOM displays the lossValue in the loss span. The lower that value is, the greater the accuracy. Eventually, it decreases closer and closer to zero and the training process is finished when lossValue becomes null.
 
 After the training is complete, , click the button that says See the model in action once training is done. Test out your new model by taking your mask on and off in front of your webcam. The model will return a yes or no label in addition to the confidence level of the classification to reflect how confident the model is in that label. The closer to the number is to 1, the more sure it is.
The classification() method is called over and over in the background, so that model is constantly predicting if someone is wearing a mask or not.
If the model is not very accurate, try adding more images to the model. Otherwise, you can save the model by clicking the save button which calls featureExtractor.save() to save the model.
Be sure to save it to the assets folder (which the Twilio Serverless Toolkit automatically generates) so the model can be accessed by others, including our Twilio video app (ready-made from this blog post on building a Twilio video app quickly with JavaScript and the Twilio CLI.)
Detect Mask-Usage in a Twilio Video App
Our model has been built, now we have to use it! Replace the contents of assets/video.html with the following code which imports ml5, adds a new h2 and some spans to reflect the "no" and "yes" mask labels and confidence levels, and a button to detect mask-wearing.
You’ll also need to edit the assets/index.js file.
In assets/index.js, edit line 4 to say  const ROOM_NAME = 'mask';. Then beneath the video variable, add the following variables which you should recognize from train.js:
Once someone joins a Twilio Video room, we load the model with:
Look for the following two lines at the bottom of the joinRoomButton click handler that say:
Beneath these lines, still inside the click handler, add the following code (which should also look pretty familiar from train.js):
Save your file and head back to your browser. Visit your https://YOUR-TWILIO-DOMAIN/video.html page  From there you can detect mask usage with the model you trained on the train.html page in a Twilio video application!
 
 The complete code can be found on GitHub, which includes two models I trained to detect masks which you could use.
What's Next for Twilio Video and Machine Learning?
Twilio's Serverless Toolkit makes it possible to deploy web apps quickly, including video chat applications. You can train a ml5.js model to detect other things like if you are wearing a hat or holding a banana. I tried training a model to detect if a mask was being worn correctly or if it was showing my nose, and the detection was not as accurate--it most likely needed a lot more training data.
Let me know online what you're building with Serverless or Video, and check out related posts like Pose Detection with TensorFlow and Twilio Video.
- Twitter: @lizziepika
- GitHub: elizabethsiegle
- Email: lsiegle@twilio.com
- Livestreams: twitch.tv/lizziepikachu
Related Posts
Related Resources
Twilio Docs
From APIs to SDKs to sample apps
API reference documentation, SDKs, helper libraries, quickstarts, and tutorials for your language and platform.
Resource Center
The latest ebooks, industry reports, and webinars
Learn from customer engagement experts to improve your own communication.
Ahoy
Twilio's developer community hub
Best practices, code samples, and inspiration to build communications and digital engagement experiences.
 
     
    