Detect Objects in a Serverless Twilio Video App with TensorFlow.js
Object detection is a computer vision technique for locating instances of objects in media such as images or videos. This machine learning (ML) method can be applied to many areas of computer vision, like image retrieval, security, surveillance, automated vehicle systems and machine inspection. Read on to learn how to detect objects in a Twilio Programmable Video application using TensorFlow.js.
Setup
To build a Twilio Programmable Video application, we will need:
- A Twilio account - sign up for a free one here and receive an extra $10 if you upgrade through this link
- Your Twilio Account SID: find it in your account console here
- API Key SID and API Key Secret: generate them here
- The Twilio CLI
- The Twilio Serverless Toolkit
Download this GitHub repo and then create a file named .env in the top-level directory with the following contents, replacing the XXXXX
placeholders with the values that apply to your account and API Key:
If you'd like to better understand Twilio Programmable Video in JavaScript, follow this post to get setup with a starter Twilio Video app.
In assets/video.html
on lines 20-22, import TensorFlow.js and the coco-ssd model to detect "objects defined in the COCO dataset, which is a large-scale object detection, segmentation, and captioning dataset." It can detect 80 classes of objects. (SSD stands for Single Shot MultiBox Detection, kind-of like how YOLO stands for You Only Look Once). Read more about the model here on Google CodeLabs.
Then in the same file add a canvas
element with in-line styling above the video tag within the room-controls
div.
The complete assets/video.html
file looks like this:
Now it's time to write some TensorFlow.js code!
Object Detection with TensorFlow.js
Now we will detect objects in our video feed. Let's make an estimate
function to estimate objects detected and their locations, and to load the coco-ssd ML model.
We will call the model's detect
method on the video feed from the Twilio Video application, which returns a promise that resolves to an array of predictions about what the objects are. The results look something like this:
In assets/index.js
beneath const video = document.getElementById("video");
, make an estimate
function to load the model, get the predictions, and pass those predictions to another function we will soon make called renderPredictions
. The renderPredictions
function will display the predictions along with a bounding box on the video canvas. We also call requestAnimationFrame
to smooth out the rendering of the predictions.
Display Predictions on the Video Canvas
We have detected objects from the video feed, including the coordinates of the objects detected. Now let's display a bounding box around them and write the object and confidence score on top of the bounding box.
We grab the canvas element, set the width and height, and make the ctx
variable for the canvas element's context (which is where the drawing will be rendered). We call clearRect
on where the drawing will be rendered to erase the pixels in a rectangular shape by making them transparent. We customize the font for which the text will display the predictions and then loop through all the predictions. The first element in the bbox
object is the x
coordinate, the second element is the y
coordinate, the third is the width, and the fourth is the height. With those variables, we draw a bounding box and customize the lines that will draw it.
We make the strToShow
variable to display the prediction class (object detected) and the prediction confidence score.
All we need to do now is call the estimate
function: this can be done when the user connects to the room with estimate(video);
above joinRoomButton.disabled = true;
.
The complete assets/index.js
code should look like this:
Tada! Now to deploy our app and test it, in the root directory run twilio serverless:deploy
and grab the URL ending in /video.html
. Open it in a web browser, click Join room
, share the link with your friends, and start performing object detection.
You can find the complete code on GitHub here.
What's Next after Detecting Objects with TensorFlow and Twilio Programmable Video
Performing object detection in a video app with TensorFlow.js is just the beginning. You can use this as a stepping stone to build collaborative games, detect mask-usage like in this ML5.js app, put a mask on faces, and more. I can't wait to see what you build, so let me know what you're building online!
Twitter: @lizziepika
GitHub: elizabethsiegle
email: lsiegle@twilio.com
Related Posts
Related Resources
Twilio Docs
From APIs to SDKs to sample apps
API reference documentation, SDKs, helper libraries, quickstarts, and tutorials for your language and platform.
Resource Center
The latest ebooks, industry reports, and webinars
Learn from customer engagement experts to improve your own communication.
Ahoy
Twilio's developer community hub
Best practices, code samples, and inspiration to build communications and digital engagement experiences.