Integrating OpenCV Object Detection with Twilio Programmable Video
Time to read: 5 minutes
Video conferencing doesn’t have to be as basic as just conveying packets of data between users. Using machine learning, we can interpret what those packets of data represent in the real world, and manipulate them in a way to create a more human-centered experience.
Today we’ll learn how to use OpenCV to do some simple object-detection with Twilio’s Programmable Video. This will allow you to add object detection to your video streams and open the pathway to many more image processing techniques using OpenCV!
Let’s get started.
Prerequisites
Before we can build our OpenCV integration, you’ll first need a few things.
- A Twilio Account – you can sign up for a free trial here
- Node 8.16.0+
- npm, the Node Package Manager
Integrate OpenCV and Twilio
First off, let’s clone Twilio’s Quickstart Video application. Open up a console and run:
Great! Now we need to initialize our Twilio application variables.
Let’s start by copying the .env.template into our own .env
file.
Now we need to initialize three variables in our .env
file:
TWILIO_ACCOUNT_SID
: Your primary Twilio account identifier - find this in the console here.TWILIO_API_KEY
: Used to authenticate - generate one here.TWILIO_API_SECRET
: Used to authenticate - just like the above, you'll get one here.
Now let’s install our dependencies.
We should be all set now to run our base application. Let’s start the app!
Initializing and Installing OpenCV
Now that we have our quickstart app working, we need to install OpenCV.
To do this you will first need to figure out the latest release from here. Download the file using the link https://docs.opencv.org/{VERSION_NUMBER}/opencv.js but substituting the latest release version.
For example, at the time of this writing the latest release is 4.5.1, so I will download https://docs.opencv.org/4.5.1/opencv.js and save it in a file called opencv.js
. Copy this file to the /quickstart/public
directory.
We’re going to base our tutorial on OpenCV’s Meanshift
walk-through, found here.
The next step will be to add this package to one of our webpage sources. Open up quickstart/public/index.html
and add this line before the closing body tag of the page:
And just like that, we have OpenCV installed in our application.
How OpenCV Works
Before we get into the code, it’s important to understand how OpenCV works. OpenCV provides us with functions to read from an image, manipulate that image somehow, and then draw it back. In most cases you will be binding a <video />
element with the library, and reading however many frames you want per second and drawing them back on a canvas object.
So, what you might do is read from a frame in a video such as the one below, then do some facial recognition using Haar Feature-based Cascade Classifiers. And then redraw the same frame with some boxes highling the woman’s facial features. If your video is 30 frames per second, then you need to do this 30 times a second on your canvas.
(Image from OpenCV documentation)
In this tutorial, we won’t be doing facial recognition but demonstrating the concept with simpler object based detection. The tutorial will still show you the means to expand your implementation.
Coding Object Detection
We’re finally ready to code our meanshift object detection filter. First, plop this function into your quickstart/src/joinroom.js
file. This algorithm was found here from OpenCV’s tutorial.
Here it is in full for you to paste:
In the above block of code, here’s what’s happening:
- Setup our OpenCV instance with our Twilio video stream as an input.
- Take the first frame of the video.
- Create our Region of Interest Histogram, other scalar matrices, and so on.
Now we enter a loop that runs 30 times every second. Each time we enter the loop:
- Take a frame from the video.
- Use OpenCV’s meanshift algorithm to calculate the position of the moving object.
- Draw a rectangle around said position.
- Output to canvas object.
In this function, you can work on the algorithm and tweak it to match your own use case. There are tons of examples on the internet and algorithms that you can mostly just copy and paste right into your code.
Scheduling frame processing
Notice that since OpenCV works on a frame per frame basis, we schedule the next frame using setTimeout()
when we’re done with one frame. In order to short circuit the processing, we save the result from the setTimeout()
to openCVInterval
so we can clear it later inside the OpenCV processing. Go back to see where it is declared.
Now we need to declare this variable on the top of the quickstart/src/joinroom.js
file.
At the end of the setActiveParticipant
function we will add these lines of code to short-circuit any previous invocation of initOpenCV
and invoke a new thread to process the new participant’s video.
The timeout of 5 seconds is overkill but is required. There’s a slight delay between when the participantConnected
event fires, which lets our application know that a new participant has joined, and actually rendering their video on screen.
The idea is that we wait for the video to render on the screen before we start to process it, otherwise OpenCV throws errors since it sees an empty video element. In a real application we might have a button or something that will trigger the OpenCV processing so this delay will not be necessary.
UI Styling And Final Touches
In your quickstart/public/index.html
file, look at this part of the DOM:
We are going to change it to this:
We did two things of importance here. We created our canvas object and set it and the video container to an equal width and height ratio.
Finally, add these styles to the quickstart/public/index.css
file.
Run and Test OpenCV
Great work – you’re now ready to check everything is working. Run the app using:
Now when you join a room you should see a moving red rectangle around an object you put in frame! Here’s a demo:
Integrating OpenCV with Twilio Programmable Video
There you go – now you have some basic object detection in your Programmable Video app! You’ll now be able to use OpenCV to understand more – programmatically – about what a video stream is depicting, track moving objects, recognize facial expressions, etc. You’ll definitely be able to build cool stuff around that concept.
Now that you have OpenCV and Twilio working together, check out our Video blog posts for more ideas on how to develop your app. If you already know what you’re building, our Programmable Video docs have what you need.
We can’t wait to see what you build.
Muhammad Nasir is a Software Developer. He's currently working with Webrtc.ventures. He can be reached at muhammad [at] webrtc.ventures.
Related Posts
Related Resources
Twilio Docs
From APIs to SDKs to sample apps
API reference documentation, SDKs, helper libraries, quickstarts, and tutorials for your language and platform.
Resource Center
The latest ebooks, industry reports, and webinars
Learn from customer engagement experts to improve your own communication.
Ahoy
Twilio's developer community hub
Best practices, code samples, and inspiration to build communications and digital engagement experiences.