How to build an On-Demand Translation Service Using Twilio MMS, CamFind and Microsoft Translator
Time to read: 7 minutes
Picture the scene: you’re out on a romantic evening with your partner and you find yourself staring at a beautiful sunset. You suddenly think you could gain some brownie points by describing the scene in French, but you don’t know how to speak French! If only you could send a picture message to a handy on-demand translator service, you’d know that the mots d’amour would be “une magnifique coucher de soliel”!
Thankfully, using an MMS-enabled Twilio number and a few APIs (CamFind and Microsoft Translator) it is pretty easy to build this type of translation service. After I show you how to build the basic translator you’ll be well equipped to whip out the romantic French phrases, or phrases in any language for that matter!
What you will need
- A Twilio account — you can sign up for one here for free.
- An MMS-enabled Twilio phone number (available in the United States and Canada)
- Ruby
- Twilio Ruby library
- Sinatra
- A Mashape account (for CamFind)
- A Microsoft Translator account
How it works
The initial version of the app will support translating an image into five different languages: Spanish, French, Italian, German and Klingon. The user will send an MMS message to our Twilio number containing a photo and a language to translate the description of that image into. We will also allow the user to send in the word “list” to receive a list of supported languages.
As soon as we receive a valid MMS message from the user we will send back a response indicating that we are performing analysis on the image and a translation will be coming soon. The image URL from the MMS message is sent off to the CamFind API for analysis. The results of the CamFind API call are sent to the Microsoft Translator API to be translated to the user’s requested language.
Once we have all of the pieces in place a text message is sent back to the user stating the best guess for what is in the image and how to say that in the target language.
Try it now by sending a picture to:
(202) 800-1180
Make sure to include text with your picture to let the translator know which language to translate your picture into.
The full project code is available if you want to follow along: Github
Setting up the project
To get started on this project you are going to need to have Ruby and Ruby Gems installed. If you are on a Mac, this should already be the case. For Windows users, I would recommend checking out Ruby Installer. For the Linux users that might need a refresher on package management, here’s a guide for using apt-get in Ubuntu.
Now that we have that prerequisite squared away, open up a terminal window and create a new folder to hold our app:
Change into this newly created directory so we can install some Ruby Gems:
Next we’ll create the file that will hold our application:
We will build the translation buddy using Sinatra which is a lightweight Ruby web framework. We’ll also use the Twilio Ruby gem to interact with the Twilio APIs just to make things a little easier to work with. I’m using the Unirest gem to make REST API calls to the CamFind API, but you are free to make those REST API calls another way if you have a preference (e.g. Rest Client).
Now that our project structure is set up, let’s translate some pictures!
Building the translation service
Our server code will only require one endpoint to handle incoming MMS messages so this project is a great fit for a lightweight framework like Sinatra. Keeping it simple on the server-side allows us to focus on the logic needed to translate our images.
Let’s get started by setting up the Sinatra server and the various dependencies we’ll need:
The endpoint we set up at /translate will be called by Twilio when a message is received by our Twilio number. Before the request we’ll set up some variables in a before filter that we can use throughout the app:
Here we are storing the text the user sends with the message. This will either be the language the user wishes to translate to or the “list” keyword. We also store their phone number and the URL of the picture they sent. The first thing we need to check is whether or not the user sent a language. If the user doesn’t send a language we will default the translator to French:
Next, we’ll check if the text the user sent is the keyword “list”. In this case we want to return a text message to the user indicating valid languages for translation:
Now that we’ve determined whether or not there is text in the incoming message we also need to check whether or not there is a picture. If there isn’t, we won’t have anything to translate so we’ll alert the user with a response text message:
The next thing we have to do is cross reference the language sent in by the user with a list of languages that we support. Add the following method to the top of app.rb right below the require statements:
The check_language method compares the target language against our supported languages. If it is a match we return the shorthand format Microsoft Translator will be expecting in the translation process. Now we need to call this method from our /translate endpoint with the requested language:
If @language_format is nil, the language the user requested is not supported. Let’s inform the user of this and let them know what is supported:
One last thing to do before we do the heavy lifting of translating the image is to let the user know this might take a little bit of time:
Since our image analysis and translation process will take about a minute we want to make sure we do this work after the user has been informed the process has started. The Sinatra gem sinatra-run-later allows us to run code after our /translate endpoint has returned. Add the following code to the top of the /translate endpoint:
Image analysis with the CamFind API takes a little bit of time. Some computer vision analysis is done on the image and then, more often than not, a person looks at the image and describes it. When we make the request to the API we will be given a token we can use to request the image analysis details at a later time. We’ll store that token and then wait a minute to allow the analysis process to happen:
At this point we can request the results from the CamFind API:
The description variable will have CamFind’s description of the picture in English. This is exactly what we need to pass to the Microsoft Translator API to help the user say it in another language. Let’s use the bing_translator gem to make a request to Microsoft Translator with our English text and the target language the user specified:
We now have the final translated version of what the user requested. We can now use the twilio-ruby gem to make a REST API call to send a text message back to the user with the translated text. For comparison purposes we’ll make sure to let them know what CamFind thought the image was in English as well:
That’s it, now you can translate any picture into a description in another language just by sending an MMS. You should deploy your server code somewhere publicly accessible so that Twilio will be able to contact your server. I recommend Heroku for this and this guide will show you how to deploy your Sinatra app to Heroku. You can view the full project on Github.
Hooking up our app to Twilio
To get our translation service working we need to connect our app to Twilio so that incoming texts will be routed to our app logic. Log in to Twilio and head over to the numbers portal. Click on the number you wish to use for the the translator and configure the Messaging URL to point at your newly deployed Sinatra server:
Your on-demand translation buddy is good to go! Send in a picture and a language to translate it to and pretty soon you’ll be describing the world around you in multiple languages.
Next steps
I think the idea of having an on-demand translation buddy in your pocket is an awesome thing, but I’m even more stoked to see how you extend it. Here are some ideas for extending what we’ve built here:
- Create a language learning flash card game based on your translation results
- Make a voice call using Twilio to read out the results instead of receiving a text message
So far I think we have just scratched the surface of what is possible with Twilio MMS. I think the best use cases will be the ones you amazing developers will create. Please don’t hesitate to share what you build with me. You can email me at brent@twilio.com or hit me up on Twitter @brentschooley.
Related Posts
Related Resources
Twilio Docs
From APIs to SDKs to sample apps
API reference documentation, SDKs, helper libraries, quickstarts, and tutorials for your language and platform.
Resource Center
The latest ebooks, industry reports, and webinars
Learn from customer engagement experts to improve your own communication.
Ahoy
Twilio's developer community hub
Best practices, code samples, and inspiration to build communications and digital engagement experiences.