How to Use Twilio Speech Recognition

June 22, 2017
Written by

Twilio Speech Recognition is a powerful addition to voice applications powered by the TwiML <Gather> verb. Instead of just taking DTMF tones as input you can use the full expressiveness of spoken language in a variety of languages.

We’ll build a hotline that returns facts about cats, numbers, and Chuck Norris to have some fun with this feature and also show its usefulness in interactive voice response (IVR) applications. If you learn best from video or just want to see this in action, this full tutorial is available on the Twilio YouTube channel:

 

The code for the application is available in this repo on Github.

Hello, How Can I Help You?

We’ll use Twilio Functions to build our application. If you’re new to Twilio Functions you can follow this video tutorial to learn how it works. Since Twilio Functions runs inside the Twilio Runtime there are no prerequisites for this project other than signing up for a free Twilio account and getting a voice-enabled Twilio phone number.

The call flow for our application looks like this:


When the user calls they’ll be presented with a menu provided by the TwiML returned by /facts. The user will speak a command and the resulting words will be processed by /fact-commands. This function will redirect to the appropriate function for the type of fact requested by the caller. Once at /cat-facts, /number-facts, or /chuck-facts, the caller can either receive more of that type of fact or head back to the main menu. Let’s create the /facts function to get this started. Head to the Functions inside the Twilio Runtime and create a new Function using the Blank template:

Set the Function name to “Twilio Facts” and the path to /facts. Then add the following code to the code section at the bottom:
exports.handler = function(context, event, callback) {
  const twiml = new Twilio.twiml.VoiceResponse();

  twiml.gather({
    input: 'speech',
    timeout: 3,
    hints: 'cat, numbers, chuck norris',
    action: '/fact-command'
  }).say('Welcome to the Twilio Facts hotline. Please say cat for cat facts, number for trivia about numbers, or chuck norris for a random chunk of chuck norris knowledge.');

  callback(null, twiml);
};

 

Line 5 specifies that this <Gather> tag uses the speech input method only. If you want to also accept DTMF tones you can specify dtmf speech but we’ll use speech only for now. Line 7 provides a comma-separated set of hints that instruct the speech recognition engine about words we’re expecting to receive as input. These intents improve the accuracy of the application. We’re done with /facts so save this function before moving on.

Create another new function using the Blank template. Name it “Fact Command” and give it a path of /fact-command. In the code section we’ll add some code that just tests the speech recognition to see if things are working so far:

exports.handler = function(context, event, callback) {
  const twiml = new Twilio.twiml.VoiceResponse();

  const command = event.SpeechResult.toLowerCase();

  twiml.say(`You said ${command}. I'll give you a ${command} fact.`);

  callback(null, twiml);
};

The highlighted line gets the lowercase value of the SpeechResult out of the event parameter and stores it in command. We set the command to lowercase to make comparison easier in future steps. Head to setup page for your Twilio phone number and set the incoming call to use the “Twilio Facts” function:


Call your phone number and say “cat”, “number”, or “Chuck Norris” at the prompt to make sure everything is working so far.

What Kind of Fact Do You Want?

With the basics of the app working, update /fact-command to redirect the call to the appropriate fact Function based on what the caller says. Replace the code in /fact-command with the followingcode:

exports.handler = function(context, event, callback) {
  const twiml = new Twilio.twiml.VoiceResponse();
  let command;

  if (event.SpeechResult) { command = event.SpeechResult.toLowerCase(); }
  
  if(command) {
    if(command.includes("cat")) {
      twiml.say('Fetching your cat fact.');
      twiml.redirect('cat-facts');
    } else if (command.includes("number")) {
      twiml.say('Fetching your number fact.');
      twiml.redirect('/number-facts');
    } else if (command.includes("chuck norris")) {
      twiml.say('Fetching your chuck norris fact.');
      twiml.redirect('/chuck-facts');
    } else {
      twiml.say(`Sorry but I do not recognize ${command} as a valid command. Try again.`);
      twiml.redirect('/facts');
    }   
  }

  callback(null, twiml);
};

The highlighted lines redirect the call flow based on the input received. If the caller says something unexpected, we redirect them back to the main menu.

Implement the /cat-facts Function next. Cat facts will be retrieved from http://catfact.ninja/fact using the got library (did you know that blue-eyed, pure white cats are frequently deaf?). Create a Function called “Cat Facts” with the path cat-facts and add the following code in the code section:

exports.handler = function(context, event, callback) {
  const got = require('got');
  const twiml = new Twilio.twiml.VoiceResponse();
  let command;

  if (event.SpeechResult) { command = event.SpeechResult.toLowerCase(); }
  
  if(command && command.includes('menu')) {
    twiml.redirect('/facts');
    callback(null, twiml);
    return;
  }

  got('https://catfact.ninja/fact').then(response => {
    const catFact = JSON.parse(response.body);
    twiml.gather({
      input: 'speech',
      hints: 'menu',
      timeout: 3
    }).say(`Here's your cat fact: ${catFact.fact} ... Say more cat facts for more cat facts or menu for main menu.`);
    callback(null, twiml);
  }).catch(err => {
    twiml.say('There was an error fetching your fact. Going back to main menu.');
    twiml.redirect('/facts');
    callback(null, twiml);
  });
};

Our application first checks to see if the caller said “menu” after receiving a cat fact and if so the app redirects them back to the menu. Otherwise, a new cat fact is fetched and presented to the caller. The caller is redirected to the main menu if an error occurs in fetching a cat fact. This ensures that we don’t hang up on our caller if catfact.ninja happens to go down for some reason.

The number facts and Chuck Norris facts Functions are almost identical so you can implement them with the following code. Here’s the /number-facts code:

exports.handler = function(context, event, callback) {
  const got = require('got');
  const twiml = new Twilio.twiml.VoiceResponse();

  let command;

  if (event.SpeechResult) { command = event.SpeechResult.toLowerCase(); }
  
  if(command && command.includes('menu')) {
    twiml.redirect('/facts');
    callback(null, twiml);
    return;
  }

  got('http://numbersapi.com/random/trivia').then(response => {
        const numberFact = response.body;
        twiml.gather({
          input: 'speech',
          hints: 'number, numbers, more number facts, more numbers facts',
          timeout: 3
        }).say(`Here's your number fact: ${numberFact} ... Say more number facts for more number trivia or menu for main menu.`)
        callback(null, twiml);
      }).catch(err => {
        twiml.say('There was an error fetching your fact. Going back to main menu.');
        twiml.redirect('/facts');
        callback(null, twiml);
      });
};

Here’s the /chuck-facts code:

exports.handler = function(context, event, callback) {
  const got = require('got');
  const twiml = new Twilio.twiml.VoiceResponse();
  let command;

  if (event.SpeechResult) { command = event.SpeechResult.toLowerCase(); }
  
  if(command && command.includes('menu')) {
    twiml.redirect('/facts');
    callback(null, twiml);
    return;
  }

  got('http://api.icndb.com/jokes/random').then(response => {
        const chuckFact = JSON.parse(response.body);
        console.log(chuckFact);
        twiml.gather({
          input: 'speech',
          hints: 'chuck norris, more chuck facts, more chuck norris facts',
          timeout: 3
        }).say(`Here's your Chuck Norris fact: ${chuckFact.value.joke} ... Say chuck norris for more chunks of Chuck Norris knowledge or menu for main menu.`)
        callback(null, twiml);
      }).catch(err => {
        twiml.say('There was an error fetching your fact. Going back to main menu.');
    twiml.redirect('/facts');
        callback(null, twiml);
      });
};

Once all of these functions are created and saved, call your Twilio Facts Hotline and test it out. You should be able to amaze your friends with random cat facts, some number trivia or at the very least have a Chuck Norris joke at the ready at any time.

What’s Next?

Now that you know how Twilio Speech Recognition works, consider trying the following:

  • Try using the partialResultsCallback for to access partial speech results in real-time
  • Build an ordering system that combines DTMF tones and speech

Whatever you build, let me know about it at @brentschooley.