Making Phone Calls Within Slack Using IBM’s Watson Twilio: Phonebot Is Born

June 22, 2015
Written by

Phonebot

If you Gchat me when I’m on Slack, you’ll get this face. Slack users want to keep it all in Slack, even conference calls. James Thomas built a Phonebot that transcribes your conference calls and post them in Slack for you.

Here’s how it works. When you call @Phonebot, you can initiate a Twilio Call with your team. Watson listens to the call in chunks after the audio is posted to the REST API in 5 second batches. Watson then translates the speech to text and posts it in Slack’s chat. Now, you can even track Conference calls in Slack.

Here’s the rundown of the hack, originally posted by James Thomas on his blog here.

Building Phonebot: Integrating Slack, Twilio and IBM Watson

Slack publishes numerous APIs for integrating custom services. These APIs provide everything from sending simple messages as Slackbot to creating a real-time messaging service.

Phonebot will listen to messages starting with @phonebot and which contain user commands e.g. dial, hangup. It will create new channel messages with the translated speech results along with status messages. Users can issue the following commands to control Phonebot.

@phonebot call PHONE_NUMBER <-- Dials the phone number
@phonebot say TEXT <-- Sends text as speech to the call 
@phonebot hangup <-- Ends the active call
@phonebot verbose {on|off}<-- Toggle verbose mode
@phonebot duration NUMBER <-- Set recording duration
@phonebot help <-- Show all commands usage information

We use the Incoming Webhooks API to post new channel messages and the Outgoing Webhook API to notify the application about custom channel commands.

Listening for custom commands

Creating a new Outgoing Webhook, messages from the registered channels which begin with the “@phonebot” prefix will be posted to HTTP URL for the IBM Bluemix application handling the incoming messages.

We can create Outgoing Webhooks for every channel we want to register Phonebot in.

For each registered channel, we need to allow Phonebot to post new messages.

phonebot2

Sending new channel messages

Incoming Webhooks provide an obfuscated HTTP URL that allows unauthenticated HTTP requests to create new channel messages. Creating a new Incoming obfuscated for each channel we are listening to will allow Phonebot to post responses.

Each Incoming Webhook URL will be passed to Phonebot application using configuration via environment variables.

phonebot3

Making Phone Calls

Twilio provides “telephony-as-a-service”, allowing applications to make telephone calls using a REST API.

Twilio has been made available on the IBM Bluemix platform. Binding this service to your application will provide the authentication credentials to use with the Twilio client library.

When users issue the “call” command with a phone number, the channel bot listening to user commands emits a custom event.

bot.on('call', function (number) {
  var phone = this.channels[channel].phone

  if (phone.call_active()) {
    bot.post('The line is busy, you have to hang up first...!')
    return
  }

  phone.call(number, this.base_url + '/' + channel)
})

Within the “phone” object, the “call” method triggers the following code.

this.client.makeCall({
  to: number,
  from: this.from,
  url: route
}, function (err, responseData) {
  if (err) {
    that.request_fail('Failed To Start Call: ' + number + '(' + route + ') ', err)
    return
  }

  that.request_success('New Call Started: ' + number + ' (' + route + '): ' + responseData.sid, responseData)
})

The URL parameter provides a HTTP URL which Twilio will use to POST updated call status information. HTTP responses from this location will tell Twilio how to handle the ongoing call, e.g. play an audio message, press the following digits, record phone line audio.

If the phone call connects successfully, we need the phone line audio stream to translate the speech into text. Unfortunately, Twilio does not support directly accessing the real-time audio stream. However, can record a batch of audio, i.e five seconds, and download the resulting file.

Therefore, we will tell Twilio to record a short section of audio and post the results back to our application. When this message is received, our response will contain the request to record another five seconds. This approach will provide a semi-realtime stream of phone call audio for processing.

Here is the code snippet to construct the TwiML response to record the audio snippet. Any channel messages that are queued for sending as speech will be added to the outgoing response.

twiml = new twilio.TwimlResponse()

// Do we have text to send down the active call?
if (this.outgoing.length) {
  var user_speech = this.outgoing.join(' ')
    this.outgoing = []
    twiml.say(user_speech)
}

twiml.record({playBeep: false, trim: 'do-not-trim', maxLength: this.defaults.duration, timeout: 60})

 

When we have the audio files containing the phone call audio, we can schedule these for translation with the IBM Watson Speech To Text service.

Translating Speech To Text

Using the IBM Watson Speech To Text service, we can simply transcribe phone calls by posting the audio file to the REST API. Using the client library handles making the actual API requests behind a simple JavaScript interface.

var params = {
  audio: fs.createReadStream(file_name),
  content_type: 'audio/l16; rate=16000'
}

this.speech_to_text.recognize(params, function (err, res) {
  if (err) {
    this.error(err)
    return
  }

  var result = res.results[res.result_index]
  if (result) {
    this.transcript = result.alternatives[0].transcript
    this.emit('available')
  } else {
    this.error('Missing speech recognition result.')
  }
})

 

Having previously handling converting the audio file from the format created by Twilio to that needed by the Watson API, we were able to reuse the translate.js class between projects.

This module relies on the SOX library being installed in the native runtime. We used a custom buildpack to support this.

Managing Translation Tasks

When a new Twilio message with audio recording details is received, we schedule a translation request. As this background task returns, the results are posted into the corresponding Slack channel.

If a translation request takes longer than expected, additional requests may be scheduled before the first has finished. We still want to maintain the order when posting new channel messages, even if later requests finishing translating first.

Using the async library, a single-worker queue is created to schedule the translation tasks.

Each time the phone object for a channel emits a ‘recording’ event, we start the translation request and post the worker to the channel queue.

phone.on('recording', function (location) {
  if (phone.defaults.verbose) {
    this.channels[channel].bot.post(':speech_balloon: _waiting for translation_')
  }
  var req = translate(this.watson, location)
  req.start()
  this.channels[channel].queue.push(req)
})

When a task reaches the front of the queue, the worker function is called to process the result.

If translation task has finished, we signal to the queue this task has completed. Otherwise, we wait for completion events being emitted.

var queue = async.queue(function (task, callback) {
  var done = function (message) {
    if (message) this.channels[channel].bot.post(':speech_balloon: ' + message)
    callback()
    return true
  }

  var process = function () {
    return done(task.transcript)
  }

  var failed = function () {
    return done(this.channels[channel].phone.defaults.verbose ? '_unable to recognise speech_' : '')
  }

  if (task.transcript && process()) return
  if (task.failed && failed()) return

  task.on('available', process)
  task.on('failed', failed)
}, 1)

Deploying Phonebot

Now we’ve finished the code, we can configure the application to deploy on the IBM Bluemix cloud platform.

Configuring Webhooks

Phonebot must be passed the configured incoming webhooks URLs, allowing it to send channel messages. Following the standard Platform-as-a-Service convention for passing configuration, we store the channel webhooks as environment variables.

Using the CF CLI, we run the following command to set up the local environment parameters.

$ cf cups slack_webhooks -p '{"channel_name":"incoming_webhook_url",...}'

 

Application Manifest

Application manifests configure deployment parameters for Cloud Foundry applications. Phonebot will need to be bound to Twilio, IBM watson and custom services, along with configuring the runtime environment.

---
applications:
- name: phonebot 
  memory: 256M 
  command: node app.js
  buildpack: https://github.com/jthomas/nodejs-buildpack.git
  services:
  - twilio
  - speech_to_text
  - slack_webhooks
declared-services:
  twilio:
    label: Twilio
    plan: 'user-provided'
  twilio:
    label: slack_webhooks
    plan: 'user-provided'
  speech_to_text:
    label: speech_to_text
    plan: free

 

…with this manifest, we can just use the cf push command to deploy our application!

Using Phonebot

Phonebot will post the following message to each channel successfully registered on startup.

phonebot_is_here

Users can issue @phonebot COMMAND messages to control phone calls directly from the slack channel.

For further information about the project, follow the project on Github. Upcoming features are listed in the issues page. Please feel free to ask for new features, report bugs and leave feedback on Github.