Build a Wake Word Detection Assistant in PHP using Ratchet for WebSockets and Twilio SMS

October 24, 2019
Written by

Wake word detection in PHP using WebSockets

Due to their data transmission speed and low latency, WebSockets open up exciting possibilities for real-time apps. Many of your favorite apps such as Google Docs, Instagram, and Facebook are already using them to keep us in sync with each other using live data.

We’re going to dive deeper into the world of PHP WebSocket development by building a wake word and hot word detection assistant that communicates with us via SMS, similar to an Alexa.

To begin, we’ll use the introductory tutorial, “How to Create a WebSocket Server in PHP to Build a Real-Time Application”. We’ll modify the HTML to create a <textarea> that streams the input value to our WebSocket. Our Socket() class will listen for a list of predefined keywords and upon a match, send an SMS to the user.

If you haven’t already done so, download the repo here.

Tools Needed to Complete This Tutorial

To complete this tutorial we will need the following tools:

Install the Twilio PHP SDK in the project

Twilio provides a near-seamless experience to incorporate voice, SMS, and video into your application. For this project, we’ll use the Twilio PHP SDK to connect to the Twilio SMS API. The Twilio API will be used to send SMS to the client users later in the tutorial.

To install the SDK, run the following command:

$ composer require twilio/sdk

Next, we’ll need a way to securely store our Twilio account credentials. We’ll build on the work of the phpdotenv project by Vance Lucas. The dotenv package will create a mechanism to load environment variables from a .env file to getenv(), $_ENV and $_SERVER. Let’s install it by running the following command:

$ composer require vlucas/phpdotenv && touch .env

Our Twilio installation is now almost complete. The last thing we’ll need is to update the .env we just generated with our Twilio Credentials.

Add the following variables as seen below.

TWILIO_ACCOUNT_SID=""
TWILIO_AUTH_TOKEN=""
TWILIO_PHONE_NUMBER="Phone number in E.164 format i.e +13365555555"

Navigate to your Twilio account dashboard and copy your “Account SID” and “Auth Token” into the respective variable above. Then, navigate to the Twilio Phone Numbers section of the console and copy your incoming number into the TWILIO_PHONE_NUMBER variable. Be sure to paste it in E.164 format.

Twilio account credentials

Finally, open app.php and add the following highlighted lines:

<?php

use Ratchet\Server\IoServer;
use Ratchet\Http\HttpServer;
use Ratchet\WebSocket\WsServer;
use MyApp\Socket;

require dirname( __FILE__ ) . '/vendor/autoload.php';

$dotenv = Dotenv\Dotenv::create(__DIR__);
$dotenv->load();

$server = IoServer::factory(
    new HttpServer(
        new WsServer(
            new Socket()
        )
    ),
    8080
);

$server->run();

This will load the .env file and expose the values of the variables we defined to getenv().

Add Wake Word Detection to the Socket Class

There are many philosophies and engines on wake-word or hot-word detection. We’ll ignore those today and focus on exploiting WebSockets for real-time data processing.

Open app\socket.php and replace the file with the following code:

<?php
namespace MyApp;

use Ratchet\MessageComponentInterface;
use Ratchet\ConnectionInterface;

use Twilio\Rest\Client;

class Socket implements MessageComponentInterface {

    public function __construct()
    {
        $this->clients   = new \SplObjectStorage;
        $this->wake_word = 'Hey Bot';
        $this->keywords = [
            'eat'   => 'I\'d be glad to get you something to eat!',
            'drink' => 'Are you sure you\'re thirsty?!'
        ];
        $this->said = []; // Store what keywords have been used
    }

    public function onOpen(ConnectionInterface $conn) {

        // Store the new connection in $this->clients
        $this->clients->attach($conn);

        echo "New connection! ({$conn->resourceId})\n";
    }

    public function onMessage(ConnectionInterface $from, $msg) {

        // Perform a case-insensitive search for the wake word inside of the message
        if ( false === stripos( $msg, $this->wake_word ) ) {
            
            // Reset the words said
            $this->said = [];

            return;
        }

        // Capture the phone number of the client
        $to = strstr( $msg, ':', true );
        
        // Remove any punctuation from the message
        $msg = preg_replace('/[^a-z0-9]+/i', ' ', strstr( $msg, ':' ) );
        $msg = trim( $msg );

        // Parse the message received from the client
        $message_parts = explode( ' ', $msg );

        // Detect if a keyword has been shared
        foreach ( $message_parts as $word ) {
            
            if ( isset( $this->keywords[ $word ] ) && ! in_array( $word, $this->said ) ) {
                $this->send_sms( $to, $this->keywords[ $word ] );
                $this->said[] = $word;

                // Send a response back to the client
                $from->send( $this->keywords[ $word ] );
            }
        }

        // Send a response back to the client
        if ( 'hey bot' === strtolower( $msg ) ) {
            $from->send( 'Hi. Bot is listening...' );
        }
    }

    public function onClose(ConnectionInterface $conn) {
    }

    public function onError(ConnectionInterface $conn, \Exception $e) {
    }

    public function send_sms( $to, $message ) {

        if ( empty( $to ) ) {
            return;
        }

        $account_sid = getenv('TWILIO_ACCOUNT_SID');
        $auth_token = getenv('TWILIO_AUTH_TOKEN');

        // A Twilio number you own with SMS capabilities
        $twilio_number = getenv('TWILIO_PHONE_NUMBER');

        $client = new Client( $account_sid, $auth_token );
        $client->messages->create(
            // Where to send a text message (your cell phone?)
            $to,
            [
                'from' => $twilio_number,
                'body' => $message
            ]
        );
    }
}

You’ll notice that the onMessage() method now processes our input in real-time. Upon detecting a phone number, it sends a message to the client.

Now that our wake-word detection is in place, we can move forward to updating our client-side application to test things out.

Create the Client-side Application

Open the index.html file and replace the code:

<html>
    <head>
        <style>
            input, button, textarea { padding: 10px; }
        </style>
    </head>
    <body>
        <p></p><input type="text" id="number" placeholder="Phone Number in E.164 format" autocomplete="off" /></p>
        <textarea id="message" onkeyup="transmitMessage()" cols="50" rows="4" placeholder="Type a message to the bot"></textarea>
        <p>Type these sample messages:</p>
        <ol>
            <li>"Hey Bot"</li>
            <li>"Hey Bot, can you bring me something to eat?"</li>
            <li>"Hey Bot, do you have anything to drink?"</li>
        </ol>
        <p id="response"></p>
        <script>
            // Create a new WebSocket
            var socket  = new WebSocket('ws://localhost:8080');

            // Define the variable to transmit to the WebSocket
            var message  = document.getElementById('message');
            var number   = document.getElementById('number');
            var response = document.getElementById('response');

            function transmitMessage() {
                socket.send( number.value + ':' + message.value );
            }

            socket.onmessage = function(e) {
                response.innerHTML = e.data;
            }
        </script>
    </body>
</html>

This form accepts the phone number of the client along with a message to send to our bot.

Every keystroke is transferred to the WebSocket for real-time processing. Whenever the wake word “Hey Bot” is transmitted, the interface will respond accordingly, followed by a response for each subsequent command.

Screen Shot 2019-10-22 at 4.17.04 PM.png

Test the Wake Word Assistant

In your terminal, start the WebSocket server by running:

$ php app.php

In a separate terminal run ngrok to expose the HTTP server to the internet.

$ ngrok http 80

ngrok screen

Copy the HTTPS Forwarding address and paste it into your browser.

Follow the prompts in the interface and look for an SMS to be sent when you trigger the “eat” and “drink” keywords.

iMessage screen

Conclusion

Because we capture the phone number of each client, this tutorial could be extended to send SMS messages to each connected client in real-time.

Marcus Battle is the PHP Developer of Technical Content at Twilio. He is committed to prompting and rallying PHP developers across the world to build the future of communications. He can be reached on Twitter at @themarcusbattle and via email.