Convert Incoming Twilio Messages to Audio with Transformers.js

Time to read: 24 minutes

March 19, 2024

Written by

Carlos Mucuho

Contributor

Reviewed by

Dhruv Patel

Twilion

Convert Incoming Twilio Messages to Audio with Transformers.js TTS

In this tutorial, you will build a web app that not only retrieves and displays the latest messages sent to your Twilio phone number but also converts them to audio using Text-to-Speech (TTS).

To build this application, you will use the Twilio Programmable Messaging REST API to retrieve the most recent messages sent to your Twilio number. Then, you will display these messages in the web app. You will use Transformers.js TTS functionality to convert the messages to audio, the wavefile library to process the audio, and the Audiomotion-analyzer to create and display the audio waveforms. The IDB-Keyval library for audio storage and retrieval. You will also use Twilio Sync to share incoming messages detected on the server with the client app.

The Twilio Programmable Messaging REST API is a service that allows developers to send, receive, and retrieve SMS and MMS messages programmatically using Twilio's infrastructure.

Twilio Sync is a service that enables real-time data synchronization between connected devices and applications, allowing seamless collaboration and communication.

Transformers.js is a library that mirrors the functionality of Hugging Face’s transformers Python library , enabling you to run in the browser, the same pre-trained models with a highly similar API.

The wavefile library is a tool for working with WAV audio files in JavaScript or Node.js. It provides functions to create, manipulate, and read WAV files, allowing developers to handle audio data efficiently.

The Audiomotion-analyzer library is designed for analyzing audio in web applications. It facilitates tasks such as generating visualizations of audio waveforms and processing audio data using JavaScript.

The IDB-Keyval library simplifies the usage of the IndexedDB API for key-value storage in web applications. It provides a simple and convenient interface for storing and retrieving data in the IndexedDB database.

By the end of this tutorial, you will have a web app that looks similar to the following:

Tutorial Requirements

To follow this tutorial you will need the following:

A Twilio account - Sign up here
A Ngrok account and the Ngrok CLI
A basic understanding of Javascript
A basic understanding of how to use Twilio SMS and Sync APIs to build a web app
Node.js v18+ installation
Git installation

Getting the boilerplate code

In this section, you will clone a repository containing the boilerplate code needed to build the application.

Open a terminal window and navigate to a suitable location for your project. Run the following commands to clone the repository containing the boilerplate code and navigate to the boilerplate directory:

git clone https://github.com/CSFM93/twilio-convert-messages-to-audio-starter.git
cd twilio-convert-messages-to-audio-starter

This boilerplate code includes an Express.js project that serves the client application.

This Node.js application comes with the following packages:

express: is a lightweight Node.js framework used to build web applications and APIs.
body-parser: is a middleware for Express.js that extracts the entire body portion of an incoming request stream and exposes it on req.body.
node-dev: is a development tool for Node.js that automatically restarts the server upon detecting changes in the source code.
dotenv: is a Node.js package that facilitates loading environment variables from a .env file into process.env.
twilio: is a package that allows you to interact with the Twilio API.
twilio-sync: is a package specifically designed to interact with the Twilio Sync API.
uuid: is a package for generating universally unique identifiers (UUIDs).

Use the following command to install the packages mentioned above:

npm install

Understand the public directory

The public directory contains a file named index.html and the following subdirectories: css, js, and audio.

The index.html file sets up the structure for the client application’s web page, it incorporates Bootstrap for styling and functionality, and Twilio Sync Javascript SDK for synchronizing data across multiple clients. This file defines elements for displaying messages and notifications, and additionally, it defines an audio visualizer modal to showcase waveforms corresponding to audio generated from TTS-converted messages.

The js directory contains an empty JS file named index.js included in the index.html file.This file will contain the code responsible for managing the client application UI.

The css directory contains a file named style.css which will be used to stylize the web app.

The audio directory contains a bell notification audio file downloaded from mixkit.co. This file will serve as the notification sound triggered when the client application detects an incoming message for your Twilio number.

Go back to your terminal and run the following command to start the server application:

npm start

Open your preferred browser, navigate to the URL http://localhost:3000/, and you should see a page similar to this:

Web app starter page that comes with the boilerplate code

Make sure you terminate your server application before moving to the next section.

Collect and store your credentials

In this section, you are going to retrieve the Twilio credentials that will allow you to interact with the Twilio API. Additionally, you will buy a new Twilio Phone number with SMS capabilities if you don't have one.

The .env file included in the cloned repository is where you will store all credentials.

Twilio credentials

Open a new browser tab, and log in to your Twilio Console. Once you are on your console copy the Account SID and Auth Token and store these credentials in the .env file found in the root directory of the boilerplate code:

TWILIO_AUTH_TOKEN=<your twilio account auth token>

The third and fourth required credentials are an API Key SID and API Key Secret, which can be obtained by following this guide. After obtaining both credentials, copy and store them in the .env file:

TWILIO_API_KEY_SID=<your twilio API key>
TWILIO_API_KEY_SECRET=<your twilio Secret key>

After generating an API key you will only be able to see the API Key Secret one time. Be sure to copy and paste it somewhere safe, or directly into your .env file right away.

The fifth credential required is a Sync Service SID. A Sync Service is the top-level scope of all other Sync resources (Maps, Lists, Documents).

Navigate to the Twilio Sync Service page, create a new Sync Service, and copy the Sync Service SID. This credential should also be stored in the . env file once obtained:

TWILIO_SYNC_SERVICE_SID=<Your Sync Service SID>

Navigate to the Buy a Number page in your Twilio Console, and purchase a number with SMS capabilities if you don’t own one. Copy your Twilio phone number and store it in the .env file as the value for TWILIO_NUMBER:

TWILIO_NUMBER= <your twilio phone number>

Most jurisdictions, including the United States, require you to fill out certain information before you can use a phone number. You won’t be able to complete this tutorial until your number is registered.Refer to our Getting Started with Phone Number Regulatory Compliance guide.

Retrieving and displaying messages received

In this section, you will write the code that will allow your application to retrieve and display the most recent messages sent to your Twilio phone number.

Open your server.js file located in your project root directory, and add the following code below the line where you declare the port constant on line 6:

const twilio = require('twilio');
const twilioClient = twilio(process.env.TWILIO_ACCOUNT_SID, process.env.TWILIO_AUTH_TOKEN);

The code, first, imports the twilio package for interacting with the Twilio API.

Next, it creates a Twilio client object by invoking the twilio() function and passing in the Twilio account SID and authentication token obtained from the environment variables. This client instance will be used to interact with Twilio's messaging services.

Add the following code above below the line where you set the middleware to serve the static files (app.use(express.static('public'));):

async function getMessages() {
  let messages = await twilioClient.messages
    .list({
      to: process.env.TWILIO_NUMBER,
      limit: 100,
    });

  if (messages === undefined) {
    messages = [];
  }

  return messages;
}

The code defines an asynchronous function named getMessages() which is responsible for retrieving a list of messages received using the Twilio client.

The result of this retrieval is stored in a variable named messages and it does so by calling the list() method on the twilioClient.messages object, specifying the recipient number as your Twilio phone number, and limiting the result to the most recent 100 messages. The function then checks if the value stored in messages is undefined indicating an error retrieving the messages. If so, it initializes messages as an empty array. Lastly, the function returns the array of messages.

Add the following code below the getMessages() function:

app.get('/getMessages', async (req, res) => {
  const messages = await getMessages();
  for (let i = 0; i < messages.length; i++) {
    messages[i].read = true;
  }

  res.send({
    messages,
  });
});

Here, the code creates a route that handles incoming HTTP GET requests at the path /getMessages. This route is designed to fetch messages sent to your Twilio number, mark them as read, and send the updated list as a response.

The route initiates by calling the asynchronous function getMessages(), which retrieves a list of messages. Next, it iterates through each message, marking them as read by setting the read property to true using a for loop. Finally, the route responds to the client with an object containing the updated list of messages

Navigate to the public directory, then to the js subdirectory, open the empty file named index.js, and add the following code to it:

const ulInboxElement = document.getElementById('ul-inbox');
let inboxMessages;

The code initializes ulInboxElement with the HTML element ID ul-inbox, referencing an unordered list for displaying messages sent to your Twilio number.

Following that, it declares inboxMessages without initialization; this variable will be used to store the list of displayed messages.

Add the following code below the inboxMessages variable:

async function getMessages() {
  const response = await fetch('/getMessages', {
    method: 'GET',
    headers: {
      Accept: 'application/json',
      'Content-Type': 'application/json',
    },
  });
  const { messages } = await response.json();
  return messages;
}

The code defines an asynchronous function named getMessages(). This function is responsible for fetching messages from the server by making an asynchronous HTTP GET request to the /getMessages endpoint.

Within the function, the fetch API is utilized to send a GET request to the specified endpoint. After obtaining the response, the code extracts the JSON content using response.json(). The extracted JSON object is destructured to retrieve the messages property, containing the list of messages. Finally, the function returns the obtained messages array.

Add the following code below the getMessages() function:

function truncate(str, max) {
  return str.length > max ? str.substr(0, max - 1) + '…' : str;
}

function formatDate(date) {
  const splitDate = date.split('T');
  const day = splitDate[0].replaceAll('-', '/');
  const splitTime = splitDate[1].split(':');
  const time = splitTime[0] + ':' + splitTime[1];
  const formatedDate = day + '  ' + time;
  return formatedDate;
}

The code defines two functions named truncate() and formatDate() which are responsible for truncating and formatting strings to display messages in a concise and readable manner.

The truncate() takes a str (string) and max (maximum length) as parameters. It shortens the input string if it exceeds the specified maximum length, appending an ellipsis (...) if needed.

The formatDate() function expects a date parameter, in the following string format 2024-01-05T21:34:18.000Z. It splits the date into its date and time components using 'T' as the separator. The date is formatted by replacing hyphens with slashes, creating a day format. The time is extracted and formatted as hours and minutes. Finally, the function returns the formatted date and time.

Add the following code below the formatDate() function:

async function populateMessages() {
  for (const [index, message] of inboxMessages.entries()) {
    const { from, body, dateCreated, read } = message;
    const showNewMessageBadge = read ? '' : '<span class="badge rounded-pill text-bg-danger">New</span>';
    const newMessageElement = document.createElement('li');
    newMessageElement.setAttribute('class', 'list-group-item d-flex justify-content-between');
    newMessageElement.innerHTML = `<div class="ms-2 me-auto">
    <div class="fw-bold from-div">${from} ${showNewMessageBadge}</div>
    <div class="row">
        ${truncate(body, 30)}
    </div>
</div>
<div>
    <span class="ps-2">${formatDate(dateCreated)}</span>
    <div class="d-flex justify-content-center">
      <button class="btn btn-sm btn-outline-danger rounded-circle btnPlayMessage py-1" id="btnPlayMessage-${index}">
          <i class="bi bi-play-fill"></i>
      </button>
</div>
</div>`;
    ulInboxElement.appendChild(newMessageElement);
  }
}

Here, the code introduces an asynchronous function named populateMessages(). This function is responsible for rendering and displaying messages within the user interface.

It iterates through the inboxMessages array, extracting relevant properties such as from, body, dateCreated, and read for each message.

Within the loop, the code dynamically generates HTML elements to represent each message. It creates a list item (<li>) for each message using document.createElement('li'). The structure of each list item includes information like the sender, a potential new message badge, a truncated message body, a timestamp, and a Play button.

The truncate() function is used to limit the displayed message body to 30 characters. The formatDate function is utilized to format the date and time of the message creation.

Finally, the dynamically created HTML elements are appended to the existing ulInboxElement.

Add the following function below the populateMessages():

async function initialize() {
  inboxMessages = await getMessages();
  populateMessages();
}
initialize();

The code declares an asynchronous function named initialize(). This function is responsible for initializing the application or a specific module.

Within the function, the variable inboxMessages is assigned the result of an asynchronous call to getMessages(). The getMessages() fetches messages sent to your Twilio number, and the result is stored in the inboxMessages variable.

Following the retrieval of the messages, the populateMessages() function is invoked. This function renders messages in the UI based on the fetched data.

The final line calls the initialize() function to trigger the initialization process, ensuring that when the application is started, the inbox messages are fetched and populated in the UI.

Go back to your terminal and run the following command to start the server application:

npm start

Open another terminal tab, and run the following ngrok command:

ngrok http 3000

This command above exposes your local web server, running on port 3000, to the internet using the ngrok service.

Copy the forwarding HTTPS URL provided, paste it into the browser tab you previously opened, navigate to the pasted URL, and you should see a page similar to the following:

Displaying the most recent messages sent to your Twilio number

The page should display the most recent messages sent to your Twilio number. If your number hasn’t received any messages make sure you send one, then refresh the page before proceeding to the next section.

Using TTS to convert messages to audio

In this section, you will use the Transformers.js library to convert messages sent to your Twilio number to audio using a TTS model. Next, you will use the AudioMotion-analyzer and wavefile libraries to process and generate waveforms for the audio. Additionally, you will use the IDB-Keyval library to store and retrieve the generated audio.

Given the resource-intensive nature of executing AI models in the browser, the code responsible for this task will be housed in a dedicated Web worker file.

In the public/js subdirectory create a file named worker.js, and the following code to it:

import { pipeline, env } from 'https://cdn.jsdelivr.net/npm/@xenova/transformers@2.13.4';
env.allowLocalModels = false;
let synthesizer;

The code imports the pipeline function and env module from the Transformers.js library hosted on a CDN.

The env.allowLocalModels is set to false, to disallow the use of locally stored models and ensure the use of remote models instead.

A variable named synthesizer is declared to store the instance of the text-to-speech synthesizer. This variable will be used to interact with the Transformers.js library for processing text and generating speech.

Add the following code below the synthesizer variable:

async function getInstance(progress_callback = null) {
  synthesizer = await pipeline('text-to-speech', 'Xenova/mms-tts-eng', {
    quantized: false,
    progress_callback,
  });
}

This code defines an asynchronous function named getInstance(). It initializes the text-to-speech synthesizer and takes an optional progress_callback parameter to track initialization progress.

Within the function, it awaits the result of the pipeline() function from Transformers.js. This function sets up a text-to-speech pipeline using the Xenova/mms-tts-eng model designed for English synthesis. A pipeline is a unified interface for performing various natural language processing (NLP) tasks using pre-trained models.

The quantized option is set to false, indicating no quantization. Quantization reduces model precision for efficiency but may slightly impact accuracy.

The progress_callback function reports progress during initialization. Once the pipeline is initialized, the synthesizer variable holds the resulting pipeline for text-to-speech synthesis.

Add the following code below the getInstance() function:

async function initializePipeline() {
  await getInstance((data) => {
    self.postMessage(data);
  });
}

This code block defines an asynchronous function named initializePipeline(). This function initializes the text-to-speech synthesizer pipeline used within the worker.

Within the function, it awaits the result of the getInstance() function, which is responsible for setting up the text-to-speech synthesizer pipeline. The getInstance() function takes an optional progress_callback parameter, which is a callback function to track the initialization progress.

The getInstance() function is called within the initializePipeline() function, and it awaits the completion of the pipeline initialization. During this process, if progress information is received, it is posted back to the main thread using the self.postMessage() method. This allows the client application to monitor the progress of the text-to-speech synthesizer initialization.

Add the following code below the initializePipeline():

async function convertMessageToSpeech(message) {
  try {
    const result = await synthesizer(message);
    return result;
  } catch (error) {
    console.error('error', error);
    return undefined;
  }
}

The code defines an asynchronous function named convertMessageToSpeech(). This function is responsible for converting a given text message into speech using the text-to-speech synthesizer.

The function begins by attempting to execute the text-to-speech synthesis process using the synthesizer object. It awaits the result of this synthesis, which represents the generated speech corresponding to the input text message.

If the synthesis process is successful, the function returns the resulting speech. However, if an error occurs during the synthesis, the function catches the error, logs it to the console, and returns undefined to indicate that the conversion process encountered an issue.

Add the following code below the convertMessageToSpeech() function:

self.addEventListener('message', async (event) => {
  const message = event.data;
  console.log('message received by worker', message);

  if (message.action === 'convertMessageToSpeech') {
    if (synthesizer === undefined) {
      await initializePipeline();
    }

    const { body, sid } = message.args;
    const result = await convertMessageToSpeech(body);

    result.sid = sid;
    self.postMessage({
      status: result !== undefined ? 'complete' : 'error',
      task: 'convertMessageToSpeech',
      data: result,
    });
  }
});

The code above defines a message event listener within a web worker script. This event listener responds to messages received by the worker, particularly those related to converting text messages to speech.

The event listener starts by extracting the received message from the event data and logs it to the console.

It then checks if the received message's action property is set to ’convertMessageToSpeech’. This conditional statement ensures that the worker responds specifically to messages requesting the conversion of a text message to speech.

Within the conditional block, it verifies whether the text-to-speech synthesizer (synthesizer) is undefined. If it is, the worker calls the initializePipeline() function to set up the synthesizer. This initialization step ensures that the text-to-speech pipeline is ready for use.

Following the synthesizer initialization, the worker extracts the message body and sid from the received message's args property.

It proceeds to call the convertMessageToSpeech() function, passing the message body as an argument. This function asynchronously converts the text message to speech using the pre-initialized synthesizer.

The resulting speech, along with the message sid, is stored in the result variable.

Finally, the worker posts a message back to the main thread. This message includes information about the status of the conversion process ('complete' or 'error'), the task performed ('convertMessageToSpeech'), and the data (result or undefined).

In the public/js subdirectory, create a file named tts.js, and add the following code to it:

import AudioMotionAnalyzer from 'https://cdn.skypack.dev/audiomotion-analyzer?min'
import * as wavefile from 'https://cdn.jsdelivr.net/npm/wavefile@11.0.0/+esm'

const audioVisualizerModal = document.getElementById('audioVisualizerModal');
const modalBtnPlayAudio = document.getElementById('modalBtnPlayAudio');
const audioVisualizerContainer = document.getElementById('audioVisualizerContainer');
const audioElement = document.getElementById('audioVisualizer');
const bootStrapModal = new bootstrap.Modal(audioVisualizerModal);

let isLoading = false;
let clickedPlayButtonID = '';

The code begins by importing the AudioMotion-analyzer and wavefile libraries from a CDN. The AudioMotion-analyzer library is used for audio visualization. It helps analyze and visualize audio data in web applications. The wavefile library is used to handle and manipulate WAV audio files.

Several constants are then declared, each representing a different HTML element obtained by its ID. These elements include the audio visualizer modal, a play audio button inside the modal, the container for the audio visualizer, and the audio element itself.

A Bootstrap modal instance (bootStrapModal) is created, associated with the audio visualizer modal. This instance facilitates the manipulation of the modal's appearance and behavior.

Two global variables, isLoading and clickedPlayButtonID, are initialized. The isLoading variable tracks whether an audio file is currently being loaded, while clickedPlayButtonID stores the ID of the button that triggered the play action.

Add the following code below the clickedPlayButtonID variable:

const worker = new Worker(new URL("./worker.js", import.meta.url), {
  type: "module",
});

This code initializes a web worker by instantiating the Worker class. The worker script is specified using the URL constructor, where ./worker.js is the relative path to the worker script file, and import.meta.url provides the base URL of the current module.

The type "module" option indicates that the worker script should be treated as a module, allowing the use of modern ECMAScript module syntax in the worker file.

Add the following code to the bottom of your tts.js file:

export function handlePlayButton() {
    const btnClicked = document.getElementById(clickedPlayButtonID)
    isLoading = isLoading ? false : true
    if (isLoading) {
        btnClicked.disabled = true;
        btnClicked.innerHTML = `<span class="spinner-border spinner-border-sm" aria-hidden="true"></span>
    <span class="visually-hidden" role="status">Loading...</span>
  `
    } else {
        btnClicked.disabled = false;
        btnClicked.innerHTML = `<i class="bi bi-play-fill"></i>`
    }
}

The code declares a function named handlePlayButton(). This function is responsible for toggling the visual state of the Play buttons shown next to every message received.

The function begins by retrieving the HTML button element associated with the ID stored in the variable clickedPlayButtonID. It then toggles the isLoading variable, which tracks whether the audio generated for the message is currently loading.

If isLoading is true, indicating that the content is loading, the function disables the button and updates its inner HTML to display a loading spinner.

On the other hand, if isLoading is false, indicating that the content has finished loading, the function enables the button and updates its inner HTML to display a play icon.

Add the following code below the handlePlayButton() function:

export async function convertMessageToSpeech(message, btnID) {
    clickedPlayButtonID = btnID
    handlePlayButton()
    const { body, sid } = message

    let data = {
        action: 'convertMessageToSpeech',
        args: {
            sid,
            body
        },
    }
    worker.postMessage(data);
}

The code declares and exports an async function named convertMessageToSpeech(). This function is responsible for initiating the conversion of a given message's text content to speech. It takes two parameters, message and btnID.

The function begins by updating the global variable clickedPlayButtonID with the value of btnID. Following this, it calls the handlePlayButton() function, which dynamically adjusts the visual state of a Play button associated with the provided btnID.

Next, the function destructures the message parameter to extract the message body and sid. It then creates a data object containing an action of 'convertMessageToSpeech' and an args object with the extracted sid and body.

Finally, the function posts this data to the web worker using the worker.postMessage() method to initiate the text-to-speech conversion process.

Add the following code below the convertMessageToSpeech() function:

async function playAudio(audioData, samplingRate) {
  try {
    const wav = new wavefile.WaveFile();
    wav.fromScratch(1, samplingRate, '32f', audioData);

    const blob = new Blob([wav.toBuffer()], { type: 'audio/wav' });
    const audioURL = URL.createObjectURL(blob);
    audioElement.src = audioURL;

    bootStrapModal.show();
    audioElement.play();
    handlePlayButton()
    return true;
  } catch (error) {
    console.error('Error', error);
    handlePlayButton()
    return false;
  }
}

The code defines a function named playAudio(). This asynchronous function takes two parameters: audioData, which represents the audio content to be played, and samplingRate, which is the rate at which samples of the audio signal are taken per second.

Inside the function, a try-catch block is used to handle potential errors during the audio playback process. Within the try block, a new WaveFile instance is created, representing a WAV audio file. The audio content (audioData) and sampling rate (samplingRate) are used to initialize this WAV file.

Next, a Blob is created from the WAV file's buffer, with a specified MIME type of 'audio/wav'. The Blob is then converted into a URL using URL.createObjectURL, and this URL is assigned to the src attribute of the audioElement.

The Bootstrap modal (bootStrapModal) is displayed, indicating that audio playback is in progress. The audioElement is then instructed to play the audio using audioElement.play(). Additionally, the handlePlayButton() function is called to dynamically update the visual state of the Play button.

In case of an error, the catch block logs the error, calling handlePlayButton() to update the Play button state. The function returns true on successful audio playback; otherwise, it returns false.

Add the following code below the playAudio():

worker.onmessage = async function (event) {
  const message = event.data
  console.log('Message from worker:', message);

  if (message.task === 'convertMessageToSpeech') {
    if (message.status === 'complete') {
      let result = message.data
      await playAudio(result.audio, result.sampling_rate)
    } else {
      handlePlayButton()
      alert('failed to generate audio')
    }
  }
};

Here the code implements onmessage event handler for the worker object initialized earlier. This event listener is responsible for handling messages sent by the web worker to the main thread. It is triggered whenever the web worker posts a message.

Inside the event handler, the received message is stored in the message variable. The code then logs this message.

The code further checks if the task specified in the message is 'convertMessageToSpeech'. If true, it examines the 'status' property within that message. If the `status` is 'complete', it extracts the audio data and sampling rate from the message, then calls the playAudio() function asynchronously, passing the audio data and sampling rate as arguments.

If the status is not 'complete', the handlePlayButton() function is called to update the Play button's visual state, and an alert is displayed, notifying the user that the audio generation process has failed.

Add the following code below the onmessage event handler:

function setupAudioVisualizer() {
  const audioMotion = new AudioMotionAnalyzer(
    audioVisualizerContainer,
    {
      source: audioElement,
      ansiBands: false,
      showScaleX: false,
      bgAlpha: 0,
      overlay: true,
      mode: 5,
      frequencyScale: "log",
      showPeaks: false,
      reflexRatio: 0.5,
      reflexAlpha: 1,
      reflexBright: 1,
      smoothing: 0.7,
      gradient: 'classic'
    }
  );
  audioMotion.registerGradient('classic', {
    colorStops: [
      { color: 'white' },
    ]
  });
}
setupAudioVisualizer();

In the code above, a function named setupAudioVisualizer() is declared. This function is responsible for initializing and configuring an audio visualizer using the AudioMotion-analyzer library.

Inside the function, a new instance of AudioMotionAnalyzer is created and assigned to the variable audioMotion. This instance is configured with various parameters, specifying the settings for the audio visualization. The settings include:

source: The audio element (audioElement) is set as the source for visualization.
ansiBands: Set to false to disable ANSI bands in the visualization.
showScaleX: Set to false to hide the X-axis scale.
bgAlpha: The background alpha value is set to 0 (fully transparent).
overlay : Set to true to enable overlay mode.
mode: The visualization mode is set to 5.
frequencyScale: The frequency scale is set to "log"
showPeaks: Set to false to hide audio peaks.
reflexRatio: The reflex ratio is set to 0.5.
reflexAlpha: The reflex alpha value is set to 1.
reflexBright: The reflex brightness is set to 1.
smoothing: The smoothing factor is set to 0.7.
gradient: The color gradient used for visualization is set to 'classic'.

Additionally, the function overrides an existing color gradient named 'classic' with a single color stop, where the color is set to 'white'.

Finally, the setupAudioVisualizer() function is called to initialize the audio visualizer with the specified configurations.

For comprehensive instructions on utilizing the AudioMotion-analyzer library, please refer to its Github page.

At the bottom of the tts.js file, add a click event listener for the Play button located inside the modal:

modalBtnPlayAudio.addEventListener('click', () => {
    audioElement.play();
})

This event listener ensures that when the Play button is clicked, the audio element located inside the modal begins playing the audio.

Go to the /public/js/index.js file and add the following line of code to the top of the import section to import the convertMessageToSpeech() function:

import { convertMessageToSpeech } from './tts.js';

Now, go to the populateMessages() function located in the /public/js/index.js file, and add lines 6-13 below to the for of loop below the line where you append a newMessageElement to the ulInboxElement:

async function populateMessages() {
  for (const [index, message] of inboxMessages.entries()) {
    …
    ulInboxElement.appendChild(newMessageElement);

    document.getElementById(`btnPlayMessage-${index}`).addEventListener('click', () => {
      const btnID = `btnPlayMessage-${index}`;
      convertMessageToSpeech(message, btnID);
      inboxMessages[index].read = true;
      const fromDivs = ulInboxElement.getElementsByClassName('from-div');
      const targetFromDiv = fromDivs[index];
      targetFromDiv.innerHTML = `${from}`;
    });
  }
}

The code attaches a click event listener to a Play button (btnPlayMessage-${index}) located inside a newMessageElement.

When the button is clicked, it triggers the convertMessageToSpeech() function, passing the corresponding message and a unique identifier for the button. Additionally, it marks the message as read and updates the UI to reflect this change.

Restart the application by navigating back to your terminal and entering npm start (if it is already running enter CTRL + C to end the process). Go back to the tab where the client application is open, refresh the tab, and then click on one of the messages’ displayed Play button to convert and play the message using TTS.

Using Transformers.js TTS to convert messages to audio

The initial attempt to play a message may experience a delay of up to a minute or two as Transformers.js requires time to download and cache the TTS model. You can monitor the download progress by accessing your browser’s dev tools console. Subsequent playback attempts will be significantly quicker.

Saving the generated audio using the IndexedDB API

You may have observed that each time you click the Play button to listen to a particular message, there is a delay as the worker is invoked to generate the audio. To circumvent this, you will use the IDB-Keyval library for storing and retrieving the previously generated audio, eliminating the need to call the worker every time.

In the public/js subdirectory create a file named idbHelper.js, and add the following code to it:

import { get, set } from 'https://cdn.jsdelivr.net/npm/idb-keyval@6/+esm';

The line above imports the get and set functions from the IDB-Keyval library, enabling key-value storage with IndexedDB. The get function retrieves the value for a key, and set stores a key-value pair in IndexedDB.

Add the following code to below the get and set functions:

export async function getAudio(key) {
  return get(key)
    .then((val) => {
      console.log(val);
      return val;
    });
}

export async function saveAudio(key, val) {
  return set(key, val)
    .then(() => {
      console.log('data saved');
      return true;
    })
    .catch((err) => {
      console.error('It failed!', err);
      return false;
    });
}

In the code above, two asynchronous functions, getAudio() and saveAudio() were defined. These functions manage IndexedDB interactions for audio data retrieval and storage.

The getAudio() function, taking a key parameter, uses the IDB-Keyval get function to retrieve the IndexedDB value associated with the specified key. The value is then logged, and the function returns it.

The saveAudio() function, with key and val parameters, uses the IDB-Keyval set function to store audio data with the provided key. Upon success, a success log appears, and true is returned. On storage error, an error log appears, and false is returned.

Go to the tts.js located in the public/js subdirectory and the following code below the line where you imported the wavefile library:

import { getAudio, saveAudio } from "./idbHelper.js";

In the line above, the code imports the getAudio and saveAudio functions located inside the idbHelper.js

Go to the worker onmessage event handler and add lines 7-14 before calling the playAudio() function:

worker.onmessage = async function (event) {
  …
  if (message.task === 'convertMessageToSpeech') {
    if (message.status === 'complete') {
      let result = message.data

      let data = {
        audio: result.audio,
        samplingRate: result.sampling_rate
      }
      const savedAudio = await saveAudio(result.sid, data)
      if (!savedAudio) {
        alert('failed to save audio')
      }

      await playAudio(result.audio, result.sampling_rate)
    } else {
      …
    }
  }
};

The code added attempts to save this audio data in the IndexedDB using the saveAudio() function, passing the message sid as the key and an object containing the audio data as the value. If the code fails to save the audio data an alert notifying the user will be displayed.

Go to the convertMessageToSpeech() function and replace the code that appears after the sid constant with the following:

export async function convertMessageToSpeech(message, btnID) {
  …
  const { body, sid } = message

  let result = await getAudio(sid)
  if (result !== undefined) {
    await playAudio(result.audio, result.samplingRate)
  } else {
    let data = {
      action: 'convertMessageToSpeech',
      args: {
        sid,
        body
      },
    }
    worker.postMessage(data);
  }
}

The code added calls the getAudio() function with the message sid as an argument to check if there is already audio data associated with the provided sid.

If audio data is found, the function proceeds to play the audio using the playAudio() function, passing the audio data and sampling rate from the retrieved result.

If no audio data is found, the function constructs a data object containing an action named 'convertMessageToSpeech' and associated arguments (sid and body). This object is then sent to a web worker using worker.postMessage() to initiate the process of converting the message to speech.

Go back to the tab where the client application is open, refresh the tab, and then click on one of the messages’ displayed Play button to convert and play the message using TTS. Close the modal, then click the same message’s Play button again and observe how the audio loads faster because it is now retrieving the stored audio from IndexedDB instead of converting it again.

Using IndexedDB for audio storage and retrieval

Handling incoming messages

In this section, first, you'll set up a server endpoint tasked with creating an access token. This token will authorize the client and server application to access the Twilio Sync API. Next, within the server, you'll instantiate a Twilio Sync client and set up an incoming message webhook. When this webhook is triggered, the code will use the Sync client instance to broadcast the incoming message to the client application, where it will be shown.

Go to your project root directory, open the server.js file, and add the following code to the bottom of the import section:

const { SyncClient } = require('twilio-sync');
const { AccessToken } = twilio.jwt;
const { SyncGrant } = AccessToken;
const { v4: uuidv4 } = require('uuid');
let twilioSyncClient;

This code prepares the server to work with Twilio Sync by importing necessary modules and initializing variables related to Twilio Sync.

First, The code imports the SyncClient class from the twilio-sync package.

Next, the code imports the AccessToken class and SyncGrant class from the twilio.jwt module. These classes will be used to generate access tokens for Twilio Sync.

The uuidv4() function is imported from the uuid package. This function will be used to a unique identity for every access token.

Lastly, a variable named twilioSyncClient is declared. This variable will be used to store an instance of the Twilio Sync client once it's created.

Add the following code below the getMessages() function:

function getAccessToken() {
  const token = new AccessToken(
    process.env.TWILIO_ACCOUNT_SID,
    process.env.TWILIO_API_KEY_SID,
    process.env.TWILIO_API_KEY_SECRET,
    { identity: uuidv4() },
  );

  const syncGrant = new SyncGrant({
    serviceSid: process.env.TWILIO_SYNC_SERVICE_SID,
  });

  token.addGrant(syncGrant);
  return token.toJwt();
}

The code above defines a function named getAccessToken() for generating an access token for Twilio Sync.

Within it, first, an AccessToken instance is created. The constructor for creating the AccessToken instance takes as parameters the Twilio account SID, API key SID, secret, and a unique identity generated by calling uuidv4() function.

Next, a SyncGrant is formed using the SyncGrant class with the Sync service SID. This grant is added to the AccessToken via addGrant().

Finally, the function returns the JWT-formatted access token.

Add the following code below the getAccessToken() function:

function initializeTwilioSyncClient() {
  const token = getAccessToken();
  twilioSyncClient = new SyncClient(token);
}

initializeTwilioSyncClient();

This code block defines a function named initializeTwilioSyncClient() which is responsible for setting up and initializing a Twilio Sync client within the server application.

The code initializes a Twilio Sync client by obtaining an access token, creating a SyncClient instance, and assigning the instance to the global variable twilioSyncClient.

Next, the initializeTwilioSyncClient() function is invoked, ensuring that the Twilio Sync client is set up and ready for use when the server starts.

Add the following code below the line where the initializeTwilioSyncClient() is invoked:

function updateDocument(data, action) {
  twilioSyncClient.document('inbox').then((doc) => {
    doc.update({ content: data, action });
  });
}

The code defines a function named updateDocument(), responsible for updating a document within the Twilio Sync service. This function is designed to be called when your Twilio phone number receives a new message that needs to be shared among connected Twilio Sync clients.

The function calls the document() method on the Twilio Sync client, specifying the document named 'inbox'. It then uses the returned document object to invoke the update() method, passing an object containing the new content (data) and an action related to the update.

Add the following code below the /getMessages route:

app.get('/getToken', async (req, res) => {
  const token = getAccessToken();
  res.send({
    token,
  });
});

This code establishes a route at '/getToken’ to handle incoming HTTP GET requests. This route is designed to handle requests made to obtain a Twilio access token for establishing secure communication with the Twilio Sync service.

When a user accesses this endpoint, the getAccessTokenfunction() is invoked to generate a new Twilio access token.

Next, the obtained access token is encapsulated in a JSON object with the key 'token' and then it is sent back to the client.

Add the following code below the /getToken route:

app.post('/incomingMessage', async (req, res) => {
  let newMessage;
  const incomingMessage = req.body;
  const messages = await getMessages();

  for (let i = 0; i < messages.length; i++) {
    if (messages[i].sid === incomingMessage.SmsMessageSid) {
      newMessage = messages[i];
      console.log('found match', newMessage);
      break;
    }
  }

  newMessage.read = false;
  updateDocument(newMessage, 'newMessage');
  res.send({
    success: true,
  });
});

This code establishes a route at '/incomingMessage' to handle incoming HTTP POST requests. The route processes incoming SMS messages and shares the processed message with other Twilio Sync clients.

First, a variable newMessage is declared, without an initial value, and the incoming request body is stored in incomingMessage. The function calls getMessages(), retrieving the current messages from Twilio, and storing the list in messages.

Next, a loop iterates over messages, seeking a match based on the incoming message SID property. Once found, newMessage gets assigned the value, addressing the difference in object properties between the incoming request body and what the client application expects.

Subsequently, the read property of the found message is set to false, indicating unread status. updateDocument() is called with the message and a string 'newMessage' parameters, updating the Twilio Sync ’inbox’ document to broadcast the processed incoming message.

Finally, a response is sent with a JSON object { success: true }, signaling successful processing.

Navigate, to the public/js subdirectory, open the index.js file, and add the following code below the line where you declared the inboxMessages variable:

let twilioSyncClient;

Here you declared a variable named twilioSyncClient, which will be used to store the Twilio Sync client instance.

Add the following code below the populateMessages() function located around line 31:

function removeAllChildNodes(parent) {
  while (parent.firstChild) {
    parent.removeChild(parent.firstChild);
  }
}

function repopulateMessages() {
  removeAllChildNodes(ulInboxElement);
  populateMessages();
}

Here, the code declares two functions named removeAllChildNodes() and repopulateMessages().

The removeAllChildNodes() function takes a single parameter, parent, representing a parent HTML element. It clears all child nodes from this parent using a while loop that iterates while the parent has a first child. Within each iteration, it removes the first child node. After execution, the specified parent element becomes empty.

The repopulateMessages() function refreshes the displayed message list. It calls removeAllChildNodes() passing the ulInboxElement as an argument to clear it, then calls populateMessages() to refill it with updated messages.

Add the following code below the repopulateMessages() function:

function playNotificationSound() {
  const sound = new Audio();
  sound.src = '../audio/mixkit-bell-notification-933.wav';
  sound.load();
  sound.play();
}

function showNewMessageToast(message) {
  const toastMessageSender = newMessageToast.getElementsByClassName('toast-message-sender')[0];
  const toastMessageBody = newMessageToast.getElementsByClassName('toast-message-body')[0];

  toastMessageSender.textContent = message.from;
  toastMessageBody.textContent = message.body;

  const toastBootstrap = bootstrap.Toast.getOrCreateInstance(newMessageToast);
  toastBootstrap.show();
  playNotificationSound();
}

The code above declares two functions named playNotificationSound() and showNewMessageToast().

The playNotificationSound() function plays a notification sound on a new message arrival. To achieve this, It creates an Audio element, sets its source to the notification sound file in the audio directory, and then loads and plays it.

The showNewMessageToast() function manages toast notifications for new messages. Taking a message object, it extracts the sender and body information, and using this information it updates the toast message content. Using Bootstrap, retrieves or creates a Bootstrap Toast instance, and shows the toast. Additionally, it calls playNotificationSound() for an audible cue.

Add the following code below the showNewMessageToast() function:

async function getAccessToken() {
  const response = await fetch('/getToken', {
    method: 'GET',
    headers: {
      Accept: 'application/json',
      'Content-Type': 'application/json',
    },
  });
  const { token } = await response.json();
  return token;
}

The code declares an async function named getAccessToken() which is responsible for fetching the access token needed to create a Twilio Sync client.

Utilizing the Fetch API, it sends an HTTP GET request to '/getToken'. Upon response, the function extracts the token from the response, stores it in a variable named token, and then returns the obtained token.

Add the following code below the getAccessToken() function:

async function createTwilioSyncClient() {
  const token = await getAccessToken();

  twilioSyncClient = new Twilio.Sync.Client(token);
  twilioSyncClient.document('inbox')
    .then((document) => {
      document.on('updated', (event) => {
        if (event.data.action === 'newMessage') {
          const message = event.data.content;
          inboxMessages.splice(0, 0, message);
          repopulateMessages();
          showNewMessageToast(message);
        }
      });
    })
    .catch((error) => {
      console.error('Unexpected error', error);
    });
}

This code defines a function named createTwilioSyncClient(), which is responsible for connecting to Twilio Sync for real-time synchronization of incoming messages between the client and server.

Initially, it awaits the asynchronous operation of obtaining the access token through the getAccessToken() function. Once the token is acquired, a new Twilio Sync Client instance is created using it.

The function then accesses the 'inbox' document, attaching an event listener to its 'updated' event.

This listener checks for the action being a 'newMessage'. If true, it extracts the new message content, inserts it at the beginning of inboxMessages, updates the UI with repopulateMessages(), and displays a notification toast with showNewMessageToast().

In case of errors, an error message is logged to the console.

Go to the initialize() function located at the bottom of the index.js and add the following code below the line where the `populateMessages()` function is invoked:

async function initialize() {
  inboxMessages = await getMessages();
  populateMessages();
  createTwilioSyncClient();
}

The code added invokes the createTwilioSyncClient() function to create a new Twilio Sync client.

Go back to the tab where the client application is open, copy the URL, and paste it into a text editor where you can edit it. Next, add the text /incomingMessage to the end of it so that your complete URL looks similar to the following:

https://xxxx-xxx-xxx-xxx-xxx.ngrok-free.app/incomingMessage

Switch to the tab where you accessed your Twilio Console to navigate to Active Numbers under Phone Numbers > Manage. Click on the number you are using in this tutorial. Scroll down to Messaging. Under A MESSAGE COMES IN you will set the HTTP method to POST and assign it the ngrok https URL you combined earlier. Then, click Save to save the configuration.

Go back to the tab where the client application is open, and refresh the tab. Now send a message to your Twilio number using any method you prefer, Observe how a toast notification

appears and the bell notification plays when your Twilio number receives the message.

Conclusion

In this tutorial, you've successfully built a web application that not only retrieves and showcases the latest messages from your Twilio phone number but also converts them into audio using Text-to-Speech (TTS). Throughout the tutorial, you used the Twilio Programmable Messaging REST API for seamless message retrieval, Transformers.js for TTS functionality, the wavefile library for audio processing, and the Audiomotion-analyzer for visualizing audio waveforms. Additionally, you implemented the IDB-Keyval library for efficient audio storage and retrieval and leveraged Twilio Sync to seamlessly share incoming messages between the server and the client application.

Related Resources

Twilio Docs

From APIs to SDKs to sample apps

API reference documentation, SDKs, helper libraries, quickstarts, and tutorials for your language and platform.

Resource Center

The latest ebooks, industry reports, and webinars

Learn from customer engagement experts to improve your own communication.

Ahoy

Twilio's developer community hub

Best practices, code samples, and inspiration to build communications and digital engagement experiences.

Convert Incoming Twilio Messages to Audio with Transformers.js

Convert Incoming Twilio Messages to Audio with Transformers.js TTS

Tutorial Requirements

Getting the boilerplate code

Understand the public directory

Collect and store your credentials

Twilio credentials

Retrieving and displaying messages received

Using TTS to convert messages to audio

Saving the generated audio using the IndexedDB API

Handling incoming messages

Conclusion

Related Posts

Related Resources

From APIs to SDKs to sample apps

The latest ebooks, industry reports, and webinars

Twilio's developer community hub