Introducing 50+ additional Text-to-Speech voices with Amazon Polly Integration

August 06, 2018
Written by
Kris Gutta
Twilion

Kielet-2

We are excited to announce that Twilio now supports Amazon Polly, adding more than 50 voices, 25 languages, and new APIs to give developers more control over synthesized speech output in their Programmable Voice applications. With Amazon Polly, Twilio developers now have control over the volume, pitch, rate, and pronunciation of the voices that interact with their users.

Text-to-Speech (TTS)–also known as speech synthesis, is a process where text is rendered into audio using a human-sounding voice. TTS enables developers to create smarter interactive voice applications by generating speech dynamically, rather than playing static, pre-recorded media files. The Programmable Voice verb has long supported Text-to-Speech: you provide the text, and synthesizes speech in real-time and plays the audio back to the call. For example, the following TwiML plays back Hello World in US English.

<Response>
   <Say>Hello World!</Say>
</Response>

Before today, ’s built-in Basic TTS supported three voices, each with their own supported set of languages. But TTS quality has improved dramatically over recent years, and Amazon has been at the forefront of these improvements with products like Echo, Alexa, and services like Amazon Polly. That’s why today we’re proud to offer built-in integration with Polly, bringing the best TTS capabilities to every developer building on Twilio.

To hear the difference between our Basic TTS and Amazon Polly, listen to the following TwiML recorded using 

<Response>
   <Say>The coldest winter I ever spent was a summer in San Francisco!</Say>
</Response>

Control Text-to-Speech in TwiML

You control the TTS prompts played from your Programmable Voice application through TwiML’s tag and its language and voice attributes. The language attributes tells the TTS engine which locale it should use when interpreting the text you provide. The voice attribute selects the both the TTS Provider and one of that provider’s available voices.

For example, to use Amazon Polly’s “Emma” voice in UK English, you would use the following TwiML:

<Response>
   <Say voice="Polly.Emma" language="en-GB">Thanks for calling!</Say>
</Response>

To say the same phrase with Polly’s Amy voice, just change the voice attribute:

<Response>
   <Say voice="Polly.Amy" language="en-GB">Thanks for calling!</Say>
</Response>

This level of control lets you dynamically select a voice based on some information about the caller–their preferences, location, etc. But if you don’t need that level of control, we’ve also made it easier to centrally manage the TTS behavior of your application with a new Text to Speech configuration page in the Twilio Console.

Configure Text-to-Speech behavior in Twilio Console

Choosing a TTS Provider

With just a couple clicks, you can change the default TTS Provider for your application from Twilio’s Basic to Amazon Polly.

This will reconfigure your account to use the specified Amazon Polly voice for each of the locales shown in the Console, whenever the voice and locale is left unspecified in your TwiML code.

In addition to configuring Polly as your default TTS Provider, you can also:

  • Choose the default voice for your account
  • Listen to each voice and select the one you like most per locale.

Modifying the default Voice and Locale

The Console also gives you a central place to control the voice used by your application. Let’s say you’re building an app exclusively for French-speaking users.

To change the default locale on your account, navigate to TTS console, click on the edit link for DEFAULT VOICE and select the locale of your choice.

Once the change is done, return the following TwiML and Twilio will synthesize speech in French using Celine’s voice:

<Response>
   <Say>
      Félicitations à l'équipe de France pour avoir remporté la coupe du monde 2018!
   </Say>
</Response>

Because of the defaults you configured in the Console, this TwiML is functionality equivalent to the following:

<Response>
   <Say voice=”Polly.Celine” language=”fr-FR”>
      Félicitations à l'équipe de France pour avoir remporté la coupe du monde 2018!
   </Say>
</Response>

Listening to the voices

Amazon Polly has a vast selection of voices–for example, Polly comes with eight voices for US English (en-US) alone. Not sure what voice to use? Take them for a test drive in the Console. You can listen to each of the voices, and then choose whichever you like best. For example, to modify the default for US English, click on English (US) (en-US) in the Locale Mapping table after setting your default provider to Amazon Polly.

 

Using voices with Twilio Studio

Voices also works out of the box with Studio, Twilio’s visual development environment for building customer communication applications. Anyone can use its drag-and-drop interface to design and customize messaging experiences.

Use SSML to make Text-to-Speech more lifelike

The Speech Synthesis Markup Language (SSML) standard outlines a standard API to control the volume, pitch, rate, and pronunciation of synthesized speech. When you use Amazon Polly for your TTS, you can use SSML within your element to control the TTS experience. For example, can use inside to increase the speed of synthesized speech,

<Response>
 <Say>
   <prosody rate="110%">
     Speech Synthesis Markup Language (SSML) is a W3C specification
     that allows developers to use XML-based markup language for 
     assisting the generation of synthesized speech .
   </prosody>
 </Say>
</Response>

You can also use SSML to provide pronunciation of a word. For example the following TwiML uses tag to pronounce San Francisco correctly,

<Response>
   <Say>
      <prosody rate="110%">
        The coldest winter I ever spent was a summer in 
        <phoneme alphabet=”ipa” ph="sæn frənˈsɪskoʊ">San Francisco!</phoneme>
      </prosody>
   </Say>
</Response>

Learn more about the SSML supported by Amazon Polly.

Pricing

Amazon Polly for starts at $0.0008 for every 100 spoken characters. Volume discounts are available.

What’s Next?

Twilio will continue to add more voices and SSML features so that you have the most advanced TTS technology at your disposal when building on Twilio.

We believe Amazon Polly with its deep learning service offers more control allowing you to provide better customer experiences over the phone and as a result, in the coming weeks, we will be making Amazon Polly the default provider for all new Twilio Accounts with an option to revert to Basic Provider using Twilio console.

In addition, we will be adding support for Vocal Tract SSML Feature that comes with Amazon Polly voices which allows you to control timbre, pitch, etc … of the synthesized speech. In the meantime, you can learn more about the feature here.  

If you have feedback or want to share your experience using Amazon Polly, don’t hesitate to reach out to us at support@twilio.com or by leaving comments.

We can’t wait to see what you build!