Real-Time Transcriptions, including the <Transcriptions>
TwiML noun and API, use artificial intelligence or machine learning technologies. By enabling or using any of the features or functionalities within Programmable Voice that are identified as using artificial intelligence or machine learning technology, you acknowledge and agree that your use of these features or functionalities is subject to the terms of the Predictive and Generative AI/ML Features Addendum.
Real-Time Transcriptions is not PCI compliant or a HIPAA Eligible Service and should not be used in Voice Intelligence workflows that are subject to HIPAA or PCI.
Real-Time Transcription is currently available as a Public Beta product and information contained in this document is subject to change. This means that some of the features are not yet implemented and others may be changed before the product is declared as Generally Available. Public Beta products are not covered by a Twilio Service Level Agreement.
The <Transcription>
TwiML noun allows you to transcribe live calls in near real-time. It is used in conjunction with <Start>
. When Twilio executes the <Start><Transcription>
instruction during a call, it forks the raw audio stream to a speech-to-text transcription engine that can provide streaming responses almost instantly.
This page covers <Transcription>
's supported attributes and provides sample code.
The <Transcription>
TwiML noun is associated with Twilio's Real-Time Transcriptions product. It is not to be confused with Recording Transcriptions.
Consumers of <Transcription>
should leverage the statusCallbackUrl
webhook for live processing of conversation utterances in your application.
Real-Time Transcription persistence and post-call language intelligence support comes from integration with Voice Intelligence. To store your transcripts with Twilio or run Language Operators after the call, add the intelligenceService
attribute when starting a Real-Time Transcription session.
Below is a basic example of <Start><Transcription>
:
1<Start>2<Transcription statusCallbackUrl="https://example.com/your-callback-url"/>3</Start>
The table below lists <Transcription>
's supported attributes, which modify the <Transcription>
behavior. All attributes are optional.
Attribute Name | Allowed Values | Default Value |
---|---|---|
name | Unique name for the Real-Time Transcription | none |
statusCallbackUrl | Valid relative or absolute URL | none |
languageCode | A standard code that identifies human languages. | en-US |
track | inbound_track , outbound_track , both_tracks | both_tracks |
inboundTrackLabel | An alphanumeric label to associate to the inbound track being transcribed | none |
outboundTrackLabel | An alphanumeric label to associate to the outbound track being transcribed | none |
transcriptionEngine | Name of speech-to-text transcription provider. Valid values are: google | google |
speechModel | (Google only) Any speechModel value | telephony |
profanityFilter | (Google only) true or false | true |
partialResults | (Google only) true or false | false |
hints | (Google only) Comma-separated list of expected phrases or keywords for recognition | None |
enableAutomaticPunctuation | (Google only) true or false | true |
intelligenceService | The Voice Intelligence Service SID for persisting transcripts and running Language Operators. | none |
The user-specified name of this Real-Time Transcription. This name can be used to stop the Real-Time Transcription.
1const VoiceResponse = require('twilio').twiml.VoiceResponse;23const response = new VoiceResponse();4const start = response.start();5start.transcription({statusCallbackUrl: 'https://example.com/your-callback-url', name: 'Contact center transcription'});67console.log(response.toString());
1<?xml version="1.0" encoding="UTF-8"?>2<Response>3<Start>4<Transcription statusCallbackUrl="https://example.com/your-callback-url" name="Contact center transcription" />5</Start>6</Response>
The statusCallbackUrl
attribute is the relative or absolute URL of an endpoint. Twilio sends Real-Time Transcription status updates and the call's transcript data to this URL.
Twilio sends a POST
request to this URL whenever one of the following occurs:
transcription-started
event.transcription-content
event.transcription-stopped
event. This event occurs when a Real-Time Transcription session is stopped via API or TwiML, or when the call ends.transcription-error
event.1const VoiceResponse = require('twilio').twiml.VoiceResponse;23const response = new VoiceResponse();4const start = response.start();5start.transcription({statusCallbackUrl: 'https://example.com/your-callback-url'});67console.log(response.toString());
1<?xml version="1.0" encoding="UTF-8"?>2<Response>3<Start>4<Transcription statusCallbackUrl="https://example.com/your-callback-url"/>5</Start>6</Response>
When a Real-Time Transcription is started and a session is created, Twilio sends an HTTP POST
request to your statusCallbackUrl
for the transcription-started
event. This event provides initial details about the transcription session.
These HTTP requests contain the properties listed below.
Property | Description | Example |
---|---|---|
AccountSid | Twilio Account SID | AC11b76cdc7d217e72a72be6422d46a7ca |
CallSid | Twilio Call SID | CA57af2620f427810cb4e430371e8d6e0f |
TranscriptionSid | Unique identifier for this Real-Time Transcription session | GT20dfa03c8cf8aa8d0c4aeccde5558b66 |
Timestamp | Time of the event in UTC ISO 8601 timestamp | 2023-10-19T22:33:22.611Z |
SequenceId | Integer sequence number of the event | 1 |
TranscriptionEvent | The event type | transcription-started |
ProviderConfiguration | JSON string of the transcription provider | {\"profanityFilter\":\"true\",\"speechModel\":\"telephony\",\"enableAutomaticPunctuation\":\"true\",\"hints\":\"Alice Johnson, Bob Martin, ACME Corp, XYZ Enterprises, product demo, sales inquiry, customer feedback\"} |
TranscriptionEngine | The name of the transcription engine | google |
Name | Friendly name of the Real-Time Transcription session | session1 |
Track | The track being transcribed: inbound_track , outbound_track , or both_tracks | inbound_track |
InboundTrackLabel | Label associated with the inbound track | customer |
OutboundTrackLabel | Label associated with the outbound track | agent |
PartialResults | Whether partial results are enabled (true or false ) | true |
LanguageCode | The language code for the transcription | en-US |
Example of a transcription-started
event payload:
1{2"TranscriptionSid": "GT8fbf72a043b98407a3ce68331cd0030a",3"Timestamp": "2024-06-25T18:45:12.135751Z",4"AccountSid": "ACXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",5"ProviderConfiguration": "{\"profanityFilter\":\"true\",\"speechModel\":\"telephony\",\"enableAutomaticPunctuation\":\"true\",\"hints\":\"Alice Johnson, Bob Martin, ACME Corp, XYZ Enterprises, product demo, sales inquiry, customer feedback\"}",6"Name": "Chris Transcription",7"OutboundTrackLabel": "agent",8"LanguageCode": "en-US",9"PartialResults": "false",10"InboundTrackLabel": "customer",11"TranscriptionEvent": "transcription-started",12"CallSid": "CAXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",13"TranscriptionEngine": "google",14"Track": "both_tracks",15"SequenceId": "1"16}
When an individual utterance (partial or final) of audio is transcribed, Twilio sends an HTTP POST
request to your statusCallbackUrl
for the transcription-content
event. This event provides TranscriptionData
results for the transcribed audio.
Stability and Confidence depend on partialResults
. For example, if partialResults
is true
, then the stability
property will be included in the event payload, and confidence
will not. However, if partialResults
is false
, the opposite will be true. Always refer to Google's specific documentation (examples) for more details on each of these properties.
These HTTP requests contain the properties listed below.
Property | Description | Example |
---|---|---|
AccountSid | Twilio Account SID | AC11b76cdc7d217e72a72be6422d46a7ca |
CallSid | Twilio Call SID | CA57af2620f427810cb4e430371e8d6e0f |
TranscriptionSid | Unique identifier for this Real-Time Transcription session | GT20dfa03c8cf8aa8d0c4aeccde5558b66 |
Timestamp | Time of the event in UTC ISO 8601 timestamp | 2023-10-19T22:33:22.611Z |
SequenceId | Integer sequence number of the event | 2 |
TranscriptionEvent | The event type | transcription-content |
LanguageCode | A BCP-47 standard language code (e.g. "en-US") | en-US |
Track | The track being transcribed: inbound_track or outbound_track | inbound_track |
TranscriptionData | JSON string containing transcription content. Note that TranscriptionData.Confidence is a decimal number. | {\"Transcript\":\"to be or not to be\",\"Confidence\":0.96823084} |
Stability | String representing estimate of the likelihood Google will not change the guess it made about this partial result transcript. This property is only provided when partialResults is true . | Range between 0.0 (unstable) and 1.0 (stable). Example: 0.8 |
Final | Boolean value indicating whether this event contains the final utterance (or partial utterance) | false |
Example of a transcription-content
event payload when partialResults
is equal to false
:
1{2"LanguageCode": "en-US",3"TranscriptionSid": "GT8fbf72a043b98407a3ce68331cd0030a",4"TranscriptionEvent": "transcription-content",5"CallSid": "CAXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",6"TranscriptionData": "{\"transcript\":\"Hello, this is Sam from Horizon Financial Services. Just letting you know this call may be recorded for quality purposes. How can I assist you today?\",\"confidence\":0.9956335}",7"Timestamp": "2024-06-25T18:45:21.454203Z",8"Final": "true",9"AccountSid": "ACXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",10"Track": "outbound_track",11"SequenceId": "2"12}
Example of a transcription-content
event payload when partialResults
is equal to true
:
1{2"LanguageCode": "en-US",3"TranscriptionSid": "GT6ebb54a123f0c86b70605a4925836f69",4"Stability": "0.9",5"TranscriptionEvent": "transcription-content",6"CallSid": "CAXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",7"TranscriptionData": "{\"transcript\":\"Hello, this is Sam from Horizon Financial Services. Just letting you know this call may be recorded for\"}",8"Timestamp": "2024-06-25T16:30:21.600697Z",9"Final": "false",10"AccountSid": "ACXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",11"Track": "outbound_track",12"SequenceId": "70"13}
When a Real-Time Transcription session is stopped or ends, Twilio sends an HTTP POST
request to your statusCallbackUrl
for the transcription-stopped
event. This event provides final details about the transcription session.
These HTTP requests contain the properties listed below.
Property | Description | Example |
---|---|---|
AccountSid | Twilio Account SID | AC11b76cdc7d217e72a72be6422d46a7ca |
CallSid | Twilio Call SID | CA57af2620f427810cb4e430371e8d6e0f |
TranscriptionSid | Unique identifier for this Real-Time Transcription session | GT20dfa03c8cf8aa8d0c4aeccde5558b66 |
Timestamp | Time of the event, in UTC ISO 8601 format | 2023-10-19T22:33:22.611Z |
SequenceId | Integer sequence number of the event | 3 |
TranscriptionEvent | The event type | transcription-stopped |
An example of the transcription-stopped
event payload:
1{2"TranscriptionSid": "GT8fbf72a043b98407a3ce68331cd0030a",3"TranscriptionEvent": "transcription-stopped",4"CallSid": "CAXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",5"Timestamp": "2024-06-25T18:45:23.839266Z",6"AccountSid": "ACXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",7"SequenceId": "3"8}
When an error occurs during a Real-Time Transcription session, Twilio sends an HTTP POST
request to your statusCallbackUrl
for the transcription-error
event.
Documentation on Real-Time Transcription errors can be found on the Error and Warning Dictionary and range from 32650-32655. Errors are also viewable in the Twilio Console.
These HTTP requests contain the properties listed below.
Property | Description | Example |
---|---|---|
AccountSid | Twilio Account SID | ACXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX |
CallSid | Twilio Call SID | CAXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX |
TranscriptionSid | Unique identifier for this Real-Time Transcription session | GT20dfa03c8cf8aa8d0c4aeccde5558b66 |
Timestamp | Time of the event in UTC ISO 8601 timestamp | 2023-10-19T22:33:22.611Z |
SequenceId | Integer sequence number of the event | 3 |
TranscriptionEvent | The event type | transcription-error |
TranscriptionErrorCode | Error code | 32655 |
TranscriptionError | Error description | Provider Unavailable |
Example of a transcription-error
event payload:
1{2"TranscriptionSid": "GT20dfa03c8cf8aa8d0c4aeccde5558b66",3"Timestamp": "2023-10-19T22:33:22.611Z",4"AccountSid": "ACXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",5"SequenceId": "3",6"TranscriptionEvent": "transcription-error",7"CallSid": "CAXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",8"TranscriptionErrorCode": "32655",9"TranscriptionError": "Provider Unavailable"10}
The languageCode
attribute specifies the language in which the transcription should be performed. It accepts a BCP-47 standard language code, such as en-US
for American English. This attribute is useful for ensuring that the transcription engine correctly understands and processes the spoken language.
The following TwiML example demonstrates how to specify the languageCode
attribute for a transcription for Mexican Spanish. This ensures that the transcription is performed in the specified language, which is particularly useful for calls in languages other than English.
1const VoiceResponse = require('twilio').twiml.VoiceResponse;23const response = new VoiceResponse();4const start = response.start();5start.transcription({statusCallbackUrl: 'https://example.com/your-callback-url', languageCode: 'es-MX'});67console.log(response.toString());
1<?xml version="1.0" encoding="UTF-8"?>2<Response>3<Start>4<Transcription statusCallbackUrl="https://example.com/your-callback-url" languageCode="es-MX" />5</Start>6</Response>
The track
attribute specifies which audio track should be transcribed. It can take one of the following values: inbound_track
, outbound_track
, or both_tracks
. This attribute is useful for determining whether to transcribe the audio coming from the caller, the callee, or both.
The following TwiML example demonstrates how to specify the track
attribute for a transcription.
1const VoiceResponse = require('twilio').twiml.VoiceResponse;23const response = new VoiceResponse();4const start = response.start();5start.transcription({statusCallbackUrl: 'https://example.com/your-callback-url', track: 'inbound_track'});67console.log(response.toString());
1<?xml version="1.0" encoding="UTF-8"?>2<Response>3<Start>4<Transcription statusCallbackUrl="https://example.com/your-callback-url" track="inbound_track" />5</Start>6</Response>
The inboundTrackLabel
attribute allows you to associate an alphanumeric label with the inbound track being transcribed. This can be useful for identifying and differentiating the inbound audio stream in the transcription results. Using labels helps to clearly identify who is speaking, especially in multi-party conversations or call center scenarios.
Refer to the Track labels section below to understand the importance of using labels.
1const VoiceResponse = require('twilio').twiml.VoiceResponse;23const response = new VoiceResponse();4const start = response.start();5start.transcription({statusCallbackUrl: 'https://example.com/your-callback-url', inboundTrackLabel: 'agent', outboundTrackLabel: 'customer'});67console.log(response.toString());
1<?xml version="1.0" encoding="UTF-8"?>2<Response>3<Start>4<Transcription statusCallbackUrl="https://example.com/your-callback-url" inboundTrackLabel="agent" outboundTrackLabel="customer" />5</Start>6</Response>
In an inbound call scenario, the call is initiated by the customer and received by the agent or business person. Here, the inbound audio track (agent's speech) is labeled for clarity in the transcription results.
1<Response>2<Start>3<Transcription track="inbound_track" inboundTrackLabel="agent" />4</Start>5</Response>
In this example, the inbound audio track is labeled as "agent". This is useful for scenarios like customer support calls, where distinguishing the agent's responses from the customer's speech is crucial for understanding the interaction.
In an outbound call scenario, the call is initiated by the agent or business person and received by the customer. Here, the inbound audio track (customer's speech) is labeled for clarity in the transcription results.
1<Response>2<Start>3<Transcription track="inbound_track" inboundTrackLabel="customer" />4</Start>5</Response>
In this example, the inbound audio track is labeled as "customer". This is useful for scenarios like sales calls, where distinguishing the customer's speech in the transcription can help in analyzing customer feedback and engagement.
The outboundTrackLabel
attribute allows you to associate an alphanumeric label with the outbound track being transcribed. This can be useful for identifying and differentiating the outbound audio stream in the transcription results. Using labels helps to clearly identify who is speaking, especially in multi-party conversations or call center scenarios.
Refer to the Track labels section below to understand the importance of using labels.
1const VoiceResponse = require('twilio').twiml.VoiceResponse;23const response = new VoiceResponse();4const start = response.start();5start.transcription({statusCallbackUrl: 'https://example.com/your-callback-url', inboundTrackLabel: 'agent', outboundTrackLabel: 'customer'});67console.log(response.toString());
1<?xml version="1.0" encoding="UTF-8"?>2<Response>3<Start>4<Transcription statusCallbackUrl="https://example.com/your-callback-url" inboundTrackLabel="agent" outboundTrackLabel="customer" />5</Start>6</Response>
In an inbound call scenario, the call is initiated by the customer and received by the agent or business person. Here, the outbound audio track (customer's speech) is labeled for clarity in the transcription results.
1<Response>2<Start>3<Transcription track="outbound_track" outboundTrackLabel="customer" />4</Start>5</Response>
In this example, the outbound audio track is labeled as "customer". This is useful for scenarios like customer support calls, where distinguishing the customer's speech from the agent's responses is crucial for understanding the interaction.
In an outbound call scenario, the call is initiated by the agent or business person and received by the customer. Here, the outbound audio track (agent's speech) is labeled for clarity in the transcription results.
1<Response>2<Start>3<Transcription track="outbound_track" outboundTrackLabel="agent" />4</Start>5</Response>
In this example, the outbound audio track is labeled as "agent". This is useful for scenarios like sales calls, where distinguishing the agent's speech in the transcription can help in analyzing the effectiveness of the sales pitch.
The transcriptionEngine
attribute allows you to specify the name of the speech-to-text transcription provider to be used. This can be useful for leveraging specific features or optimizations provided by different transcription engines.
1const VoiceResponse = require('twilio').twiml.VoiceResponse;23const response = new VoiceResponse();4const start = response.start();5start.transcription({statusCallbackUrl: 'https://example.com/your-callback-url', transcriptionEngine: 'google'});67console.log(response.toString());
1<?xml version="1.0" encoding="UTF-8"?>2<Response>3<Start>4<Transcription statusCallbackUrl="https://example.com/your-callback-url" transcriptionEngine="google" />5</Start>6</Response>
The speechModel
attribute allows you to specify which speech model to use for the transcription.
Maps to Transcription Model in Google terminology. Different speech models can optimize for different use cases, such as phone calls, video, or enhanced models for higher accuracy.
Refer to the Google documentation to understand each speech model's specific capabilities and configurations.
The telephony
speech model is optimized for phone call audio and can provide better accuracy for this type of audio.
The long
speech model is optimized for long-form audio, such as lectures or extended conversations, and can provide better accuracy for lengthy audio.
1const VoiceResponse = require('twilio').twiml.VoiceResponse;23const response = new VoiceResponse();4const start = response.start();5start.transcription({statusCallbackUrl: 'https://example.com/your-callback-url', speechModel: 'telephony', transcriptionEngine: 'google'});67console.log(response.toString());
1<?xml version="1.0" encoding="UTF-8"?>2<Response>3<Start>4<Transcription statusCallbackUrl="https://example.com/your-callback-url" speechModel="telephony" transcriptionEngine="google" />5</Start>6</Response>
Maps directly to the profanityFilter in Google's RecognitionFeatures object. The profanityFilter
attribute allows you to enable or disable the filtering of profane words in the transcription. When enabled, the transcription engine will attempt to mask or omit any detected profanities in the transcription results.
By default, the Transcription Engine enables the profanityFilter
for all calls. The "medical_conversation" speechModel doesn't support profanityFilter
. When using the "medical_conversation" speechModel, set the profanityFilter
attribute to false
.
The example below demonstrates how to enable the profanity filter for the transcription. This is useful for ensuring that any profane language is masked or omitted in the transcription output.
1const VoiceResponse = require('twilio').twiml.VoiceResponse;23const response = new VoiceResponse();4const start = response.start();5start.transcription({statusCallbackUrl: 'https://example.com/your-callback-url', profanityFilter: false, transcriptionEngine: 'google'});67console.log(response.toString());
1<?xml version="1.0" encoding="UTF-8"?>2<Response>3<Start>4<Transcription statusCallbackUrl="https://example.com/your-callback-url" profanityFilter="false" transcriptionEngine="google" />5</Start>6</Response>
Maps to StreamingRecognitionResult specifically when ("is_final"=false
) in Google Terminology. The partialResults
attribute allows you to enable or disable the delivery of interim transcription results. When enabled, the transcription engine will send partial (interim) results as the transcription progresses, providing more immediate feedback before the final result is available.
The example below demonstrates how to enable partial results for the transcription.
1const VoiceResponse = require('twilio').twiml.VoiceResponse;23const response = new VoiceResponse();4const start = response.start();5start.transcription({statusCallbackUrl: 'https://example.com/your-callback-url', partialResults: true, transcriptionEngine: 'google'});67console.log(response.toString());
1<?xml version="1.0" encoding="UTF-8"?>2<Response>3<Start>4<Transcription statusCallbackUrl="https://example.com/your-callback-url" partialResults="true" transcriptionEngine="google" />5</Start>6</Response>
The hints
attribute contains a list of words or phrases that the transcription provider can expect to encounter during a Real-Time Transcription. Using the hints
attribute can improve the transcription provider's recognition of words or phrases you expect from your callers.
You may provide up to 500 words or phrases in this list, separating each entry with a comma. Your hints may be up to 100 characters each, and you should separate each word in a phrase with a space, e.g.:
1const VoiceResponse = require('twilio').twiml.VoiceResponse;23const response = new VoiceResponse();4const start = response.start();5start.transcription({statusCallbackUrl: 'https://example.com/your-callback-url', hints: 'Alice Johnson, Bob Martin, ACME Corp, XYZ Enterprises, product demo, sales inquiry, customer feedback'});67console.log(response.toString());
1<?xml version="1.0" encoding="UTF-8"?>2<Response>3<Start>4<Transcription statusCallbackUrl="https://example.com/your-callback-url" hints="Alice Johnson, Bob Martin, ACME Corp, XYZ Enterprises, product demo, sales inquiry, customer feedback" />5</Start>6</Response>
The hints
attribute also supports Google's class token list to improve recognition. You can pass a class token directly in the hints
attribute, as shown in the example below.
1const VoiceResponse = require('twilio').twiml.VoiceResponse;23const response = new VoiceResponse();4const start = response.start();5start.transcription({statusCallbackUrl: 'https://example.com/your-callback-url', hints: '$OOV_CLASS_ALPHANUMERIC_SEQUENCE'});67console.log(response.toString());
1<?xml version="1.0" encoding="UTF-8"?>2<Response>3<Start>4<Transcription statusCallbackUrl="https://example.com/your-callback-url" hints="$OOV_CLASS_ALPHANUMERIC_SEQUENCE" />5</Start>6</Response>
Maps to Automatic Punctuation in Google Terminology. The enableAutomaticPunctuation
attribute allows you to enable or disable automatic punctuation in the transcription. When enabled, the transcription engine will automatically insert punctuation marks such as periods, commas, and question marks, improving the readability of the transcribed text.
1const VoiceResponse = require('twilio').twiml.VoiceResponse;23const response = new VoiceResponse();4const start = response.start();5start.transcription({statusCallbackUrl: 'https://example.com/your-callback-url', enableAutomaticPunctuation: true, transcriptionEngine: 'google'});67console.log(response.toString());
1<?xml version="1.0" encoding="UTF-8"?>2<Response>3<Start>4<Transcription statusCallbackUrl="https://example.com/your-callback-url" enableAutomaticPunctuation="true" transcriptionEngine="google" />5</Start>6</Response>
The intelligenceService
attribute allows you to opt-in to sending your Real-Time Transcript to Twilio Voice Intelligence for integrated post-processing. By enabling storage and analysis of calls transcribed in real-time, this feature helps you extract actionable insights from transcripts. This runs in parallel to statusCallbackUrl
which streams utterance-level data and other session lifecycle events to your app during the call.
When enabled, this feature performs the following functions:
To use this feature, you need to meet the following conditions.
intelligenceService
parameter to the Voice Intelligence Service SID.Important Notes:
languageCode
of the Real-Time Transcription session must match the configured language of the Voice Intelligence Service.intelligenceService
parameter without passing a statusCallbackUrl
parameter.1const VoiceResponse = require('twilio').twiml.VoiceResponse;23const response = new VoiceResponse();4const start = response.start();5start.transcription({statusCallbackUrl: 'https://example.com/your-callback-url', intelligenceService: 'GAaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'});67console.log(response.toString());
1<?xml version="1.0" encoding="UTF-8"?>2<Response>3<Start>4<Transcription statusCallbackUrl="https://example.com/your-callback-url" intelligenceService="GAaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" />5</Start>6</Response>
Twilio's transcription service supports a variety of languages and models. The examples provided below are specific to Google Speech-to-Text. Depending on the language, certain attributes like speechModel
, profanityFilter
, and enableAutomaticPunctuation
may have different levels of support. For the most up-to-date and comprehensive information, refer to the Google Speech-to-Text Supported Languages documentation.
These examples are accurate as of June 2024 and are subject to changes. Customers should always refer back to the Google Speech-to-Text Supported Languages page for the most current information.
This example demonstrates how to configure transcription for Chinese (Simplified, China) using the Chirp Model with support for automatic punctuation.
1<Response>2<Start>3<Transcription4transcriptionEngine="google"5languageCode="cmn-Hans-CN"6speechModel="chirp"7enableAutomaticPunctuation="true" />8</Start>9</Response>
In this example, the profanityFilter
attribute, hints
attribute, and other advanced features are not supported for this configuration.
This example demonstrates how to configure transcription for Spanish (Spain) using the telephony model with full support for all attributes.
1<Response>2<Start>3<Transcription4transcriptionEngine="google"5languageCode="es-ES"6speechModel="telephony"7profanityFilter="true"8enableAutomaticPunctuation="true" />9</Start>10</Response>
In this example, the telephony model supports automatic punctuation and profanity filter, but not model adaptation (e.g., hints
).
This example demonstrates how to configure transcription for Hindi (India) using the short model with support for specific attributes.
1<Response>2<Start>3<Transcription4transcriptionEngine="google"5languageCode="hi-IN"6speechModel="short"7enableAutomaticPunctuation="true"8profanityFilter="true"9hints="संपर्क, सेवा, समर्थन, ग्राहक"10modelAdaptation="true" />11</Start>12</Response>
In this example, the short model supports automatic punctuation, profanity filter, model adaptation, and hints.
This example demonstrates how to configure transcription for French (Canada) using the long model with support for specific attributes.
1<Response>2<Start>3<Transcription4transcriptionEngine="google"5languageCode="fr-CA"6speechModel="long"7hints="service à la clientèle, rendez-vous, commande" />8</Start>9</Response>
In this example, the long model supports model adaptation through hints, but does not support automatic punctuation, profanity filter, or spoken punctuation.
If specifying inboundTrackLabel
or outboundTrackLabel
, the call direction mapping table below can be used as a guide.
Track | Call Direction | Call Resource Mapping | TrackLabel |
---|---|---|---|
Inbound-track | Outbound | TO # | Label for "who is being called" in an outbound call from Twilio (e.g., inboundTrackLabel ="customer"). |
Outbound-track | Outbound | FROM # | Label for "who is calling" in an outbound call from Twilio (e.g., outboundTrackLabel ="agent"). |
Inbound-track | Inbound | FROM # | Label for "who is being called" in an inbound call to Twilio (e.g., inboundTrackLabel ="agent"). |
Outbound-track | Inbound | TO # | Label for "who is calling" in an inbound call to Twilio (e.g., outboundTrackLabel ="customer"). |
Note: A call that has an "outbound" direction is a call that is outbound from Twilio, i.e., from Twilio to a customer.
If you provided a name
attribute when starting a Real-Time Transcription session, you can stop a Real-Time Transcription using TwiML or via API.
Given a Real-Time Transcription that was started with the following TwiML instructions:
1<Response>2<Start>3<Transcription name="Contact center transcription" />4</Start>5</Response>
You can stop the Real-Time Transcription with the following TwiML instructions:
1const VoiceResponse = require('twilio').twiml.VoiceResponse;23const response = new VoiceResponse();4const stop = response.stop();5stop.transcription({name: 'Contact center transcription'});67console.log(response.toString());
1<?xml version="1.0" encoding="UTF-8"?>2<Response>3<Stop>4<Transcription name="Contact center transcription" />5</Stop>6</Response>
If a name
was not provided, you can stop an in-progress Real-Time Transcription via API using the SID
of the Transcription. See the RealtimeTranscription resource API reference page for more information.
Real-Time Transcriptions, including <Transcriptions>
TwiML noun and API, uses third-party artificial technology and machine learning technologies.
Twilio's AI Nutrition Facts provide an overview of the AI feature you're using, so you can better understand how the AI is working with your data. Real-Time Transcriptions AI qualities are outlined in the following Speech to Text Transcriptions - Programmable Voice Nutrition Facts label. For more information and the glossary regarding the AI Nutrition Facts Label, please refer to Twilio's AI Nutrition Facts page.
Voice Intelligence and Programmable Voice only use the default Base Model provided by the Model Vendor. The Base Model is not trained using customer data.
Voice Intelligence and Programmable Voice only use the default Base Model provided by the Model Vendor. The Base Model is not trained using customer data.
Base Model is not trained using any customer data.
Transcriptions are deleted by the customer using the Voice Intelligence API or when a customer account is deprovisioned.
The customer views output in the Voice Intelligence API or Transcript Viewer.
Compliance
The customer can listen to the input (recording) and view the output (transcript).
The customer can listen to the input (recording) and view the output (transcript).
The customer is responsible for human review.
Learn more about this label at nutrition-facts.ai