Skip to main content

FAQs: Voice transcription

If a user clicks the button to translate an interaction into Spanish, closes the interaction, and then reopens it later to translate into Spanish again, is that counted as one translation or two?

Voice transcription – How much does Extended Voice Transcription Services or Native Voice Transcription cost?

Extended Voice Transcription Services (EVTS) and Native Voice Transcription are billed on a per-minute basis.

Regardless of the transcription engine used, the cost for Voice Transcription is the same and is billed in the currency of your contract.

Depending on your organization’s Voice Transcription offer, a fair use allocation will be included. The following table shows the latest pricing for both EVTS and Native Voice Transcription:

USDCADAUDNZDGBPEURBRLJPYZAR
0.01000.01100.01300.01400.00700.00800.04001.20000.1420

As noted above, your organization will have a fair use allocation for Native Voice Transcription, EVTS, or both, depending on the offer available to you.

Voice transcription offers

There are two Voice Transcription offers in Genesys Cloud:

  • Voice Transcription (Native and Extended) consists of one SKU (this is the default offer):
    • GC-170-NV-VOICETRANSCRIPTION
  • Voice Transcription (Legacy) consists of the following SKUs (this offer is not available for new contracts):
    • GC-170-NV-VTFAIRUSEO
    • GC-170-NV-EVTS

An organization can only subscribe to one of these offers at a time, not both.

Fair use allocation

  • Under the Voice Transcription (Native and Extended) offer (GC-170-NV-VOICETRANSCRIPTION), EVTS and Native Voice Transcription share a combined fair use allocation. This ensures consistent global support for a wide range of dialects and languages. See the for details about current allocations.

  • Under the Voice Transcription (Legacy) offer (GC-170-NV-VTFAIRUSEO and GC-170-NV-EVTS), EVTS does not include a fair use allocation and is billed from the first minute of use. Native Voice Transcription does include a fair use allocation, as defined in the .

Note
  • To enable Extended Voice Transcription Services (EVTS), administrators may need to activate it through AppFoundry or Integrations, or request assistance from their CSM.
  • When using EVTS, transcribed users are not billed for Genesys Cloud CX 1 WEM Add-on II, Genesys Cloud CX 2 WEM Add-on I, or the Speech and Text Analytics Add-on, provided that Topic Spotting is not enabled for those interactions.

Voice transcription – How does Extended Voice Transcription Services – Azure provide customer data security?

Extended Voice Transcription Services streams media outside of Genesys Cloud to a third party to generate voice transcripts. Currently, these Extended Voice Transcription Services are provided by Microsoft through their Azure Speech-to-Text offering. As part of this combined offering, Genesys ensures data security in the following ways:

Note: Genesys Cloud is transitioning the Extended Voice Transcription Services engine from Microsoft Azure to AWS Transcribe. Impacted organizations will receive advance notice prior to any changes./bs_well]
  • Azure Speech-to-Text does not store any audio or transcription data at rest. All data in-transit is encrypted. For more information, see
  • The media sent to  services is processed only in Azure’s server memory and no data is stored at rest by the third party.
  • Once transcribed, all transcripts are encrypted and safely stored within Genesys Cloud.
  • All media sent to a third party is encrypted using TLS.
  • Transcripts created by Extended Voice Transcription and recorded interactions are stored by Genesys Cloud using the same type of encryption.

For more information, see , and .

Voice transcription – What is the difference between Genesys Cloud Voice Transcription and Extended Voice Transcription Services?

Both Genesys Cloud Voice Transcription and Extended Voice Transcription Services (EVTS) can transcribe voice interactions.  

The differences between Genesys Cloud Voice Transcription and Extended Voice Transcription Services (EVTS) are summarized in the following list.

  • EVTS extends Genesys Cloud’s own native transcription.
  • EVTS uses third party transcription services and may have different performance attributes.
  • EVTS can provide access to additional dialects and languages.
  • EVTS uses a non-customizable transcription model. Customization is only available with Genesys Voice Transcription.
  • For non-Genesys Cloud CX 3 customers (in addition to EVTS charges), the customer will also be billed for WEM Add-on when Topic Spotting is used.

Note: During call segments, WEM voice transcription may use transcripts using Google Dialogflow.  For more information, see .   

For more information about EVTS, see:

    Voice transcription – Are interaction transcripts encrypted when stored in the cloud?

    Interaction transcripts are encrypted and safely stored to protect them from unauthorized access.  Transcripts are encrypted with AES 256-bit encryption using customer/organization-specific encryption keys.  For more information, see .

    An organization may choose to make transcripts searchable as a part of the content search feature.  In this case, transcript information is indexed in this search cluster using a Genesys Cloud-wide encryption key, not an organization-specific encryption key.

    Note: These transcripts are only searchable and are stored in this manner for 35 days. Organizations can opt in or out of having searchable transcript information.

    Licensing and costs – Are there any additional costs when voice transcription is enabled?

    Voice transcription is a speech and text analytics feature and it is included as part of the Genesys Cloud CX 1 WEM Add-on II or Genesys Cloud CX 2 WEM Add-on I, and Genesys Cloud CX 3 license. A fair use policy is in place for voice transcription that allows customers to use an allocated number of transcribed audio minutes, per Genesys Cloud user, per month, without incurring additional costs.

    For more information, see: , and .

    Voice transcription – Best practices when setting up voice transcription

    To set up voice transcription, best practice recommends that you follow these steps:

    1. Determine whether or not your organization will benefit from transcribing all agent interactions or only a specific set of lines of business.
    2. Enable voice transcription. For more information, see .
    3. Determine the best way to identify specific agents. Should you target specific queues, or should you create an Architect flow action?
    4. Create a program and set it as the default program. For more information, see and .
    5. Assign the default program to the queues and/or flows that should have transcription enabled.

    Voice transcription – Can I download a voice transcript?

    You can export transcripts from one or more interactions using the speech and text analytics API.

    Also, a transcript can be copied manually from the Interaction Details page by clicking the Copy Transcript option in the top right corner of the transcript. For more information, see .

    For more information, see .

    Licensing and costs – Can voice transcription usage be monitored?

    Currently, voice transcription usage cannot be monitored externally. Genesys does measure voice transcription usage internally and contacts customers who are nearing their usage limit.

    Since voice transcription usage is not available externally, Genesys exercises leniency when a customer goes over the allotted transcription usage quota, and provides the necessary time required to adjust their usage.

    For more information, see .

    Voice transcription – What is the accuracy of voice transcription and how do I increase it by including brand names, acronyms or internal terminology?

    Genesys Cloud’s voice transcription accuracy is comparable to that of other leading providers and hyperscalers. Several factors can influence accuracy, including audio quality, speaker accents, background noise, and the complexity of the language.

    For guidance on improving overall accuracy, see .

    To help the system recognize business- or domain-specific terms, such as brand names, acronyms, or internal terminology. For more information, see .

    Voice transcription – Is voice transcription supported using third parties such as Amazon, Google, or Microsoft?

    Genesys Cloud uses its own native transcription engine and includes Extended Voice Transcription Services (EVTS) as an alternative to native voice transcription. The underlying provider for Extended Voice Transcription Services can be either Microsoft Azure Speech-to-Text, or AWS Transcribe.

    EVTS provides customers with additional language support beyond the Genesys Cloud native transcription engine, and a choice between the engines when transcribing voice interactions.

    For other voice transcription providers such as Google, you must integrate using existing AudioHook and Transcription connector capabilities.

    For more information, see: , and

    Voice transcription – What is the expected latency and level of accuracy for voice transcription?

    Within Genesys Cloud, audio is transcribed in near real time, within seconds, and is accessible through our .  The full interaction transcript becomes available in the Interaction Details UI immediately after the call, usually within 15 seconds.

    • Expected latency: approximately 3–5 seconds with this toggle enabled, compared to 35–40 seconds without it.
    • There is no additional cost for customers who use this feature.

    For more information, see , and , .

    Voice transcription – What makes Genesys voice transcription unique and better than third parties?

    The language model used within the Genesys voice transcription capability is trained based on contact center conversations. 

    Since Genesys voice transcription focuses on your specific contact center conversations, it is best suited to transcribe your conversations. As a result, it consistently produces more accurate transcriptions of call center conversations when compared to general transcription engines.

    The speech to text transcription model adapts and expands when phrases are added as part of a topic. By doing this, the recognition engine is tailored to find and highlight actionable areas in the transcription that facilitatetargeted data search and retrieval. 

    For more information, see .

    Voice transcription – When I play back a recording the transcript time and the audio are not synchronized. What should I do?

    Interaction player and transcription synchronization mismatches occur when there is a clock drift issue. To minimize a clock drift issue, Network Time Protocol (NTP) should be enabled. 

    For more information, see .

    Voice transcription – Why are there so many ellipses (…) in my voice transcriptions instead of words?

    The transcription confidence filter dictates the frequency of finding ellipses (…), in transcriptions.

    To change the number of ellipses in the transcript, you must lower your strictness level.

    A lower strictness results in more words and fewer ellipses, since a higher number of transcription errors can be expected when a low transcription confidence level is set.

    To lower the strictness level you must have administrator privileges. 

    1. Click Admin and select Quality > Speech and Text Analytics.
    2. Click Menu > Conversation Intelligence.
    3. Click Speech and TextAnalytics.
    4. Lower the Transcript Confidence Filter to 20.

    If the number of ellipses in the transcripts is still large, repeat steps 1 and 2.

    For more information, see .