Estimating The Cost of Speech to Text and Natural Language Understanding Integration

Brian · February 22, 2021, 10:59pm

When using the estimators from IBM and Google for Speech to Text (STT) the standard input is for an estimated number of minutes. For the most accurate estimate would this be based on handled calls (talk+hold times) or should we be expanding this to received calls, and/or AWC, etc.?

In short, what data is sent to IBM or Google for transcription?

For the Natural Language Understanding (NLU), once transcribed, does anyone know of an estimated character count per minute of talk time? - Just trying to get an idea of how many NLUs to anticipate per call.

ivan.malyshkin · March 17, 2021, 12:15am

Hi Brian,
for Transcription it is actually only the talk time that goes to transcription. Unless you’re using conversational IVR in the scenario - in this case it will include speech recognition in IVR

For NLU that is way more challenging to estimate, it counts per NLU units (messages) multiplied by features requested - like BOT, sentiment and others. It is still an open exercise to me. If anyone has ideas how to calculate it - please let me know :))

Derek · September 29, 2022, 5:32am

Hi Guys -

I see this is an old thread, but thought I could still be of some help.
I use 1000 characters per minute as a rough estimator. Average rate of speech in English is upward of 150 words per minute with the average word length being 5 characters. This is 750 characters per minute. This leaves us 250 characters to pad for spaces, punctuation, and hesitation markers (if present). For IBM NLU calculation you can calculate the average length of the call in minutes multiplied by 1000. IBM bills based on a data unit, or NLU item. This is 10,000 characters at a cost of $0.003/NLU item. If I recall correctly, IBM does not separate the NLU item count between interactions meaning they are cumulative over time. A 5 character word from a single interaction is not going to use up a data unit. Other interactions get added to it to become a whole 10,000 character data unit. I don’t think Google does it this way. NLU on IBM is pretty cheap when compared to just about any of the other services (TTS, STT, Assistant).

The cost estimator for IBM is pretty straight forward. Out of the box, most people aren’t implementing a custom model instance or a custom classification to use IBM NLU for sentiment analysis.
An example estimate.
Call Center volume is estimated at 600,000 a month with an average agent call duration of 5 minutes
600,000 x 5 = 3,000,000 minutes
3,000,000 x 1000 (characters per minute) = 3,000,000,000 characters
3,000,000,000 / 10,000 (NLU item) = 300,000 NLU items
Estimated cost per month = $800.00

Also, your first 30,000 NLU items per month are included in the free tier for IBM, so you can always try it out to see if it is something that you might want to use on a paid. You can then see the kind of usage that you are putting through it.

Topic		Replies	Views
Does Natural language understanding work with any other languages besides English? Product Q&A watson , nlu , sentiment	1	187	October 9, 2020
Voice recognition in base-64 encoded API and Integrations	5	330	December 17, 2020
Variable to extract all transcript results	4	181	May 29, 2023
Call Duration Variable? Product Q&A	1	189	January 14, 2021
How we can integrate our voice bot system into BP Product Q&A ai , bot	1	221	September 4, 2020

Estimating The Cost of Speech to Text and Natural Language Understanding Integration

Related topics