When using the estimators from IBM and Google for Speech to Text (STT) the standard input is for an estimated number of minutes. For the most accurate estimate would this be based on handled calls (talk+hold times) or should we be expanding this to received calls, and/or AWC, etc.?
In short, what data is sent to IBM or Google for transcription?
For the Natural Language Understanding (NLU), once transcribed, does anyone know of an estimated character count per minute of talk time? - Just trying to get an idea of how many NLUs to anticipate per call.