What is Speech Recognition? (It’s Not What You Think!)

In this post:

What is speech recognition?

What is speech recognition – simple topic, right?

Basically, yes! Speech recognition is pretty simple to understand. 

The simplest definition is: a software solution that can process human speech and respond appropriately. In an inbound call center it’s generally used to resolve queries and route calls. 

That’s not an earth-shattering revelation. So why do people get confused about what counts as speech recognition?

Basically, there’s a lot of terminology in this field and people use that terminology differently depending on their industry. 

Let’s clear up some of that confusion.

Speech recognition terminology

Each of these terms is part of the broad topic of ‘speech recognition’ – but not the whole thing. Think about it this way: software is one term that’s part of the topic ‘computing’… but it’s not the whole story. 

Conversational AI / IVR

In the contact center setting, IVR solutions are what voice-based systems will replace or augment. Scrap ‘press 1 for sales’ in favor of conversational service. 

Natural Language Processing (NLP)

NLP turns speech into structured data that a computer can process. Its job is to ‘hear’ and record what a person tells it.

Natural Language Understanding (NLU)

NLU is more sophisticated. It goes beyond what the speaker says and aims to comprehend what they mean – and therefore, what they might need. 

(Want more detail? Read ‘NLU and NLP – what’s the difference?’)

Speech-to-Text (STT)

This is the nuts and bolts of speech recognition. The system takes audio input and transcribes it to text in order to process and store it. 

Text-to-Speech (TTS)

Generally, a speech recognition system will also use speech as its output i.e. You speak to it – it speaks back. TTS means the system isn’t reliant on pre-recorded audio. 

(Need to know more? Read ‘What’s TTS doing for contact centers?’)

Voice recognition

Voice recognition usually refers to a system that responds *only* to a specific voice, often as a security feature. 

So you can probably see the issue! Speech recognition is a totally valid way to describe resources with these different features. But it’s also extremely broad when you consider how far and fast conversational systems are spreading. 

Think about it – Siri on your iPhone? That’s speech recognition! Conversational IVR in the contact center? That’s speech recognition! Voice-based search engines? Smart TVs? Your home Alexa? All speech recognition. But under the hood, they’re different systems. 

newsletter banner

Is speech recognition a dated term?

One possible gripe with the term speech recognition – it might be a little old fashioned.

This is something I’ve heard a few times, so here’s the thinking… 

Speech recognition systems have been shaping contact center services for a long time. Modern systems that use speech as its input / output are the product of decades of R&D in the field. Back in the 1950s Bell Laboratories created the Audrey system which could recognize digits spoken aloud. 

IBM followed that trick with Shoebox, which understood 16 words – nearly as good as a typical 2-year-old child! (I don’t know which 16 words they went with.)

There were major advancements in the 1980s followed by big pushes from Google’s voice search and Apple’s Siri in the new millennium. 

So what’s my point? Speech recognition was a target that technologists had – and pretty much nailed – years ago. 

The objectives for modern voice-based systems include:

  • Comprehension
  • Machine learning
  • Dynamic response
  • Data integration
  • Predictive analytics
  • Agent guidance

So yes, you can call these systems speech recognition. As a general term it’s fine, but it hardly covers what the underlying tech is for. 

Think about your smartphone for a similar example. The phone element isn’t really the point these days, right?

Most customers believe voice recognition in self-service will improve their experience

Why do businesses want voice-based tools?

Any tool that relies on speech recognition is far more complex than one that relies on, say, button pushing for input. So what’s the appeal for businesses?

It’s simple – they make life easier for customers. 

Speech is not how computers think, so it’s taken more than half a century to teach them.

But speech is how humans think; most of us pick it up within our first decade. 

The strong preference customers have for voice-based systems is obvious when you compare resources like IVR systems. Given the choice to navigate a maze of button prompts, or state their need clearly, customers choose the latter. 

We’ve compiled some great uses cases in How Delta saves $5million a year with conversational IVR but here are the highlights:

  • Call containment increased 5%
  • Misrouted calls dropped by 15%
  • Capture of caller intent reached 75%
  • AHT decreased 10%
  • Agent availability increased 25%

So it turns out that voice-based systems are in that rare goldilocks zone. They’re something that customers badly want and they save a lot of money. 

And that’s not something you see every day… 

Related Posts