Twitter Developers Get Speech Recognition Boost From Nuance

Nuance has been detailing its work to handle homonyms across the 51 languages that its technology... [+] supports. Image: Wikimedia Commons

Speech recognition is getting better all the time, but of course it’s still not perfect. Watch any hearing-impaired live and simultaneous television broadcast for the amusing misspellings and mistakes. As consumer bystanders we have had some idea as to how speech technology has been developing ever since we saw the lead movie character talking to the HAL 9000 sentient computer system in 2010: A Space Odyssey... and that was all the way back in 1968.

Where speech recognition struggles, still, is when it comes to handling what in linguistics we call homonyms. These are a group of words that share the same spelling and the same pronunciation but have different core meanings.

Homonyms -- are for example: there, their and they're.

Arguably one of the better-known players in this space is Boston-headquartered Nuance. The firm has been detailing its work to handle homonyms across the 51 languages that this technology supports.

So what happens next in speech recognition? We know that programmatically speaking we can say that algorithmic advances have pushed real speech recognition forward by an order of magnitude. We also know that NLU (natural language understanding) innovations (just look at IBM Watson) have progressed significantly.

The problem companies like Nuance faces are that when we are simply ‘home consumers’ so-to-speak, we are typically quite forgiving when it comes to speech technology. But (and it’s a big but) as speech now starts to enter fields such as healthcare, finance and perhaps even industries like aviation… more exacting application-specific use cases demand extremely high levels of accuracy.

Nuance Communications says it is working to address all these challenges and get us towards a more Star Trek like experience in general in terms of our use of speech technology. The firm’s latest work sees it take its Nuance SpeechKit SDK (software development kit) to a point where it will soon be available as part of Twitter ’s own ‘Fabric’ developer platform. The intention here is for developers to be able to voice-enable and (text-to-speech enable) apps and services.

TECHNICAL NOTE: This is not speech recognition on Twitter, although that is possible via one-extra-step process by either using Nuance’s Naturally Speaking technology and/or a combination of the ‘degree of’ speech recognition that ships automatically Apple OS X or indeed Windows -- this is Nuance placing its speech recognition technology in the hands of more developers on the Twitter platform.

Why is this a good thing?

This is still a good thing for the international speech recognition community because there are hundreds of thousands of Twitter focused software application developers around the world, so dissemination of the core technology in this way makes for a better speech technology world at the end of the day, in theory at least.

“Giving developers access to the best tools and services as part of one simple platform is what makes Twitter Fabric incredibly unique in a rapidly growing developer ecosystem. By integrating Nuance’s SpeechKit SDK, we’re giving developers the opportunity to easily design and create voice-enabled experiences that power a wide range of apps and services,” said Rich Paret, GM Developer Platform, Twitter.

Nuance’s SpeechKit lets developers create specialized voice experiences for their apps, through the ability to create customized language models in more than 40 languages and choose from more than 80 distinct voices for text-to-speech. The firm claims that by integrating Nuance SpeechKit into Fabric, developers can have a fully voice-enabled interface in a matter of minutes.

“Fabric’s diverse developer community creates an exciting opportunity for us to bring voice and conversational experiences to a wide range of apps and services – apps for social networking, IoT, gaming, entertainment, music and much more,” said Mike Thompson, executive vice president and general manager, Nuance Mobile. “In an era where the interface is shrinking while becoming more intelligent, Nuance’s voice technology instantly gives developers an opportunity to simplify and customize that experience with an incredibly natural, intuitive and powerful connection.”

Nuance’s portfolio of voice, touch and natural language understanding technology is used to create ‘humanized interactions’ with phones, tablets, computers, cars, TVs, apps and inside services from manufacturers and operators.

In five years time you’ll be able to have this article read out to you and post a user response without using your keyboard. Okay you can actually mostly do that now, but most of us don’t -- but that's exactly that’s the point.

Follow me on Twitter or LinkedIn.

More From Forbes

Twitter Developers Get Speech Recognition Boost From Nuance