Speech Recognition – A Game Changer

Summary
Speech recognition technology has come a long way, but accessibility laws have yet to catch up.

 

Over the years, speech recognition technology has been a boon to people with disabilities, especially in the sphere of accessibility. Speech recognition is the underlying technology that powers voice assistants. This technology enables computers to understand and perform tasks based on human speech. Such technology is a game changer for the disabled, especially for those who have mobility, visual, and cognition disabilities. The most common voice assistants driven by speech recognition technology are Apple Siri, Google Assistant, and Amazon Echo. While speech recognition may be inaccessible to those with speech and some with hearing disabilities, speech recognition may be helpful in creating text for those with hearing disabilities by creating text transcriptions of the spoken word through speech-to-text technology.

History of Speech Recognition Technology

Let us examine the history of speech recognition technology. The first instance of what we call speech-to-text technology might be “Audrey,” a system designed by Bell Laboratories in the 1950s. Accurate Speech-to-Text is a technology pursued by many since 1962 when IBM premiered their “Shoebox” which was able to recognize and differentiate between 16 words. Nine years later, in the early 1970s, the Department of Defense began to recognize the value of speech recognition technology. From there, speech recognition technology evolved. However, speech recognition systems were limited because of what processing power and memory were available at that time. Mainly it was because of accents that this technology was not viable for travelers around the globe. When they encountered different accents or a different dialect, speech recognition products were not localized or globalized until only recently with advancing technology.

Speech Recognition Technology Today

Recently, Google developed Live Caption and Live Transcribe as part of their drive towards accessible technology. Live Transcribe was developed in collaboration with Gallaudet University, a university in Washington that was founded for Deaf people in 1864, to assist Deaf and hard of hearing individuals. In May 2019, Google introduced Live Caption which was available only on Google’s Pixel smartphones but they aimed to make the software more widely available in 2020. Apparently, Google, YouTube, and Live Transcribe, along with Live Caption share a database of all the possible utterances including accents, nuances, etc. of the spoken language from which they use such human generated “speech” to make instantaneous matches to create transcribed texts. A competing voice recognition database could very well be Apple’s Siri, a voice-controlled personal assistant that has been around for several years.

A worthwhile mention here is that YouTube is owned by Google. YouTube claims that although 1 billion of their videos are captioned, they have boosted accuracy by about 50% since the inception of automated closed captioning in 2009. In collaboration with its subsidiaries, Google probably has the largest voice recognition database in the world from which they can create more accurate texts in their auto-generated captions. Such voice recognition database is necessary to differentiate and recognize words in spite of the widely varying accents and speaking styles.

Twenty-First Century Communications and Video Accessibility Act

Now, it would be worthwhile examining what legislation governs accessibility made possible by speech recognition technology. Foremost among such technology may very well be the Twenty-First Century Communications and Video Accessibility Act (CVAA) that was signed into law by President Barack Obama on October 8, 2010. This CVAA legislation is divided into two parts: 1) Title I – Telecommunications Access; and 2) Title II – Video Programming. Basically, what these two titles set out to do is to regulate the two main spheres in communications and video accessibility.

The main points related to access for people with disabilities, particularly those with hearing disabilities, in telecommunications access would be the reaffirmation of accessible communications services and equipment and for video programming. The CVAA affirmed all video description rules promulgated by the FCC in 2000 and authorized some expansion of those obligations over the next 10+ years. The CVAA also requires that all video programming that is closed captioned on TV to be closed captioned when distributed on the Internet (does not cover programs shown only on the Internet).

An interesting aspect of the CVAA is that no mention is made of how American Sign Language (ASL) interpretation on television, nor VRS (Video Relay Service) should apply in telecommunications and video dissemination--such as during emergency video announcements on television and internet video programming--by the government or private entities. ASL interpretation is an equal access accommodation requirement addressed in the Americans with Disabilities Act (ADA)’s Effective Communication requirement.

Americans with Disabilities Act

As the ADA was signed into law in 1990, and updated in 2008, it is obvious that the ADA did not account for modern technology, most particularly the internet which is now a staple in our lives. Of interest is Title IV of the ADA which covers Telecommunications. This Title established the nationwide telecommunications relay services to increase accessibility to the telephone system in the United States and that the Federal Communications Commission (FCC) would ensure that interstate and intrastate telecommunications relay services are available, to the extent possible and in the most efficient manner for people with speech and hearing disabilities.

With the ever-increasing demands for accessible technology by individuals with disabilities, it is possible that Title IV of the ADA might be updated in the next edition to incorporate language that would cover accessible technologies.