|
Seven years ago a modest NASA research program aimed at developing the ability to capture, analyze, and recreate subvocal speech was initiated as part of NASA’s Extension of the Human Senses program. The subvocal speech-recognition research, headed by Dr. Charles Jorgensen, was initially aimed at developing silent communication and speech augmentation in extremely noisy environments such as the space station. It soon became clear that the technology could have many other applications as well; it could enable bodyguards, security personal, or Special Forces during highly covert operations to communicate without detection, and tank commanders to give orders even during noisy fighting conditions. The technology also has many civilian applications, enabling users to talk with privacy even in the company of others or in very noisy environments. Firefighters and other help and rescue personal could use the technology in their daily routines (as this NASA video shows), as could people with vocal cord disorders. Finally, the technology could find its way into the gaming market as a way to send specific commands to team members in multiplayer games.
|
Subvocal speech is silent, or sub-auditory, speech, such as when a person silently reads or talks to himself. Even when reading or speaking to oneself with or without actual lip or facial movement, biological signals arise. While using the NASA subvocal system, a person thinks of a phrase and talks to himself so quietly that it can’t be heard; despite that, the tongue and vocal cords receive speech signals from the brain that are detected and analyzed using a small electrode placed on the throat. Jorgensen created a neural net to analyze electrical patterns recorded by the electrodes and by 2004 he reached a 99% recognition rate with a small number of words in addition to vowels and consonants. Jorgensen’s goal is to be able to reach a stage in which it would be possible to interface his subvocal technology with existing speech recognition systems, thus allowing full subvocal recognition. In addition, Jorgensen and his team are striving to improve exiting electrodes, transforming them into more advanced and conferrable sensors. Jorgensen foresees such advanced sensors embedded in either clothing or some sort of adorned simple appliance, allowing the electrical signals to be picked up in a non-invasive, convenient, and comfortable way.
Q: When did the subvocal speech project begin and what was the initial motivation for it?
A: The Subvocal program at NASA started in 1999. It was part of a larger program called the Extension of the Human Senses. It was motivated by communication problems under pressurized breathing equipment and alternate gas mixtures occurring in space operations and high noise environments such as extra-vehicular missions and space station operations.
Q: What is subvocal speech and how did you detect and translate it into normal speech?
A: Subvocal speech is the direct non-auditory interpretation of the nervous system signals sent to muscles of the vocal tract (e.g., electromyographic or EMG signals). It is measured by surface contact sensors and the electrical signals are transformed into patterns recognized by classifiers as word or word components.
Q: Is there a difference between “thinking in words” and subvocal speech? In other words, would you describe your device as a mind reading machine (even if a crude one at that)?
A: Yes, there is a difference. Subvocal speech requires some activation of the speech muscles. It is not in any way a mind reading machine. Subvocal speech requires active cooperation and intentional stimulation of the speech muscles. Hence it is voluntary and private.
Q: In 2004 your device was only able to recognize about ten or so words. What advancements have you made since then?
A: We are up to about 25 words and 38 vowels and consonants. We are communicating in real time in pressurized suits to live cell phones.
Q: What do you think could be the main application of such technology and how much computing power will be necessary to make it work effectively?
A: A silent cell phone, military operations, bioelectric device control, and handicapped communication. As for your second question, a small wearable PDA or PC will suffice. With custom chips it could be much smaller yet. Specialized sensors are also factors that will increase usability.
Q: How many sensors do you currently use in your test and how do you predict commercial applications of the technology implement the sensors?
A: We currently use two sensors although, if needed, that number can easily be increased to detect specific speech articulator movements. It is just a cost question. We are also in late stage development of a non-contact capacitive sensor that would not require wires or the messy medical style Ag/AgCl sensors used in the lab.
Q: Does the software you developed need to learn the EMG signals of every user (like voice recognition software), and how long does it take to teach the software each word?
A: Right now it is user-specific; we are not at the level of speaker-independent recognition, much like the early stages of acoustic voice recognition. We can learn six to ten words in a morning, taking about one hour plus to acquire the signals and a half hour to train the recognition system.
Q: Are you working on composition of full sentences, and what are the current obstacles in your way of achieving this goal?
A: At this stage, only simple two or three word phrases. Our effort is small and resource-constrained more than technically constrained. If we recognize more vowels and consonants and be able to connect to existing speech recognition systems quickly, making the full sentence issue a largely solved problem.
Q: Have DARPA and the U.S. military been actively interested in the technology?
A: Yes, DARPA (The Defense Advanced Research Projects Agency) has begun a recent program using the technology after consulting with us. We are working on a couple of other military applications of interest to us through small company subcontracts, but NASA is a civilian agency and that is not our main focus.
Q: Could you give some sort of time estimate for a commercial application of the technology?
|
A:
It depends on private user communities and their interest. Gaming would probably go first, then more sophisticated users, as the technology is refined in the field. Two to four years is achievable. Much faster progress would be possible if major resources were devoted to technology applications, since we now have a pretty good handle on the scientific research questions and more of the hold up is in engineering implementations.