Human to Machine Communication (HMCHuman-to-Machine Communication) enables machines to better understand speech for voice control and use the human voice as an access control authentication factor.
Voice control for HMCHuman-to-Machine Communication applications has been heralded as the next step toward a more natural user experience (UX) in today’s increasingly mobile and connected world. However, today’s Automatic Speech Recognition (ASR) solutions for HMCHuman-to-Machine Communication applications typically only perform well if words are spoken clearly and there is almost no background noise.
Machines are incapable of inferring meaning as humans do if background noise periodically drowns out the speaker, and while speech-recognition software can be trained to understand accents and other speech patterns, they cannot be trained to ignore background noise. HMCHuman-to-Machine Communication solutions must be able to isolate the speaker’s voice from others in the background, as well as from other types of ambient noise. However, acoustic microphones, alone, do not provide enough directional acquisition capabilities to achieve this level of speaker isolation, even with multiple microphones and microphone arrays.
VocalZoom has innovated a better way, creating a new category of HMCHuman-to-Machine Communication sensor that augments the output of acoustic microphones. This output from the Vocalzoom optical HMCHuman-to-Machine Communication sensor is associated exclusively with the speaker, resulting in highly accurate speech recognition. VocalZoom’s optical HMCHuman-to-Machine Communication sensor technology is poised to play a key role in significantly improving HMCHuman-to-Machine Communication performance in a wide variety of applications. The first and only solution of its kind, it adds an important new UX layer for next-generation system design, enabling systems to gather critical additional information exclusively about the user who is communicating with a device. This information comes from the optical data generated during speech as the facial skin vibrates around areas including the mouth, lip, cheek, neck and throat.
By focusing the small and low-power VocalZoom sensor on these areas, measuring the vibrations and converting this data to an audio signal, the speaker’s voice can be isolated. Using advanced interferometer technology, the sensor can deliver multiple types of functionality from this facial data and give voice authentication and control platforms a near-perfect reference signal with which to operate – regardless of noise levels.
With Background Noise
Speaker’s Voice Isolated