Friday, April 21, 2006

Large Vocabulary Continuous Speech Recognition

The company I work for, Conversay, has been working on a Large Vocabulary Continuous Speech Recognition (LVCSR) engine for embedded systems that is also speaker independent. We have chosen to call it ISSAC.

I tried ISSAC on a Pocket PC for the first time this week -- a Symbol MC50 to be exact. It ran at almost real-time speed! A little more work to get the speed up and we will have a 4MB system that runs well on ARM (and other) processors.

Our current system, CASSI, is a continuous speech recognition engine, also speaker independent, designed for smaller tasks. CASSI is great for tasks with up to 1000 words (or more depending on the structure of the task) but tradeoffs in its design makes ISSAC a better choice for larger tasks.

Both are using the same front end for signal processing. I've seen it work with greater than 90dB of noise with radios playing in the background. Actually, since we don't measure dB quite properly, I think we handle over 100dB of noise. The people who have tested the system under those circumstances get headaches even with ear protection.

1 comment:

Anonymous said...

Good dispatch and this post helped me alot in my college assignement. Thanks you for your information.