Friday, April 21, 2006

Lexicon For Embedded Systems

I've been developing a lexicon for a speech engine that runs in embedded systems.

Some of the constraints of this system (in order of importance):

  1. Machine independence.
  2. ROMability: The whole file will be loaded as-is into memory and used with no alterations.
  3. Quick lookups.
  4. Small storage and RAM requirements
  5. Handle case and other language issues
  6. Portable implementation

I've managed to meet all of these requirements with almost no space wasted on alignment issues or excess redundancy. I'm estimating that 100000 words with pronunciations comes out to be around 700KB.

No comments: