# Language Tools * Language concepts pocketsphinx handels several models for your language: - an acoustic model: from sounds to phonems (the -hmm option) - a dictionary: from phonems to words (the -dic option) - a grammar: from possible words to expected sentences (the -lm option) - a keyphrase: an exact sentence to search (the -keyphrase option) - keywords spotting search: a set of words to search (the -kws option) Note: lm, keyphrase and kws are different ways to search something to recognize. The pocketsphinx application does not handle both in the same time, despite it should possible with the programming API: http://cmusphinx.sourceforge.net/wiki/tutorialpocketsphinx#searches http://cmusphinx.sourceforge.net/wiki/faq#qcan_pocketsphinx_reject_out-of-grammar_words_and_noises http://sourceforge.net/p/cmusphinx/discussion/help/thread/df599784/ http://eprints.qut.edu.au/37254/1/Albert_Thambiratnam_Thesis.pdf Other: http://kaldi.sourceforge.net/about.html * build_grammar.sh It builds the set of sentences for our language into a file "corpus.txt" Then it performes the requested steps to get an ARPA lm-grammar "corpus.lm" Finally it filters a reference dictionnary (a french one in our case) with only words of our sentences and put it into "corpus.dic" Note that other generated files ".idngram" ".vocab" are not used afterwards It also builds "keyword.{txt,lm,dic}" for a single keyword Documentation: http://cmusphinx.sourceforge.net/wiki/tutoriallm * corpus2wav.sh [] It reads a text file. For each line (hopefully a sentence), it uses the voxygen.fr online TTS and record result in a wave file. Use it on the corpus of sentences of your language, and also on a corpus of wrong sentences. You will get a set of records for testing. * Depedencies . This requires cmuclmtk tools: http://cmusphinx.sourceforge.net/wiki/cmuclmtkdevelopment . The reference dictionnary I use is for french language: frenchWords62K.dic http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/French%20Language%20Model/ Edit build_grammar.sh to change it.