--------------------------------------------------------------------
slf2dfa - HTK-to-julian grammar converter

						ver.1.0 (2006.10.x)
--------------------------------------------------------------------

WHAT'S THIS
============

This toolkit converts an HTK recognition grammar into Julian format.
A word network (SLF) will be converted to DFA format, and the words
in the SLF are extracted from the dictionary to be used in Julian.
Furthermore, word category will be automatically detected and defined
to optimize performance in Julian.


HOW TO USE
===========

First, you must specify some paths in "SLF2DFA.sh".  Two programs
"dfa_determinize" and "dfa_minimize" are needed to run this toolkit. 
They are distributed with Julius-3.5.3 and later
Please set path variables in SLF2DFA.sh to point to the programs. 


Then, prepare an HTK recognion grammar files you want to convert 
with the following suffixes.

     "foobar.slf"    - word network SLF file
     "foobar.htkdic" - dictionary file

Then, invoke "SLF2DFA.sh" to generate equivalent Julian network file
"foobar.dfa" and Julian dictionary file "foobar.dict" like this:

    % ./SLF2DFA.sh foobar

It will output these files:

     "foobar.dfa"  - converted DFA file
     "foobar.dict" - converted dictionary file
     "foobar.term" - input label index

Other outputs whose name start with "_" are just temporary files, so
you can remove them manually,

    % rm -f _*


EXAMPLES
=========

The directory "test1" and "test2" contains example grammars:

   test1 - 4 digit (one, two, three, four) recognition
   test2 - a sample voice dialing grammar, the same as in HTKBook tutorial

You can test the tools like this:

    % mkdir tmp
    % cp test2/ex.grammar test2/ex.htkdic tmp
    % cd tmp
    % HParse ex.grammar ex.slf
    % ../SLF2DFA.sh ex
    % (compare the results with the original ones in test2/)


LIMITATIONS
============

Currently this toolkit focuses on only converting a word network
generated by "HParse".  Other usage like re-scoring word lattice
generated by recognizer are not tested.


CONTACT
============

julius-info at lists.sourceforge.jp, or see below:
http://julius.sourceforge.jp/
http://julius.sourceforge.jp/en/


WHAT HAPPENS INSIDE
====================

Below are the differences between HTK and Julian for grammar handling.

 HTK (default):
  - left-to-right
  - Moore type (output is assigned to the nodes) 
  - allow NULL (no output) nodes.
  - only needed words in dictionary are loaded at run time
  - does not concern word category

 Julian:
  - right-to-left (since 2nd pass is reversed)
  - Mealy type (output is assigned to the arcs)
  - does not allow NULL or non-deterministic arcs
  - all words in dictionary are loaded at run time, even if they does
    not appear in grammar.
  - word constraint can be handled by its "category" for efficiency

Thus, the conversion needs several steps.

Below are the conversion steps.  Steps from 1 to 7 are for the actual
conversion procedure, and Step 8 and 9 are for optimizing the
generated grammar for efficiency.

1) "remove_null.pl" removes "!NULL" nodes in the input SLF.

2) "moore2mealy.pl" converts SLF from Moore-type to Mealy-type.  This
   will generate many non-deterministic arcs. The output is still in
   SLF format.

3) "slf2dfa.pl" converts the syntax from SLF to DFA.

4) "dfarev.pl" reverses the network direction and swap start / end node.

5) "dfa_determinize" determinizes the arcs.

6) "dfa_minimize" minimizes the whole network.

7) "dict2dict.pl" converts the HTK dictionary to Julian format,
   extracting only words contained in the word network.

8) "dfa_make_category.pl" checks the transition patterns and detects
   which set of words can be treated as a category.  It adds detected 
   patterns as new categories to the grammar, and tokenize the
   corresponding transition arcs with the new categories.

9) "dfa_clean.pl" will parse the grammar files to remove unused terminal
   symbols and words, and re-number the terminal IDs properly.
