JULIAN(1)                                               JULIAN(1)



NAME
       Julian  -  open  source  grammar  based  continuous speech
       recognition parser

SYNOPSIS
       julian [-C jconffile] [options ...]

DESCRIPTION
       Julian is a multi-purpose speech recognition parser  based
       on  finite  state  grammar.  It is an another variation of
       Julius , and is included in the  distribution  of  Julius.
       It  has a capability of performing almost real-time recog-
       nition of continuous speech with  over  ten  thousands  of
       words on most current PCs.

       Written  finite  state  grammar  and triphone HMM acoustic
       model of any units and sizes can  be  used.   The  grammar
       format  is original one, and tools to create a recognirion
       grammar are included in the distribution.   Standard  for-
       mats are also adopted for acoustic models.  Users can make
       their own grammars, their own acoustic models with  Julian
       to build recognition system of their own.

       Julian can perform recognition on audio files, live micro-
       phone input, network input and  feature  parameter  files.
       The maximum size of vocabulary is 65,535 words.

RECOGNITION MODELS
       Julian supports the following models.

       Acoustic Models
                 Same  as  Julius:  Sub-word  HMM  (Hidden Markov
                 Model) in HTK  format  are  supported.   Phoneme
                 models  (monophone),  context  dependent phoneme
                 models  (triphone),  tied-mixture  and  phonetic
                 tied-mixture  models  of  any  unit can be used.
                 When using context dependent  models,  interword
                 context is also handled.

       Lanaguage model
                 For  the  task  grammar, sentence structures are
                 written in a BNF style using word categories  as
                 terminating  symbols  to  a grammar file. A voca
                 file   contains   the   pronunciation   (phoneme
                 sequence) for all words within each category are
                 created.   These   files   are   converted    by
                 mkdfa.pl(1)  to a deterministic finite automaton
                 file (.dfa) and a dictionary file (.dict).

SPEECH INPUT
       Same as Julius: Speech waveform files (16bit WAV (no  com-
       pression), RAW format, and many other if used with libsnd-
       file library) and feature parameter files (HTK format) can
       be  used as speech input.  Live input from either a Micro-
       phone, a DatLink (NetAudio) system, or via  tcpip  network
       is also supported.

       Notice:  Julian  can  only  extract  MFCC_E_D_N_Z features
       internally.  If you want to use HMMs based on another type
       of  feature  extraction  then  microphone input and speech
       waveform files cannot be used.  Use an external tool  such
       as  Hcopy  or  wav2mfcc  to create the appropriate feature
       parameter files.

SEARCH ALGORITHM
       Recognition algorithm of Julian is  based  on  a  two-pass
       strategy.   In  the  first  pass, a high-speed approximate
       search is performed  using  weaker  constraints  then  the
       given  grammar.   Here  a LR beam search using only inter-
       category constraints extracted from the  grammar  is  per-
       formed.  The  second pass re-searches the input, using the
       original grammar rules and intermediate results  from  the
       first  pass,  to gain a high precision result quickly.  In
       the second pass  the  optimal  solution  is  theoretically
       guaranteed using the A* search.

       When using context dependent phones (triphones), interword
       contexts are taken into consideration.   For  tied-mixture
       and  phonetic  tied-mixture  models,  high-speed  acoustic
       likelihood calculation is possible using gaussian pruning.

       For  more  details,  see  the related document or web site
       below.

OPTIONS
       The options below allow you to specify the models and  set
       system  parameters.   You can set these option at the com-
       mand line, however it  is  recommended  that  you  combine
       these  options in a "jconf settings file" and use the "-C"
       option to read it at run time.

       Most are the same as Julius.
       Options only in Julian: -dfa, -penalty1, -penalty2, -look-
       trellis
       Options  only  in  Julius:  -nlr,  -nrl,  -d, -lmp, -lmp2,
       -transp,  -silhead,  -siltail,  -spdur,  -sepnum,   -sepa-
       ratescore

       Below is an explanation of all the available options.

   Speech Input
       -input {rawfile|mfcfile|mic|adinnet|netaudio|stdin}
              Select speech data input source.  'rawfile' is from
              waveform file (file name should be specified  after
              startup).   'mfcfile'  is  a  feature  vector  file
              extracted by HTK  HCopy  tool.   'mic'  means  live
              microphone  input,  and  'adinnet'  means receiving
              waveform data via tcpip  network  from  an  adinnet
              client. 'stdin' means standard tty input.

              The  supported waveform file format varies based on
              compilation time configuration.  To see what format
              is  actually  supported, see the help message using
              option "-help".  (for stdin  input,  only  WAV  (no
              compression) and RAW (16bit, BE) is supported.)
              (default: mfcfile)

       -filelist file
              (with  -input  rawfile|mfcfile) perform recognition
              on all files contained within the target file.

       -adport portnum
              (with -input adinnet) adinnet port number (default:
              5530)

       -NA server:unit
              (with -input netaudio) set the server name and unit
              ID of the Datlink unit.

       -record directory
              auto-save recognized speech data under  the  direc-
              tory.   Each  segmented inputs are recorded each by
              one, with  a  filename  of  "YYYY.MMDD.HHMMSS.raw",
              which  shows  the system time when the input begins
              (YYYY=year, MMDD=month/day, HHMMSS=hour/minute/sec-
              ond).   The file format is RAW, 16bit, 16kHz, mono,
              big endian.

   Speech Detection
       -cutsilence

       -nocutsilence
              Force silence cutting (=speech  segment  detection)
              ON/OFF.  (default:  ON  for  mic/adinnet,  OFF  for
              files)

       -lv threslevel
              Amplitude threshold (0 - 32767).  If the  amplitude
              passes  this  threshold  it is considered to be the
              beginning of a speech segment, if  it  drops  below
              this  level  then  it is the end of the speech seg-
              ment. (default: 3000)

       -zc zerocrossnum
              Zero crossing threshold per a second (default: 60)

       -headmargin msec
              Margin at the start of the speech segment  in  mil-
              liseconds. (default: 300)

       -tailmargin msec
              Margin  at  the  end  of the speech segment in mil-
              liseconds. (default: 400)

       -nostrip
              On some sound devices, invalid "0" samples  may  be
              recorded at the start and end of recording.  Julian
              remove them automatically by default.  This  option
              inhibit the automatic removal.

   Acoustic Analysis
       -smpFreq frequency
              Sampling frequency (Hz).
              (default: 16kHz = 625ns).

       -smpPeriod period
              Sampling rate (nanoseconds).
              (default: 625ns = 16kHz).

       -fsize sample
              Analysis  window size (No. samples) (default: 400).

       -fshift sample
              Frame shift (No. samples) (default: 160).

       -delwin frame
              Delta window size (No. frames) (default: 2).

       -hipass frequency
              High-pass filter cutoff frequency (Hz).
              (default: -1 = disabled)

       -lopass frequency
              Low-pass filter cutoff frequency (Hz).
              (default: -1 = disabled)

       -sscalc
              Perform spectral subtraction using the head silence
              of files.  Valid only for rawfile input.

       -sscalclen
              Specify  the length of head silence in milliseconds
              (default: 300)

       -ssload filename
              Perform spectral subtraction for speech input using
              pre-estimated  noise spectrum from file.  The noise
              spectrum data  should  be  computed  beforehand  by
              mkss.

       -ssalpha value
              Alpha  coefficient  of spectral subtraction.  Noise
              will be subtracted  stronger  as  this  value  gets
              larger, but distortion of the resulting signal also
              becomes remarkable.  (default: 2.0)

       -ssfloor value
              Flooring coefficient of spectral subtraction.   For
              spectral  parameters  that go under zero after sub-
              traction, the source signal is assigned  with  this
              coefficient multiplied. (default: 0.5)

   Language Model (Finite State Grammar)
       -dfa dfa_filename
              finite state automaton grammar file. (required)

       -penalty1 float
              Word   insertion   penalty   for  the  first  pass.
              (default: 0.0)

       -penalty2 float
              Word  insertion  penalty  for   the   first   pass.
              (default: 0.0)

   Word Dictionary
       -v dictionary_file
              Word dictionary file (required)

       -spmodel {WORD|WORD[OUTSYM]|#num}
              Name  of  short  pause  model  as  defined  in  the
              hmmdefs.  (default: "sp")

              For Words that has this model  as  a  pronunciation
              and  intended  to  match  the  short pauses between
              words, Julian handle them especially to  deal  with
              short  pause  insertion.   They  can  be defined as
              shown below.

                                       Example
           Word_name                     <s>
           Word_name[output_symbol]   <s>[silB]
           #Word_ID                      #14

            (Word_ID is the word position in the dictionary
             file starting from 0)

       -forcedict
              Disregard dictionary errors.  Word definitions with
              errors will be skipped on startup.

   Acoustic Model (HMM)
       -h hmmfilename
              HMM definition file to use. (required)

       -hlist HMMlistfilename
              HMMList  file to use.  Required when using triphone
              based HMMs.  This file provides a  mapping  between
              the  logical  triphones  names  genertated from the
              phonetic representation in the dictionary  and  the
              HMM definition names.

       -iwcd1 {max|avg}
              When  using a triphone model, select method to han-
              dle inter-word triphone context on  the  first  and
              last phone of a word in the first pass.

              max: use maximum likelihood of the same
                   context triphones (default)
              avg: use average likelihood of the same
                   context triphones

       -force_ccd / -no_ccd
              Normally  Julian  determines  whether the specified
              hmmdefs is a context-dependent model by  the  model
              definition  names,  i.e.,  whether  the model names
              contain character '+' and '-'.  In case  the  auto-
              matic  detection  fails, you can explicitly specify
              by these options.  These options will override  the
              automatic detection result.

       -notypecheck
              Disable   check   of   the  input  parameter  type.
              (default: enabled)

   Acoustic Computation
       Gaussian Pruning will be automatically enabled when  using
       tied-mixture  based  acoutic  model.   Gaussian  Selection
       needs a monophone model converted by mkgshmm to  activate.

       -tmix K
              With  Gaussian Pruning, specify the number of Gaus-
              sians to compute per codebook. (default: 2)

       -gprune {safe|heuristic|beam|none}
              Set the Gaussian pruning technique to use.
              (default: safe (setup=standard) beam (setup=fast))

       -gshmm hmmdefs
              Specify monophone hmmdefs to use for Gaussian  Mix-
              ture  Selectio.   Monophone model for GMS is gener-
              ated from an ordinary  monophone  HMM  model  using
              mkgshmm.   This  option is disabled by default. (no
              GMS applied)

       -gsnum N
              When using GMS, specify number of  monophone  state
              to  select  from  whole monophone states. (default:
              24)

   Inter-word Short Pause Handling
       -iwsp  (Multi-path version only)  Enable  inter-word  con-
              text-free   short   pause  handling.   This  option
              appends a skippable short  pause  model  for  every
              word  end.  The added model will also be ignored in
              context   modeling.    The   model   specified   by
              "-spmodel" will be appended.

   Search Parameters (First Pass)
       -b beamwidth
              Beam  width  (Number  of HMM nodes).  As this value
              increases the precision  also  increases,  however,
              processing time and memory usage also increase.

              default value: acoustic model dependent
                400 (monophone)
                800 (triphone,PTM)
               1000 (triphone,PTM, setup=v2.1)

       -1pass Only  perform  the first pass search.  This mode is
              automatically set when no 3-gram language model has
              been specified (-nlr).

       -realtime

       -norealtime
              Explicitly  specify  whether  real-time  (pipeline)
              processing will be done in the first pass  or  not.
              For  file  input, the default is OFF (-norealtime),
              for microphone, adinnet  and  NetAudio  input,  the
              default  is ON (-realtime).  This option relates to
              the way CMN is performed: when OFF  CMN  is  calcu-
              lated  for each input independently, when the real-
              time option is ON the previous 5 second of input is
              always used.  Also refer to -progout.

       -cmnsave filename
              Save last CMN parameters computed while recognition
              to the specified  file.   The  parameters  will  be
              saved  to  the  file in each time a input is recog-
              nized, so the output file always keeps the last CMN
              parameters.   If output file already exist, it will
              be overridden.

       -cmnload filename
              Load initial CMN parameters previously saved  in  a
              file  by "-cmnsave".  This option enables Julian to
              recognize the first utterance of a live  microphone
              input or adinnet input with CMN.

   Search Parameters (Second Pass)
       -b2 hyponum
              Beam  width  (number of hypothesis) in second pass.
              If the count of word expantion at a certain  length
              of  hypothesis  reaches  this  limit  while search,
              shorter hypotheses are not expanded further.   This
              prevents  search to fall in breadth-first-like sta-
              tus stacking on  the  same  position,  and  improve
              search failure.  (default: 30)

       -n candidatenum
              The  search continues till 'candidate_num' sentence
              hypotheses have been found.  The obtained  sentence
              hypotheses are sorted by score, and final result is
              displayed in the  order  (see  also  the  "-output"
              option).

              The  possibility  that  the  optimum  hypothesis is
              found increases as this value is increased, but the
              processing time also becomes longer.

              Default  value depends on the  engine setup on com-
              pilation time:
                10  (standard)
                 1  (fast, v2.1)

       -output N
              The top N sentence hypothesis will be Output at the
              end of search.  Use with "-n" option. (default: 1)

       -sb score
              Score  envelope  width for enveloped scoring.  When
              calculating hypothesis  score  for  each  generated
              hypothesis, its trellis expansion and viterbi oper-
              ation will be pruned in the middle of the speech if
              score  on a frame goes under [current maximum score
              of the frame- width].   Giving  small  value  makes
              computation  cost  of  the second pass smaller, but
              computation error may occur.  (default: 80.0)

       -s stack_size
              The maximum number of hypothesis that can be stored
              on the stack during the search.  A larger value may
              give more stable results, but increases the  amount
              of memory required. (default: 500)

       -m overflow_pop_times
              Number  of  expanded hypotheses required to discon-
              tinue  the  search.   If  the  number  of  expanded
              hypotheses is greater then this threshold then, the
              search is discontinued at that point.   The  larger
              this value is, the longer the search will continue,
              but processing time for search failures  will  also
              increase. (default: 2000)

       -lookuprange nframe
              When  performing  word  expansion, this option sets
              the number of frames before and after in  which  to
              determine  next word hypotheses.  This prevents the
              omission of short words but, with  a  large  value,
              the  number  of  expanded  hypotheses increases and
              system becomes slow. (default: 5)

       -looktrellis
              Expand only the trellis words instead  of  grammar-
              permitted  words.   This  option  makes second pass
              decoding faster, but may increase deletion error of
              short words. (default: disabled)

   Forced Alignment
       -walign
              Do viterbi alignment per word units from the recog-
              nition result.  The word boundary  frames  and  the
              average acoustic scores per frame are calculated.

       -palign
              Do viterbi alignment per phoneme (model) units from
              the  recognition  result.   The  phoneme   boundary
              frames  and  the  average acoustic scores per frame
              are calculated.

       -salign
              Do viterbi alignment per HMM state from the  recog-
              nition  result.   The state boundary frames and the
              average acoustic scores per frame are calculated.

   Server Module Mode



       -module [port]
              Run Julian on "Server Module Mode".  After startup,
              Julian  waits  for  tcp/ip  connection from client.
              Once connection is established, Julian start commu-
              nication  with  the client to process incoming com-
              mands from the client,  or  to  output  recognition
              results, input trigger information and other system
              status to the client.  The  multi-grammar  mode  is
              only  supported  at  this  Server Module Mode.  The
              default port number is 10500.

       -outcode [W][L][P][S][w][l][p][s]
              (Only for Server Module Mode) Switch which  symbols
              of  recognized words to be sent to client.  Specify
              'W' for output symbol, 'L' for grammar  entry,  'P'
              for  phoneme sequence, 'S' for score, respectively.
              Capital letters are  for  the  second  pass  (final
              result),  and  small letters are for results of the
              first pass.  For example, if you want to send  only
              the  output symbols and phone sequences as a recog-
              nition result to a client, specify "-outcode WP".

   Message Output
       -quiet Omit phoneme sequence and score,  only  output  the
              best word sequence hypothesis.

       -progout
              Enable progressive output of the partial results on
              the first pass at regular intervals.

       -proginterval msec
              set the output time interval of "-progout" in  mil-
              liseconds.

       -demo  Equivalent to "-progout -quiet"

   OTHERS
       -debug (For  debug)  display  internal  status  and  debug
              information.

       -C jconffile
              Load the jconf file.  The options  written  in  the
              file  are included and expanded at the point.  This
              option can also be used within other jconf file.

       -check wchmm
              (For debug) turn on interactive check mode of  tree
              lexicon structure at startup.

       -check triphone
              (For debug) turn on interactive check mode of model
              mapping between Acoustic model, HMMList and dictio-
              nary at startup.

       -version
              Display version information and exit.

       -help  Display a brief description of all options.

EXAMPLES
       For  examples  of system usage, refer to the tutorial sec-
       tion in the Julian documents.

NOTICE
       Note about path names in jconf files: relative paths in  a
       jconf  file  are interpreted as relative to the jconf file
       itself, not to the current directory.

SEE ALSO
       julius(1), mkbingram(1), mkss(1), jcontrol(1), adinrec(1),
       adintool(1), mkdfa(1), mkgsmm(1), wav2mfcc(1)

       http://julius.sourceforge.jp/  (main)
       http://sourceforge.jp/projects/julius/ (development site)

DIAGNOSTICS
       Julian  normally  will  return  the  exit status 0.  If an
       error occurs, Julian exits abnormally with exit status  1.
       If  an  input file cannot be found or cannot be loaded for
       some reason then Julian  will  skip  processing  for  that
       file.

BUGS
       There  are  some  restrictions to the type and size of the
       models Julian can use.  For a detailed  explanation  refer
       to  the  Julius  documentation.  For bug-reports, inquires
       and comments please contact  julius@kuis.kyoto-u.ac.jp  or
       julius@is.aist-nara.ac.jp.

AUTHORS
       Rev.1.0 (1998/07/20)
              Designed by Tatsuya KAWAHARA and Akinobu LEE (Kyoto
              University)

       Rev.2.0 (1999/02/20)

       Rev.2.1 (1999/04/20)

       Rev.2.2 (1999/10/04)

       Rev.3.1 (2000/05/11)
              Development of above versions by Akinobu LEE (Kyoto
              University)

       Rev.3.2 (2001/08/15)

       Rev.3.3 (2002/09/11)
              Development  of above versions by Akinobu LEE (Nara
              Institute of Science and Technology)

THANKS TO
       From Rev.3.2 Julian is released in the  "Information  Pro-
       cessing Society, Continuous Speech Consortium".

       The  Windows  Microsoft  Speech API compatible version was
       developed by Takashi SUMIYOSHI (Kyoto University).



                              LOCAL                     JULIAN(1)
