JULIUS(1)                                               JULIUS(1)



NAME
       Julius - open source multi-purpose LVCSR engine

SYNOPSIS
       julius [-C jconffile] [options ...]

DESCRIPTION
       Julius  is  a high-performance, multi-purpose, open source
       speech recognition engine that performs  almost  real-time
       recognition  of continuous speech with 60k-word vocabulary
       on most current PCs.

       Word 3-gram language model and triphone HMM acoustic model
       of  any  units and sizes can be used.  As standard formats
       are adopted for the models, users can use their  own  lan-
       guage and acoustic models with Julius to build recognition
       system of their own.

       Julius can perform recognition on audio files, live micro-
       phone  input,  network  input and feature parameter files.
       The maximum size of vocabulary is 65,535 words.

RECOGNITION MODELS
       Julius supports the following models.

       Acoustic Models
                 Sub-word HMM (Hidden Markov Model) in HTK format
                 are supported.  Phoneme models (monophone), con-
                 text dependent phoneme models (triphone),  tied-
                 mixture  and phonetic tied-mixture models of any
                 unit can be used.  When using context  dependent
                 models, interword context is also handled.

       Lanaguage model
                 The  system  uses 2-gram and reverse 3-gram lan-
                 guage models.  The Standard ARPA format is  sup-
                 ported.   In addition, a binary format N-gram is
                 also supported for efficiency.   The  binary  N-
                 gram  can  be  converted  from the ARPA language
                 models using the attached tool mkbingram.

SPEECH INPUT
       Speech waveform files (16bit  WAV  (no  compression),  RAW
       format,  and  many  other if used with libsndfile library)
       and feature parameter files (HTK format) can  be  used  as
       speech  input.   Live  input  from  either a Microphone, a
       DatLink (NetAudio) system, or via tcpip  network  is  also
       supported.

       Notice:  Julius  can  only  extract  MFCC_E_D_N_Z features
       internally.  If you want to use HMMs based on another type
       of  feature  extraction  then  microphone input and speech
       waveform files cannot be used.  Use an external tool  such
       as  Hcopy  or  wav2mfcc  to create the appropriate feature
       parameter files.

SEARCH ALGORITHM
       Recognition algorithm of Julius is  based  on  a  two-pass
       strategy.   Word 2-gram and reverse word 3-gram is used on
       the respective passes.  The entire input is  processed  on
       the  first  pass, and again the final searching process is
       performed again for the input, using  the  result  of  the
       first pass as a "guidance".  Specifically, the recognition
       algorithm is based on a tree-trellis heuristic search com-
       bined with left-to-right frame-synchronous beam search and
       right-to-left stack decoding search.

       When using context dependent phones (triphones), interword
       contexts  are  taken into consideration.  For tied-mixture
       and  phonetic  tied-mixture  models,  high-speed  acoustic
       likelihood calculation is possible using gaussian pruning.

       For more details, see the related  document  or  web  site
       below.

OPTIONS
       The  options below allow you to specify the models and set
       system parameters.  You can set these option at  the  com-
       mand  line,  however  it  is  recommended that you combine
       these options in a "jconf settings file" and use the  "-C"
       option to read it at run time.

       Below is an explanation of all the available options.

   Speech Input
       -input {rawfile|mfcfile|mic|adinnet|netaudio|stdin}
              Select speech data input source.  'rawfile' is from
              waveform file (file name should be specified  after
              startup).   'mfcfile'  is  a  feature  vector  file
              extracted by HTK  HCopy  tool.   'mic'  means  live
              microphone  input,  and  'adinnet'  means receiving
              waveform data via tcpip  network  from  an  adinnet
              client. 'stdin' means standard tty input.

              The  supported waveform file format varies based on
              compilation time configuration.  To see what format
              is  actually  supported, see the help message using
              option "-help".  (for stdin  input,  only  WAV  (no
              compression) and RAW (16bit, BE) is supported.)
              (default: mfcfile)

       -filelist file
              (with  -input  rawfile|mfcfile) perform recognition
              on all files contained within the target file.

       -adport portnum
              (with -input adinnet) adinnet port number (default:
              5530)

       -NA server:unit
              (with -input netaudio) set the server name and unit
              ID of the Datlink unit.

       -record directory
              auto-save recognized speech data under  the  direc-
              tory.   Each  segmented inputs are recorded each by
              one, with  a  filename  of  "YYYY.MMDD.HHMMSS.raw",
              which  shows  the system time when the input begins
              (YYYY=year, MMDD=month/day, HHMMSS=hour/minute/sec-
              ond).   The file format is RAW, 16bit, 16kHz, mono,
              big endian.

   Speech Detection
       -cutsilence

       -nocutsilence
              Force silence cutting (=speech  segment  detection)
              ON/OFF.  (default:  ON  for  mic/adinnet,  OFF  for
              files)

       -lv threslevel
              Amplitude threshold (0 - 32767).  If the  amplitude
              passes  this  threshold  it is considered to be the
              beginning of a speech segment, if  it  drops  below
              this  level  then  it is the end of the speech seg-
              ment. (default: 3000)

       -zc zerocrossnum
              Zero crossing threshold per a second (default: 60)

       -headmargin msec
              Margin at the start of the speech segment  in  mil-
              liseconds. (default: 300)

       -tailmargin msec
              Margin  at  the  end  of the speech segment in mil-
              liseconds. (default: 400)

       -nostrip
              On some sound devices, invalid "0" samples  may  be
              recorded at the start and end of recording.  Julius
              remove them automatically by default.  This  option
              inhibit the automatic removal.

   Acoustic Analysis
       -smpFreq frequency
              Sampling frequency (Hz).
              (default: 16kHz = 625ns).

       -smpPeriod period
              Sampling rate (nanoseconds).
              (default: 625ns = 16kHz).

       -fsize sample
              Analysis  window size (No. samples) (default: 400).

       -fshift sample
              Frame shift (No. samples) (default: 160).

       -delwin frame
              Delta window size (No. frames) (default: 2).

       -hipass frequency
              High-pass filter cutoff frequency (Hz).
              (default: -1 = disabled)

       -lopass frequency
              Low-pass filter cutoff frequency (Hz).
              (default: -1 = disabled)

       -sscalc
              Perform spectral subtraction using the head silence
              of files.  Valid only for rawfile input.

       -sscalclen
              Specify  the length of head silence in milliseconds
              (default: 300)

       -ssload filename
              Perform spectral subtraction for speech input using
              pre-estimated  noise spectrum from file.  The noise
              spectrum data  should  be  computed  beforehand  by
              mkss.


       -ssalpha value
              Alpha  coefficient  of spectral subtraction.  Noise
              will be subtracted  stronger  as  this  value  gets
              larger, but distortion of the resulting signal also
              becomes remarkable.  (default: 2.0)

       -ssfloor value
              Flooring coefficient of spectral subtraction.   For
              spectral  parameters  that go under zero after sub-
              traction, the source signal is assigned  with  this
              coefficient multiplied. (default: 0.5)

   Language Model (word N-gram)
       -nlr 2gram_filename
              2-gram  language  model  filename  in standard ARPA
              format.

       -nrl rev_3gram_filename
              Reverse 3-gram language model  filename.   This  is
              required  for  the  second search pass.  If this is
              not defined then only  the  first  pass  will  take
              place.

       -d bingram_filename
              Use  a  binary language model as built using mkbin-
              gram(1).  This is used in place of the  "-nlr"  and
              "-nlr"  options above, and allows Julius to perform
              rapid initialization.

       -lmp lm_weight lm_penalty

       -lmp2 lm_weight2 lm_penalty2
              Language model score  weights  and  word  insertion
              penalties  for  the first and second passes respec-
              tively.

              The hypothesis language scores are scaled as  shown
              below:

              lm_score1  =  lm_weight * 2-gram_score + lm_penalty
              lm_score2 = lm_weight2 * 3-gram_score + lm_penalty2

              The defaults are dependent on acoustic model:

                First-Pass | Second-Pass
               --------------------------
                5.0 -1.0   |  6.0  0.0 (monophone)
                8.0 -2.0   |  8.0 -2.0 (triphone,PTM)
                9.0  8.0   | 11.0 -2.0 (triphone,PTM, setup=v2.1)

       -transp float
              Additional insertion penalty for transparent words.
              (default: 0.0)

   Word Dictionary
       -v dictionary_file
              Word dictionary file (required)

       -silhead {WORD|WORD[OUTSYM]|#num}

       -siltail {WORD|WORD[OUTSYM]|#num}
              Sentence  start  and end silence word as defined in
              the dictionary.  (default: "<s>" / "</s>")

              These are dealt with specially  during  recognition
              to hypotheses start and end points (margins).  They
              can be defined as shown below.

                                       Example
           Word_name                     <s>
           Word_name[output_symbol]   <s>[silB]
           #Word_ID                      #14

            (Word_ID is the word position in the dictionary
             file starting from 0)

       -forcedict
              Disregard dictionary errors.  Word definitions with
              errors will be skipped on startup.

   Acoustic Model (HMM)
       -h hmmfilename
              HMM definition file to use. (required)

       -hlist HMMlistfilename
              HMMList  file to use.  Required when using triphone
              based HMMs.  This file provides a  mapping  between
              the  logical  triphones  names  genertated from the
              phonetic representation in the dictionary  and  the
              HMM definition names.

       -iwcd1 {max|avg}
              When  using a triphone model, select method to han-
              dle inter-word triphone context on  the  first  and
              last phone of a word in the first pass.

              max: use maximum likelihood of the same
                   context triphones (default)
              avg: use average likelihood of the same
                   context triphones

       -force_ccd / -no_ccd
              Normally  Julius  determines  whether the specified
              hmmdefs is a context-dependent model by  the  model
              definition  names,  i.e.,  whether  the model names
              contain character '+' and '-'.  In case  the  auto-
              matic  detection  fails, you can explicitly specify
              by these options.  These options will override  the
              automatic detection result.

       -notypecheck
              Disable   check   of   the  input  parameter  type.
              (default: enabled)

   Acoustic Computation
       Gaussian Pruning will be automatically enabled when  using
       tied-mixture  based  acoutic  model.   Gaussian  Selection
       needs a monophone model converted by mkgshmm to  activate.

       -tmix K
              With  Gaussian Pruning, specify the number of Gaus-
              sians to compute per codebook. (default: 2)

       -gprune {safe|heuristic|beam|none}
              Set the Gaussian pruning technique to use.
              (default: safe (setup=standard) beam (setup=fast))

       -gshmm hmmdefs
              Specify monophone hmmdefs to use for Gaussian  Mix-
              ture  Selectio.   Monophone model for GMS is gener-
              ated from an ordinary  monophone  HMM  model  using
              mkgshmm.   This  option is disabled by default. (no
              GMS applied)

       -gsnum N
              When using GMS, specify number of  monophone  state
              to  select  from  whole monophone states. (default:
              24)

   Inter-word Short Pause Handling
       -iwspword
              Add a word entry to the dictionary that corresponds
              to  inter-word  short  pauses.   The content of the
              word entry can be specified by "-iwspentry".

       -iwspentry
              Specify the  word  entry  that  will  be  added  by
              "-iwspword".  (default: "<UNK> [sp] sp sp")

       -iwsp  (Multi-path  version  only)  Enable inter-word con-
              text-free  short  pause  handling.    This   option
              appends  a  skippable  short  pause model for every
              word end.  The added model will also be ignored  in
              context  modeling.  The model to be appended can be
              specified by "-spmodel" option.

       -spmodel
              Specify short-pause model name that will be used in
              "-iwsp". (default: "sp")

   Short-pause Segmentation
       The  short  pause  segmentation  can be used for sucessive
       decoding of a long utterance.  Enabled when compiled  with
       '--enable-sp-segment'.

       -spdur Set the short-pause duration threshold in number of
              frames.  If a  short-pause  word  has  the  maximum
              likelihood  in  successive  frames longer than this
              value, then interrupt the first pass and start  the
              second pass. (default: 10)

   Search Parameters (First Pass)
       -b beamwidth
              Beam  width  (Number  of HMM nodes).  As this value
              increases the precision  also  increases,  however,
              processing time and memory usage also increase.

              default value: acoustic model dependent
                400 (monophone)
                800 (triphone,PTM)
               1000 (triphone,PTM, setup=v2.1)

       -sepnum N
              Number of high frequency words to be separated from
              the lexicon tree. (default: 150)

       -1pass Only perform the first pass search.  This  mode  is
              automatically set when no 3-gram language model has
              been specified (-nlr).

       -realtime

       -norealtime
              Explicitly  specify  whether  real-time  (pipeline)
              processing  will  be done in the first pass or not.
              For file input, the default is  OFF  (-norealtime),
              for  microphone,  adinnet  and  NetAudio input, the
              default is ON (-realtime).  This option relates  to
              the  way  CMN  is performed: when OFF CMN is calcu-
              lated for each input independently, when the  real-
              time option is ON the previous 5 second of input is
              always used.  Also refer to -progout.

       -cmnsave filename
              Save last CMN parameters computed while recognition
              to  the  specified  file.   The  parameters will be
              saved to the file in each time a  input  is  recog-
              nized, so the output file always keeps the last CMN
              parameters.  If output file already exist, it  will
              be overridden.

       -cmnload filename
              Load  initial  CMN parameters previously saved in a
              file by "-cmnsave".  This option enables Julius  to
              recognize  the first utterance of a live microphone
              input or adinnet input with CMN.

   Search Parameters (Second Pass)
       -b2 hyponum
              Beam width (number of hypothesis) in  second  pass.
              If  the count of word expantion at a certain length
              of hypothesis  reaches  this  limit  while  search,
              shorter  hypotheses are not expanded further.  This
              prevents search to fall in breadth-first-like  sta-
              tus  stacking  on  the  same  position, and improve
              search failure.  (default: 30)

       -n candidatenum
              The search continues till 'candidate_num'  sentence
              hypotheses  have been found.  The obtained sentence
              hypotheses are sorted by score, and final result is
              displayed  in  the  order  (see  also the "-output"
              option).

              The possibility  that  the  optimum  hypothesis  is
              found increases as this value is increased, but the
              processing time also becomes longer.

              Default value depends on the  engine setup on  com-
              pilation time:
                10  (standard)
                 1  (fast, v2.1)

       -output N
              The top N sentence hypothesis will be Output at the
              end of search.  Use with "-n" option. (default: 1)

       -sb score
              Score envelope width for enveloped  scoring.   When
              calculating  hypothesis  score  for  each generated
              hypothesis, its trellis expansion and viterbi oper-
              ation will be pruned in the middle of the speech if
              score on a frame goes under [current maximum  score
              of  the  frame-  width].   Giving small value makes
              computation cost of the second  pass  smaller,  but
              computation error may occur.  (default: 80.0)

       -s stack_size
              The maximum number of hypothesis that can be stored
              on the stack during the search.  A larger value may
              give  more stable results, but increases the amount
              of memory required. (default: 500)


       -m overflow_pop_times
              Number of expanded hypotheses required  to  discon-
              tinue  the  search.   If  the  number  of  expanded
              hypotheses is greater then this threshold then, the
              search  is  discontinued at that point.  The larger
              this value is, the longer the search will continue,
              but  processing  time for search failures will also
              increase. (default: 2000)

       -lookuprange nframe
              When performing word expansion,  this  option  sets
              the  number  of frames before and after in which to
              determine next word hypotheses.  This prevents  the
              omission  of  short  words but, with a large value,
              the number of  expanded  hypotheses  increases  and
              system becomes slow. (default: 5)

   Forced Alignment
       -walign
              Do viterbi alignment per word units from the recog-
              nition result.  The word boundary  frames  and  the
              average acoustic scores per frame are calculated.

       -palign
              Do viterbi alignment per phoneme (model) units from
              the  recognition  result.   The  phoneme   boundary
              frames  and  the  average acoustic scores per frame
              are calculated.

       -salign
              Do viterbi alignment per HMM state from the  recog-
              nition  result.   The state boundary frames and the
              average acoustic scores per frame are calculated.

   Server Module Mode
       -module [port]
              Run Julius on "Server Module Mode".  After startup,
              Julius  waits  for  tcp/ip  connection from client.
              Once connection is established, Julius start commu-
              nication  with  the client to process incoming com-
              mands from the client,  or  to  output  recognition
              results, input trigger information and other system
              status to the client.  The  multi-grammar  mode  is
              only  supported  at  this  Server Module Mode.  The
              default port number is 10500.

       -outcode [W][L][P][S][w][l][p][s]
              (Only for Server Module Mode) Switch which  symbols
              of  recognized words to be sent to client.  Specify
              'W' for output symbol, 'L' for grammar  entry,  'P'
              for  phoneme sequence, 'S' for score, respectively.
              Capital letters are  for  the  second  pass  (final
              result),  and  small letters are for results of the
              first pass.  For example, if you want to send  only
              the  output symbols and phone sequences as a recog-
              nition result to a client, specify "-outcode WP".

   Message Output
       -separatescore
              Output the language and acoustic scores separately.

       -quiet Omit  phoneme  sequence  and score, only output the
              best word sequence hypothesis.



       -progout
              Enable progressive output of the partial results on
              the first pass at regular intervals.

       -proginterval msec
              set  the output time interval of "-progout" in mil-
              liseconds.

       -demo  Equivalent to "-progout -quiet"

   OTHERS
       -debug (For  debug)  display  internal  status  and  debug
              information.

       -C jconffile
              Load  the  jconf  file.  The options written in the
              file are included and expanded at the point.   This
              option can also be used within other jconf file.

       -check wchmm
              (For  debug) turn on interactive check mode of tree
              lexicon structure at startup.

       -check triphone
              (For debug) turn on interactive check mode of model
              mapping between Acoustic model, HMMList and dictio-
              nary at startup.

       -version
              Display version information and exit.

       -help  Display a brief description of all options.

EXAMPLES
       For examples of system usage, refer to the  tutorial  sec-
       tion in the Julius documents.

NOTICE
       Note  about path names in jconf files: relative paths in a
       jconf file are interpreted as relative to the  jconf  file
       itself, not to the current directory.

SEE ALSO
       julian(1), mkbingram(1), mkss(1), jcontrol(1), adinrec(1),
       adintool(1), mkdfa(1), mkgsmm(1), wav2mfcc(1)

       http://julius.sourceforge.jp/  (main)
       http://sourceforge.jp/projects/julius/ (development site)

DIAGNOSTICS
       Julius normally will return the  exit  status  0.   If  an
       error  occurs, Julius exits abnormally with exit status 1.
       If an input file cannot be found or cannot be  loaded  for
       some  reason  then  Julius  will  skip processing for that
       file.

BUGS
       There are some restrictions to the type and  size  of  the
       models  Julius  can use.  For a detailed explanation refer
       to the Julius documentation.   For  bug-reports,  inquires
       and  comments  please contact julius@kuis.kyoto-u.ac.jp or
       julius@is.aist-nara.ac.jp.

AUTHORS


       Rev.1.0 (1998/02/20)
              Designed by Tatsuya KAWAHARA and Akinobu LEE (Kyoto
              University)

              Development by Akinobu LEE (Kyoto University)

       Rev.1.1 (1998/04/14)

       Rev.1.2 (1998/10/31)

       Rev.2.0 (1999/02/20)

       Rev.2.1 (1999/04/20)

       Rev.2.2 (1999/10/04)

       Rev.3.0 (2000/02/14)

       Rev.3.1 (2000/05/11)
              Development of above versions by Akinobu LEE (Kyoto
              University)

       Rev.3.2 (2001/08/15)

       Rev.3.3 (2002/09/11)
              Development of above versions by Akinobu LEE  (Nara
              Institute of Science and Technology)

THANKS TO
       From  Rev.3.2  Julius is released by the "Information Pro-
       cessing Society, Continuous Speech Consortium".

       The Windows DLL version  was  developed  and  released  by
       Hideki BANNO (Nagoya University).

       The  Windows  Microsoft  Speech API compatible version was
       developed by Takashi SUMIYOSHI (Kyoto University).



                              LOCAL                     JULIUS(1)
