|
Qizx/open 4.1 API | |||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
public interface TextTokenizer
Pluggable text tokenizer compatible with standard full-text features. Analyzes text chunks to extract and normalize words.
To parse words, the tokenizer is first initialized with method
start(char[], int) on a text chunk. Then the nextToken() method
is called repeatedly until the last token is parsed.
| Field Summary | |
|---|---|
static int |
END
Code returned by nextToken when the end of the text to tokenize is reached. |
static int |
PARAGRAPH
Code returned by nextToken when a paragraph boundary is recognized. |
static int |
SENTENCE
Code returned by nextToken when a sentence boundary is recognized. |
static int |
WORD
Code returned by nextToken when a word is recognized. |
| Method Summary | |
|---|---|
void |
copyTokenTo(char[] array,
int start)
Copies the current token into a character array. |
void |
defineSpecialChar(char ch)
Define a character to recognize when parsing of special characters is enabled. |
int |
getDigitMax()
Returns the maximum number of digits a word can contain. |
char[] |
getTokenChars()
Returns the current token as a new character array. |
int |
getTokenLength()
Returns the original length of the last word returned by nextWord. |
int |
getTokenOffset()
Returns the offset (in source text chunk) of the last word returned by nextWord. |
boolean |
gotWildcard()
Returns true if wildcard characters have been recognized in the current token. |
boolean |
isAcceptingWildcards()
Returns true if wildcard characters are recognized. |
boolean |
isParsingSpecialChars()
Returns true if special characters are recognized. |
int |
nextToken()
Returns the type of the next token, or END if no more token can be found. |
void |
setAcceptingWildcards(boolean acceptingWildcards)
If set to true, wildcard characters are recognized. |
void |
setDigitMax(int max)
Sets the maximum number of digits a word can contain. |
void |
setParsingSpecialChars(boolean parsingSpecialChars)
If set to true, special characters are recognized. |
void |
start(char[] text,
int length)
Starts the analysis of a new text chunk. |
void |
start(CharSequence text)
Starts the analysis of a new text chunk. |
| Field Detail |
|---|
static final int END
static final int WORD
static final int SENTENCE
Not yet supported.
static final int PARAGRAPH
Not yet supported.
| Method Detail |
|---|
void start(char[] text,
int length)
text - characters to tokenizelength - number of characters in the text arrayvoid start(CharSequence text)
text - fragment to tokenizeint nextToken()
int getTokenOffset()
int getTokenLength()
char[] getTokenChars()
void copyTokenTo(char[] array,
int start)
array - destination array. Must fit the size of the token.start - offset in the destination array.boolean isParsingSpecialChars()
defineSpecialChar(char)void setParsingSpecialChars(boolean parsingSpecialChars)
defineSpecialChar(char)int getDigitMax()
void setDigitMax(int max)
void defineSpecialChar(char ch)
boolean isAcceptingWildcards()
Wildcard character sequences are ".", ".?", ".*", ".+", and ".{n,m}"
void setAcceptingWildcards(boolean acceptingWildcards)
boolean gotWildcard()
|
© 2010 Axyana Software | |||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||