|
Qizx/Open v0.3 | |||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Object | +--net.xfra.qizxopen.util.DefaultWordExtractor
A default word extractor suitable for European languages compatible with ISO-8859-1.
By default, words start on a letter, accept letters/digits inside. Characters are folded to lowercase and - unless setKeepAccents(true) is called - accented letters to the corresponding non-accented letters (e.g eacute maps to 'E'.) This behavior can be redefined in subclasses by redefining isWordStart, isWordPart and mapChar.
| Constructor Summary | |
DefaultWordExtractor()
|
|
| Method Summary | |
char |
charAt(int ahead)
Returns the character at current position + ahead, or 0 if after end. |
boolean |
isWordPart(char c)
Returns true if a word may contain this character. |
boolean |
isWordStart(char c)
Returns true if a word may begin with this character. |
static void |
main(java.lang.String[] args)
|
char |
mapChar(char c)
Normalizes a character (belonging to a word) |
char |
nextChar()
Moves to next character and return it, returns 0 if at end. |
char[] |
nextWord()
Gets the next normalized word, or null if no more words. |
void |
setKeepAccents(boolean keep)
|
void |
start(char[] text,
int length)
Starts the analysis of a new text chunk. |
int |
wordLength()
Returns the original length of the last word returned by nextWord. |
int |
wordOffset()
Returns the offset of the last word returned by nextWord. |
| Methods inherited from class java.lang.Object |
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
public DefaultWordExtractor()
| Method Detail |
public void start(char[] text,
int length)
WordExtractor
start in interface WordExtractorpublic boolean isWordStart(char c)
isWordStart in interface WordExtractorpublic boolean isWordPart(char c)
isWordPart in interface WordExtractorpublic char mapChar(char c)
WordExtractor
mapChar in interface WordExtractorpublic char[] nextWord()
WordExtractor
nextWord in interface WordExtractorpublic char charAt(int ahead)
WordExtractor
charAt in interface WordExtractorpublic char nextChar()
WordExtractor
nextChar in interface WordExtractorpublic int wordOffset()
WordExtractor
wordOffset in interface WordExtractorpublic int wordLength()
WordExtractor
wordLength in interface WordExtractorpublic void setKeepAccents(boolean keep)
public static void main(java.lang.String[] args)
|
Copyright Xavier FRANC 2003-2004 | |||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||