Interface Tokenizer

All Known Implementing Classes:
TokenizerImpl

public interface Tokenizer
Chops a string or text file into Token instances.
  • Method Details

    • setInputText

      void setInputText(String textToTokenize)
      Sets the text to be tokenized by this tokenizer.
      Parameters:
      textToTokenize - the text to tokenize
    • setInputReader

      void setInputReader(Reader reader)
      Sets the input reader.
      Parameters:
      reader - the input source
    • getNextToken

      Token getNextToken()
      Returns the next token.
      Returns:
      the next token if it exists; otherwise null
    • hasMoreTokens

      boolean hasMoreTokens()
      Returns true if there are more tokens, false otherwise.
      Returns:
      true if there are more tokens; otherwise false
    • hasErrors

      boolean hasErrors()
      Returns true if there were errors while reading tokens.
      Returns:
      true if there were errors; otherwise false
    • getErrorDescription

      String getErrorDescription()
      If hasErrors returns true, returns a description of the error encountered. Otherwise returns null.
      Returns:
      a description of the last error that occurred
    • setWhitespaceSymbols

      void setWhitespaceSymbols(String symbols)
      Sets the whitespace symbols of this Tokenizer to the given symbols.
      Parameters:
      symbols - the whitespace symbols
    • setSingleCharSymbols

      void setSingleCharSymbols(String symbols)
      Sets the single character symbols of this Tokenizer to the given symbols.
      Parameters:
      symbols - the single character symbols
    • setPrepunctuationSymbols

      void setPrepunctuationSymbols(String symbols)
      Sets the prepunctuation symbols of this Tokenizer to the given symbols.
      Parameters:
      symbols - the prepunctuation symbols
    • setPostpunctuationSymbols

      void setPostpunctuationSymbols(String symbols)
      Sets the postpunctuation symbols of this Tokenizer to the given symbols.
      Parameters:
      symbols - the postpunctuation symbols
    • isBreak

      boolean isBreak()
      Determines if the current token should start a new sentence.
      Returns:
      true if a new sentence should be started