Interface Parser<NT extends Enum<NT>>

Type Parameters:
NT - is an Enum type with one value for every nonterminal in the grammar

public interface Parser<NT extends Enum<NT>>
A Parser is an immutable object that is able to take a sequence of characters and return a parse tree according to some grammar.

Parsers are constructed by calling compile() with a grammar, which might be stored in a string, in a file, or read from a stream.

Once constructed, a Parser object is used by calling parse() on a sequence of characters (represented as a string or file or stream). Its result is a ParseTree showing how that string matches the grammar.

The type parameter NT should be an Enum type with the same (case-insensitive) names as the nonterminals in the grammar. This allows nonterminals to be referred to by your Java code with static checking and type safety. For example, if your grammar is:

String sumGrammar = "expression ::= number '+' number ;  number ::= [0-9]+;"
then you should create a nonterminal enum like this:
enum SumGrammar { EXPRESSION, NUMBER };
and then use:
Parser<SumGrammar>.compile(sumGrammar, SumGrammar.EXPRESSION)
to compile it into a parser.

The grammar of a grammar is as follows.

   @skip whitespaceAndComments {
     grammar ::= ( production | skipBlock )+
     production ::= nonterminal '::=' union ';'
     skipBlock ::= '@skip' nonterminal '{' production* '}'
     union :: = concatenation ('|' concatenation)*
     concatenation ::= repetition*
     repetition ::= unit repeatOperator?
     unit ::= nonterminal | terminal | '(' union ')'
   }
   nonterminal ::= [a-zA-Z_][a-zA-Z_0-9]*
   terminal ::= quotedString | characterSet | anyChar | characterClass
   quotedString ::= "'" ([^'\r\n\\] | '\\' . )* "'"   // e.g. 'hello', '\'',  '\r\n\t', ''
                  | '"' ([^"\r\n\\] | '\\' . )* '"'   // e.g. "world", "\"", "\r\n\t", ""
   characterSet ::= '[' ([^\]\r\n\\] | '\\' . )+ ']'   // e.g. [abc], [a-z], [^a-z], [\]], [\r\n\t]
   anyChar ::= '.'
   repeatOperator ::= [*+?] | '{' ( number | range | upperBound | lowerBound ) '}'
   number ::= [0-9]+
   range ::= number ',' number
   upperBound ::= ',' number
   lowerBound ::= number ','
   characterClass ::= '\\' [dsw]     // e.g. \d, \s, \w
   whitespaceAndComments ::= (whitespace | oneLineComment | blockComment)*
   whitespace ::= [ \t\r\n] 
   oneLineComment ::= '//' [^\r\n]* [\r\n]+ 
   blockComment ::= '/*' [^*]* '*' ([^/]* '*')* '/'
 
Author:
6.005/6.031 course staff
  • Field Summary

    Fields
    Modifier and Type Field Description
    static String VERSION  
  • Method Summary

    Modifier and Type Method Description
    static <NT extends Enum<NT>>
    Parser<NT>
    compile​(File f, NT rootNonterminal)
    Compile a Parser from a grammar stored in a file.
    static <NT extends Enum<NT>>
    Parser<NT>
    compile​(InputStream in, NT rootNonterminal)
    Compile a Parser from a grammar represented as an InputStream.
    static <NT extends Enum<NT>>
    Parser<NT>
    compile​(Reader in, NT rootNonterminal)
    Compile a Parser from a grammar represented as a Reader stream.
    static <NT extends Enum<NT>>
    Parser<NT>
    compile​(String grammar, NT rootNonterminal)
    Compile a Parser from a grammar represented as a string.
    default ParseTree<NT> parse​(File f)
    Parses a file based on the grammar internally represented by the parser.
    default ParseTree<NT> parse​(InputStream stream)
    Parses a stream based on the grammar internally represented by the parser.
    ParseTree<NT> parse​(Reader in)
    Parses a stream based on the grammar internally represented by the parser.
    ParseTree<NT> parse​(String string)
    Parses a string based on the grammar internally represented by the parser.
  • Field Details

  • Method Details

    • compile

      static <NT extends Enum<NT>> Parser<NT> compile​(String grammar, NT rootNonterminal) throws UnableToParseException
      Compile a Parser from a grammar represented as a string.
      Type Parameters:
      NT - an Enum type that contains one value for every nonterminal in the grammar.
      Parameters:
      grammar - the grammar to use
      rootNonterminal - the desired root nonterminal in the grammar
      Returns:
      a parser for the given grammar that will start parsing at rootNonterminal.
      Throws:
      UnableToParseException - if the grammar has a syntax error
    • compile

      static <NT extends Enum<NT>> Parser<NT> compile​(Reader in, NT rootNonterminal) throws UnableToParseException, IOException
      Compile a Parser from a grammar represented as a Reader stream.
      Type Parameters:
      NT - an Enum type that contains one value for every nonterminal in the grammar.
      Parameters:
      in - contains the grammar
      rootNonterminal - the desired root nonterminal in the grammar
      Returns:
      a parser for the given grammar that will start parsing at rootNonterminal.
      Throws:
      UnableToParseException - if the grammar has a syntax error
      IOException - if the stream has an I/O error
    • compile

      static <NT extends Enum<NT>> Parser<NT> compile​(File f, NT rootNonterminal) throws UnableToParseException, IOException
      Compile a Parser from a grammar stored in a file.
      Type Parameters:
      NT - an Enum type that contains one value for every nonterminal in the grammar.
      Parameters:
      f - file containing the grammar. Required to have UTF-8 encoding; if you need a different encoding, use compile(new FileReader(...),...) to choose the encoding yourself instead.
      rootNonterminal - the desired root nonterminal in the grammar
      Returns:
      a parser for the given grammar that will start parsing at rootNonterminal.
      Throws:
      UnableToParseException - if the grammar has a syntax error
      IOException - if the file is missing or has an I/O error
    • compile

      static <NT extends Enum<NT>> Parser<NT> compile​(InputStream in, NT rootNonterminal) throws UnableToParseException, IOException
      Compile a Parser from a grammar represented as an InputStream.
      Type Parameters:
      NT - an Enum type that contains one value for every nonterminal in the grammar.
      Parameters:
      in - stream containing the grammar. Required to have UTF-8 encoding; if you need a different encoding, if you need a different encoding, use compile(new InputStreamReader(...),...) to choose the encoding yourself instead.
      rootNonterminal - the desired root nonterminal in the grammar
      Returns:
      a parser for the given grammar that will start parsing at rootNonterminal.
      Throws:
      UnableToParseException - if the grammar has a syntax error
      IOException - if the stream has an I/O error
    • parse

      ParseTree<NT> parse​(String string) throws UnableToParseException
      Parses a string based on the grammar internally represented by the parser.
      Parameters:
      string - string to parse
      Returns:
      ParseTree representing a successful parse of the string
      Throws:
      UnableToParseException - if string cannot be parsed, describing approximately where the parsing error occurred
    • parse

      Parses a stream based on the grammar internally represented by the parser.
      Parameters:
      in - stream from which to read the text to be parsed.
      Returns:
      ParseTree representing a successful parse of the content of the stream
      Throws:
      UnableToParseException - if the stream cannot be parsed, describing approximately where the parsing error occurred
      IOException - if the stream has an I/O error.
    • parse

      default ParseTree<NT> parse​(File f) throws UnableToParseException, IOException
      Parses a file based on the grammar internally represented by the parser.
      Parameters:
      f - File containing the text to be parsed. Required to have UTF-8 encoding; if you need a different encoding, use parse(new FileReader(...),...) to choose the encoding yourself instead.
      Returns:
      ParseTree representing a successful parse of the content of the file
      Throws:
      UnableToParseException - if the file cannot be parsed, describing approximately where the parsing error occurred
      IOException - if the file has an I/O error.
    • parse

      default ParseTree<NT> parse​(InputStream stream) throws UnableToParseException, IOException
      Parses a stream based on the grammar internally represented by the parser.
      Parameters:
      stream - stream from which to read the text to be parsed. Required to have UTF-8 encoding; if you need a different encoding, use compile(new InputStreamReader(...),...) to choose the encoding yourself instead.
      Returns:
      ParseTree representing a successful parse of the content of the stream
      Throws:
      UnableToParseException - if the stream cannot be parsed, describing approximately where the parsing error occurred
      IOException - if the stream has an I/O error.