Context free grammar in Java

BNF and context-free grammar

What is a grammar ? A grammar is a model of a language, or a language of a language. We can use finite grammar rules to generate infinite sentences of a language. The general method we describe a grammar is called BNF. For example, we describe a context-free grammar (CFG) using 4-tuple:

  1. A set of terminal symbols.
  2. A set of non-terminal symbols.
  3. A set of rules known as productions (or production rules) which can transform each non-terminal symbol to a sequence of non-terminal and/or terminal symbols.
  4. A start symbol or goal symbol, also called sentence in English.

A production example :

S -> aS;
S -> abc;

Maybe you know another grammar called context sensitive grammar. Then, what’s the differences between context free grammar and context sensitive grammar ? The most important difference is production rules : the left hand side of a context free grammar production is non-terminal symbol , but the left hand side of a context sensitive grammar contains not-termianl and terminal symbols.

More information can be found in Chomsky hierarchy. We can get the grammars’ relationship below.

Lexical grammar and syntactic grammar

Java is a programming language using context free grammar to define lexical grammar and syntactic grammar.

For example, the lexical grammar in Java contains the production:

BooleanLiteral :
        t r u e
        f a l s e

BooleanLiteral is a non-terminal symbol in the left hand side of the production, true or false is the right hand side sequence.

The syntactic grammar for the Java programming language has tokens defined by the lexical grammar as its terminal symbols.

For example , The syntactic production :

    if ( Expression ) Statement

IfThenStatement is a non-termianl symbol , which represents the token if which is a token defined by the lexical grammar, followed by a left parenthesis token, followed by an Expression, followed by a right parenthesis token, followed by a Statement.