OTHERS 四月 24, 2022

如何描述一种语言?

文章字数 11k 阅读约需 10 mins. 阅读次数

如何描述一种语言

当我们需要定义一种语言的规范或语法时,需要将这种语言通过一种形式描述出来,用普通的文字来描述,会非常的繁琐,那么有什么工具来帮助我们描述这种语言的规范呢?

BNF

BNF 就是一种描述语法的规范,是 Backus–Naur form 或 Backus normal form 的缩写,其形式非常简单:

 <symbol> ::= __expression__

在这里:

  • <symbol> 代表一个非终止符,通过 <> 来定义,如 <主语><rule>
  • __expression__ 由一个或多个符号组成,符号可以是一个终止符,也可以是一个非终止符;
  • ::= 代表一个非终止符的规则,左边的符号必须通过右边的表达式来替代,如 <rule1>::=__exp1__ <rule2><rule2>::=__exp2__,则 <rule1> 可以表示为 __exp1__ __exp2__
  • 多个符号可以通过 | 间隔,表示一个非终止符的可能值,如 <rule>::=__exp1__ | __exp2__
  • 从未出现在左边的符号代表终止符,意为可以终止替代操作;
  • 字面量可通过 "' 来定义。

所以 BNF 自身语法中所使用到的符号,只有 <>|::="',简单,但却可以描述复杂的语法规则。

关于 BNF 更详细的描述,可以参考 wikipedia,或其他相关资料。

EBNF

BNF 还有很多变种形式,EBNF 就是其中之一。

EBNF 是 Extended Backus–Naur form 的缩写,对 BNF 的符号和语法进行了简化和扩充。

较为混乱的是,EBNF 也有很多变种形式,国际标准化组织(International Organization for Standardization,简称 ISO)于 1996 年通过了一个 EBNF 的标准 ISO/IEC 14977,该标准中定义的 EBNF(下称 ISO EBNF)中可使用的符号如下:

Usage Notation
definition =
concatenation ,
termination ;
alternation |
optional [ … ]
repetition { … }
grouping ( … )
terminal string “ … “
terminal string ‘ … ‘
comment (* … *)
special sequence ? … ?
exception -

使用 ISO EBNF 描述整数的规则如下:

digit excluding zero = "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;
digit                = "0" | digit excluding zero ;
natural number = digit excluding zero, { digit } ;
integer = "0" | [ "-" ], natural number ;

遗憾的是,ISO EBNF 的出现,并没有统一 ENBF 的使用,即便在 ISO 自己的其他标准中,也并没有全部使用 ISO EBNF。

用 EBNF 定义的语法,同样也可以通过 BNF 来描述,只不过使用 BNF 描述时会繁琐一些,例如重复就不太容易通过 BNF 来描述,需要借助中间规则来实现。

关于 EBNF 更详细的描述,可以参考 wikipedia,或其他相关资料。

实例

让我们来看一些使用 BNF 或 EBNF 定义的语言实例。

BNF

BNF 自己的语法,可以使用 BNF 来描述:

 <syntax>         ::= <rule> | <rule> <syntax>
 <rule>           ::= <opt-whitespace> "<" <rule-name> ">" <opt-whitespace> "::=" <opt-whitespace> <expression> <line-end>
 <opt-whitespace> ::= " " <opt-whitespace> | ""
 <expression>     ::= <list> | <list> <opt-whitespace> "|" <opt-whitespace> <expression>
 <line-end>       ::= <opt-whitespace> <EOL> | <line-end> <line-end>
 <list>           ::= <term> | <term> <opt-whitespace> <list>
 <term>           ::= <literal> | "<" <rule-name> ">"
 <literal>        ::= '"' <text1> '"' | "'" <text2> "'"
 <text1>          ::= "" | <character1> <text1>
 <text2>          ::= '' | <character2> <text2>
 <character>      ::= <letter> | <digit> | <symbol>
 <letter>         ::= "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" | "J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" | "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z" | "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" | "j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z"
 <digit>          ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
 <symbol>         ::=  "|" | " " | "!" | "#" | "$" | "%" | "&" | "(" | ")" | "*" | "+" | "," | "-" | "." | "/" | ":" | ";" | ">" | "=" | "<" | "?" | "@" | "[" | "\" | "]" | "^" | "_" | "`" | "{" | "}" | "~"
 <character1>     ::= <character> | "'"
 <character2>     ::= <character> | '"'
 <rule-name>      ::= <letter> | <rule-name> <rule-char>
 <rule-char>      ::= <letter> | <digit> | "-"

BNF 的语法也可以用图来表示:

BNF syntax diagram

ISO EBNF

使用 ISO EBNF 来描述自己:

letter = "A" | "B" | "C" | "D" | "E" | "F" | "G"
       | "H" | "I" | "J" | "K" | "L" | "M" | "N"
       | "O" | "P" | "Q" | "R" | "S" | "T" | "U"
       | "V" | "W" | "X" | "Y" | "Z" | "a" | "b"
       | "c" | "d" | "e" | "f" | "g" | "h" | "i"
       | "j" | "k" | "l" | "m" | "n" | "o" | "p"
       | "q" | "r" | "s" | "t" | "u" | "v" | "w"
       | "x" | "y" | "z" ;
digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;
symbol = "[" | "]" | "{" | "}" | "(" | ")" | "<" | ">"
       | "'" | '"' | "=" | "|" | "." | "," | ";" ;
character = letter | digit | symbol | "_" ;

identifier = letter , { letter | digit | "_" } ;
terminal = "'" , character , { character } , "'" 
         | '"' , character , { character } , '"' ;

lhs = identifier ;
rhs = identifier
     | terminal
     | "[" , rhs , "]"
     | "{" , rhs , "}"
     | "(" , rhs , ")"
     | rhs , "|" , rhs
     | rhs , "," , rhs ;

rule = lhs , "=" , rhs , ";" ;
grammar = { rule } ;

ISO EBNF 语法图:

ISO EBNF syntax diagram

MySQL SQL Statement

在 MySQL 的官方文档中,是通过 EBNF 来对 SQL 语句 进行描述的,比如 创建数据库 语句的语法:

CREATE {DATABASE | SCHEMA} [IF NOT EXISTS] db_name
    [create_option] ...

create_option: [DEFAULT] {
    CHARACTER SET [=] charset_name
  | COLLATE [=] collation_name
  | ENCRYPTION [=] {'Y' | 'N'}
}

更多语法实例可以查阅 官方文档

H2 Grammar

H2 Database 的 文档 中,包括 SQL 语句的 BNF 和铁轨图(railroad diagram)两个版本,比如 创建 SCHEMA

CREATE SCHEMA [ IF NOT EXISTS ]
{ name [ AUTHORIZATION ownerName ] | [ AUTHORIZATION ownerName ] }
[ WITH tableEngineParamName [,...] ]

railroad diagram

更多语法实例可以查阅 官方文档

0%