CBS-beta Notation
Contents
Languages
See the CBS of IMP for illustration of the following points.
-
Each file of the definition of a language
L
starts with the lineLanguage "L"
. Splitting a language definition across multiple files (within the same project) does not affect its well-formedness. -
Language definitions can be split into numbered sections. A section number is written
#n
, wheren
can be a series of numbers or (single) letters, separated by dots. Section numbers are hyperlinks in a table of contents, which is written[...]
, and in multi-line comments/*...*/
. -
Multi-line comments are written
/*...*/
, and may include funcon terms (delimited by back-ticks`
). End-of-line comments are written//...
.
Multi-line comments are displayed as running text in generated web pages, without the enclosing
/*...*/
.
Syntax
Syntax
introduces one or more grammar productions for the
abstract (context-free) syntax of the language, together with meta-variables
ranging over the associated sorts of ASTs.
- Nonterminals start with lowercase letters, and may include letters, digits,
and dashes
-
. - The start symbol of the grammar is named
start
. - Meta-variables start with uppercase letters, e.g.,
Pgm
. - The left and right sides of productions are separated by
::=
. - Terminal symbols are enclosed in single quotes
'...'
. - Alternatives are separated by
|
. All alternatives for the same nonterminal have to be specified together. - Regular expressions are written using postfix
?
for optional parts,*
for iteration,+
for non-empty iteration, and grouping parentheses(...)
. - Layout (including comments) is implicitly allowed everywhere in
Syntax
, but it can be excluded between two symbols by inserting an underscore_
, e.g.,num ::= '-'?_decimal
.
Lexis
introduces one or more grammar productions for the
lexical (regular or context-free) syntax of the language, together with
meta-variables ranging over the specified strings of characters.
- The range of characters from
'c1'
to'c2'
is specified by'c1'-'c2'
. For example,decimal
is defined as('0'-'9')+
. - Layout (including comments) is implicitly excluded everywhere in
Lexis
. Lexis
productions are otherwise specified as forSyntax
.
Parsers generated from grammars for abstract syntax are usually ambiguous.
Associativity, relative priority, and lexical disambiguation are currently
specified separately, using notation from SDF, embedded in multi-line comments
/*...*/
.
Semantics
Semantics
introduces a declaration of a translation function
from ASTs (with strings as leaves) to funcon terms.
- A translation function takes a single AST (or string) as argument,
enclosed in double brackets
[[...]]
. - The sort of the argument is specified by
_:...
. - The type of the resulting funcon term is specified after
:
. It is usually a computation type=>t
for some value typet
; it can also be a computation sequence type of the form(=>t)*
.
Rule
introduces an equation defining the translation function
on trees matching a specified pattern.
- The pattern in a rule usually corresponds to a single alternative of the argument sort.
- Meta-variables of the same sort in a rule are distinguished by subscripted digits, optionally followed by primes.
- The notation
\"...\"
produces a funcon string value from a lexical symbol...
. (In contrast,"..."
is always a literal string, not containing references to meta-variables.) - Desugaring rules for ASTs are written
[[...]] : s = [[...]]
, where the patterns on both sides match the nonterminal symbols
. Note that desugaring rules do not refer to particular translation functions.
All funcons (etc.) defined in Funcons-beta can be used in all language definitions. Funcons defined in a language definition file must have fresh names, and cannot be reused in other language definitions (except by copy-paste).
Funcons
Funcon names start with lowercase letters, and may include letters, digits,
and dashes -
.
Variables in funcon terms start with uppercase letters, and may be suffixed by
digits and/or primes. Variables that stand for sequences of indefinite length
are suffixed by *
, +
, or ?
. Variables whose values are not required may
be replaced by underscores _
.
A funcon term formed from a funcon f
and argument sequence s
is written in
prefix form: f s
. When f
has no arguments, it is written without
parentheses: f
; when it has 2 or more arguments t1
, …, tn
, they are
written in ordinary parentheses: f(t1,...,tn)
. Parentheses are optional when
there is a single argument term, so both f t
and f(t)
are allowed.
- N.B. Parentheses are also optional around composite argument terms:
f g t
is always grouped asf(g(t))
!
Funcons are not higher-order, so implicit grouping of f g t
as (f g)(t)
would here be completely useless, as it would never give a well-formed term.
Grouping in funcon terms treats funcon names as prefix operations, such as the
‘-
’ in ‘-sin(x)
’. (Readers accustomed to higher-order programming in Haskell
may find it helpful to imagine f g t
written as f$g t
.)
Some funcons can take argument sequences of varying lengths; funcons may also
compute sequences of values. An example of a funcon that does both is
left-to-right
, which is used to ensure that arguments are evaluated in the
specified order (the default is to allow interleaving). Composition with
left-to-right
provides sequential variants of all multi-argument funcons, e.g.:
integer-add left-to-right(t1,t2)
(which abbreviatesinteger-add(left-to-right(t1,t2))
)
The following special forms of funcon terms are allowed:
- Natural numbers in decimal notation
- Characters
'c'
(with the usual\
-escapes) - Strings
"c1...cn"
(with the usual\
-escapes) - Lists
[t1,...,tn]
, empty list[ ]
- Sets
{t1,...,tn}
, empty set{ }
- Maps
{k1|->t1,...,kn|->tn}
, empty mapmap( )
- Type union
t1|...|tn
- Type intersection
t1&...&tn
- Type complement
~t
- Sequence types
t*
,t+
,t?
,t^n
The funcon definitions in Funcons-beta are language-independent.
CBS specifications do not refer to the hierarchy of folders and files used for Funcons-beta.
Files in Funcons-beta are generally divided into unnumbered subsections.
The number of #
characters in a subsection heading indicates its level,
with the top level sections Computations and Values having a single #
.
Type definitions
In CBS, types are values. They can be given as arguments to funcons, and computed by funcons.
Type
introduces a declaration of a fresh funcon name for a type, and lists
any arguments. The type of values it computes is types
.
- A type definition
Type t... ~> t1
combines a type declaration with a rewrite of the type tot1
. - A type definition
Type t... <: t1
combines a type declaration with an assertion thatt...
is a subtype oft1
.
Datatype
introduces a definition of an algebraic datatype, written
t ::= f1(...) | ... | fn(...)
, which also declares the constructor funcons
fi(...):t
for its values.
- A datatype definition
Datatype t(...)
written without constructors allows separate declaration of constructors with different instantiations of the parameterised typet(...)
, corresponding to a GADT.
Meta-variables
introduces type constraints on variables used in rules.
These constraints are usually of the form v <: t
where v
is a variable,
requiring v
to be a subtype of t
.
Funcon definitions
Funcon
introduces a declaration of a fresh funcon name, together with
its signature.
- Funcon names start with lowercase letters, and may include letters, digits,
and dashes
-
. - The signature of a funcon taking no arguments and computing values of type
t
is written:=>t
. For example, the signature offail
is written:=>empty-type
. - The signature of a funcon taking one or more arguments and computing values
of type
t
is written(v1:ct1,...,vn:ctn):=>t
, where eachvi
is a variable, and eachcti
is either a type of valuesti
or a computation type=>ti
. (Computation types=>ti
resemble the types of call-by-name parameters in Scala.) For example, the signature ofif-true-else
is(_:booleans, _:=>T, _:=>T) : =>T
, whereT
is a type variable. - Arguments specified by
vi:ti
are implicitly pre-evaluated (possibly interleaved) whereas evaluation of arguments specified byvi:=>ti
is determined by explicit rules. - One of the argument types in a signature may be an indefinite sequence
type, formed using a suffixed
*
,+
, or?
, allowing use of the funcon with varying numbers of arguments. (A sequence argument is usually at the end, but could be anywhere, since the other arguments in an application determine where to match it.) - The lack of a defined result of a partial funcon is represented by the
empty sequence
( )
. The result type in the signature of a partial funcon is of the formt?
for some value typet
. - The signature of a value constructor is written
(v1:ct1,...,vn:ctn):t
, using a value result typet
instead of a computation result type=>t
.
Alias
introduces an equation n1 = n2
that declares a fresh name n1
and defines it to be equivalent to another name n2
. n1
is usually an
abbreviation for a longer n2
; its declaration as an alias prevents it from
being defined elsewhere to have a different interpretation.
Built-in
introduces a declaration of a fresh funcon name and its
signature, but does not provide a definition for it. Built-ins can be regarded
as parameters of a collection of funcon definitions. They are generally
reserved for types of values such as integers, sets, and maps, which are not
amenable to specification using operational rules.
Auxiliary
introduces a name that is not intended for direct use in
language definitions.
Rule
introduces a formula or inference rule defining the
operational behaviour of a funcon.
t1 ~> t2
is a rewrite fromt1
tot2
.t1 ---> t2
is simple transition fromt1
tot2
.e(v) |- t1 ---> t2
specifies dependence on a contextual entity namede
.<t1,s(v1)> ---> <t2,s(v2)>
specifies inspection of a mutable entity nameds
and its replacement of its valuev1
byv2
.t1 --i?(v*)-> t2
specifies a sequence of valuesv*
for an input entity namedi
.t1 --o!(v*)-> t2
specifies a sequence of valuesv*
for an output entity namedo
.t1 --c(v?)-> t2
specifies an optional valuev?
for a control entity namedc
.- An annotated variable
v:t
is restricted to value terms; un-annotated variables range over computation terms (including value terms), except that type variables declared byMeta-variables
are restricted to type terms. - A single defining rewrite rule for a funcon may be combined with its declaration.
- Arguments of funcons in rules may be patterns formed from variables and
value constructors (which are usually introduced by
Datatype
definitions).
Assert
introduces a formula that expresses the intention
that it should hold (either as a consequence of specified definitions, or
as a requirement for built-ins).
Entities
Entity
introduces a declaration of a fresh auxiliary entity name.
- Entities are implicitly propagated in transition rules when not mentioned.
- Rewrites are independent of entities.
- The declaration of an entity specifies how the entity is written when used, which determines how it is propagated when omitted in transition formulae.
e(v:t) |- _ ---> _
declares a contextual entitye
of typet
, e.g.,environment(_:environments) |- _ ---> _
. When a contextual entity is omitted in a rule, it is implicitly the same in the conclusion and any premises.<_,s(v1:t)> ---> <_,s(v2:t)>
declares a mutable entitys
of typet
, e.g.,< _ , store(_:stores) > ---> < _ , store(_:stores) >
. When a mutable entity is omitted in an axiom, it is implicitly propagated unchanged. When it is omitted in a rule with a single premise, its value before the transition in the premise is the same as before the transition in the conclusion, and similarly for its value after the transitions. Changes to mutable entities are threaded through sequences of transitions._ --i?(v*:t*)-> _
declares an input entityi
of typet*
, e.g.,_ -- standard-in?(_:values*) -> _
. When an input entity is omitted in an axiom, it is implicitly required to be the empty sequence. When it is omitted in a rule with a single premise, the sequence of values in the conclusion is implicitly the same as in the premise. The value sequences of an input entity are concatenated in sequences of transitions._ --o?(v*:t*)-> _
declares an output entityo
of typet*
, e.g.,_ -- standard-out!(_:values*) -> _
. When an output entity is omitted in a rule, the implicit requirements are analogous to those for input entities._ --c(v?:t?)-> _
declares a control entityc
of typet?
, e.g.,_ --abrupted(_:values?)-> _
. When a control entity is omitted in a rule, the implicit requirements are analogous to those for input entities, except that a transition with a non-empty control entity value cannot be followed by a further transition (before being handled).