Lexical conventions

A Metaslang text consists of a sequence of symbols, possibly interspersed with whitespace. The term whitespace refers to any non-empty sequence of spaces, tabs, newlines, and comments (described below). A symbol is presented in the text as a sequence of one or more "marks" (ASCII characters). Within a composite (multi-mark) symbol, as well as within a unit-identifier, no whitespace is allowed, but whitespace may be needed between two symbols if together they could be taken for one symbol; in particular, two simple-names that follow each other should be separated by whitespace. More precisely, whitespace is required between two adjacent symbols for each of the following combinations, in which "abc" stands for an arbitrary word-symbol, "<*>" stands for an arbitrary non-word-symbol, "?:!" stands for an arbitrary non-word-symbol starting with a ?-mark, and "123" stands for an arbitrary literal (see below for the definitions of the various classes of symbols):
    abc  abc
    abc  ?:!
    abc  123
    <*>  <*>
    123  abc
    123  123
    abc  _
      (  *
Apart from the last two cases, no whitespace is ever needed adjacent to a special-symbol.

Inside literals (constant-denotations) whitespace is also disallowed, except for "significant-whitespace" as described under String-literals.

Other than that, whitespace -- or the lack of it -- has no significance. Whitespace can be used to lay-out the text for readability, but as far as only the meaning is concerned, the following two presentations of the same spec are entirely equivalent:
    spec
      type Even
      op next : Even -> Even
      axiom nextEffect is
        fa(x : Even) ~(next x = x)
    endspec

    spec type   Even op   next : Even -> Even axiom nextEffect
    is fa(x : Even)~(next     x            = x)endspec

Symbols and Names

symbol ::= simple-name | literal | special-symbol

simple-name ::= word-symbol | non-word-symbol

word-symbol ::= word-start-mark { word-continue-mark }*

word-start-mark ::= letter

word-continue-mark ::=
    letter | decimal-digit | _ | ?

letter ::=
      A | B | C | D | E | F | G | H | I | J | K | L | M
    | N | O | P | Q | R | S | T | U | V | W | X | Y | Z
    | a | b | c | d | e | f | g | h | i | j | k | l | m
    | n | o | p | q | r | s | t | u | v | w | x | y | z

decimal-digit ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

non-word-symbol ::= non-word-mark { non-word-mark }*

non-word-mark ::=
      ` | ~ | ! | @ | $ | ^ | & | * | -
    | = | + | \ | | | : | < | > | / | ?

special-symbol ::= _ | ( | ) | [ | ] | { | } | ; | , | .
Sample simple-names:
    Date                    $$          ?!
    yymmdd2date             <*>           :=:
    well_ordered?           ~==           
For convenience, here are the 14 printing ASCII marks that, next to letters and decimal-digits, can not occur in a non-word-symbol:
      #    %    '    "    _    (    )
      [    ]    {    }    ;    ,    .
Restriction. As mentioned before, no whitespace is allowed in symbols: while anode is a single simple-name, both a node and an ode consist of two simple-names. Further, the case (lower or upper) of letters in simple-names is significant: grandparent, grandParent and grandpaRent are three different simple-names.

Restriction. In general, simple-names can be chosen freely. However, the following reserved symbols have a special meaning and must not be used for simple-names:
    as              endspec       infixr           relax
    axiom           ex            is               restrict
    case            fa            let              spec
    choose          false         morphism         then
    colimit         fn            obligations      theorem
    conjecture      from          of               translate
    def             generate      op               true
    diagram         if            project          type
    else            import        prove            where
    embed           in            qualifying
    embed?          infixl        quotient

    :         ::        |         =>        ||       &&
    <-        ->        +->       =         ~=       <<
They each count as a single symbol, and no whitespace is allowed inside any reserved symbol.

Non-word-symbols can be used to choose convenient simple-names for operators that, conventionally, are written with non-alphabetic marks.

Some Metaslang users follow the convention of using simple-names that start with a capital letter for unit-identifiers and type-names and for constructors, while word-symbols chosen for ops and field-names start with a lower-case letter. Both plain local-variables and local-type-variables are often chosen to be single lower-case letters: x, y, z, a, b, c, with the start of the alphabet preferred for local-type-variables. Op-names of predicates (that is, having some type T -> Boolean) often end with the mark ?. These are just conventions that users are free to follow or ignore, but in particular some convention distinguishing constructors from ops and local-variables is recommended.

Comments

comment ::= line-end-comment | block-comment

line-end-comment ::= % line-end-comment-body

line-end-comment-body ::=
    any-text-up-to-end-of-line

block-comment ::= (* block-comment-body *)

block-comment-body ::=
    any-text-including-newlines-and-nested-block-comments
Sample comments:
    % keys must be unique
    (* op yymmdd2Date : String -> Date *)
Metaslang allows two styles of comments. The %-style is light-weight, for adding comment on a line after the formal text (or taking a line on its own, but always confined to a single line). The (*...*)-style can be used for blocks of text, spanning several lines, or stay within a line. Any text remaining on the line after the closing *) is processed as formal text. Block-comments may be nested, so the pairs of brackets (* and *) must be balanced.

A block-comment can not contain a line-end-comment and vice versa: whichever starts first has "the right of way". For example, (* 100 % or more! *) is a block-comment with block-comment-body 100 % or more! . The % here is a mark like any other; it does not introduce a line-end-comment. Conversely, in the line-end-comment % op <*> stands for (*) the (* is part of the line-end-comment-body; it does not introduce a block-comment. Note also that % and (* have no special significance in literals (which must not contain whitespace, including comments): "100 % or more!" is a well-formed string-literal.