Link Search Menu Expand Document

Funcons-beta : Characters.cbs | PRETTY | PDF

Outline

Characters

Built-in Type
  characters <: values

Literal characters can be written 'C' where C is any visible character other than a single-quote or backslash character, which need to be escaped as '\'' and '\\'.

Alias
  chars = characters

Unicode character set

The set of Unicode characters and allocated points is open to extension. See https://en.wikipedia.org/wiki/Plane_(Unicode)

Built-in Datatype
  unicode-characters <: characters
Alias
  unicode-chars = unicode-characters
Built-in Type
  unicode-points <: bounded-integers(0, unsigned-bit-vector-maximum(21))
Built-in Funcon
  unicode-character(_:unicode-points) : unicode-characters
Alias
  unicode-char = unicode-character

The values in unicode-characters are the values of unicode-character(UP:unicode-points).

Funcon
  unicode-point(_:unicode-characters) : =>unicode-points
Alias
  unicode = unicode-point

Unicode basic multilingual plane

The set of Unicode BMP characters and allocated points is open to extension.

Built-in Datatype
  basic-multilingual-plane-characters <: unicode-characters
Alias
  bmp-chars = basic-multilingual-plane-characters
Built-in Type
  basic-multilingual-plane-points <:
    bounded-integers(0, unsigned-bit-vector-maximum(17))

The values in basic-multilingual-plane-characters are the values of unicode-character(BMPP:basic-multilingual-plane-points).

ISO Latin-1 character set

Built-in Datatype
  iso-latin-1-characters <: basic-multilingual-plane-characters
Alias
  latin-1-chars = iso-latin-1-characters
Type
  iso-latin-1-points ~> bounded-integers(0, unsigned-bit-vector-maximum(8))

The values in iso-latin-1-characters are the values of unicode-character(ILP:iso-latin-1-points).

ASCII character set

Built-in Type
  ascii-characters <: iso-latin-1-characters
Alias
  ascii-chars = ascii-characters
Type
  ascii-points ~> bounded-integers(0, unsigned-bit-vector-maximum(7))

The values in ascii-characters are the values of unicode-character(AP:ascii-points).

Funcon
  ascii-character(_:strings) : =>ascii-characters?
Alias
  ascii-char = ascii-character

ascii-character"C" takes a string. When it consists of a single ASCII character C it gives the character, otherwise ( ).

Rule
  ascii-character[C:ascii-characters] ~> C
Rule
  C : ~ ascii-characters
  ------------------------------------
  ascii-character[C:characters] ~> ( )
Rule
  length(C*) =/= 1
  --------------------------------------
  ascii-character[C*:characters*] ~> ( )

Character point encodings

See https://en.wikipedia.org/wiki/Character_encoding

Built-in Funcon
  utf-8(_:unicode-points) : =>(bytes, (bytes, (bytes, bytes?)? )? )
Built-in Funcon
  utf-16(_:unicode-points) : =>(bit-vectors(16), (bit-vectors(16))? )
Built-in Funcon
  utf-32(_:unicode-points) : =>bit-vectors(32)

Control characters

Funcon
  horizontal-tab : =>ascii-characters 
    ~> unicode-character(hexadecimal-natural"0009")
Funcon
  carriage-return : =>ascii-characters 
    ~> unicode-character(hexadecimal-natural"000d")
Funcon
  double-quote : =>ascii-characters 
    ~> unicode-character(hexadecimal-natural"0022")
Funcon
  single-quote : =>ascii-characters 
    ~> unicode-character(hexadecimal-natural"0027")