Funcons-beta : Characters.cbs | PRETTY | PDF
Outline
Characters
[
Type characters Alias chars
Datatype unicode-characters Alias unicode-chars
Type unicode-points
Funcon unicode-character Alias unicode-char
Funcon unicode-point Alias unicode
Type basic-multilingual-plane-characters Alias bmp-chars
Type basic-multilingual-plane-points
Type iso-latin-1-characters Alias latin-1-chars
Type iso-latin-1-points
Type ascii-characters Alias ascii-chars
Type ascii-points
Funcon ascii-character Alias ascii-char
Funcon utf-8
Funcon utf-16
Funcon utf-32
Funcon backspace
Funcon horizontal-tab
Funcon line-feed
Funcon form-feed
Funcon carriage-return
Funcon double-quote
Funcon single-quote
Funcon backslash
]
Built-in Type
characters <: values
Literal characters can be written 'C'
where C
is any visible character
other than a single-quote
or backslash
character, which need to be
escaped as '\''
and '\\'
.
Alias
chars = characters
Unicode character set
The set of Unicode characters and allocated points is open to extension. See https://en.wikipedia.org/wiki/Plane_(Unicode)
Built-in Datatype
unicode-characters <: characters
Alias
unicode-chars = unicode-characters
Built-in Type
unicode-points <: bounded-integers(0, unsigned-bit-vector-maximum(21))
Built-in Funcon
unicode-character(_:unicode-points) : unicode-characters
Alias
unicode-char = unicode-character
The values in unicode-characters
are the values of
unicode-character(UP:unicode-points)
.
Funcon
unicode-point(_:unicode-characters) : =>unicode-points
Alias
unicode = unicode-point
Rule
unicode-point(unicode-character(UP:unicode-points)) ~> UP
Unicode basic multilingual plane
The set of Unicode BMP characters and allocated points is open to extension.
Built-in Datatype
basic-multilingual-plane-characters <: unicode-characters
Alias
bmp-chars = basic-multilingual-plane-characters
Built-in Type
basic-multilingual-plane-points <:
bounded-integers(0, unsigned-bit-vector-maximum(17))
The values in basic-multilingual-plane-characters
are the values of
unicode-character(BMPP:basic-multilingual-plane-points)
.
ISO Latin-1 character set
Built-in Datatype
iso-latin-1-characters <: basic-multilingual-plane-characters
Alias
latin-1-chars = iso-latin-1-characters
Type
iso-latin-1-points ~> bounded-integers(0, unsigned-bit-vector-maximum(8))
The values in iso-latin-1-characters
are the values of
unicode-character(ILP:iso-latin-1-points)
.
ASCII character set
Built-in Type
ascii-characters <: iso-latin-1-characters
Alias
ascii-chars = ascii-characters
Type
ascii-points ~> bounded-integers(0, unsigned-bit-vector-maximum(7))
The values in ascii-characters
are the values of
unicode-character(AP:ascii-points)
.
Funcon
ascii-character(_:strings) : =>ascii-characters?
Alias
ascii-char = ascii-character
ascii-character"C"
takes a string. When it consists of a single ASCII
character C
it gives the character, otherwise ( )
.
Rule
ascii-character[C:ascii-characters] ~> C
Rule
C : ~ ascii-characters
------------------------------------
ascii-character[C:characters] ~> ( )
Rule
length(C*) =/= 1
--------------------------------------
ascii-character[C*:characters*] ~> ( )
Character point encodings
See https://en.wikipedia.org/wiki/Character_encoding
Built-in Funcon
utf-8(_:unicode-points) : =>(bytes, (bytes, (bytes, bytes?)? )? )
Built-in Funcon
utf-16(_:unicode-points) : =>(bit-vectors(16), (bit-vectors(16))? )
Built-in Funcon
utf-32(_:unicode-points) : =>bit-vectors(32)
Control characters
Funcon
backspace : =>ascii-characters
~> unicode-character(hexadecimal-natural"0008")
Funcon
horizontal-tab : =>ascii-characters
~> unicode-character(hexadecimal-natural"0009")
Funcon
line-feed : =>ascii-characters
~> unicode-character(hexadecimal-natural"000a")
Funcon
form-feed : =>ascii-characters
~> unicode-character(hexadecimal-natural"000c")
Funcon
carriage-return : =>ascii-characters
~> unicode-character(hexadecimal-natural"000d")
Funcon
double-quote : =>ascii-characters
~> unicode-character(hexadecimal-natural"0022")
Funcon
single-quote : =>ascii-characters
~> unicode-character(hexadecimal-natural"0027")
Funcon
backslash : =>ascii-characters
~> unicode-character(hexadecimal-natural"005c")