Characters¶
Overview¶
A character represents a single textual unit. Characters are written using the #\ prefix:
#\a
#\A
#\newline
#\space
Characters are distinct from strings. A string may contain many characters, while a character represents exactly one. For example:
--> #\a
#\a
--> "a"
"a"
Although these may look similar, the first is a character and the second is a string containing one character.
Characters are primarily used when working with textual data at a fine-grained level, such as:
Parsing input character by character
Implementing lexical analyzers
Performing character classification (alphabetic, numeric, whitespace, etc.)
Converting between character codes and textual representations
Characters in Cozenage are stored internally as 32-bit Unicode code points (UChar32). This means each character directly represents a full Unicode scalar value, making character comparisons, classification, and transformation straightforward and unambiguous. Unlike UTF-8 strings, which are variable-length byte sequences, a character object always represents exactly one Unicode code point.
This design allows characters such as λ, €, or Ω to behave identically to simple ASCII characters like A or ?.
Example:
--> #\λ
#\λ
--> (char? #\λ)
#true
Character comparison procedures allow you to test equality and ordering, and classification procedures determine properties such as whether a character is alphabetic, numeric, or whitespace.
Because characters are first-class objects, they can be stored in lists, vectors, sets, or maps, passed to procedures, and returned as values.
Understanding the distinction between characters and strings is important: characters represent atomic textual units, while strings represent ordered sequences of those units.
Named Character Literals¶
The Cozenage parser supports a set of user-friendly named Unicode character literals. These are accepted during input using the #\name syntax. When printed, however, characters are displayed as their actual Unicode glyph, not their named form.
For example:
--> #\lambda
#\λ
This means the name is only recognized by the reader; internally and when printed, the true Unicode character is used.
Below is the complete list of supported named character literals.
Supported Named Character Literals¶
Name |
Glyph |
Code Point |
|---|---|---|
Alpha |
Α |
U+0391 |
Beta |
Β |
U+0392 |
Delta |
Δ |
U+0394 |
Gamma |
Γ |
U+0393 |
Iota |
Ι |
U+0399 |
Lambda |
Λ |
U+039B |
Omega |
Ω |
U+03A9 |
Omicron |
Ο |
U+039F |
Phi |
Φ |
U+03A6 |
Pi |
Π |
U+03A0 |
Psi |
Ψ |
U+03A8 |
Rho |
Ρ |
U+03A1 |
Sigma |
Σ |
U+03A3 |
Theta |
Θ |
U+0398 |
Xi |
Ξ |
U+039E |
alpha |
α |
U+03B1 |
beta |
β |
U+03B2 |
chi |
χ |
U+03C7 |
copy |
© |
U+00A9 |
curren |
¤ |
U+00A4 |
deg |
° |
U+00B0 |
delta |
δ |
U+03B4 |
divide |
÷ |
U+00F7 |
epsilon |
ε |
U+03B5 |
eta |
η |
U+03B7 |
euro |
€ |
U+20AC |
gamma |
γ |
U+03B3 |
iota |
ι |
U+03B9 |
iquest |
¿ |
U+00BF |
kappa |
κ |
U+03BA |
lambda |
λ |
U+03BB |
micro |
µ |
U+00B5 |
mu |
μ |
U+03BC |
omega |
ω |
U+03C9 |
para |
¶ |
U+00B6 |
phi |
φ |
U+03C6 |
pi |
π |
U+03C0 |
plusnm |
± |
U+00B1 |
pound |
£ |
U+00A3 |
psi |
ψ |
U+03C8 |
reg |
® |
U+00AE |
rho |
ρ |
U+03C1 |
sect |
§ |
U+00A7 |
sigma |
σ |
U+03C3 |
tau |
τ |
U+03C4 |
theta |
θ |
U+03B8 |
times |
× |
U+00D7 |
xi |
ξ |
U+03BE |
yen |
¥ |
U+00A5 |
zeta |
ζ |
U+03B6 |
Character Procedures¶
Char Type Predicate Procedures¶
char-alphabetic?¶
- (char-alphabetic? char)
Returns
#trueif char is an alphabetic character, and#falseotherwise.- Parameters:
char (char) – The character to test.
- Returns:
#true or #false.
- Return type:
boolean
Example:
--> (char-alphabetic? #\a) #true --> (char-alphabetic? #\Z) #true --> (char-alphabetic? #\7) #false
char-numeric?¶
- (char-numeric? char)
Returns
#trueif char is a numeric digit (0-9), and#falseotherwise.- Parameters:
char (char) – The character to test.
- Returns:
#true or #false.
- Return type:
boolean
Example:
--> (char-numeric? #\5) #true --> (char-numeric? #\x) #false
char-whitespace?¶
- (char-whitespace? char)
Returns
#trueif char is a whitespace character (like space, tab, or newline), and#falseotherwise.- Parameters:
char (char) – The character to test.
- Returns:
#true or #false.
- Return type:
boolean
Example:
--> (char-whitespace? #\space) #true --> (char-whitespace? #\newline) #true --> (char-whitespace? #\a) #false
char-upper-case?¶
- (char-upper-case? char)
Returns
#trueif char is an uppercase letter, and#falseotherwise.- Parameters:
char (char) – The character to test.
- Returns:
#true or #false.
- Return type:
boolean
Example:
--> (char-upper-case? #\A) #true --> (char-upper-case? #\a) #false
char-lower-case?¶
- (char-lower-case? char)
Returns
#trueif char is a lowercase letter, and#falseotherwise.- Parameters:
char (char) – The character to test.
- Returns:
#true or #false.
- Return type:
boolean
Example:
--> (char-lower-case? #\z) #true --> (char-lower-case? #\Z) #false
Char to Numeric Value and Inverse Procedures¶
digit-value¶
- (digit-value char)
If char is a numeric digit, this procedure returns its integer value (0-9). If char is not a digit, it returns
#false.- Parameters:
char (char) – The character to convert.
- Returns:
An integer from 0-9 or #false.
- Return type:
integer or boolean
Example:
--> (digit-value #\7) 7 --> (digit-value #\a) #false
char->integer¶
- (char->integer char)
Returns the Unicode scalar value of char as an exact integer. For Unicode characters, the result is in the range
0to#xD7FFor#xE000to#x10FFFFinclusive.- Parameters:
char (character) – The character to convert.
- Returns:
The Unicode scalar value of char.
- Return type:
integer
Example:
--> (char->integer #\A) 65 --> (char->integer #\a) 97 --> (char->integer #\space) 32 --> (char->integer #\λ) 955
integer->char¶
- (integer->char n)
Returns the character whose Unicode scalar value is n. Raises an error if n is negative, greater than
#x10FFFF, or falls in the surrogate range#xD800to#xDFFF.- Parameters:
n (integer) – A Unicode scalar value.
- Returns:
The character corresponding to Unicode scalar value n.
- Return type:
character
Example:
--> (integer->char 65) #\A --> (integer->char 97) #\a --> (integer->char 32) #\space --> (integer->char 955) #\λ
Case Conversion Procedures¶
char-upcase¶
- (char-upcase char)
Returns the uppercase equivalent of char. If char is not a lowercase letter, it is returned unchanged.
- Parameters:
char (char) – The character to convert.
- Returns:
The uppercase version of the character.
- Return type:
char
Example:
--> (char-upcase #\a) #\A --> (char-upcase #\B) #\B
char-downcase¶
- (char-downcase char)
Returns the lowercase equivalent of char. If char is not an uppercase letter, it is returned unchanged.
- Parameters:
char (char) – The character to convert.
- Returns:
The lowercase version of the character.
- Return type:
char
Example:
--> (char-downcase #\A) #\a --> (char-downcase #\b) #\b
char-foldcase¶
- (char-foldcase char)
Applies Unicode case-folding to char. For most characters, this is the same as
char-downcase, but it handles special cases for robust case-insensitive comparison.- Parameters:
char (char) – The character to convert.
- Returns:
The folded-case version of the character.
- Return type:
char
Example:
--> (char-foldcase #\A) #\a --> (char-foldcase #\ς) ; Greek final sigma #\σ
Case-Sensitive Comparison Procedures¶
char=?¶
- (char=? char1 char2 ...)
Returns
#tif all arguments have the same Unicode scalar value,#fotherwise.- Parameters:
char1 (character) – Two or more characters to compare.
- Returns:
#tif all arguments are equal,#fotherwise.- Return type:
boolean
Example:
--> (char=? #\a #\a) #t --> (char=? #\a #\b) #f --> (char=? #\a #\a #\a) #t
char<?¶
- (char<? char1 char2 ...)
Returns
#tif the Unicode scalar values of the arguments are monotonically increasing,#fotherwise.- Parameters:
char1 (character) – Two or more characters to compare.
- Returns:
#tif the arguments are monotonically increasing,#fotherwise.- Return type:
boolean
Example:
--> (char<? #\a #\b) #t --> (char<? #\b #\a) #f --> (char<? #\a #\b #\c) #t
char<=?¶
- (char<=? char1 char2 ...)
Returns
#tif the Unicode scalar values of the arguments are monotonically non-decreasing,#fotherwise.- Parameters:
char1 (character) – Two or more characters to compare.
- Returns:
#tif the arguments are monotonically non-decreasing,#fotherwise.- Return type:
boolean
Example:
--> (char<=? #\a #\b) #t --> (char<=? #\a #\a) #t --> (char<=? #\b #\a) #f --> (char<=? #\a #\a #\b) #t
char>?¶
- (char>? char1 char2 ...)
Returns
#tif the Unicode scalar values of the arguments are monotonically decreasing,#fotherwise.- Parameters:
char1 (character) – Two or more characters to compare.
- Returns:
#tif the arguments are monotonically decreasing,#fotherwise.- Return type:
boolean
Example:
--> (char>? #\b #\a) #t --> (char>? #\a #\b) #f --> (char>? #\c #\b #\a) #t
char>=?¶
- (char>=? char1 char2 ...)
Returns
#tif the Unicode scalar values of the arguments are monotonically non-increasing,#fotherwise.- Parameters:
char1 (character) – Two or more characters to compare.
- Returns:
#tif the arguments are monotonically non-increasing,#fotherwise.- Return type:
boolean
Example:
--> (char>=? #\b #\a) #t --> (char>=? #\a #\a) #t --> (char>=? #\a #\b) #f --> (char>=? #\c #\c #\b) #t
Case-Insensitive Comparison Procedures¶
char-ci=?¶
- (char-ci=? char1 char2 ...)
Returns
#tif all arguments are equal under case-folding,#fotherwise. Equivalent to applyingchar-foldcaseto all arguments before comparing withchar=?.- Parameters:
char1 (character) – Two or more characters to compare.
- Returns:
#tif all arguments are equal ignoring case,#fotherwise.- Return type:
boolean
Example:
--> (char-ci=? #\a #\A) #t --> (char-ci=? #\a #\a #\A) #t --> (char-ci=? #\a #\b) #f
char-ci<?¶
- (char-ci<? char1 char2 ...)
Returns
#tif the case-folded Unicode scalar values of the arguments are monotonically increasing,#fotherwise. Equivalent to applyingchar-foldcaseto all arguments before comparing withchar<?.- Parameters:
char1 (character) – Two or more characters to compare.
- Returns:
#tif the arguments are monotonically increasing ignoring case,#fotherwise.- Return type:
boolean
Example:
--> (char-ci<? #\a #\B) #t --> (char-ci<? #\A #\b) #t --> (char-ci<? #\b #\A) #f
char-ci<=?¶
- (char-ci<=? char1 char2 ...)
Returns
#tif the case-folded Unicode scalar values of the arguments are monotonically non-decreasing,#fotherwise. Equivalent to applyingchar-foldcaseto all arguments before comparing withchar<=?.- Parameters:
char1 (character) – Two or more characters to compare.
- Returns:
#tif the arguments are monotonically non-decreasing ignoring case,#fotherwise.- Return type:
boolean
Example:
--> (char-ci<=? #\a #\B) #t --> (char-ci<=? #\A #\a) #t --> (char-ci<=? #\B #\a) #f
char-ci>?¶
- (char-ci>? char1 char2 ...)
Returns
#tif the case-folded Unicode scalar values of the arguments are monotonically decreasing,#fotherwise. Equivalent to applyingchar-foldcaseto all arguments before comparing withchar>?.- Parameters:
char1 (character) – Two or more characters to compare.
- Returns:
#tif the arguments are monotonically decreasing ignoring case,#fotherwise.- Return type:
boolean
Example:
--> (char-ci>? #\B #\a) #t --> (char-ci>? #\A #\b) #f --> (char-ci>? #\c #\B #\a) #t
char-ci>=?¶
- (char-ci>=? char1 char2 ...)
Returns
#tif the case-folded Unicode scalar values of the arguments are monotonically non-increasing,#fotherwise. Equivalent to applyingchar-foldcaseto all arguments before comparing withchar>=?.- Parameters:
char1 (character) – Two or more characters to compare.
- Returns:
#tif the arguments are monotonically non-increasing ignoring case,#fotherwise.- Return type:
boolean
Example:
--> (char-ci>=? #\B #\a) #t --> (char-ci>=? #\A #\a) #t --> (char-ci>=? #\a #\B) #f --> (char-ci>=? #\C #\b #\A) #t