Characters

Overview

A character represents a single textual unit. Characters are written using the #\ prefix:

#\a
#\A
#\newline
#\space

Characters are distinct from strings. A string may contain many characters, while a character represents exactly one. For example:

--> #\a
#\a
--> "a"
"a"

Although these may look similar, the first is a character and the second is a string containing one character.

Characters are primarily used when working with textual data at a fine-grained level, such as:

  • Parsing input character by character

  • Implementing lexical analyzers

  • Performing character classification (alphabetic, numeric, whitespace, etc.)

  • Converting between character codes and textual representations

Characters in Cozenage are stored internally as 32-bit Unicode code points (UChar32). This means each character directly represents a full Unicode scalar value, making character comparisons, classification, and transformation straightforward and unambiguous. Unlike UTF-8 strings, which are variable-length byte sequences, a character object always represents exactly one Unicode code point.

This design allows characters such as λ, €, or Ω to behave identically to simple ASCII characters like A or ?.

Example:

--> #\λ
#\λ
--> (char? #\λ)
#true

Character comparison procedures allow you to test equality and ordering, and classification procedures determine properties such as whether a character is alphabetic, numeric, or whitespace.

Because characters are first-class objects, they can be stored in lists, vectors, sets, or maps, passed to procedures, and returned as values.

Understanding the distinction between characters and strings is important: characters represent atomic textual units, while strings represent ordered sequences of those units.

Named Character Literals

The Cozenage parser supports a set of user-friendly named Unicode character literals. These are accepted during input using the #\name syntax. When printed, however, characters are displayed as their actual Unicode glyph, not their named form.

For example:

--> #\lambda
#\λ

This means the name is only recognized by the reader; internally and when printed, the true Unicode character is used.

Below is the complete list of supported named character literals.

Supported Named Character Literals

Name

Glyph

Code Point

Alpha

Α

U+0391

Beta

Β

U+0392

Delta

Δ

U+0394

Gamma

Γ

U+0393

Iota

Ι

U+0399

Lambda

Λ

U+039B

Omega

Ω

U+03A9

Omicron

Ο

U+039F

Phi

Φ

U+03A6

Pi

Π

U+03A0

Psi

Ψ

U+03A8

Rho

Ρ

U+03A1

Sigma

Σ

U+03A3

Theta

Θ

U+0398

Xi

Ξ

U+039E

alpha

α

U+03B1

beta

β

U+03B2

chi

χ

U+03C7

copy

©

U+00A9

curren

¤

U+00A4

deg

°

U+00B0

delta

δ

U+03B4

divide

÷

U+00F7

epsilon

ε

U+03B5

eta

η

U+03B7

euro

U+20AC

gamma

γ

U+03B3

iota

ι

U+03B9

iquest

¿

U+00BF

kappa

κ

U+03BA

lambda

λ

U+03BB

micro

µ

U+00B5

mu

μ

U+03BC

omega

ω

U+03C9

para

U+00B6

phi

φ

U+03C6

pi

π

U+03C0

plusnm

±

U+00B1

pound

£

U+00A3

psi

ψ

U+03C8

reg

®

U+00AE

rho

ρ

U+03C1

sect

§

U+00A7

sigma

σ

U+03C3

tau

τ

U+03C4

theta

θ

U+03B8

times

×

U+00D7

xi

ξ

U+03BE

yen

¥

U+00A5

zeta

ζ

U+03B6

Character Procedures

Char Type Predicate Procedures

char-alphabetic?

(char-alphabetic? char)

Returns #true if char is an alphabetic character, and #false otherwise.

Parameters:

char (char) – The character to test.

Returns:

#true or #false.

Return type:

boolean

Example:

--> (char-alphabetic? #\a)
  #true
--> (char-alphabetic? #\Z)
  #true
--> (char-alphabetic? #\7)
  #false

char-numeric?

(char-numeric? char)

Returns #true if char is a numeric digit (0-9), and #false otherwise.

Parameters:

char (char) – The character to test.

Returns:

#true or #false.

Return type:

boolean

Example:

--> (char-numeric? #\5)
  #true
--> (char-numeric? #\x)
  #false

char-whitespace?

(char-whitespace? char)

Returns #true if char is a whitespace character (like space, tab, or newline), and #false otherwise.

Parameters:

char (char) – The character to test.

Returns:

#true or #false.

Return type:

boolean

Example:

--> (char-whitespace? #\space)
  #true
--> (char-whitespace? #\newline)
  #true
--> (char-whitespace? #\a)
  #false

char-upper-case?

(char-upper-case? char)

Returns #true if char is an uppercase letter, and #false otherwise.

Parameters:

char (char) – The character to test.

Returns:

#true or #false.

Return type:

boolean

Example:

--> (char-upper-case? #\A)
  #true
--> (char-upper-case? #\a)
  #false

char-lower-case?

(char-lower-case? char)

Returns #true if char is a lowercase letter, and #false otherwise.

Parameters:

char (char) – The character to test.

Returns:

#true or #false.

Return type:

boolean

Example:

--> (char-lower-case? #\z)
  #true
--> (char-lower-case? #\Z)
  #false

Char to Numeric Value and Inverse Procedures

digit-value

(digit-value char)

If char is a numeric digit, this procedure returns its integer value (0-9). If char is not a digit, it returns #false.

Parameters:

char (char) – The character to convert.

Returns:

An integer from 0-9 or #false.

Return type:

integer or boolean

Example:

--> (digit-value #\7)
  7
--> (digit-value #\a)
  #false

char->integer

(char->integer char)

Returns the Unicode scalar value of char as an exact integer. For Unicode characters, the result is in the range 0 to #xD7FF or #xE000 to #x10FFFF inclusive.

Parameters:

char (character) – The character to convert.

Returns:

The Unicode scalar value of char.

Return type:

integer

Example:

--> (char->integer #\A)
65
--> (char->integer #\a)
97
--> (char->integer #\space)
32
--> (char->integer #\λ)
955

integer->char

(integer->char n)

Returns the character whose Unicode scalar value is n. Raises an error if n is negative, greater than #x10FFFF, or falls in the surrogate range #xD800 to #xDFFF.

Parameters:

n (integer) – A Unicode scalar value.

Returns:

The character corresponding to Unicode scalar value n.

Return type:

character

Example:

--> (integer->char 65)
#\A
--> (integer->char 97)
#\a
--> (integer->char 32)
#\space
--> (integer->char 955)
#\λ

Case Conversion Procedures

char-upcase

(char-upcase char)

Returns the uppercase equivalent of char. If char is not a lowercase letter, it is returned unchanged.

Parameters:

char (char) – The character to convert.

Returns:

The uppercase version of the character.

Return type:

char

Example:

--> (char-upcase #\a)
  #\A
--> (char-upcase #\B)
  #\B

char-downcase

(char-downcase char)

Returns the lowercase equivalent of char. If char is not an uppercase letter, it is returned unchanged.

Parameters:

char (char) – The character to convert.

Returns:

The lowercase version of the character.

Return type:

char

Example:

--> (char-downcase #\A)
  #\a
--> (char-downcase #\b)
  #\b

char-foldcase

(char-foldcase char)

Applies Unicode case-folding to char. For most characters, this is the same as char-downcase, but it handles special cases for robust case-insensitive comparison.

Parameters:

char (char) – The character to convert.

Returns:

The folded-case version of the character.

Return type:

char

Example:

--> (char-foldcase #\A)
  #\a
--> (char-foldcase #\ς) ; Greek final sigma
  #\σ

Case-Sensitive Comparison Procedures

char=?

(char=? char1 char2 ...)

Returns #t if all arguments have the same Unicode scalar value, #f otherwise.

Parameters:

char1 (character) – Two or more characters to compare.

Returns:

#t if all arguments are equal, #f otherwise.

Return type:

boolean

Example:

--> (char=? #\a #\a)
#t
--> (char=? #\a #\b)
#f
--> (char=? #\a #\a #\a)
#t

char<?

(char<? char1 char2 ...)

Returns #t if the Unicode scalar values of the arguments are monotonically increasing, #f otherwise.

Parameters:

char1 (character) – Two or more characters to compare.

Returns:

#t if the arguments are monotonically increasing, #f otherwise.

Return type:

boolean

Example:

--> (char<? #\a #\b)
#t
--> (char<? #\b #\a)
#f
--> (char<? #\a #\b #\c)
#t

char<=?

(char<=? char1 char2 ...)

Returns #t if the Unicode scalar values of the arguments are monotonically non-decreasing, #f otherwise.

Parameters:

char1 (character) – Two or more characters to compare.

Returns:

#t if the arguments are monotonically non-decreasing, #f otherwise.

Return type:

boolean

Example:

--> (char<=? #\a #\b)
#t
--> (char<=? #\a #\a)
#t
--> (char<=? #\b #\a)
#f
--> (char<=? #\a #\a #\b)
#t

char>?

(char>? char1 char2 ...)

Returns #t if the Unicode scalar values of the arguments are monotonically decreasing, #f otherwise.

Parameters:

char1 (character) – Two or more characters to compare.

Returns:

#t if the arguments are monotonically decreasing, #f otherwise.

Return type:

boolean

Example:

--> (char>? #\b #\a)
#t
--> (char>? #\a #\b)
#f
--> (char>? #\c #\b #\a)
#t

char>=?

(char>=? char1 char2 ...)

Returns #t if the Unicode scalar values of the arguments are monotonically non-increasing, #f otherwise.

Parameters:

char1 (character) – Two or more characters to compare.

Returns:

#t if the arguments are monotonically non-increasing, #f otherwise.

Return type:

boolean

Example:

--> (char>=? #\b #\a)
#t
--> (char>=? #\a #\a)
#t
--> (char>=? #\a #\b)
#f
--> (char>=? #\c #\c #\b)
#t

Case-Insensitive Comparison Procedures

char-ci=?

(char-ci=? char1 char2 ...)

Returns #t if all arguments are equal under case-folding, #f otherwise. Equivalent to applying char-foldcase to all arguments before comparing with char=?.

Parameters:

char1 (character) – Two or more characters to compare.

Returns:

#t if all arguments are equal ignoring case, #f otherwise.

Return type:

boolean

Example:

--> (char-ci=? #\a #\A)
#t
--> (char-ci=? #\a #\a #\A)
#t
--> (char-ci=? #\a #\b)
#f

char-ci<?

(char-ci<? char1 char2 ...)

Returns #t if the case-folded Unicode scalar values of the arguments are monotonically increasing, #f otherwise. Equivalent to applying char-foldcase to all arguments before comparing with char<?.

Parameters:

char1 (character) – Two or more characters to compare.

Returns:

#t if the arguments are monotonically increasing ignoring case, #f otherwise.

Return type:

boolean

Example:

--> (char-ci<? #\a #\B)
#t
--> (char-ci<? #\A #\b)
#t
--> (char-ci<? #\b #\A)
#f

char-ci<=?

(char-ci<=? char1 char2 ...)

Returns #t if the case-folded Unicode scalar values of the arguments are monotonically non-decreasing, #f otherwise. Equivalent to applying char-foldcase to all arguments before comparing with char<=?.

Parameters:

char1 (character) – Two or more characters to compare.

Returns:

#t if the arguments are monotonically non-decreasing ignoring case, #f otherwise.

Return type:

boolean

Example:

--> (char-ci<=? #\a #\B)
#t
--> (char-ci<=? #\A #\a)
#t
--> (char-ci<=? #\B #\a)
#f

char-ci>?

(char-ci>? char1 char2 ...)

Returns #t if the case-folded Unicode scalar values of the arguments are monotonically decreasing, #f otherwise. Equivalent to applying char-foldcase to all arguments before comparing with char>?.

Parameters:

char1 (character) – Two or more characters to compare.

Returns:

#t if the arguments are monotonically decreasing ignoring case, #f otherwise.

Return type:

boolean

Example:

--> (char-ci>? #\B #\a)
#t
--> (char-ci>? #\A #\b)
#f
--> (char-ci>? #\c #\B #\a)
#t

char-ci>=?

(char-ci>=? char1 char2 ...)

Returns #t if the case-folded Unicode scalar values of the arguments are monotonically non-increasing, #f otherwise. Equivalent to applying char-foldcase to all arguments before comparing with char>=?.

Parameters:

char1 (character) – Two or more characters to compare.

Returns:

#t if the arguments are monotonically non-increasing ignoring case, #f otherwise.

Return type:

boolean

Example:

--> (char-ci>=? #\B #\a)
#t
--> (char-ci>=? #\A #\a)
#t
--> (char-ci>=? #\a #\B)
#f
--> (char-ci>=? #\C #\b #\A)
#t