Software:C character classification

C character classification is an operation provided by a group of functions in the ANSI C Standard Library for the C programming language. These functions are used to test characters for membership in a particular class of characters, such as alphabetic characters, control characters, etc. Both single-byte, and wide characters are supported.^[1]

History

Early C-language programmers working on the Unix operating system developed programming idioms for classifying characters into different types. For example, for the ASCII character set, the following expression identifies a letter, when its value is true:

('A' <= c && c <= 'Z') || ('a' <= c && c <= 'z')

As this may be expressed in multiple formulations, it became desirable to introduce short, standardized forms of such tests that were placed in the system-wide header file ctype.h.

Implementation

Unlike the above example, the character classification routines are not written as comparison tests. In most C libraries, they are written as static table lookups instead of macros or functions.

For example, an array of 256 eight-bit integers, arranged as bitfields, is created, where each bit corresponds to a particular property of the character, e.g., isdigit, isalpha. If the lowest-order bit of the integers corresponds to the isdigit property, the code could be written as

#define isdigit(x) (TABLE[x] & 1)

Early versions of Linux used a potentially faulty method similar to the first code sample:

#define isdigit(x) ((x) >= '0' && (x) <= '9')

This can cause problems if when the macro expands, the expression substituted for x has a side effect. For example, if one calls isdigit(x++) or isdigit(run_some_program()). It is not immediately evident that the argument to isdigit is evaluated twice. For this reason, the table-based approach is generally used.

Overview of functions

The functions that operate on single-byte characters are defined in ctype.h header file (cctype in C++). The functions that operate on wide characters are defined in wctype.h header file (cwctype in C++).

The classification is evaluated according to the effective locale.

Byte character	Wide character	Description
`isalnum`	`iswalnum`	checks whether the operand is alphanumeric
`isalpha`	`iswalpha`	checks whether the operand is alphabetic
`islower`	`iswlower`	checks whether the operand is lowercase
`isupper`	`iswupper`	checks whether the operand is an uppercase
`isdigit`	`iswdigit`	checks whether the operand is a digit
`isxdigit`	`iswxdigit`	checks whether the operand is hexadecimal
`iscntrl`	`iswcntrl`	checks whether the operand is a control character
`isgraph`	`iswgraph`	checks whether the operand is a graphical character
`isspace`	`iswspace`	checks whether the operand is space
`isblank`	`iswblank`	checks whether the operand is a blank space character
`isprint`	`iswprint`	checks whether the operand is a printable character
`ispunct`	`iswpunct`	checks whether the operand is punctuation
`tolower`	`towlower`	converts the operand to lowercase
`toupper`	`towupper`	converts the operand to uppercase
N/A	`iswctype`	checks whether the operand falls into specific class
N/A	`towctrans`	converts the operand using a specific mapping
N/A	`wctype`	returns a wide character class to be used with `iswctype`
N/A	`wctrans`	returns a transformation mapping to be used with `towctrans`

References

↑ ISO/IEC 9899:1999 specification. p. 193, § 7.4. http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf.

External links

0.00

(0 votes)

Original source: https://en.wikipedia.org/wiki/C character classification. Read more

[1] ISO/IEC 9899:1999 specification. p. 193, § 7.4. http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf.

[1]

v t e C programming language
ANSI C C89 and C90 C99 C11 C18 C2x Embedded C MISRA C
Features	Functions Header files Libraries Operators String Syntax Preprocessor Data types
Standard library functions	Char (ctype.h) File I/O (stdio.h) Math (math.h) Dynamic memory (stdlib.h) String (string.h) Time (time.h) Variadic (stdarg.h) POSIX
Standard libraries	Bionic libhybris dietlibc EGLIBC glibc klibc Microsoft Run-time Library musl Newlib uClibc BSD libc
Compilers	Comparison of compilers ACK Borland Turbo C Clang GCC ICC LCC PCC SDCC TCC Microsoft Visual Studio / Express / C++ Watcom C/C++
IDEs	Comparison of IDEs Anjuta Code CodeLite Eclipse Geany Microsoft Visual Studio NetBeans
Comparison with other languages	Compatibility of C and C++ Comparison with Embedded C Comparison with Pascal Comparison of programming languages
Descendant languages	C++ C# D Objective-C Alef Limbo Go Vala
Category

Anonymous

Search

Software:C character classification

Namespaces

More

Page actions

Contents

History

Implementation

Overview of functions

References

External links

Navigation

Navigation

Help

Translate

Wiki tools

Wiki tools

Anonymous

Search

Software:C character classification

History

Implementation

Overview of functions

References

External links

Navigation

Wiki tools

Page tools

Other projects

Categories