stringprep.h: GNU Libidn API Reference Manual

stringprep.h

stringprep.h — Stringprep-related functions

Functions

#define	stringprep_nameprep()
#define	stringprep_nameprep_no_unassigned()
#define	stringprep_plain()
#define	stringprep_kerberos5()
#define	stringprep_xmpp_nodeprep()
#define	stringprep_xmpp_resourceprep()
#define	stringprep_iscsi()
int	stringprep_4i ()
int	stringprep_4zi ()
int	stringprep ()
int	stringprep_profile ()
const char *	stringprep_strerror ()
const char *	stringprep_check_version ()
int	stringprep_unichar_to_utf8 ()
uint32_t	stringprep_utf8_to_unichar ()
uint32_t *	stringprep_utf8_to_ucs4 ()
char *	stringprep_ucs4_to_utf8 ()
char *	stringprep_utf8_nfkc_normalize ()
uint32_t *	stringprep_ucs4_nfkc_normalize ()
const char *	stringprep_locale_charset ()
char *	stringprep_convert ()
char *	stringprep_locale_to_utf8 ()
char *	stringprep_utf8_to_locale ()

Types and Values

#define	IDNAPI
#define	STRINGPREP_VERSION
enum	Stringprep_rc
enum	Stringprep_profile_flags
enum	Stringprep_profile_steps
#define	STRINGPREP_MAX_MAP_CHARS
struct	Stringprep_table_element
struct	Stringprep_table
typedef	Stringprep_profile
struct	Stringprep_profiles

Description

Stringprep-related functions.

Functions

stringprep_nameprep()

#define             stringprep_nameprep(in, maxlen)

Prepare the input UTF-8 string according to the nameprep profile. The AllowUnassigned flag is true, use stringprep_nameprep_no_unassigned() if you want a false AllowUnassigned. Returns 0 iff successful, or an error code.

Parameters

in	input/output array with string to prepare.
maxlen	maximum length of input/output array.

stringprep_nameprep_no_unassigned()

#define             stringprep_nameprep_no_unassigned(in, maxlen)

Prepare the input UTF-8 string according to the nameprep profile. The AllowUnassigned flag is false, use stringprep_nameprep() for true AllowUnassigned. Returns 0 iff successful, or an error code.

Parameters

in	input/output array with string to prepare.
maxlen	maximum length of input/output array.

stringprep_plain()

#define             stringprep_plain(in, maxlen)

Prepare the input UTF-8 string according to the draft SASL ANONYMOUS profile. Returns 0 iff successful, or an error code.

Parameters

in	input/output array with string to prepare.
maxlen	maximum length of input/output array.

stringprep_kerberos5()

#define             stringprep_kerberos5(in, maxlen)

Prepare the input UTF-8 string according to the draft Kerberos 5 node identifier profile. Returns 0 iff successful, or an error code.

Parameters

in	input/output array with string to prepare.
maxlen	maximum length of input/output array.

stringprep_xmpp_nodeprep()

#define             stringprep_xmpp_nodeprep(in, maxlen)

Prepare the input UTF-8 string according to the draft XMPP node identifier profile. Returns 0 iff successful, or an error code.

Parameters

in	input/output array with string to prepare.
maxlen	maximum length of input/output array.

stringprep_xmpp_resourceprep()

#define             stringprep_xmpp_resourceprep(in, maxlen)

Prepare the input UTF-8 string according to the draft XMPP resource identifier profile. Returns 0 iff successful, or an error code.

Parameters

in	input/output array with string to prepare.
maxlen	maximum length of input/output array.

stringprep_iscsi()

#define             stringprep_iscsi(in, maxlen)

Prepare the input UTF-8 string according to the draft iSCSI stringprep profile. Returns 0 iff successful, or an error code.

Parameters

in	input/output array with string to prepare.
maxlen	maximum length of input/output array.

stringprep_4i ()

int
stringprep_4i (uint32_t *ucs4,
               size_t *len,
               size_t maxucs4len,
               Stringprep_profile_flags flags,
               const Stringprep_profile *profile);

Prepare the input UCS-4 string according to the stringprep profile, and write back the result to the input string.

The input is not required to be zero terminated (ucs4 [len ] = 0). The output will not be zero terminated unless ucs4 [len ] = 0. Instead, see stringprep_4zi() if your input is zero terminated or if you want the output to be.

Since the stringprep operation can expand the string, maxucs4len indicate how large the buffer holding the string is. This function will not read or write to code points outside that size.

The flags are one of Stringprep_profile_flags values, or 0.

The profile contain the Stringprep_profile instructions to perform. Your application can define new profiles, possibly re-using the generic stringprep tables that always will be part of the library, or use one of the currently supported profiles.

Parameters

ucs4	input/output array with string to prepare.
len	on input, length of input array with Unicode code points, on exit, length of output array with Unicode code points.
maxucs4len	maximum length of input/output array.
flags	a Stringprep_profile_flags value, or 0.
profile	pointer to Stringprep_profile to use.

Returns

Returns STRINGPREP_OK iff successful, or an Stringprep_rc error code.

stringprep_4zi ()

int
stringprep_4zi (uint32_t *ucs4,
                size_t maxucs4len,
                Stringprep_profile_flags flags,
                const Stringprep_profile *profile);

Prepare the input zero terminated UCS-4 string according to the stringprep profile, and write back the result to the input string.

Since the stringprep operation can expand the string, maxucs4len indicate how large the buffer holding the string is. This function will not read or write to code points outside that size.

The flags are one of Stringprep_profile_flags values, or 0.

Parameters

ucs4	input/output array with zero terminated string to prepare.
maxucs4len	maximum length of input/output array.
flags	a Stringprep_profile_flags value, or 0.
profile	pointer to Stringprep_profile to use.

Returns

Returns STRINGPREP_OK iff successful, or an Stringprep_rc error code.

stringprep ()

int
stringprep (char *in,
            size_t maxlen,
            Stringprep_profile_flags flags,
            const Stringprep_profile *profile);

Prepare the input zero terminated UTF-8 string according to the stringprep profile, and write back the result to the input string.

Note that you must convert strings entered in the systems locale into UTF-8 before using this function, see stringprep_locale_to_utf8().

Since the stringprep operation can expand the string, maxlen indicate how large the buffer holding the string is. This function will not read or write to characters outside that size.

The flags are one of Stringprep_profile_flags values, or 0.

Parameters

in	input/output array with string to prepare.
maxlen	maximum length of input/output array.
flags	a Stringprep_profile_flags value, or 0.
profile	pointer to Stringprep_profile to use.

Returns

Returns STRINGPREP_OK iff successful, or an error code.

stringprep_profile ()

int
stringprep_profile (const char *in,
                    char **out,
                    const char *profile,
                    Stringprep_profile_flags flags);

Prepare the input zero terminated UTF-8 string according to the stringprep profile, and return the result in a newly allocated variable.

Note that you must convert strings entered in the systems locale into UTF-8 before using this function, see stringprep_locale_to_utf8().

The output out variable must be deallocated by the caller.

The flags are one of Stringprep_profile_flags values, or 0.

The profile specifies the name of the stringprep profile to use. It must be one of the internally supported stringprep profiles.

Parameters

in	input array with UTF-8 string to prepare.
out	output variable with pointer to newly allocate string.
profile	name of stringprep profile to use.
flags	a Stringprep_profile_flags value, or 0.

Returns

Returns STRINGPREP_OK iff successful, or an error code.

stringprep_strerror ()

const char *
stringprep_strerror (Stringprep_rc rc);

Convert a return code integer to a text string. This string can be used to output a diagnostic message to the user.

STRINGPREP_OK: Successful operation. This value is guaranteed to always be zero, the remaining ones are only guaranteed to hold non-zero values, for logical comparison purposes. STRINGPREP_CONTAINS_UNASSIGNED: String contain unassigned Unicode code points, which is forbidden by the profile. STRINGPREP_CONTAINS_PROHIBITED: String contain code points prohibited by the profile. STRINGPREP_BIDI_BOTH_L_AND_RAL: String contain code points with conflicting bidirection category. STRINGPREP_BIDI_LEADTRAIL_NOT_RAL: Leading and trailing character in string not of proper bidirectional category. STRINGPREP_BIDI_CONTAINS_PROHIBITED: Contains prohibited code points detected by bidirectional code. STRINGPREP_TOO_SMALL_BUFFER: Buffer handed to function was too small. This usually indicate a problem in the calling application. STRINGPREP_PROFILE_ERROR: The stringprep profile was inconsistent. This usually indicate an internal error in the library. STRINGPREP_FLAG_ERROR: The supplied flag conflicted with profile. This usually indicate a problem in the calling application. STRINGPREP_UNKNOWN_PROFILE: The supplied profile name was not known to the library. STRINGPREP_ICONV_ERROR: Character encoding conversion error. STRINGPREP_NFKC_FAILED: The Unicode NFKC operation failed. This usually indicate an internal error in the library. STRINGPREP_MALLOC_ERROR: The malloc() was out of memory. This is usually a fatal error.

Parameters

a Stringprep_rc return code.

Returns

Returns a pointer to a statically allocated string containing a description of the error with the return code rc .

stringprep_check_version ()

const char *
stringprep_check_version (const char *req_version);

Check that the version of the library is at minimum the requested one and return the version string; return NULL if the condition is not satisfied. If a NULL is passed to this function, no check is done, but the version string is simply returned.

See STRINGPREP_VERSION for a suitable req_version string.

Parameters

req_version

Required version number, or NULL.

Returns

Version string of run-time library, or NULL if the run-time library does not meet the required version number.

stringprep_unichar_to_utf8 ()

int
stringprep_unichar_to_utf8 (uint32_t c,
                            char *outbuf);

Converts a single character to UTF-8.

Parameters

c	a ISO10646 character code
outbuf	output buffer, must have at least 6 bytes of space. If `NULL`, the length will be computed and returned and nothing will be written to `outbuf` .

Returns

number of bytes written.

stringprep_utf8_to_unichar ()

uint32_t
stringprep_utf8_to_unichar (const char *p);

Converts a sequence of bytes encoded as UTF-8 to a Unicode character. If p does not point to a valid UTF-8 encoded character, results are undefined.

Parameters

a pointer to Unicode character encoded as UTF-8

Returns

the resulting character.

stringprep_utf8_to_ucs4 ()

uint32_t *
stringprep_utf8_to_ucs4 (const char *str,
                         ssize_t len,
                         size_t *items_written);

Convert a string from UTF-8 to a 32-bit fixed width representation as UCS-4. The function now performs error checking to verify that the input is valid UTF-8 (before it was documented to not do error checking).

Parameters

str	a UTF-8 encoded string
len	the maximum length of `str` to use. If `len` < 0, then the string is nul-terminated.
items_written	location to store the number of characters in the result, or `NULL`.

Returns

a pointer to a newly allocated UCS-4 string. This value must be deallocated by the caller.

stringprep_ucs4_to_utf8 ()

char *
stringprep_ucs4_to_utf8 (const uint32_t *str,
                         ssize_t len,
                         size_t *items_read,
                         size_t *items_written);

Convert a string from a 32-bit fixed width representation as UCS-4. to UTF-8. The result will be terminated with a 0 byte.

Parameters

str	a UCS-4 encoded string
len	the maximum length of `str` to use. If `len` < 0, then the string is terminated with a 0 character.
items_read	location to store number of characters read read, or `NULL`.
items_written	location to store number of bytes written or `NULL`. The value here stored does not include the trailing 0 byte.

Returns

a pointer to a newly allocated UTF-8 string. This value must be deallocated by the caller. If an error occurs, NULL will be returned.

stringprep_utf8_nfkc_normalize ()

char *
stringprep_utf8_nfkc_normalize (const char *str,
                                ssize_t len);

Converts a string into canonical form, standardizing such issues as whether a character with an accent is represented as a base character and combining accent or as a single precomposed character.

The normalization mode is NFKC (ALL COMPOSE). It standardizes differences that do not affect the text content, such as the above-mentioned accent representation. It standardizes the "compatibility" characters in Unicode, such as SUPERSCRIPT THREE to the standard forms (in this case DIGIT THREE). Formatting information may be lost but for most text operations such characters should be considered the same. It returns a result with composed forms rather than a maximally decomposed form.

Parameters

str	a UTF-8 encoded string.
len	length of `str` , in bytes, or -1 if `str` is nul-terminated.

Returns

a newly allocated string, that is the NFKC normalized form of str .

stringprep_ucs4_nfkc_normalize ()

uint32_t *
stringprep_ucs4_nfkc_normalize (const uint32_t *str,
                                ssize_t len);

Converts a UCS4 string into canonical form, see stringprep_utf8_nfkc_normalize() for more information.

Parameters

str	a Unicode string.
len	length of `str` array, or -1 if `str` is nul-terminated.

Returns

a newly allocated Unicode string, that is the NFKC normalized form of str .

stringprep_locale_charset ()

const char *
stringprep_locale_charset (void);

Find out current locale charset. The function respect the CHARSET environment variable, but typically uses nl_langinfo(CODESET) when it is supported. It fall back on "ASCII" if CHARSET isn't set and nl_langinfo isn't supported or return anything.

Note that this function return the application's locale's preferred charset (or thread's locale's preferred charset, if your system support thread-specific locales). It does not return what the system may be using. Thus, if you receive data from external sources you cannot in general use this function to guess what charset it is encoded in. Use stringprep_convert from the external representation into the charset returned by this function, to have data in the locale encoding.

Returns

Return the character set used by the current locale. It will never return NULL, but use "ASCII" as a fallback.

stringprep_convert ()

char *
stringprep_convert (const char *str,
                    const char *to_codeset,
                    const char *from_codeset);

Convert the string from one character set to another using the system's iconv() function.

Parameters

str	input zero-terminated string.
to_codeset	name of destination character set.
from_codeset	name of origin character set, as used by `str` .

Returns

Returns newly allocated zero-terminated string which is str transcoded into to_codeset.

stringprep_locale_to_utf8 ()

char *
stringprep_locale_to_utf8 (const char *str);

Convert string encoded in the locale's character set into UTF-8 by using stringprep_convert().

Parameters

str

input zero terminated string.

Returns

Returns newly allocated zero-terminated string which is str transcoded into UTF-8.

stringprep_utf8_to_locale ()

char *
stringprep_utf8_to_locale (const char *str);

Convert string encoded in UTF-8 into the locale's character set by using stringprep_convert().

Parameters

str

input zero terminated string.

Returns

Returns newly allocated zero-terminated string which is str transcoded into the locale's character set.

Types and Values

IDNAPI

#define             IDNAPI

Symbol holding shared library API visibility decorator.

This is used internally by the library header file and should never be used or modified by the application.

https://www.gnu.org/software/gnulib/manual/html_node/Exported-Symbols-of-Shared-Libraries.html

STRINGPREP_VERSION

# define STRINGPREP_VERSION "1.43"

String defined via CPP denoting the header file version number. Used together with stringprep_check_version() to verify header file and run-time library consistency.

enum Stringprep_rc

Enumerated return codes of stringprep(), stringprep_profile() functions (and macros using those functions). The value 0 is guaranteed to always correspond to success.

Members

STRINGPREP_OK	Successful operation. This value is guaranteed to always be zero, the remaining ones are only guaranteed to hold non-zero values, for logical comparison purposes.
STRINGPREP_CONTAINS_UNASSIGNED	String contain unassigned Unicode code points, which is forbidden by the profile.
STRINGPREP_CONTAINS_PROHIBITED	String contain code points prohibited by the profile.
STRINGPREP_BIDI_BOTH_L_AND_RAL	String contain code points with conflicting bidirection category.
STRINGPREP_BIDI_LEADTRAIL_NOT_RAL	Leading and trailing character in string not of proper bidirectional category.
STRINGPREP_BIDI_CONTAINS_PROHIBITED	Contains prohibited code points detected by bidirectional code.
STRINGPREP_TOO_SMALL_BUFFER	Buffer handed to function was too small. This usually indicate a problem in the calling application.
STRINGPREP_PROFILE_ERROR	The stringprep profile was inconsistent. This usually indicate an internal error in the library.
STRINGPREP_FLAG_ERROR	The supplied flag conflicted with profile. This usually indicate a problem in the calling application.
STRINGPREP_UNKNOWN_PROFILE	The supplied profile name was not known to the library.
STRINGPREP_ICONV_ERROR	Character encoding conversion error.
STRINGPREP_NFKC_FAILED	The Unicode NFKC operation failed. This usually indicate an internal error in the library.
STRINGPREP_MALLOC_ERROR	The `malloc()` was out of memory. This is usually a fatal error.

enum Stringprep_profile_flags

Stringprep profile flags.

Members

STRINGPREP_NO_NFKC	Disable the NFKC normalization, as well as selecting the non-NFKC case folding tables. Usually the profile specifies BIDI and NFKC settings, and applications should not override it unless in special situations.
STRINGPREP_NO_BIDI	Disable the BIDI step. Usually the profile specifies BIDI and NFKC settings, and applications should not override it unless in special situations.
STRINGPREP_NO_UNASSIGNED	Make the library return with an error if string contains unassigned characters according to profile.

enum Stringprep_profile_steps

Various steps in the stringprep algorithm. You really want to study the source code to understand this one. Only useful if you want to add another profile.

Members

STRINGPREP_NFKC	The NFKC step.
STRINGPREP_BIDI	The BIDI step.
STRINGPREP_MAP_TABLE	The MAP step.
STRINGPREP_UNASSIGNED_TABLE	The Unassigned step.
STRINGPREP_PROHIBIT_TABLE	The Prohibited step.
STRINGPREP_BIDI_PROHIBIT_TABLE	The BIDI-Prohibited step.
STRINGPREP_BIDI_RAL_TABLE	The BIDI-RAL step.
STRINGPREP_BIDI_L_TABLE	The BIDI-L step.

STRINGPREP_MAX_MAP_CHARS

# define STRINGPREP_MAX_MAP_CHARS 4

Maximum number of code points that can replace a single code point, during stringprep mapping.

struct Stringprep_table_element

struct Stringprep_table_element {
    uint32_t start;
    uint32_t end;
    uint32_t map[STRINGPREP_MAX_MAP_CHARS];
};

Stringprep profile table element.

Members

uint32_t `start`;	starting codepoint.
uint32_t `end`;	ending codepoint, 0 if only one character.
uint32_t `map`[STRINGPREP_MAX_MAP_CHARS];	codepoints to map `start` into, NULL if end is not 0.

struct Stringprep_table

struct Stringprep_table {
    Stringprep_profile_steps operation;
    Stringprep_profile_flags flags;
    const Stringprep_table_element *table;
    size_t table_size;
};

Stringprep profile table.

Members

Stringprep_profile_steps `operation`;	a Stringprep_profile_steps value
Stringprep_profile_flags `flags`;	a Stringprep_profile_flags value
const Stringprep_table_element *`table`;	zero-terminated array of `Stringprep_table_element` elements.
size_t `table_size`;	size of `table` , to speed up searching.

Stringprep_profile

  typedef struct Stringprep_table Stringprep_profile;

Stringprep profile table.

struct Stringprep_profiles

struct Stringprep_profiles {
    const char *name;
    const Stringprep_profile *tables;
};

Element structure

Members

const char *`name`;	name of stringprep profile.
const Stringprep_profile *`tables`;	zero-terminated array of `Stringprep_profile` elements.