String handling

Utility functions to operate on strings.

Some of them are portable versions and/or replacements for some useful GNU and POSIX extesions.

Summary
String handlingUtility functions to operate on strings.
Functions
nacore_char_utf8_encode()Encodes a Unicode code point into an UTF-8 character.
nacore_char_utf8_decode()Decodes the Unicode code point associated to an UTF-8 character.
nacore_char_utf16_encode()Encodes a Unicode code point into an UTF-16 character.
nacore_char_utf16_decode()Decodes the Unicode code point associated to an UTF-16 character.
nacore_string_utf8_to_utf16_len()Calculates the number of bytes needed to store the UTF-16 representation of a UTF-8 encoded string, excluding the terminating null character.
nacore_string_utf8_to_utf16()Converts a UTF-8 encoded string to UTF-16 into a previously allocated buffer, including the terminating null character.
nacore_string_utf8_to_utf16_a()Converts a UTF-8 encoded string to UTF-16, allocating the output string.
nacore_string_utf16_to_utf8_len()Calculates the number of bytes needed to store the UTF-8 representation of a UTF-16 encoded string, excluding the terminating null character.
nacore_string_utf16_to_utf8()Converts a UTF-16 encoded string to UTF-8 into a previously allocated buffer, including the terminating null character.
nacore_string_utf16_to_utf8_a()Converts a UTF-16 encoded string to UTF-8, allocating the output string.
nacore_string_get_size()Returns the number of bytes making up a string including the terminating null character.
nacore_strnlen()Gets the number of bytes in a string, not including the terminating null character, up to a certain length.
nacore_strdup()Analog of strcpy() that allocates a string large enough to hold the output including the terminating null character.
nacore_asprintf()Analog of sprintf() that allocates a string large enough to hold the output including the terminating null character.
nacore_asprintf_nl()Analog of sprintf() that allocates a string large enough to hold the output including the terminating null character.
nacore_vasprintf()Analog of vsprintf() that allocates a string large enough to hold the output including the terminating null character.
nacore_vasprintf_nl()Analog of vsprintf() that allocates a string large enough to hold the output including the terminating null character.
nacore_string_split()Creates an auto-allocating list of strings by splitting the given string on boundaries formed by the given separator string.

Functions

nacore_char_utf8_encode()

_NACORE_DEF size_t nacore_char_utf8_encode(char *utf8c,
uint32_t cp)

Encodes a Unicode code point into an UTF-8 character.

If utf8c is NULL, it does only calculate the length in bytes of the UTF-8 character corresponding to the given code point.

Parameters

utf8cPointer to a large enough buffer to contain the resulting UTF-8 character (worst case: 4 bytes wide) or NULL.
cpCode point.

Returns

Length in bytes of the resulting UTF-8 character or 0 if cp is outside of the valid value range (0 to 0x10ffff, except 0xd800 to 0xdfff, 0xfeff and 0xfffe).

nacore_char_utf8_decode()

_NACORE_DEF size_t nacore_char_utf8_decode(const char *utf8c,
uint32_t *cp)

Decodes the Unicode code point associated to an UTF-8 character.

If cp is not NULL and the encoding is valid, the code point is stored into *cp.

Parameters

utf8cPointer to the buffer containing the UTF-8 character to be decoded.
cpPointer to the memory location where to put the code point value or NULL.

Returns

Length in bytes of the decoded character or 0 if the encoding is not valid, in which case nothing is written in *cp.

nacore_char_utf16_encode()

_NACORE_DEF size_t nacore_char_utf16_encode(uint16_t *utf16c,
uint32_t cp)

Encodes a Unicode code point into an UTF-16 character.

If utf16c is NULL, it does only calculate the length in bytes of the UTF-16 character corresponding to the given code point.

Parameters

utf16cPointer to a large enough buffer to contain the resulting UTF-16 character (worst case: 4 bytes wide) or NULL.
cpCode point.

Returns

Length in bytes of the resulting UTF-16 character or 0 if cp is outside of the valid value range (0 to 0x10ffff, except 0xd800 to 0xdfff, 0xfeff and 0xfffe).

nacore_char_utf16_decode()

_NACORE_DEF size_t nacore_char_utf16_decode(const uint16_t *utf16c,
uint32_t *cp)

Decodes the Unicode code point associated to an UTF-16 character.

If cp is not NULL and the encoding is valid, the code point is stored into *cp.

Parameters

utf16cPointer to the buffer containing the UTF-16 character to be decoded.
cpPointer to the memory location where to put the code point value or NULL.

Returns

Length in bytes of the decoded character or 0 if the encoding is not valid, in which case nothing is written in *cp.

nacore_string_utf8_to_utf16_len()

_NACORE_DEF size_t nacore_string_utf8_to_utf16_len(const char *str_utf8)

Calculates the number of bytes needed to store the UTF-16 representation of a UTF-8 encoded string, excluding the terminating null character.

It tries to skip badly encoded data, yet it’s actually just guessing, hence try to avoid relying on this behavior.

Parameters

str_utf8UTF-8 encoded string.

Returns

Number of bytes needed to store the UTF-16 representation of str_utf8, excluding the terminating null character.

nacore_string_utf8_to_utf16()

_NACORE_DEF void nacore_string_utf8_to_utf16(uint16_t *buf,
const char *str_utf8)

Converts a UTF-8 encoded string to UTF-16 into a previously allocated buffer, including the terminating null character.

It tries to skip badly encoded data, yet it’s actually just guessing, hence try to avoid relying on this behavior.

Parameters

bufOutput buffer.
str_utf8UTF-8 encoded string to be converted.

nacore_string_utf8_to_utf16_a()

_NACORE_DEF uint16_t * nacore_string_utf8_to_utf16_a(const char *str_utf8)

Converts a UTF-8 encoded string to UTF-16, allocating the output string.

It tries to skip badly encoded data, yet it’s actually just guessing, hence try to avoid relying on this behavior.

Parameters

str_utf8UTF-8 encoded string to be converted.

Returns

A malloc()-allocated, UTF-16 encoded string or NULL if there was not enough memory.  The caller is in charge of free()ing such string.

nacore_string_utf16_to_utf8_len()

_NACORE_DEF size_t nacore_string_utf16_to_utf8_len(const uint16_t *str_utf16)

Calculates the number of bytes needed to store the UTF-8 representation of a UTF-16 encoded string, excluding the terminating null character.

It tries to skip badly encoded data, yet it’s actually just guessing, hence try to avoid relying on this behavior.

Parameters

str_utf16UTF-16 encoded string.

Returns

Number of bytes needed to store the UTF-8 representation of str_utf16, excluding the terminating null character.

nacore_string_utf16_to_utf8()

_NACORE_DEF void nacore_string_utf16_to_utf8(char *buf,
const uint16_t *str_utf16)

Converts a UTF-16 encoded string to UTF-8 into a previously allocated buffer, including the terminating null character.

It tries to skip badly encoded data, yet it’s actually just guessing, hence try to avoid relying on this behavior.

Parameters

bufOutput buffer.
str_utf16UTF-16 encoded string to be converted.

nacore_string_utf16_to_utf8_a()

_NACORE_DEF char * nacore_string_utf16_to_utf8_a(const uint16_t *str_utf16)

Converts a UTF-16 encoded string to UTF-8, allocating the output string.

It tries to skip badly encoded data, yet it’s actually just guessing, hence try to avoid relying on this behavior.

Parameters

str_utf16UTF-16 encoded string to be converted.

Returns

A malloc()-allocated, UTF-8 encoded string or NULL if there was not enough memory.  The caller is in charge of free()ing such string.

nacore_string_get_size()

_NACORE_DEF size_t nacore_string_get_size(const char *s,
void *unused)

Returns the number of bytes making up a string including the terminating null character.

Can be safely casted to nacore_get_size_cb type.

Parameters

sThe string.
unusedUnused, set to NULL.

Returns

Number of bytes.

nacore_strnlen()

_NACORE_DEF size_t nacore_strnlen(const char *s,
size_t maxlen)

Gets the number of bytes in a string, not including the terminating null character, up to a certain length.

In doing this, the function looks only at the first maxlen bytes at s and never beyond s + maxlen.

Parameters

sString to be examined.
maxlenMaximum number of bytes.

Returns

strlen(s) if that is less than maxlen, or maxlen if there is no null character among the first maxlen characters pointed to by s.

nacore_strdup()

_NACORE_DEF char * nacore_strdup(const char *s,
void *unused)

Analog of strcpy() that allocates a string large enough to hold the output including the terminating null character.

The allocated storage should be free()d when it is no longer needed.

Can be safely casted to nacore_to_string_cb type.

Parameters

sThe string to copy.
unusedUnsed, set to NULL.

Returns

A malloc()-allocated string copy or NULL if there was not enough memory.

nacore_asprintf()

_NACORE_DEF NACORE_FORMAT_PRINTF(
   2,
   3
) int nacore_asprintf(char **strp, const char *fmt, ...)

Analog of sprintf() that allocates a string large enough to hold the output including the terminating null character.

If strp is not NULL, it is set to a pointer to the malloc()-allocated string.  Such string should be free()d when it is no longer needed.

The function is affected by locale settings.

It supports all C99 conversion specifications, except %lc and %ls kinds.

Length and precision modifiers for %s kind of conversion specifications indicate number of bytes.

This function can be considered thread-safe as long as no other thread changes locale settings of the calling thread while it is running.

Parameters

strpMemory location where to put a pointer to the allocated string or NULL.
fmtprintf()-like format string.
...printf()-like extra arguments.

Returns

Length of the output string in bytes excluding the terminating null character.  If there was not enough memory and strp is not NULL, *strp is set to NULL.

nacore_asprintf_nl()

_NACORE_DEF NACORE_FORMAT_PRINTF(
   2,
   3
) int nacore_asprintf_nl(char **strp, const char *fmt, ...)

Analog of sprintf() that allocates a string large enough to hold the output including the terminating null character.

If strp is not NULL, it is set to a pointer to the malloc()-allocated string.  Such string should be free()d when it is no longer needed.

The function is not affected by locale settings (i.e., it acts as if the “C” locale is used).

It supports all C99 conversion specifications, except %lc and %ls kinds.

Length and precision modifiers for %s kind of conversion specifications indicate number of bytes.

Parameters

strpMemory location where to put a pointer to the allocated string or NULL.
fmtprintf()-like format string.
...printf()-like extra arguments.

Returns

Length of the output string in bytes excluding the terminating null character.  If there was not enough memory and strp is not NULL, *strp is set to NULL.

nacore_vasprintf()

_NACORE_DEF NACORE_FORMAT_VPRINTF(
   2
) int nacore_vasprintf(char **strp, const char *fmt, va_list ap)

Analog of vsprintf() that allocates a string large enough to hold the output including the terminating null character.

If strp is not NULL, it is set to a pointer to the malloc()-allocated string.  Such string should be free()d when it is no longer needed.

The function is affected by locale settings.

It supports all C99 conversion specifications, except %lc and %ls kinds.

Length and precision modifiers for %s kind of conversion specifications indicate number of bytes.

va_end() is not called on ap inside the function, hence its value is undefined after the call.

This function can be considered thread-safe as long as no other thread changes locale settings of the calling thread while it is running.

Parameters

strpMemory location where to put a pointer to the allocated string or NULL.
fmtvprintf()-like format string.
apvprintf()-like va_list.

Returns

Length of the output string in bytes excluding the terminating null character.  If there was not enough memory and strp is not NULL, *strp is set to NULL.

nacore_vasprintf_nl()

_NACORE_DEF NACORE_FORMAT_VPRINTF(
   2
) int nacore_vasprintf_nl(char **strp, const char *fmt, va_list ap)

Analog of vsprintf() that allocates a string large enough to hold the output including the terminating null character.

If strp is not NULL, it is set to a pointer to the malloc()-allocated string.  Such string should be free()d when it is no longer needed.

The function is not affected by locale settings (i.e., it acts as if the “C” locale is used).

It supports all C99 conversion specifications, except %lc and %ls kinds.

Length and precision modifiers for %s kind of conversion specifications indicate number of bytes.

va_end() is not called on ap inside the function, hence its value is undefined after the call.

Parameters

strpMemory location where to put a pointer to the allocated string or NULL.
fmtvprintf()-like format string.
apvprintf()-like va_list.

Returns

Length of the output string in bytes excluding the terminating null character.  If there was not enough memory and strp is not NULL, *strp is set to NULL.

nacore_string_split()

_NACORE_DEF nacore_list nacore_string_split(const char *s,
const char *sep,
nacore_filter_cb filter_cb,
void *filter_opaque)

Creates an auto-allocating list of strings by splitting the given string on boundaries formed by the given separator string.

The list will use nacore_string_get_size() as data size callback.

If filter_cb is not NULL, it will be called along with filter_opaque for each substring; if it is to be filtered out (i.e., filter_cb returns 0) the substring will not be included in the resulting list.

Parameters

sInput string.
sepSeparator string.
filter_cbValue filtering callback.
filter_opaqueExtra opaque data to be passed to filter_cb or NULL.

Returns

Auto-allocating list of strings or NULL if there was not enough memory.

_NACORE_DEF size_t nacore_char_utf8_encode(char *utf8c,
uint32_t cp)
Encodes a Unicode code point into an UTF-8 character.
_NACORE_DEF size_t nacore_char_utf8_decode(const char *utf8c,
uint32_t *cp)
Decodes the Unicode code point associated to an UTF-8 character.
_NACORE_DEF size_t nacore_char_utf16_encode(uint16_t *utf16c,
uint32_t cp)
Encodes a Unicode code point into an UTF-16 character.
_NACORE_DEF size_t nacore_char_utf16_decode(const uint16_t *utf16c,
uint32_t *cp)
Decodes the Unicode code point associated to an UTF-16 character.
_NACORE_DEF size_t nacore_string_utf8_to_utf16_len(const char *str_utf8)
Calculates the number of bytes needed to store the UTF-16 representation of a UTF-8 encoded string, excluding the terminating null character.
_NACORE_DEF void nacore_string_utf8_to_utf16(uint16_t *buf,
const char *str_utf8)
Converts a UTF-8 encoded string to UTF-16 into a previously allocated buffer, including the terminating null character.
_NACORE_DEF uint16_t * nacore_string_utf8_to_utf16_a(const char *str_utf8)
Converts a UTF-8 encoded string to UTF-16, allocating the output string.
_NACORE_DEF size_t nacore_string_utf16_to_utf8_len(const uint16_t *str_utf16)
Calculates the number of bytes needed to store the UTF-8 representation of a UTF-16 encoded string, excluding the terminating null character.
_NACORE_DEF void nacore_string_utf16_to_utf8(char *buf,
const uint16_t *str_utf16)
Converts a UTF-16 encoded string to UTF-8 into a previously allocated buffer, including the terminating null character.
_NACORE_DEF char * nacore_string_utf16_to_utf8_a(const uint16_t *str_utf16)
Converts a UTF-16 encoded string to UTF-8, allocating the output string.
_NACORE_DEF size_t nacore_string_get_size(const char *s,
void *unused)
Returns the number of bytes making up a string including the terminating null character.
_NACORE_DEF size_t nacore_strnlen(const char *s,
size_t maxlen)
Gets the number of bytes in a string, not including the terminating null character, up to a certain length.
_NACORE_DEF char * nacore_strdup(const char *s,
void *unused)
Analog of strcpy() that allocates a string large enough to hold the output including the terminating null character.
_NACORE_DEF NACORE_FORMAT_PRINTF(
   2,
   3
) int nacore_asprintf(char **strp, const char *fmt, ...)
Analog of sprintf() that allocates a string large enough to hold the output including the terminating null character.
_NACORE_DEF NACORE_FORMAT_PRINTF(
   2,
   3
) int nacore_asprintf_nl(char **strp, const char *fmt, ...)
Analog of sprintf() that allocates a string large enough to hold the output including the terminating null character.
_NACORE_DEF NACORE_FORMAT_VPRINTF(
   2
) int nacore_vasprintf(char **strp, const char *fmt, va_list ap)
Analog of vsprintf() that allocates a string large enough to hold the output including the terminating null character.
_NACORE_DEF NACORE_FORMAT_VPRINTF(
   2
) int nacore_vasprintf_nl(char **strp, const char *fmt, va_list ap)
Analog of vsprintf() that allocates a string large enough to hold the output including the terminating null character.
_NACORE_DEF nacore_list nacore_string_split(const char *s,
const char *sep,
nacore_filter_cb filter_cb,
void *filter_opaque)
Creates an auto-allocating list of strings by splitting the given string on boundaries formed by the given separator string.
typedef size_t (*nacore_get_size_cb)(const void *value, void *opaque)
A function that returns the size of some value.
typedef char * (*nacore_to_string_cb)(const void *value, void *opaque)
A function that retuns a textual description of some value.
Close