1. Introduction
2. ALib String Classes
3. String Construction and Type Conversion
4. Non-Zero-Terminated String Detection
5. String Assembly
6. Other Aspects Of ALib String Types
7. Strings And Character Widths
- 7.1 String Literals
- 7.2 Platform-Independent Conversion
8. String Utility Classes
9. String Formatting

1. Introduction

C++ developers do not need to get motivated much about using a 3rd-party string library. This is due to the fact that the language itself does not offer powerful built-in types that allow convenient character string processing.

The situation is even a little worse, because in C++:

There is more than one character type defined.
The language keywords for narrow and wide characters do not specify a distinct size but are platform- and compiler-dependent.
C++ string literals are zero-terminated arrays (which for good reasons can be considered to be a legacy design mistake).
C++ standard library type std::string always allocates memory and copies assign data. "Lightweight" string class std::string_view was only introduced with C++ 17 - too late for today's libraries.

Because of this, every general purpose C++ library tends to invent it's own character string type and while ALib is no exception, this constitutes a problem in itself. It is a true dilemma: C++ developers need to rely on some external string library, but each new string library increases the problem of adding complexity to this very basic and fundamental domain.

Yes, a C++ developer lives in a string hell! And therefore, a main design goal of module ALib Strings is to mitigate the problems.

1.1 Library Design Goals

The design goals of module ALib Strings are:

1. Mitigate the "C++ string problem":

: The most important design goal behind module ALib Strings, is to have the string classes introduced be as open and interchangeable as possible with just any other string type, may this be legacy zero-terminated arrays, the standard library types or just any 3rd-party typed that "smells like a string".
Instead of aiming to offer a next prominent string type to C++ developers, ALib strings strings should try and "hide" themselves. For example, if interface functions of an API accepted string objects as arguments and likewise API functions returned string data, then in both directions arbitrary types of string objects should be able to be passed - preferably without the need of explicit conversion code.
This goal is in perfect alignment with the general design principle of the ALib library to be "least intrusive".

2. Mitigate the "C++ character width problem":

: The width of characters used with ALib strings should be transparent. Only within special code sections, character widths should be needed to be explicitly chosen. As one result, user code should be free of preprocessor directives (code selection) while still compiling on any platform that defaults to any character width.

3. Abandon the use of zero-terminated strings:

: Zero-terminated "c-strings" are often insecure and inefficient. Their use is to be reduced to the required minimum, for example in the context of system API calls. At the same time, their usage should be transparent, safe and convenient.

4. Use of unicode and UTF-encoding

: All string data should be in unicode using UTF-8, UTF-16 or 32-bit encoding. Conversion of strings to a different width should be transparent and rather implicit than explicit.

5. Low and high level string features

: Of course, ALib strings should be "complete" in the sense that they offer all features that a modern character string API usually offers. This final goal was reached with the evolution of the library. Furthermore, in addition to what is provided here, other modules of ALib C++ Library even extend such functionality, especially module ALib BaseCamp.

1.2 Module ALib Characters

The primary goals listed in the previous section are reachable best with the use of "template meta programming". Within this C++ programming paradigm, it is possible to define and use information about C++ types, which generally is called "type traits". With such traits, templated code can be selectively compiled depending on the template types involved.

In earlier versions of this library, type traits that defined the use of built-in and 3rd-party string classes had been introduced along with ALib string types. However, it turned out that there is very good reason to "generalize" and extract the type traits into a separate module which is completely independent of string processing.

Instead of looking on character strings, the traits are rather about "character arrays". The difference lies in the angle of perspective: With ALib, character strings are a higher-level concept than character arrays. Strings may be constructed from character arrays, may export their data as character arrays and interpret or manipulate the array data. Hence, the arrays are seen as the foundational data structure that is used by strings.

With this conceptual distinction, it became possible to separate the definition of type traits to separate module ALib Characters. While this module ALib Strings builds on ALib Characters, there is no dependency in the other direction: module ALib Characters does not "know" about module ALib Strings.

For a thorough understanding of all aspects, reading the Programmer's Manual Of ALib Module Characters before the manual you are currently reading, is of course helpful. But for a normal, straight forward use of the string classes, this is not needed. Therefore, the advice for the reader is to continue reading this manual about strings, and only start investigating into module ALib Characters, when noted in later chapters.

This brief summary of what module ALib Characters offers may suffice for the time being:

Character Types:
- As the width of wide character type wchar_t is compiler dependent, type wchar is introduced. This may be equivalent to wchar_t but may also be one of char16_t or char32_t. With type wchar, the responsibility of what a wide character is removed from the compiler and given to ALib (and its platform defaults and compilation options).
- The "other" wide character type, which has different width than wchar_t is aliased by type xchar.
- Finally, just for completion, an alias type for char is given with nchar.
- Together, this makes three new character types that denote the width: nchar, wchar and xchar.
Logical Types:
- Usually a programmer should not be bothered with choosing character width. Therefore, type character is given as the first "logical" character type. This one is the most important and frequently used.
  Type character either corresponds to type nchar or wchar, again depending on platform and ALib compilation options.
- To explicitly address the non-standard character type, type complementChar is given. This type is equivalent to type nchar if type character is equivalent to type wchar - and vice versa.
- To finalize the set of ALib character types, a logical type name for the "strange", non-standard wide type is given with strangeChar. This is of 2-byte size when type wchar is of 4-byte size and vice versa.
- Together, this makes three logical character types that do not provide any information about the actual width: character, complementChar and strangeChar.
  Similar to the fact that explicit type nchar is always equivalent to built-in type char, the logical type strangeChar is always equivalent to explicit type xchar.
Character Array Type Traits:
The traits, if given for a custom type T, answer the following questions:
- Does type T contain, implement or otherwise represent character array data and does it provide access to that data?
- Should such access granted implicitly, only explicitly or even only with mutable objects of type T?
- May values of type T be constructed from character arrays?
- Is such construction implicitly allowed or only when explicitly expressed?
In addition, similar traits are available that answer the very same questions in respect to zero-terminated character arrays.

1.3 UTF Encoding

The use of UTF encoding was named a "design goal" in section 1.1 Library Design Goals. In fact, this is much more: It is a mandatory constrain that the software process that invokes code of this ALib Module, uses UTF in general. The module has to rely on that fact, because unfortunately most of today's operating systems and system class libraries (that ALib builds on), use a global (process wide) approach with setting configuration parameters of provided character conversion functions like wcsnrtombs or mbsnrtowcs. On GNU/Linux, such settings are made with function setlocale.

Attention: Therefore, it has to be ensured that the process's locale settings are "UTF-8" compatible. If not, ALib may not function properly. With debug-builds, runtime warnings or assertions might be raised. In release compilations, mixing strings of different character width might lead to undefined behavior.

This should not be seen to be a huge restriction, because there are no good reasons for any modern software to use any other character encoding than UTF. However, environment variables (or ALib variable LOCALE in a configuration source) has to be set to a UTF-8 encoding.

2. ALib String Classes

This module provides five different string classes:

String
CString
AString
Substring
LocalString<TCapacity>

A string object's underlying character type is defined using a template parameter named TChar. The different string classes are located in namespace alib::strings and their type names include a prefix letter 'T'. As a result, the list of base classes is:

As it is described in the documentation of outer namespace alib, it is common practice for any ALib Module to define "alias types" of all important classes in that namespace. For each of the string classes, four alias types are defined which are using character types character, nchar, wchar and xchar. With the latter three explicit character types, the alias names replace the prefix letter 'T' by letters 'N', 'W' and 'X'.

As a result, the following table lists all alias names in namespace alib:

String Type/Character Type	character	nchar	wchar	xchar
TString<TChar>	String	NString	WString	XString
TCString<TChar>	CString	NCString	WCString	XCString
TAString<TChar>	AString	NAString	WAString	XAString
TSubstring<TChar>	Substring	NSubstring	WSubstring	XSubstring
TLocalString<TChar,N>	LocalString<N>	NLocalString<N>	WLocalString<N>	XLocalString<N>

Within this manual, most of the time, the simple names like String, CString or AString are used, even when the corresponding templated class is meant. Likewise, if the names are linked, then the link target resolves the template type and not the simple alias. For example, this link: AString, links to class TAString.

The following subsections of this chapter introduce the main string types. This is done without going into the details of each type's functionality but rather by explaining the principal differences of the types.

2.1 Class String

The (only!) advantage of zero-terminated arrays, is that all that is needed to determine a string is a pointer to the start of the array. Otherwise, along with that pointer, the length of the string has to be given.

These two values, the pointer to the first character and the length of the string, are the only two field members of class String. It could be said, that the main purpose of this class is to provide a pair of the two values, which comprise a non-zero-terminated string and hence the type should be considered a "lightweight pointer to constant string data".

The terms "lightweight", "pointer" and "constant data" imply that class String is a simple C++ pod-type, with all benefits like having defaulted copy and move constructors, no destructor and of course no virtual functions.

For example, if a String instance is created and deleted on the stack like this:

    {
        String s= "Hello";
    }

this is the same effort for the compiler (CPU) as creating a character pointer and a simple integral value:

    {
        const char* cp    = "Hello";
        integer     length= 5;
    }

It is important to understand that creating, deleting and copying objects (values) of type String is equivalent to doing the same with objects of type std::pair<const char*, integer>. Likewise, string values can simply be overwritten:

    String s= "Hello";
    s       = "World";
    s       = String( s.Buffer() + s.Length() / 4, s.Length() - s.Length() / 2 );

The last sample shows that reducing the length of the represented string by cutting portions from either the front or the end of the string would be allowed operations: the resulting objects still represents valid string data. However, none of the interface functions of the class changes the pointer to the character array or the string's length. Such modifications are only implemented with derived types introduced later. This way, class String does not only represent a buffer of constant character data, but also the pointer to the buffer and the defined length are constant themselves.

2.2 Class CString

A first type derived of class String is class CString. The name of the class means "C language string": objects of this class represent zero-terminated character arrays.

In all other aspects, this class is the same as its parent.

With this class derived, it becomes obvious why parent class String must not allow operations that reduce the length of the string: The resulting shortened string would not be zero-terminated. In other-words: If class String allowed operations that shortened the represented string, then class CString could not be derived from it.

Class AString introduced in the next section, imposes a similar rationale why operations that cut portions from the front of a String are likewise not allowed.

2.3 Class AString

A second type derived of class String is class AString. The prefix character "A" here simply stands for "ALib". The class implements a "heavy weight" string type, namely one that does not only "represent a string" but actively allocates memory for the string data and manages that resource internally. If a String object gets "assigned" to an object of this type, then the string data is entirely copied into the character array buffer that class AString manages.

Consequently this class provides a huge set of interface functions that allow to modify the contents of the array, and if the content is inserted that exceeds the capacity of the internal buffer, a larger buffer is allocated and that allows storing the concatenated string data.

Cutting data from the end of the string is performed in constant time ("O(1)") as only the value of inherited field Length needs to be decreased. Cutting data from the start of the string is "linear" effort ("O(N)"): The remaining portion of the string is copied to the start of the buffer and the string's length is adjusted.

Note: To finalize a thought from the previous section: The latter now explains why parent class String does not offer an interface that decreases the represented string's length by cutting pieces from the start.

2.4 Class Substring

Finally, a third type derived of class String is implemented with class Substring.

It has in all respects the same properties as its base class String, especially it has the same lightweight nature, it "represents" strings rather than "implementing" those and the string data is represented is constant.

The only difference is that this type allows shortening the represented string and such shorting can be done from both ends in constant time ("O(1)"): Same as with class AString, "removing" data from the end is just about changing inherited field Length. Removing data from the front also decreases the length and in parallel increases the pointer to the start of the array.

An important use case of this type is to "parse" data from a string. Here, an object of this type is created from an "input string" of just any other type and then the string is shortened in a loop. Inside the loop, alternating operations of either parsing numbers, tokens or other values or recognizing and removing delimiters and whitespaces is performed - until the string is empty. The majority of interface methods offered therefore is named with the prefix "Consume". This indicates that not only some "parsing" takes place, but also that the corresponding characters the substring are removed from it.

2.5 Summary Of String Types

In the previous three sections, base class String and three derived types CString, AString and Substring have been introduced.

Note: A fifth type, class LocalString is only introduced in later chapter 5.4 Class LocalString.

With that introduction, it was explained why the base class is limited in respect to changing the string: Simply spoken, derived type CString disallows cutting substrings from the back, because this would result in a non-zero-terminated string and derived type AString disallows cutting substrings from the front, because its simple implementation of the memory management forces the class to copy the remaining string data to the front of the allocated buffer.

All these explanations have been given to make the design rationale of the family of ALib string classes completely transparent and understood. In other, higher level programming languages this all would be an unnecessary complication of things. Even in C++ a more simple to understand and to use design would be possible, for example by using a abstract classes with virtual functions.

The design given here, aims to leverage the speed and efficiency of the C++ language. Once the differences of the string classes is understood, choosing the right type becomes a very clear, unambiguous and straightforward task in any programming situation.

To recap:

Class String:
Lightweight constant pointer to an array of constant characters of constant length.
Used to represent and copy string values, especially as method arguments and return values.
This class is conceptually comparable to type std::string_view, which was introduced by the C++ standard library with version 17.
Class CString:
Same as String but represents zero-terminated character strings.
Class AString:
String buffer class that allows creating new strings and modify them in arbitrary ways.
Used to assemble complex strings for user messages, externalization of data, etc.
This class is conceptually comparable to type std::string, of the standard C++ library.
Class Substring:
Same as String but the pointer to the start of the array may be increased and the length of the string may be decreased. The represented string data is still constant.
Primarily used to interpret string data, aka to "parse" strings.

3. String Construction and Type Conversion

3.1 Construction

3.1.1 String Construction

This module makes use of the "character array traits" defined with dependency module ALib Characters. For the use of the string classes, a developer does not need to know all details of these traits and it is sufficient to understand what is said in the introductory chapter 1.2 Module ALib Characters of this manual.

The following table lists the constructors of class String. All constructors are inline and mostly are compiled in the shortest code possible, which only copies the right values to fields buffer and length.

No	Parameter(s)	Description
1	None	Default constructor, sets field buffer to `nullptr` and length to `0`.
2	`nullptr` (C++ keyword)	Sets field buffer to `nullptr` and length to `0`.
3	`const TChar*`, `integer`	Sets fields buffer and length to the given values.
4	`const` `T& with` `T_CharArray<T>::Access == AccessType::Implicit`	Sets field buffer to the result of `T_CharArray<T>::Buffer(src)` and field length to the result of `T_CharArray<T>::Length(src)`
5	`const` `T& with` `T_CharArray<T>::Access == AccessType::ExplicitOnly`	Same as 4), but defined using keyword `explicit`.
6	`T& with` `T_CharArray<T>::Access == AccessType::MutableOnly`	Same as 4) but using keyword `explicit` and a mutable parameter.

Constructors 4, 5 and 6 are selected by the compiler in the case that an object of template type T is given and an according specialization of type trait struct T_CharArray exists. Each of these constructors implements one the three elements of enumeration AccessType that classify the possible access of the character array data given with type T.

This set of constructors allow very intuitive and convenient construction of ALib strings from 3rd-party string types. Especially the case of implicit construction is interesting: If a method argument is declared as a constant reference type, the C++ compiler will perform one "implicit conversion", if a different type is passed for such argument.
As a sample, we have function foo defined as:

void foo(const String& string )
{
    (void) string; // ... do something with the string
}

With this, an invocation passing just any string type (that allows implicit access) is possible:

// Passing a C++ string literal
foo( A_CHAR("/usr/bin") );
 
// Passing a std::string
std::basic_string<character> stdString( A_CHAR("/usr/bin") );
foo( stdString );
 
// Passing an AString
alib::AString aString( A_CHAR("/usr/bin") );
foo( aString );
 
// Passing a Substring
alib::Substring subString= aString.Substring(0, 4);
foo( subString );

Note: In consideration that the type traits can be defined for arbitrary 3rd-party string types, the implicit string construction is a very remarkable achievement of this approach! What is demonstrated in the sample above constitutes a tool to developers to unify different string types defined in different libraries.; Without this solution, a developer would either need to provide different overloaded versions of a method (which each accepts a different string type), or she would have to place explicit string argument conversion code to each invocation of a method!; Therefore, this feature is in perfect alignment with the primary design goal of this module, which is to "Mitigate the C++ string problem", as well with one of the overall design goals of ALib to be least-intrusive.

3.1.2 CString Construction

The exact same set of constructors that are listed in the table of the previous section for class String, are implemented with class CString. The only difference is that constructors 4 to 6 are testing for a specialization of struct T_ZTCharArray instead of T_CharArray.

Therefore, all that was explained in respect to construction of type String from templated types that represent character arrays, is equivalently true for the construction of type CString from types that represent zero-terminated character arrays!

3.1.3 AString Construction

In contrast to String and CString, type AString does not allow implicit construction. Apart from the move-constructor, all constructors are explicit. This design decision was made because of the heavy-weight nature of the class.

Apart from the need to be explicit, construction of the class is even more flexible than the construction of the lightweight string types: Type traits functor T_Append allows creating string representations for objects of custom types. In addition to the character array types that base class String accepts, these types are accepted by a templated constructor of the class as well. All details about this template struct are given with chapter 5. String Assembly.

See also: Note also paragraph Copy/Move Constructor and Assignment of this class's reference documentation. This provides some rationale for the explicit nature of AString construction.

3.1.4 Substring Construction

Class Substring simply inherits all constructors of its base class String and therefore, all that had been written in previous chapter 3.1.1 String Construction, is true for this class. This includes that the type const alib::Substring& may be used as method arguments to accept any type of string of fitting character size, without explicit conversion.

Note: The rationale for this design decision is as follows: As explained before, class Substring specializes class String by adding features that remove characters from the start and the end of the string. If class String did not play the role of being the base class for types CString and AString, these features could be implemented with class String itself and class Substring would not be even needed. In this respect, class Substring is not a specialization of String but more a "continuation". With that in mind, it makes a lot of sense that all parental constructors are exposed and usable.

3.2 Casting Strings To Other Types

The previous chapter talked about how the different ALib string types are constructed. This chapter now discusses the opposite: the string types implement C++ cast operators that allow to construct values of arbitrary string types from those.

Again, the cast is performed using the type traits defined with dependency module ALib Characters. This time, the value of field Construction of specializations of T_CharArray respectively T_ZTCharArray are tested. Possible values are given with enumeration ConstructionType. With that casting string types to a specific custom type is either not allowed, implicitly allowed or allowed only if explicitly performed.

3.2.1 Casting From String And Substring

Class String implements an implicit cast operator to values of template type T if a specialization of T_CharArray exists that defines field Construction to be ConstructionType::Implicit. Likewise, an explicit operator is available if ConstructionType::ExplicitOnly is given.
Of course, the construction of the casted object is performed by invoking T_CharArray::Construct, passing the string's fields buffer and length.

With the same rationale as given in 3.1.4 Substring Construction, class Substring behaves 100% the same as parent class String in respect to casting options.

3.2.2 Casting From CString

Class CString implements the very same casts operators as class String, with the only difference that TMP struct T_ZTCharArray is used instead of TMP struct T_CharArray.

Note: With the built-in specialization of T_ZTCharArray for C++ type const char* that defines implicit casts, objects of type CString can be passed to "old school" interface methods that expect a zero-terminated character array as an argument, without an explicit cast.

3.2.3 Casting From AString

Class AString implements each of the cast methods that are provided with class String and CString. This is due to the fact that the class always reserves space in the allocated buffer for a terminating character. This way, for the preparation for casting to an arbitrary zero-terminated array type is performed in constant time, as no string data has to be moved to a newly allocated buffer.

The four casts methods makes this class the most flexible of the ALib string types, in respect to implicitly or explicitly creating external character array types.

3.2.4 Suppressing Casts

Casts, especially implicit ones, in some situations may impose ambiguities, which lead to compilation failures. To mitigate such, the implementations of the implicit casts of all three classes String, CString and AString are conditionally selected by the compiler using TMP struct T_SuppressAutoCast.

ALib specializes this struct to prevent the casting of AString objects to types String and CString, which the type traits T_CharArray and T_ZTCharArray of course, would indicate to be allowed. This is ambiguous in respect to the implicit construction that is also allowed.

Custom specializations should only be needed in similar situations, where a custom string type allows auto-casts based on the type traits provided by ALib.

3.3 Built-In Conversions

This module ALib Strings is not "responsible" to define the built-in conversion rules for C++ and 3rd-party types, because in-fact these rules are defined already with the specializations of the TMP structs T_CharArray and T_ZTCharArray given in dependency module ALib Characters.

While these specializations are described in the corresponding Programmer's Manual section 4. Built-In Character Array Traits of that module, only a summary the rules from the perspective of ALib string classes are given here.

Fixed-length Character Arrays:

Implicit construction of String objects.
Implicit construction of CString objects (because of string literals being fixed-length arrays).
No casts from any ALib type.

const TChar*:

Implicit construction of String and CString because constant character pointers are considered zero-terminated (design decision along C++ language standards).
Explicit casts from String objects: a programmer that converts a String to this type needs to be sure that either the originating string is zero-terminated or the converted pointer is not expected to point to a zero-terminated string.
Implicit casts from CString and AString.

TChar*:
In general this library considers mutable character pointers a "dubious" type and unlike their constant counterparts, arrays pointed to by this type are not considered zero-terminated. Therefore all conversion functions are explicit.

std::string_view:

Implicit construction of String objects.
Explicit construction of CString objects.
Implicit cast from String and CString objects because of the lightweight nature of the type.

std::string:

Implicit construction of String objects.
Implicit construction of CString objects, because accessing the internal buffer automatically zero-terminates it.
Implicit cast from String and CString although heap-memory allocation and the copying of string data is involved. The rationale for this decision lies in technical reasons, as explained here.

std::vector<TChar>:

Implicit construction of String objects.
Explicit construction of CString objects.
Explicit cast from String and CString objects because of the heap-memory allocation and data copy involved.

QStringView:

Implicit construction of String objects.
Explicit construction of CString objects.
Implicit casts from String.
Implicit casts from CString.

QString:

Implicit construction of String objects.
Implicit construction of CString objects.
Explicit casts from String.
Explicit casts from CString.

QLatin1String:

Implicit construction of String objects.
Explicit construction of CString objects.
Implicit casts from String.
Implicit casts from CString.

QByteArray:

Implicit construction of String objects.
Explicit construction of CString objects.
Explicit casts from String.
Explicit casts from CString.

QVector<uint>:

Implicit construction of String objects.
Explicit construction of CString objects.
Explicit casts from String.
Explicit casts from CString.

3.4 Adopting 3rd-Party String Types

In the previous sections a quite remarkable and unique feature of this module, namely the possibility of (implicit) conversions of arbitrary C++ string types to and from ALib string types, has been described. These features contribute fundamentally to a major design goal of this module, by relieving a programmer from the burden to convert string types when mixing libraries that expect different strings.

Note: This is true at least for the case that the string types that become mixed are based on the same character type. In later chapters of this manual, further tool types are introduced, which in addition mitigate the problem of necessary string conversions if different character widths are involved.

With the previous descriptions it has been mentioned that the documentation of dependency module ALib Characters is not required to be read if ALib string types are to be just used.

To adopt custom string types to become "compatible" with ALib strings all that has to be done is to specialize type-traits struct T_CharArray and, in the case that a type represents zero-terminated strings, also struct T_ZTCharArray. While this is done with only a few lines of code, still it is advised to start reading the Programmer's Manual of module ALib Characters. If not from the beginning then at least chapter 4. Character Arrays. Together with the information provided in the previous sections of this manual, the complete picture should be given and the adoption of own types be a straight forward task.

In addition header files

"alib/compatibility/chararray_std.hpp" and
"alib/compatibility/chararray_qt.hpp"

can be used as a good template to use for the adoption of own string types.

Note: The approach taken here is suitable only for types that are something very close to a string type. While the concept might be "misused" to implement a sort of "ToString()" function for custom types, this is not recommended. For the latter, the suitable mechanism is provided with "appending objects to type AString", which is described in chapter 5. String Assembly.

4. Non-Zero-Terminated String Detection

4.1 Ambiguities With Overloaded Functions

Implicit string construction as discussed in the previous chapter allows creating method interfaces that accept "arbitrary" custom string types. It was explained that type traits T_CharArray and T_ZTCharArray might be specialized for custom types and with that string classes String and CString might be created implicitly from objects of those.

With these two types given, it is not possible to create an API interface that clearly separates between custom types that are zero-terminated and those that are not. This problem is best explained with a sample.

Imagine a namespace function called IsDirectory that should accept a constant directory path string and should return true if the argument represents an existing directory in the filesystem and false if not. The function declaration would be like this:

        bool  IsDirectory(const String& path);

Now, many actual implementations of the function (for example on the GNU/Linux operating system), would need to pass a zero-terminated string to a corresponding operating system call. To create that, the accepted string argument is needed to be copied to a buffer that can be terminated. This effort is redundant if a user invoked the function like this:

        auto result= IsDirectory( "/usr/bin" )

because the string literal given is already zero-terminated. To avoid this, an overloaded function definition could fetch zero-terminated strings and pass those without the copy and termination overhead:

        bool  IsDirectory(const CString& path);

But with these two methods in place, the compiler complains about an ambiguity as soon as zero-terminated string types are passed. The reason for this is simply because the normal string type String can be implicitly constructed from zero-terminated string types as well.

4.2 Class StringNZT

As a way out of the ambiguity described in the previous section, class StringNZT is given with the library. The "NZT" suffix stands for "non-zero-terminated". The type extends class String and all it does is to deny implicit construction by objects of types that would likewise construct type CString.

With that, the two overloaded namespace functions:

    bool  IsDirectory( const StringNZT&  path );
    bool  IsDirectory( const CString&    path );

are not ambiguous. The first function's implementation would usually copy and terminate the given non-terminated string, for example by just creating an AString object from the given non-zero-terminated string. Then it would invoke the second method passing the AString, which becomes zero-terminated on the fly when converted to CString.

4.3 Summary

The following bullets summarize and refine what was sampled in this chapter:

Class StringNZT can be constructed from types with corresponding specialization of TMP struct T_CharArray, but only if complementary struct T_ZTCharArray is not specialized in parallel.
This is in contrast to its base class String which constructs if either of the type traits is given.
Therefore, offering StringNZT and alternatively CString in two overloaded functions, avoids ambiguities and allow explicit treatment of zero-terminated and non-zero-terminated strings.
The use of class StringNZT should be limited to this and similar use case.
Consequently, the existence of an interface method using type StringNZT for an argument type indicates the existence of an overloaded alternative using CString.

Finally it should be mentioned that the use of zero-terminated strings is not recommended. ALib itself does that only in very specific situations. An example is class Path. The class interfaces with the operating system that expects zero-terminated strings, like it was sampled in the previous section.

5. String Assembly

Often, software needs to assemble strings. May it be human-readable text, data serialization or for the implementation of communication protocols. For that, a string type is needed that manages a data buffer and provides interface methods that allow the concatenation of data to existing strings. Furthermore typical methods like searching and replacing substrings, letter case conversion, etc. has to be offered.

As already introduced, for this purpose class AString is provided with this module. Therefore, this chapter dedicated to the topic of string assembly is mostly a chapter about class AString.

5.1 Appending Custom Types

In the previous chapters of this manual it was explained how the lightweight ALib string types String, CString and Substring are constructable using values of C++ types which are equipped with "character array traits". Those traits are nothing else but meta-information about these types which is provided by corresponding specializations of templated structs T_CharArray and T_ZTCharArray. The character array type traits are introduced with module ALib Characters.

Some high level object-oriented programming languages offer a root class which provides a common interface for just any derived type and such interface may contain a method that creates a string representation from an instance. For example, the JAVA language defines class Object which provides method toString() for such purpose.

The two concepts (ALib character array traits and the Object.toString() method of Java) are fundamentally different: Character array traits are meant to be given for types whose main purpose is to represent or implement character arrays, while the toString() method may be implemented for just any type.

Class AString, which is designed to support the assembly of strings, offers a feature that much more corresponds the toString() concept. Again, type traits are used, this time not for accessing (existing) character array data, but for appending a string representation of any object to an AString.

5.1.2 Type Traits Functor T_Append

Type traits "functor" T_Append<TAppendable,TChar,TAllocator> by default is empty. To allow the creation of a string representation of objects of a custom type TAppendable, a specialization of the struct has to be defined that implements method T_Append::operator()(TAString<TChar>&.

Besides specifying the type that is adopted with template type TAppendable, the character type TChar of the destination AString object may be given with a specialization. If omitted, it defaults to type character.

As the name of functor T_Append suggests, the implementation of the operator usually appends a string representation of the object given with parameter src to the AString given with parameter target. Nevertheless, an implementation is free to modify the given AString in any way. For example, built-in type Format::Escape searches and replaces "escape-characters" when "appended" to an AString!

5.1.3 Method AString::Append (And Aliases)

Once type-traits functor T_Append<TAppendable,TChar,TAllocator> is specialized for a type TAppendable, objects of that type may be appended to objects of TAString<TChar,TAllocator>. This can be done using the following methods:

AString(const TAppendable&) (A constructor taking the appendable type)
Append(const TAppendable&)
_(const TAppendable&) (A method named solely "_". Provided for compatibility with JAVA and C# versions of ALib.)
operator<<(const TAppendable&)

Methods Append and '_', as well as operator '<<', each return a reference to the AString that they were invoked on. This allows concatenated calls, like in:

    AString aString;
    aString << "The result is: " << 42;

5.1.4 Built-In Appendable Types

The specializations of functor T_Append that come with the ALib library can be grouped into four areas:

1. Fundamental C++ Types:
Specializations for all fundamental C++ types like int, double, etc. are provided. No special header file has to be included for this. The specialization is available with the inclusion of header file alib/strings/astring.hpp.

2. Class Format And Its Inner Types:
Class Format is provided which allows formatting numbers. In addition, the class has a list of inner types that implement some specific simple format operations. These inner types are: Tab, Field, Escape, Bin, Hex and Oct.

Class Format as well as its inner types are "lightweight" and are supposed to be created locally with the invocation of the append-methods. As a quick example, the use of Format::Field should be showcased:

AString centered;
centered << '*' << Format::Field( "Hello", 15, lang::Alignment::Center ) << '*';
cout << centered << endl;

The code above which produces the following output:

*     Hello     *

Class Format is included implicitly with the inclusion of header file alib/strings/astring.hpp.

3. Other ALib Types:
For various types found in other ALib Modules, specializations of T_Append are provided.
All elements of important enum types are appendable, as soon as

    #include "alib/enums/serialization.hpp"

is stated in the compilation unit. For more information, see section 4.3.1 Serialization/Deserialization of the Programmer's Manual of module ALib Enums.

4. 3rd-Party Types:
In source folder alib/compatibility some special header files are provided that contain specializations of T_Append for type of the C++ standard library (namespace std) as well of types of 3rd-party libraries.

Note: While the C++ language demands to implement specializations of templated structs within the namespace that the original struct was defined in, the reference documentation "fakes" these specializations into the (otherwise non-existent!) inner namespace alib::strings::APPENDABLES. Other ALib modules do the same documentation trick, and hence all specializations of T_Append (of all four areas described above) can be found with the reference documentation of that namespace (and inner namespaces).

5.1.5 Sample Implementation

The following code snippet demonstrates how to implement the specialization of functor T_Append for internal ALib class DateTime to print out a formatted date:

#include "alib/strings/astring.hpp"
#include "alib/time/datetime.hpp"
#include "alib/lang/system/calendar.hpp"
 
namespace alib::strings {
 
    template<> struct T_Append<alib::time::DateTime, character, lang::HeapAllocator>
    {
        void operator()( AString& target, const alib::time::DateTime& appendable )
        {
            alib::CalendarDateTime calendarTime;
            calendarTime.Set( appendable, lang::Timezone::UTC );
            calendarTime.Format( A_CHAR("yyyy-MM-dd HH:mm"), target );
        }
    };
}

With this definition included, a code unit might now append DateTime objects to strings:

AString sample;
sample << "Execution Time: " <<  DateTime();
cout << sample << endl;

The output would be for example:

Execution Time: 2024-12-15 10:41

The following macros are provided to simplify the specialization of T_Append and make the code more readable:

5.2 Construction Of AStrings

Class AString hides all parent constructors and offers re-implementations that rather copy the data that is passed. Consequently - as this copying is not considered a lightweight operation - all constructors are explicit. By the same token, the assignment operator is not applicable with initializations as well.
The following code will not compile:

AString as= "This will not compile";

Instead, explicit construction has to be chosen, as shown here:

AString as("This will compile");
 
// or alternatively
AString as;
as= "This will compile";

As already noticed in chapter 5.1 Appending Custom Types, with templated constructor AString(const TAppendable&), class AString accepts any type of object that a specialization of functor T_Append exists for. This makes construction very flexible.

Copy constructor, move constructor and move assignment are well defined, which allows AString objects to be used (as efficiently as possible) as value types in containers of the standard library, for example as in std::vector<AString>.

5.3 Buffer Management

As mentioned before, class AString provides logic to manage its own buffer. During the assembly of strings, the buffer "automatically" grows as needed. If a certain minimum size can be foreseen as a result of a string assembly, before performing the assembly operations, the necessary buffer size might be reserved by invoking method SetBuffer(integer). This avoids the automatic growth process which may take place in several steps and each steps may involve to copy the current buffer to a new memory location.

Once grown, the allocated buffer size is never reduced, unless method SetBuffer(integer) is explicitly invoked providing a smaller size than currently allocated.

Besides this internal, automatic memory allocation, the class can also work on external buffers. For this, overloaded method TAString::SetBuffer. allows providing such external memory. The life-cycle of an external buffer is not bound to the life-cycle of the AString object itself. At the moment that the size of an external buffer is not sufficient to allow a requested extension of the managed string, the class replaces the external buffer by a larger, self-managed one.

For details on using external buffers, see the reference documentation of overloaded method TAString::SetBuffer. Class LocalString, which is discussed in the next section, makes use of this feature and provides the possibility to have local (stack based) allocations of strings.

5.4 Class LocalString

Template class LocalString<TChar, TCapacity>, derived from class AString uses an internal character array of a length specified by template parameter TCapacity to store the string data. During construction, the memory address of this character array member is passed to method TAString::SetBuffer. The huge benefit of using the class lies in performance: The performance impact of heap allocations is often underestimated by software developers. Therefore, for local string operations with foreseeable maximum string buffer sizes, class LocalString should be considered as a faster alternative of class AString.

5.4.1 Exceeding the Buffer Capacity

Although the internal buffer size is fixed at compile-time and hence cannot be expanded, a user of the class must not fear 'buffer overflows'. If the internal buffer capacity is exceeded, a new buffer from the free memory (aka 'heap') will be allocated.

With debug-builds of ALib, parent class AString provides a warning mechanism that allows the easy detection of such (probably unwanted) replacements of the local buffer. There are two scenarios how this mechanism might be used during development:

If the buffer should never be replaced, the capacity of a LocalString has to be increased step-by-step (during the software development/testing cycle) at the moment the warning is issued. This has to be done, until the member-buffer is huge enough and no more warning is raised.
If it is OK that the buffer is replaced "every once in a while" because special situations with higher capacity requirements may well occur but are still rather seldom, then the warning should be switched off for the specific instance. By switching the warning off, a developer places the information in the code that the internal buffer size might be too small in some occasions. Having this explicit information, helps to understand the intentions of the software developer.

If the latter case applies, then the warning can be disabled using inherited method DbgDisableBufferReplacementWarning. This inline method is empty in release-compilations and this way optimized out by the compiler.

5.4.2 Implicit construction

While class AString (as noted above) does not provide implicit construction, class LocalString re-implements the common constructors of AString and exposes them as implicit. The rationale here is that although the data is copied (which might not be a very lightweight task), still the performance impact is far less compared to constructing an AString that uses a heap-allocated buffer. The design decision behind that takes into account that a LocalString copies an argument to its local buffer without the explicit exposure of this operation.

The following method, as a sample, takes three different ALib string types as parameters:

void TakeStrings( const String& s1, const AString& s2, const String64 s3 )

The following code will not compile:

TakeStrings( A_CHAR("Str1"), A_CHAR("Str2"), A_CHAR("Str3") ); // Error, AString not implicitly constructable

Class AString has to be explicitly created, the others don't:

TakeStrings( A_CHAR("Str1"), AString(A_CHAR("Str2")), A_CHAR("Str3") ); // OK, AString explicit, String and LocalString implicit

In addition, besides having implicit construction, the default assign operator is defined as well with LocalString. This allows using objects of this type as class members that are initialized within the class declaration as shown here:

class MyClass
{
    LocalString<20> name=  A_CHAR("(none)");
};

Such members are not allowed to be initialized in the declaration if their type is AString.

5.4.3 No Move Constructor

Class LocalString provides no move constructor and thus is very inefficient in scenarios where objects of the class could rather be moved than copied. Consequently such situations are to be avoided. The use of LocalString should instead be very determined and it should not be subject to copy and move operations.

5.4.4 Aliases For Frequently Used Sizes

Within namespace alib, some convenient alias type definitions are available that define local strings of frequently uses sizes:

String8, String16, String32,String64,String128,String256,String512, String1K, String2K, String4K,
NString8,NString16, NString32,NString64,NString128,NString256,NString512, NString1K, NString2K, NString4K, and
WString8,WString16, WString32,WString64,WString128,WString256,WString512, WString1K, WString2K, WString4K.

6. Other Aspects Of ALib String Types

6.1 Nulled Strings

6.1.1 Nulled Vs. Empty Strings

An important aspect of the family of string types provided by this module and library, is concept of "nullable" strings. An object of base class String is nulled, when constructed:

with keyword nullptr, or
with a likewise nulled object of character array type.

Note: The default constructor is defaulted and leaves a String's members undefined!

An existing string can be set to nulled state, by assigning keyword nullptr or another nulled object of character array type.
Precisely, a string is nulled, when the internal pointer to the character array evaluates to nullptr.

The concept of nullable strings differs from the concept of having empty strings. The latter refers to string objects of zero length.

While nulled strings are always also empty (hence have length of zero). The other way round, empty strings are not necessarily nulled. An empty string that is not nulled does not equal an empty string that is nulled.

Inline methods IsNull, IsNotNull, IsEmpty and IsNotEmpty of base class String test strings objects for being nulled or empty.

The following code runs fine (with no assertion):

String nulled(nullptr);       // constructs a nulled string
String empty( A_CHAR("") );   // constructs an empty but not nulled string
 
assert(  nulled.IsNull()    );
assert(  nulled.IsEmpty()   );
assert(  empty.IsNotNull()  );
assert(  empty.IsEmpty()    );
 
assert(  nulled != empty    );

Especially the last line of this code is important to understand: a nulled string is different from an empty string.

6.1.2 Nulled AStrings

The concept of having nulled strings is equally available with derived string type AString: An object of type AString is nulled when no internal buffer is allocated and likewise no external buffer is set.

If default constructed, constructed with zero size, with keyword nullptr or any other nulled string, no buffer is created. Consequently, it makes a difference if an AString is constructed using AString() or AString("").

Note: This is a difference to standard C++ class std::string, which always allocates a buffer and thus does not support a nulled state.

The allocated buffer of a non-nulled AString can be disposed by invoking SetBuffer(0) or by invoking SetNull on the instance.

To make this more clear, note the following sample code which does not throw an assertion:

// Default constructor does not allocate a buffer, yet. The instance is "nulled".
AString aString;
assert(  aString.IsNull()                 );
assert(  aString == NULL_STRING          );
assert(  aString.IsEmpty()                );
assert(  aString != EMPTY_STRING         );
 
// Append an empty string. This allocates a buffer. Now the AString is not nulled any more.
aString << "";
assert(  aString.IsNotNull()              );
assert(  aString != NULL_STRING          );
assert(  aString.IsEmpty()                );
assert(  aString == EMPTY_STRING         );
 
// Append something.  Now the AString is not nulled and not empty.
aString << "ABC";
assert(  aString.IsNotNull()              );
assert(  aString != NULL_STRING          );
assert(  aString.IsNotEmpty()             );
assert(  aString != EMPTY_STRING         );
 
// Clear the contents
aString.Reset();
assert(  aString.IsNotNull()              );
assert(  aString != NULL_STRING          );
assert(  aString.IsEmpty()                );
assert(  aString == EMPTY_STRING         );
 
// Set nulled: disposes the allocated buffer. A seldom use case!
aString.SetNull();
assert(  aString.IsNull()                 );
assert(  aString == NULL_STRING          );
assert(  aString.IsEmpty()                );
assert(  aString != EMPTY_STRING         );

Note: Unlike lightweight type String which allows the assignment of nullptr to set the string to nulled state, class AString does not support any assignment operator, but the C++ copy assignment.
To remove an existing buffer from an AString and this way to set a non-nulled instance back to nulled state, is a rare and unusual use case. The code above is rather provided for demonstration and completeness.

6.1.3 Pros And Cons

What was said in the previous two sections might not need any further explanation and experienced programmers might skip to the next chapter. However, because of the fact that many string types of other libraries behave differently, some further notes should be given:

The fact that string objects can be nulled allows "transporting" a piece of information along with the string that can be used in APIs. For example, if a method should receive a string object according to a key-property, a nulled result may indicate that no data existed to the given key. This is in contrast to returning an empty string, which indicates that data was found, but that the result just is an empty string. If ALib strings types were not nullable and in this sample empty strings should be allowed as a valid answer, a second return value had to be defined for the API function that indicates if a string existed for a given key-property. Such API design paradigm is used frequently across various ALib Modules.

On the other hand, when string values are used as input data, some caution has to be taken to ensure that method invocations on a given input string is even allowed. Some methods may produce undefined behavior when invoked on nulled string objects.

To maximize code performance, explicit tests for nulled strings should be avoided if not necessary, which sometimes can be an obligation to the programmer that uses the string types. More on this topic is given in the next section.

6.2 "Non-Checking" Methods Of ALib String Classes

Several of the methods found in the different string classes of ALib are templated with a boolean template parameter named TCheck. This template parameter is defaulted with the tag-type CHK which hides the whole concept it in "normal" code. Consider the following snippet:

void parse( NSubstring line )
{
    constexpr NString startToken= "<start>";
    integer idx= line.IndexOf( startToken );
    if( idx >= 0 )
    {
        line.ConsumeChars( idx + startToken.Length() );
        //...

Two string methods are used in this code sample: TString::IndexOf and Substring::ConsumeChars. Both methods support templated parameter TCheck! The following code provides the parameter in its default value, and hence for the compiler is equivalent to the previous snippet:

void parse( NSubstring line )
{
    constexpr NString startToken= "<start>";
    integer idx= line.IndexOf<CHK>( startToken );            // <-- Explicit invocation performing checks
    if( idx >= 0 )
    {
        line.ConsumeChars<CHK>( idx + startToken.Length() ); // <-- Explicit invocation performing checks
        //...

The exact impact of the value of template parameter TCheck is documented with each function that supports it. In general, with CHK, the string object that a method is invoked on is checked, for example, for not being nulled. Furthermore the parameters given are checked, for example, to not being nulled, to be in valid ranges, and so on.

In the sampled case of method TString::IndexOf, the documentation tells us that

parameter needle must not be empty.
Parameter startIdx must be in the range of 0 and the string's length minus the needle's length.

The latter cannot be guaranteed for the sample's method argument line and therefore the check has to be performed. As a side effect, this check implicitly tests for a given nulled string, because in the case that the given string is shorter than the token "<start>", the method returns -1. This way, no user code for checking the input argument is needed in this sample code.

The implementation of method ConsumeChars by default checks if the string is long enough to cut the given number of characters from the front. In other words, it tests whether parameter regionLength is in the range of zero and the length of the string. Obviously, this check is redundant in this sample. The method is invoked only if method IndexOf had found the token "<start>" in the string!
To avoid the redundant check, for this invocation the non-checking version of method ConsumeChars may be used by providing false for the template value:

    if( idx >= 0 )
    {
        line.ConsumeChars<NC>( idx + startToken.Length() ); // Non-checking invocation
        //...

The obvious goal of using non-checking method versions lies in avoiding redundant code, hence to reduce code size and improve execution performance. As a majority of string methods are inlined, the C++ compiler often is able to detect and remove redundant checks on its own. In these cases, the use of the non-checking version of a method has no effect in optimized release compilations.
However, there are many occasions where the compiler is lacking information on the state of variables that a programmer might know about and then, non-checking versions might have a huge impact when used in loops and other critical code sections. Also, in the sample above, it is very doubtful that any of today's C++ compilers "knows" what it needs to know to optimize the redundant checks out.

So, what is that "something" that we phrased as "a programmer knows" and a "compiler does not know" above? In computer science, such information is referred to as "invariants". Usually, invariants are used to prove the correctness of algorithms. Invariants are expressions on variables that always evaluate true when program execution hits a specific line of code.
In the sample above, the relevant invariant that allows us to use the non-checking version of method ConsumeChars, could be phrased as:

    The length of string "line" is as least as long as "idx" plus the length of token "<start>".

Now, by using the non-checking version and appending "<NC>" to the method invocation, not only do we help the compiler to create shorter and faster code, we also put information about the invariant into the code. And this is a benefit that should not be under-estimated! By just looking at this single code line:

        myString.ConsumeChars<NC>( 5 );

a reader understands that string myString is at least 5 characters long. This is valuable information that a reader otherwise found out only by inspecting the context of the code line, which sometimes may become a quite complex task. From here, one could easily conclude that after this code line, an invariant for variable myString would be

        myString may be empty but is not nulled

To conclude this chapter, it has to be mentioned that in debug-compilations of the library, the non-checking versions of the code still implement checks! Exactly these conditions that are documented to be checked in the regular method versions are checked. If the check fails, debug assertions are raised by the non-checking method versions. This approach and the concept of invariants go along very well: If an invariant is false, the algorithm is considered wrong, and the code asserts.
In release compilations, invoking non-checking method versions with a breach of a corresponding invariant leads to undefined behavior (probably a process crash).

Note: In the C# and Java versions of ALib, where such template methods are not available, still some non-checking methods are provided, but less than in the C++ implementation. In these languages, some methods exist twice with the same method base name and the non-checking version named with suffix "_NC".

6.3 String Constants

6.3.1 NULL_STRING

With the inclusion of the header file alib/strings/string.hpp, the following constexpr variables are defined in namespace alib:

Each simply represents a nulled, respectively an empty string. The rationale for the provision of the nulled versions is purely to increase the readability of the source code. The following lines of code are equivalent in all respects:

    String  myString;
    String  myString= nullptr;
    String  myString= NULL_STRING;

With variable EMPTY_STRING and its siblings things are a little more complicated: Here the right C++ string literal has to be chosen. This is achieved with the template type TT_StringConstants and its specializations for character types nchar, wchar, and xchar. If a user of this library writes entities that are templated on the character type, then the use of this helper-struct is advised.

6.3.2 CString Constants

With the inclusion of the header file alib/strings/cstring.hpp, templated helper-struct TT_CStringConstants is defined, which provides static constexpr methods for a few frequently used string constants.

While the methods can be explicitly accessed by providing the templated character type, in addition, for each six character types a corresponding is variable given in namespace alib. For example, for member method TT_CStringConstants<TChar>::DefaultWhitespaces, corresponding variables

are defined.

Same as with helper-struct TT_StringConstants introduced in the previous chapter, if a user of the library writes entities that are templated on the character type, the use of helper-struct TT_StringCConstants is advised.

6.4 Debugging Strings

In some situations additional debug checking is helpful when working with ALib strings. Among such situations are:

Development of the library module itself.
Development of types derived from ALib string types.
Specializing template struct T_CharArray or functor T_Append to add support for user-defined string types or append operations to class AString.
External manipulation of AString buffer retrieved with method AString::VBuffer.
Provision of external data buffers to class AString.

In these and similar situations, it may be helpful to define preprocessor symbol ALIB_DEBUG_STRINGS. This symbol enables internal consistency checks with almost any method invoked on string types. By default this feature is disabled, as it consumes quite a lot of run-time performance. When string debugging is enabled, macro ALIB_STRING_DBG_CHK can be used to check the consistency of ALib string classes.

With string debugging, the string buffer allocated by class AString is extended by 32 characters, 16 characters at the front and 16 characters at the end. A "magic" number is written in this padding memory and accidental (illegal) write operations across the borders of the allocated space is detected.

Therefore, code that:

Uses method AString::SetBuffer to set an external buffer, and
transfers responsibility to ALib by setting parameter responsibility of that method,

has to allocate the buffer passed accordingly. This means, the buffer has to be 32 characters larger than specified and the starting address of the heap allocation has to be 16 characters before what parameter extBuffer points to.
Such external buffer allocation should therefore be conditionally implemented using code selection symbol ALIB_DEBUG_STRINGS.

Further details of the built-in debug mechanisms are not documented. Please refer to the source code of the ALib string classes, especially by investigating to code locations that use selection symbol ALIB_DEBUG_STRINGS.

6.5 Signed String Length

The string types introduced with this module are using type integer to store the string's length. This is a signed type - in contrast to what the C++ standard library suggests by using type size_t for the length of type std::string!

There are very good reasons to consider this as a wrong design decision. Negative string length are impossible and thus, this is an artificial, non-necessary restriction, because ALib strings cannot be longer than only half of the virtually addressable memory (on standard hardware).

Honestly, the main argument for taking this restriction into account, is to avoid a lot of clutter code when it comes to subtraction of string length values. ALib compiles with the almost all reasonable compiler warnings enabled. Being signed, many static casts for converting signed and unsigned integral values would be needed to avoid warnings. This would not only be true in the library code itself, but with all code that uses the strings and that also uses a similar restrictive warning policy with compilation.

However, besides this confession of a certain level of laziness, there is also a true benefit in this decision: Types derived from class String may use this unused sign bit, to encode a binary piece of information in it. As a sample, ALib class AString leverages this option already: The information if a currently used buffer is of external or internal allocation is determined by storing a positive or negative value in the likewise signed field capacity. This way, no additional boolean value is needed, which of course reduces the memory footprint of the class.

7. Strings And Character Widths

7.1 String Literals

As elaborated in the introductory chapter 1.2 Module ALib Characters, in respect to character type definitions and character array traits, this module completely relies on module ALib Characters. While all string classes are templated, the character types that are used by the template instantiations are all defined in this underlying module.

The alias types for each string class (defined in namespace alib) enumerate all possible types by adding a prefix character or word, for example NString, WString or ComplementString. The aliases without any prefix; like String, Substring or AString use the width of the generic and "agnostic" type character.

Now, when using string literals, the following code is not platform agnostic::

    String myString= "Hello World";

While it might compile on some platforms or with the right compiler symbols for ALib in place, in the case that type character is a wide type, a compilation error is generated. Therefore, all non-narrow string literals need to be given by using a corresponding macro. The set of macros are also provided with the underlying module ALib Characters. The "agnostic" macro needed in the sample above is simply A_CHAR:

    String myString= A_CHAR( "Hello World" );

As long as only strings of standard width are used, all that is needed to know is that each and every C++ string literal needs to be enclosed in this macro.

Further macros that define string literals of specific width are given with

See also

For more information on the macros, please consult chapter 3.3 Character And String Literals of the Programmer's Manual of module ALib Characters.
For information about how to change the default character width, see complete chapter 3. Type Selection And Character Literals.
To really grasp all aspects of ALib characters and strings, of course both Programmer's Manuals should be reviewed. This might be a good point in time to do this.

7.2 Platform-Independent Conversion

Sometimes a code unit expects a string of a defined width and has to handle strings of logical types, or vice versa. For example, if an interface method accepts standard string type, while internally narrow strings are used.

In such situations, the straight forward approach to this could be to use code selector symbol ALIB_CHARACTERS_WIDE and provide two different code versions.

To avoid this, the following macros are provided:

In principal the macros define a new identifier, which in the case that a conversion is needed, uses a local string where the source string is appended, while in the case that the character widths are equal, a simple reference to the given type is created. The latter will be optimized out by a C++ compiler and thus, no performance penalty occurs.

For details, consult the reference documentation of the macros.

8. String Utility Classes

This user manual concentrates on the general and fundamental aspects of the string types provided by this module.

There is a whole list of utility types available with this module that are not covered by this manual. Instead, for those types an adequate and complete introduction and description is provided with the reference documentation of each. The types for example implement token parsing, a wildcard and regular-expression matcher. To separate the fundamental string types from the utility classes, a dedicated inner namespace "util" is defined where these classes are grouped.

To investigate into the functionality and tools offered in the area of string handling, please consult to the class list provided in the reference documentation of inner namespace alib::strings::util.

9. String Formatting

Almost any standard library of modern programming languages provide functionality that allow to format a list of variadic arguments along the lines of a format string that follows a certain "placeholder syntax". The most prominent sample is the good old printf function of the standard C library.

ALib offers mechanics to define and process variadic argument lists in a type-safe fashion with its module ALib Boxing. Now, to keep module ALib Strings independent of module ALib Boxing, formatting features as described above have been placed in a separated module, namely ALib BaseCamp. With that, a powerful implementation of formatting tools, is provided. These are even supporting different standards of a format string's placeholder syntax, namely printf and Java style as well as Python style.

Table of Contents