ALib C++ Library
Library Version: 2312 R0
Documentation generated by doxygen
ALib Module Text - Programmer's Manual

1. Introduction

This ALib Module provides string formatting facilities by implementing an approach that is common to many programming languages and libraries. This approach offers an interface that includes the use of a "format string" containing placeholders. Besides this format string, a list of data values can be given, used to fill the placeholders.

Probably one of best known samples of such an interface is the printf method of the C Language. A variation of this interface is found in almost any high-level, general purpose programing language.

Of-course, this module leverages module ALib Strings for all general string functions needed. Similar important is the use of module ALib Boxing, which brings type-safe variadic argument lists and allows with its feature of having "virtual functions" on boxed arguments, to have custom formatting syntax for placeholders of custom argument type.

While it is possible to implement a formatter providing a custom placeholder syntax, two very prominent ones are built-in with formatters:

  • FormatterJavaStyle
    Implements the syntax provided with the formatter included with the core class libraries of the JAVA programming language. This syntax is an extension of the good old printf format string style.
  • FormatterPythonStyle
    Implements the syntax provided with the formatter included with the core class libraries of the Python programming language. This syntax is very powerful and flexible in respect to the provision of syntax extensions for custom types.
    Over time, this formatting syntax became the preferred syntax within ALib itself and we have extended the syntax even in some respects in comparison to the original definition.

Another good news is that in its very basics, Python Style is similar to .Net formatting. This way, there is some "familiar basic syntax" available for everybody that has used formatting in one of the languages C, C++, C#, Java or Python and in languages that have also mimicked one of these styles!

2. Using The Formatters

By leveraging module ALib Boxing, which implies the use of variadic template arguments, the invocation of the final format method is as simple as it is possible. The following samples a simple format action with each of the two built-in formatters:

AString target;
FormatterJavaStyle() .Format( target, "The result is %s!\n", 6 * 7 );
FormatterPythonStyle().Format( target, "The result is {}!\n", 6 * 7 );
cout << target;

This produces the following result:

The result is 42!
The result is 42!

Values of or pointers to any type that is "boxable" may be passed as an argument to method Formatter::Format. The specific implementation of the formatter will match the "placeholder type" with the given argument type and format the argument according to the placeholder attributes.

2.1 Concatenated Formatters

In the sample above, two different formatters are created and each is used "properly", namely with its according syntax.
To increase flexibility, the formatters of this ALib Module provide two features:

  • Formatters can be concatenated
  • Formatters detect format strings and on failure, pass processing to concatenated formatter.

With that information, the following code can be written:

AString target;
// create two formatters and concatenate them
formatter.Next.reset( new FormatterPythonStyle() );
// both formats string syntax versions may be used now the first formatter.
formatter.Format( target, "%s style\n", "Java" );
formatter.Format( target, "{} style\n", "Python" );
cout << target;
Note
While the first formatter is a simple local object (stack allocated), the second formatter is created on the heap (keyword new) then stored in field Next of the first formatter. This field is of type SPFormatter, which is an alias for std::shared_ptr<Formatter>, hence a C++ standard "smart pointer" that deletes it's contained object automatically with the deletion of the last referrer. Only in later chapters it will be explained why it is the preferred method to manage ALib formatter instances in shared pointers.

The short sample code correctly produces the following output:

Java style
Python style

However, the placeholder syntax must not be mixed within one format string. Let's try:

formatter.Format( target, "---%s---{}---", "Java", "Python" );

The output is:

---Java---{}---Python

This is obviously not what we wanted, but then it also did not produce an exception and it even included the second argument, "Python" in the output. While exceptions are discussed in a later chapter only, the reason that no exception is thrown here is simply explained: The first formatter in the chain, which we defined as type FormatterJavaStyle, identified the format string by reading "%s". It then "consumes" this string along with as many subsequent arguments as placeholders are found in the format string. This number is just one, as the placeholder "{}" is not recognized by this formatter.

The intermediate result consequently is "---Java---{}---", while the argument "Python" remains unprocessed. The next section explains what happens with this remaining argument.

2.2 Concatenating Format Operations

We just continue with the sample of the previous section: The unprocessed argument "Python" is not dropped, as it would have been with most implementations of a similar format functions in other libraries and programming languages. Instead, with ALib Text, the formatting process starts all over again using the remaining argument as the format string.

Now, as it is not a format string (it does not contain placeholders in any known syntax) it is just appended to the target string "as is".

Note
For users who are familiar with modules ALib Boxing and ALib Strings: The words "appending as is", here means, that the remaining argument is appended to the target string in a type-specific way. Because all arguments are of the same type, namely Box, this in turn means that box-function FAppend is invoked on the box, which just performs the type-dependent string conversion.

In fact, for this last operation, none of the two formatters became active. The trick here is that the abstract base class, Formatter already implements method Format. This implementation loops over all arguments. It checks if the current first argument is a string recognized as a format string by any of the chained formatters. If it is not, this argument is just appended to the target string and the loop continues with the next argument in the list.
If a format string is identified, control is passed to the corresponding formatter that consumes as many further arguments as placeholders are found in that format string, and then passes control back to the main loop.

Consequently, this approach allows to invoke Format without even a format string:

formatter.Format( target, 1,2,3 );
123

which probably does not make any sense, because the same result could have been achieved much more efficiently by stating:

target << 1 << 2 << 3;

Even the following sample, still might not make too much sense to a reader:

formatter.Format( target, "--- A: {} ---", 1, "--- B: {} ---", 2 );

because the usual way to come to the same result, was to have only one format string with two arguments, instead of two format strings with one argument each:

formatter.Format( target, "--- A: {} ------ B: {} ---", 1, 2 );

So, why is this loop implemented with it's auto-detection and the option of having more than one format string? Some sound rational for the loop is given in the next section.

2.3 Decoupled Format Argument Collection

Method Format collects the variadic template arguments to an internally allocated container of type Boxes. This container is then passed to the internal format loop.

Alternatively, the collection of format arguments in a container object may be performed "manually" by the user code. In this case, the formatter is invoked with one of the two overloaded methods Formatter::FormatArgs. Both methods must be invoked only after an explicit call to Formatter::Acquire. One of the two methods does not accept an external container and instead operates on the internally allocated instance which method Acquire returns. The second function allows to pass an arbitrary external container instance, for example one of derived type Message.

With this knowledge it becomes obvious, that the collection of formatting arguments can be "decoupled" from the invocation of the formatter. Note that the argument list may include zero, one or even multiple format strings, which each are followed by corresponding placeholder values:

AString target;
Boxes& results= formatter.Acquire(ALIB_CALLER_PRUNED);
results.Add( "The results are\n" );
// calculating speed
//...
//...
results.Add( " Speed: {} m/s\n", 42 );
// calculating mass
//...
//...
results.Add( " Mass: {} kg\n", 75.0 );
// calculating energy
//...
//...
results.Add( " Energy: {} Joule\n", 66150 );
try
{
formatter.FormatArgs( target, results );
}
catch( Exception& e )
{
e.Format( target );
}
formatter.Release();
cout << target << endl;
The results are
Speed: 42 m/s
Mass: 75.0 kg
Energy: 66150 Joule

A reader might think for herself if and when this might become useful. It should be noted that the interface of logging module ALox builds on the same mechanism. The arguments there are called "logables" and might be format strings or anything else. Therefore, also with ALox the collection of log entry data can be decoupled from the final creation of the log entry. This is especially useful for complex log-entries whose arguments are collected during the execution of an algorithm and for example are only logged in case of an exception or other unexpected conditions.

2.4 Default Formatters

In the previous samples, a local instance of a formatter (or two) has been created. For general purpose use, this module provides a global pair of (concatenated) formatters which are receivable with static methods Formatter::GetDefault and Formatter::AcquireDefault.

The formatter returned is embedded in "smart pointer type" SPFormatter. During bootstrapping of the library, a formatter of type FormatterPythonStyle is created with a concatenated object of FormatterJavaStyle.

One obvious rational for the provision of these default formatters is of-course to save memory and processing resources by reusing the formatter instances in different parts of an application. However, probably in most cases more important is the fact that this way, the same default configuration is used with formatting operations. For example, if the decimal point character of floating point numbers should be defaulted to be different than US/English standard '.', then such setting could be performed with the bootstrap of the library once and for all usages across a process.

2.5 Cloning Formatters

Formatter implementations may or may not provide default settings that for example influence a format operation that uses minimal placeholders that omit optional formatting flags. The built-in formatters do have such default settings.

If a software unit wishes to change some settings, the advised approach is as follows:

  • Retrieve the default formatter(s)
  • Create a clone of the default formatter(s) by invoking Formatter::Clone.
  • Change the default settings of the cloned formatter.
  • Use the cloned formatter.

With this procedure, any changes that an application applied to the default formatters (e.g. during bootstrap), will remain valid in the cloned formatters in addition to the "local changes", while the default formatters remain untouched.

For example, built-in formatters provide in fields DefaultNumberFormat and AlternativeNumberFormat to reflect some default behavior of their formatting syntax.
The attributes of these members might be modified to change those defaults. While this leads to a deviation of the formatting standard, it may be used instead of providing corresponding syntactic information within the placeholder field of each and every format string. Some modifications may not even be possible with the given format specification syntax.

2.6 Exceptions

The simple samples shown so far used correct format strings. In case of errorneous format strings, the built-in formatters will throw an ALib Exception defined with enumeration aworx::lib::text::Exceptions.

While in the case of "hard-coded" format strings, such exceptions are not needed to be caught, their evaluation (with debug builds) might be very helpful for identifying what is wrong. Of-course, when format strings are not hard-coded but instead can be provided by the users of a software (for example in configuration files or command line parameters), a try/catch block around formatting invocations is a mandatory thing, also in release compilations.

The following sample shows how an exception can be caught and its description may be written to the standard output:

#if ALIB_DEBUG
try
{
#endif
AString target;
aworx::Formatter::GetDefault()->Format(target, "Unknown syntax: {X}", "Test");
cout << target;
#if ALIB_DEBUG
}
catch(Exception& e)
{
cout << e.Format();
}
#endif

The output of running this code is:

E1: <format::MissingClosingBracket>
Closing bracket '}' of placeholder not found (or syntax error).
In: "Unknown syntax: {X}"
^

In most cases, a detailed text message, including a copy of the format string and a "caret" symbol '^' that hints to the parsing error in the string is given with the exception's description.

2.7 Escape Sequences In Format Strings

Escape characters, like for example "\t", "\n" or "\\" might be given with either one or two backslashes. The formatters will convert them to the corresponding ASCII code, if the backslash itself is escaped.

Class FormatterPythonStyle recognizes double curly braces "{{" and "}}" and converts them to a single brace. Similar to this, class FormatterJavaStyle recognizes "%%" and converts it to a single percentage symbol.

3. Formatting Custom Types

As we have seen, the use of module ALib Boxing allows the formatters of this module to accept any third-party data type as formatting arguments. The formatters of-course are enabled to "convert" all C++ fundamental types to strings. But how about custom types?

The solution for custom conversion is given with the support of "box-functions", which implement a sort of "virtual function call" on boxed types.
There are two box-functions that the built-in formatters are using.

3.1 Box-Function FAppend

By default, the very simple box-function that is used by the built-in formatters for converting arbitrary types to string values, is FAppend. This function is one of the built-in functions of module ALib Boxing and this way is not specific to this module ALib Text.

Usually this function's implementation just unboxes the corresponding type and appends the object to the target string.
Let as look at an example. The following struct stores a temperature in Kelvin:

struct Kelvin
{
double value;
};

If an object of this class is used with a formatter without any further preparation, the default implementation of function FAppend is invoked, which writes the memory address of the given object. In debug-compilations, this implementation in addition writes the boxed type's name (platform dependent and implemented with class DbgTypeDemangler). This is shown in the following code and output snippet:

Kelvin temperature { 287.65 };
AString target;
Formatter::GetDefault()->Format(target, "The temperature is {}\n", temperature);
cout << target;
The temperature is Kelvin(Size: 8 bytes)

The first step to implement function FAppend for sample type Kelvin is to specialize functor T_Append for the type:

target << Format(src.value - 273.15, &nf) << " \u2103"; // Degree Celsius symbol (small circle + letter 'C')
)

With that in place, it is possible to apply an object of this type to an AString:

Kelvin temperature { 287.65 };
AString target;
target << temperature;
cout << target << endl;
14.5 ℃

Now, we can easily implement box-function FAppend, because for types that are "appendable" already, this is done with just a simple macro that has to be placed in the bootstrap section of a software:

With that in place, it is possible to append a boxed object of this type to an AString:

Kelvin temperature { 287.65 };
AString target;
Box temperatureBoxed= temperature;
target << temperatureBoxed;
cout << target << endl;
14.5 ℃

Because the formatters use the same technique with the boxed arguments they receive, our sample class can now already be used with formatters:

Kelvin temperature { 287.65 };
AString target;
Formatter::GetDefault()->Format(target, "The temperature is {}", temperature);
cout << target << endl;
The temperature is 14.5 ℃

To summarize this section, some bullet points should be given:

  • Independently from this module ALib Text and the formatters defined here, class AString provides a concept based on template meta programming that allows to append objects of arbitrary type to strings.
  • With the availability of module ALib Boxing, a box-function named FAppend is established that is invoked in the moment an instance of class Box is appended to an AString.
  • By defining a specialized version of this function for a custom type, boxed values of the custom type can be appended to an AString.
  • The formatter classes provided with this module, use this function with custom types.
  • Consequently, if a custom type has already been made compatible with both modules, ALib Strings and ALib Boxing, no special preparations have to be made to use the type with the formatter classes.

3.2 Box-Function FFormat

The previous section demonstrated how a custom type can be made "compatible" to ALib formatters found in this module ALib Text.

The given approach using box-function FAppend is quite limited in that respect, that within a format string no attributes might be given that determine how to format a custom type. With the sampled temperature type "Kelvin", the output format was in celsius with one decimal digits. If we wanted to allow Fahrenheit as an alternative output, we need to implement boxing function FFormat, which was specifically created for this purpose and consequently is defined in this module.

Note
Type-specific format specification strings are allowed only with the Python-like syntax of format strings. The Java-like formatter does not provide a feature of "embedding" custom format specifications in the format string.

The function has three parameters: besides the box that it is invoked on and the target string to write to, it receives a string that provides type-specific information about how the contents is to be written. This format specification is fully type and implementation specific and has to be documented with the specific function's documentation.

We want to implement a format string that starts with character 'C', 'F' or 'K' to specify celsius, fahrenheit or kelvin and a following integral number that specifies the fractional digits of the output.

To do this, the following function declaration needs to go to a header file:

void FFormat_Kelvin( const Box& box, const String& formatSpec, AString& target );

Then, the implementation of the function has to be placed in a compilation unit. This might look like this:

void FFormat_Kelvin( const Box& box, const String& formatSpecGiven, AString& target )
{
// set default format spec (in real code, this should be using a resourced default string)
String formatSpec= formatSpecGiven.IsNotEmpty() ? formatSpecGiven
: A_CHAR("C2");
// get value from boxed object
double value= box.Unbox<Kelvin>().value;
// get precision
Substring precisionString= formatSpec.Substring(1);
if( precisionString.IsNotEmpty() )
{
int8_t precision;
precisionString.ConsumeDec( precision );
nf.FractionalPartWidth= precision;
}
else
nf.FractionalPartWidth= 2;
// convert unit (or don't)
String unit= A_CHAR("\u212A");
if( formatSpec.CharAtStart() == 'C' )
{
unit= A_CHAR("\u2103");
value= value - 273.15;
}
else if( formatSpec.CharAtStart() == 'F' )
{
unit= A_CHAR("\u2109");
value= value * 1.8 - 459.67;
}
// write value
target << Format( value, &nf) << ' ' << unit;
}

Within the bootstrap section of the process, the function has to be registered with ALib Boxing:

// This lock is usually NOT NEEDED!
// We do this, here because this sample code is run in the unit tests, when ALib is already
// bootstrapped.
// See note in reference documentation of function BootstrapRegister()

With that in place, we can use the custom format specification with our custom type

Kelvin temperature { 287.65 };
AString target;
Formatter::GetDefault()->Format(target, "The temperature is {:C2}\n", temperature);
Formatter::GetDefault()->Format(target, "The temperature is {:F0}\n", temperature);
Formatter::GetDefault()->Format(target, "The temperature is {:K5}\n", temperature);
cout << target;

The following output is produced.

The temperature is 14.50 ℃
The temperature is 58. ℉
The temperature is 287.65000 K

As a second sample we want to look at the internal implementation of formatting date and time values. ALib class CalendarDateTime provides (native) method Format to write time and date values in a human readable and customizable way. This method also requires a format specification. Now, this helper class is used to implement FFormat for boxed arguments of type DateTime, which is given with class FFormat_DateTime. Due to the existence of the helper class, the implementation of the function is therefore rather simple:

void FFormat_DateTime( const Box& box, const String& formatSpec, AString& target )
{
system::CalendarDateTime tct( box.Unbox<DateTime>() );
tct.Format( formatSpec.IsNotEmpty() ? formatSpec
: lib::SYSTEM.GetResource("DFMT"),
target );
}

4. Custom Formatters

To implement a custom formatter that uses a custom format string syntax, no detailed manual or step-by-step sample is given here. Instead just some hints as bullet points:

  • The typical use-case for implementing a custom format string is to mimic an existing formatter of a different programing language or different C++ library, to be able to reuse the formatting strings, which might be resourced and shared between different implementations of a software.
  • The built-in formatters both use "intermediate" class FormatterStdImpl as a base class. This class might be used for custom formatters as well, as it already implements a skeleton that has to be completed by implementing a set of specific abstract methods.
  • It is recommended to review (and copy) the sources of one of the given formatter implementations. While FormatterPythonStyle is by far the more powerful implementation, class FormatterJavaStyle might be less complicated to start with.
  • A thorough understanding of modules ALib Boxing and ALib Strings is a precondition for the implementation of a custom formatter.

5. Further Types Provided By This Module

Besides the formatter classes that have been discussed in this Programmer's Manual, module ALib Text provides some other types which make use of the formatters.

As of today, these types are

Please consult the class's extensive reference documentation for more information about the features and use of these types.

aworx::NumberFormat
lib::strings::TNumberFormat< character > NumberFormat
Type alias in namespace aworx.
Definition: strings/fwds.hpp:286
ALIB_STRINGS_APPENDABLE_TYPE_INLINE
#define ALIB_STRINGS_APPENDABLE_TYPE_INLINE(TYPE, IMPL)
Definition: astring.hpp:171
aworx::lib::boxing::Boxes::Add
Boxes & Add()
Definition: boxes.inl:173
aworx::DateTime
lib::time::DateTime DateTime
Type alias in namespace aworx.
Definition: datetime.hpp:228
aworx::lib::strings::TNumberFormat::FractionalPartWidth
int8_t FractionalPartWidth
Definition: numberformat.hpp:386
aworx::String
lib::strings::TString< character > String
Type alias in namespace aworx.
Definition: strings/fwds.hpp:81
aworx::lib::boxing::BootstrapRegister
void BootstrapRegister(typename TFDecl::Signature function)
Definition: boxing.hpp:154
aworx::CalendarDateTime
lib::system::CalendarDateTime CalendarDateTime
Type alias in namespace aworx.
Definition: calendar.hpp:741
aworx::Box
lib::boxing::Box Box
Type alias in namespace aworx.
Definition: boxing/fwds.hpp:40
aworx::FormatterPythonStyle
lib::text::FormatterPythonStyle FormatterPythonStyle
Type alias in namespace aworx.
Definition: text/fwds.hpp:68
aworx::Format
lib::strings::TFormat< character > Format
Type alias in namespace aworx.
Definition: strings/fwds.hpp:305
aworx::Boxes
lib::boxing::Boxes Boxes
Type alias in namespace aworx.
Definition: boxing/fwds.hpp:43
aworx::lib::strings::TString::IsNotEmpty
constexpr bool IsNotEmpty() const
Definition: string.hpp:404
aworx::lib::text::Formatter::GetDefault
static SPFormatter GetDefault()
Definition: formatter.hpp:340
aworx::lib::text::Formatter::Next
std::shared_ptr< Formatter > Next
Definition: formatter.hpp:142
aworx::AString
lib::strings::TAString< character > AString
Type alias in namespace aworx.
Definition: strings/fwds.hpp:135
aworx::lib::monomem::GlobalAllocatorLock
ALIB_API ThreadLock GlobalAllocatorLock
aworx::Substring
lib::strings::TSubstring< character > Substring
Type alias in namespace aworx.
Definition: strings/fwds.hpp:99
aworx::lib::boxing::TMappedTo
Definition: typetraits.inl:78
aworx::lib::strings::TString::Substring
ALIB_WARNINGS_IGNORE_IF_CONSTEXPR TString< TChar > Substring(integer regionStart, integer regionLength=MAX_LEN) const
Definition: string.hpp:298
ALIB_BOXING_BOOTSTRAP_REGISTER_FAPPEND_FOR_APPENDABLE_TYPE
#define ALIB_BOXING_BOOTSTRAP_REGISTER_FAPPEND_FOR_APPENDABLE_TYPE(TAppendable)
Definition: functions.inl:502
A_CHAR
#define A_CHAR(STR)
ALIB_CALLER_PRUNED
#define ALIB_CALLER_PRUNED
Definition: tools.hpp:71
aworx::FormatterJavaStyle
lib::text::FormatterJavaStyle FormatterJavaStyle
Type alias in namespace aworx.
Definition: text/fwds.hpp:71
aworx::lib::text::FFormat
Definition: text/fwds.hpp:31
aworx::lib::text::Formatter::Format
Formatter & Format(AString &target, TArgs &&... args)
ALIB_THREADS.
Definition: formatter.hpp:239
aworx::lib::system::FFormat_DateTime
ALIB_API void FFormat_DateTime(const Box &self, const String &formatSpec, AString &target)
aworx::Exception
lib::results::Exception Exception
Type alias in namespace aworx.
Definition: exception.hpp:532
ALIB_LOCK_WITH
#define ALIB_LOCK_WITH(lock)
Definition: threadlocknr.hpp:34