ALib C++ Framework
by
Library Version: 2605 R0
Documentation generated by doxygen
Loading...
Searching...
No Matches
formatterpythonstyle.hpp
Go to the documentation of this file.
1//==================================================================================================
2/// \file
3/// This header-file is part of module \alib_format of the \aliblong.
4///
5/// Copyright 2013-2026 A-Worx GmbH, Germany.
6/// Published under #"mainpage_license".
7//==================================================================================================
9
10//==================================================================================================
11/// Implements a #"format::Formatter" according to the formatting standards of the
12/// \https{Python language,docs.python.org/3.5/library/string.html#format-string-syntax}.
13///
14/// \note
15/// Inherited, public fields of parent class #"%FormatterStdImpl" provide important possibilities
16/// for changing the formatting behavior of instances of this class. Therefore, do not forget
17/// to consult the #"alib::format::FormatterStdImpl;parent classes documentation".
18///
19/// In general, the original \b Python specification is covered quite well. However, there are
20/// some differences, some things are not possible (considering python being a scripting language)
21/// but then there are also found some very helpful extensions to that standard.
22/// Instead of repeating the complete documentation, please refer to the link above as the
23/// foundation and then take note of the following list of differences, extensions and general
24/// hints:
25///
26/// - <b>General Notes:</b>
27/// \b Python defines a placeholder field as follows
28///
29/// "{" [field_name] ["!" conversion] [":" format_spec] "}"
30///
31///
32/// - This formatter is <b>less strict</b> in respect to the order of the format symbols. E.g.
33/// it allows <c>{:11.5,}</c> where Python allows only <c>{:11,.5}</c>
34///
35/// - With this class being derived from
36/// #"alib::format::FormatterStdImpl;FormatterStdImpl", features of the parent are
37/// available to this formatter as well. This is especially true and sometimes useful in respect to
38/// setting default values number formatting. For example, this allows modifying all number output
39/// without explicitly repeating the settings in each placeholder of format strings. Other options,
40/// for example, the grouping characters used with hexadecimal numbers, cannot be even changed
41/// with the <b>Python Style</b> formatting options. The only way of doing so is modifying the
42/// properties of the formatter object before the format operation.
43///
44/// - Nested replacements in format specification fields are (by nature of this implementation
45/// language) \b not supported.
46///
47/// <p>
48/// - <b>Positional arguments and field name:</b>
49/// - By the nature of the implementation language (<em>C++, no introspection</em>) of this class,
50/// \b field_name can \b not be the name of an identifier, an attribute name or an array element
51/// index. It can only be a positional argument index, hence a number that chooses a different
52/// index in the provided argument list.<br>
53/// However, the use of field names is often a requirement in use cases that offer configurable
54/// format string setup to the "end user". Therefore, there are two alternatives to cope
55/// with the limitation:
56/// - In simple cases, it is possible to just add all optionally needed data in the argument list,
57/// document their index position and let the user use positional argument notation to choose
58/// the right value from the list.
59/// - More elegant however, is the use of class
60/// #"alib::format::PropertyFormatter;PropertyFormatter"
61/// which extends the format specification by custom identifiers which control the placement
62/// of corresponding data in the format argument list. This class uses a translator table from
63/// identifier strings to custom callback functions. This way, much more than just simple
64/// field names are allowed.
65///
66/// - When using positional arguments in a format string placeholders, the Python formatter
67/// implementation does not allow to switch from <b>automatic field indexing</b> to explicit
68/// indexing. This \alib-implementation does allow it. The automatic index (aka no positional
69/// argument is given for a next placeholder) always starts with index \c 0 and is incremented
70/// each time automatic indexing is used. Occurrences of explict indexing have no influence
71/// on the automatic indexing.
72///
73///
74/// <p>
75/// - <b>Binary, Hexadecimal and Octal Numbers:</b>
76/// - Binary, hexadecimal and octal output is <b>cut in size</b> (!) when a field width is given that
77/// is smaller than the resulting amount of digits of the number arguments provided.
78/// \note This implies that a value written might not be equal to the value given.
79/// This is not a bug but a design decision. The rationale behind this is that with this
80/// behavior, there is no need to mask lower digits when passing the arguments to the
81/// format invocation. In other words, the formatter "assumes" that the given field width
82/// indicates that only a corresponding number of lower digits are of interest.
83///
84/// - If no width is given and the argument contains a boxed pointer, then the platform-dependent
85/// full output width of pointer types is used.
86///
87/// - The number <b>grouping option</b> (<c>','</c>) can also be used with binary, hexadecimal and octal
88/// output.
89/// The types support different grouping separators for nibbles, bytes, 16-bit and 32-bit words.
90/// Changing the separator symbols, is not possible with the format fields of the format strings
91/// (if it was, this would become very incompatible to Python standards). Changes have to be made
92/// before the format operation by modifying the field
93/// #"^FormatterPythonStyle::AlternativeNumberFormat" which is provided through parent class
94/// #"%format::Formatter".
95///
96/// - Alternative form (\c '#"')" adds prefixes as specified in members
97/// - #"TNumberFormat::BinLiteralPrefix",
98/// - #"TNumberFormat::HexLiteralPrefix", and
99/// - #"TNumberFormat::OctLiteralPrefix".
100///
101/// For upper case formats, those are taken from the inherited field
102/// #"^FormatterPythonStyle::DefaultNumberFormat", for lower case formats from
103/// #"^FormatterPythonStyle::AlternativeNumberFormat".
104/// However, in alignment with the \b Python specification, \b both default to lower case
105/// literals \c "0b", \c "0o" and \c "0x". The user may change all defaults.
106///
107///
108/// <p>
109/// - <b>Floating point values:</b>
110/// - If floating point values are provided without a type specification in the format string,
111/// then all values of the inherited field #"^FormatterPythonStyle::DefaultNumberFormat" are used to
112/// format the number
113/// - For lower case floating point format types (\c 'f' and \c 'e'), the values specified in
114/// attributes #"%ExponentSeparator", #"%NANLiteral" and #"%INFLiteral" of the inherited field
115/// #"^FormatterPythonStyle::AlternativeNumberFormat" are used.
116/// For upper case types (\c 'F' and \c 'E') the corresponding attributes in the
117/// field #"^FormatterPythonStyle::DefaultNumberFormat" apply.
118/// - Fixed point formats (\c 'f' and 'F' types) are not supported to use an arbitrary length.
119/// See class #"TNumberFormat;NumberFormat" for the limits.
120/// Also, very high values and values close to zero may be converted to scientific format.
121/// Finally, if flag #"NumberFormatFlags::ForceScientific" field
122/// #"TNumberFormat::Flags" in member #"DefaultNumberFormat" is \c true, types
123/// \c 'f' and 'F' behave like types \c 'e' and 'E'.
124/// - When both, a \p{width} and a \p{precision} is given, then the \p{precision} determines the
125/// fractional part, even if the type is \b 'g' or \b 'G'. This is different than specified with
126/// Python formatter, which uses \p{precision} as the overall width in case of types
127/// \b 'g' or \b 'G'.
128/// - The 'general format' type for floats, specified with \c 'g' or \c 'G' in the python
129/// implementation limits the precision of the fractional part, even if \p{precision} is not
130/// further specified. This implementation does limit the precision only if type is \c 'f'
131/// or \c 'F'.
132///
133/// <p>
134/// - <b>String Conversion:</b><br>
135/// If \e type \c 's' (or no \e type) is given in the \b format_spec of the replacement field,
136/// a string representation of the given argument is used.
137/// In \b Java and \b C# such representation is received by invoking <c>Object.[t|T]oString()</c>.
138/// Consequently, to support string representations of custom types, in these languages
139/// the corresponding <b>[t|T]oString()</b> methods of the type have to be implemented.
140///
141/// In C++ the arguments are "boxed" into objects of type
142/// #"alib::boxing::Box;Box". For the string representation, the formatter invokes
143/// box-function #"FAppend". A default implementation exists which
144/// for custom types appends the type name and the memory address of the object in hexadecimal
145/// format. To support custom string representations (for custom types), this box-function
146/// needs to be implemented for the type in question. Information and sample code on how to do this
147/// is found in the documentation of \alib_boxing , chapter
148/// #"alib_boxing_strings_fappend".
149///
150/// - <b>Hash-Value Output:</b><br>
151/// In extension (and deviation) of the Python specification, format specification type \c 'h' and
152/// its upper case version \c 'H' is implemented. The hash-values of the argument object is
153/// written in hexadecimal format. Options of the type are identical to those of \c 'x',
154/// respectively \c 'X'.
155///
156/// In the C++ language implementation of \alib, instead of hash-values of objects, the pointer
157/// found in method #"Box::Data" is printed. In case of boxed class-types and default
158/// boxing mechanics are used with such class types, this will show the memory address of
159/// the given instance.
160///
161/// - <b>Boolean output:</b><br>
162/// In extension (and deviation) of the Python specification, format specification type \c 'B'
163/// is implemented. The word \b "true" is written if the given value represents a boolean \c true
164/// value, \b "false" otherwise.
165///
166/// In the C++ language implementation of \alib, the argument is evaluated to boolean by invoking
167/// box-function #"FIsTrue".
168///
169/// <p>
170/// - <b>Custom Format Specifications:</b><br>
171/// With \c Python formatting syntax, placeholders have the following syntax:
172///
173/// "{" [field_name] ["!" conversion] [":" format_spec] "}"
174///
175/// The part that follows the colon is called \b format_spec. \b Python passes this portion of the
176/// placeholder to a built-in function \c format(). Now, each type may interpret this string in a
177/// type specific way. But most built-in \b Python types do it along what they call the
178/// \https{"Format Specification Mini Language",docs.python.org/3.5/library/string.html#format-specification-mini-language}.
179///
180/// With this implementation, the approach is very similar. The only difference is that the
181/// "Format Specification Mini Language" is implemented for standard types right within this class.
182/// But before processing \b format_spec, this class will check if the argument type assigned to
183/// the placeholder disposes of a custom implementation of box function #"FFormat".
184/// If so, this function is invoked and string \b format_spec is passed for custom processing.
185///
186/// Information and sample code on how to adopt custom types to support this interface is
187/// found in the Programmer's Manual of this module, with chapter
188/// #"alib_format_custom_types_fformat".
189///
190/// For example, \alib class #"time::DateTime" supports custom formatting with box-function
191/// #"FFormat_DateTime" which uses helper-class
192/// #"util::CalendarDateTime" that provides a very common specific mini language
193/// for #"CalendarDateTime::Format;formatting date and time values".
194///
195/// <p>
196/// - <b>Conversions:</b><br>
197/// In the \b Python placeholder syntax specification:
198///
199/// "{" [field_name] ["!" conversion] [":" format_spec] "}"
200///
201/// symbol \c '!' if used before the colon <c>':'</c> defines
202/// what is called the <b>conversion</b>. With \b Python, three options are given:
203/// \c '!s' which calls \c str() on the value, \c '!r' which calls \c repr() and \c '!a' which
204/// calls \c ascii(). This is of course not applicable to this formatter. As a replacement,
205/// this class extends the original specification of that conversion using \c '!'.
206/// The following provides a list of conversions supported. The names given can be abbreviated
207/// at any point and ignore letter case, e.g., \c !Upper can be \c !UP or just \c !u.
208/// In addition, multiple conversions can be given by concatenating them, each repeating
209/// character \c '!'.<br>
210/// The conversions supported are:
211///
212/// - <b>!Upper</b><br>
213/// Converts the contents of the field to upper case.
214///
215/// - <b>!Lower</b><br>
216/// Converts the contents of the field to lower case.
217///
218/// - <b>!Quote[O[C]]</b><br>
219/// Puts quote characters around the field.
220/// Note that these characters are not respecting any optional given field width but instead
221/// are added to such.
222/// An alias name for \b !Quote is given with \b !Str. As the alias can be abbreviated to \b !s,
223/// this provides compatibility with the \b Python specification.
224///
225/// In extension to the python syntax specification, one or two optional characters might be
226/// given after the (optionally abreviated) terms "Quote" respectively "str".
227/// If one character is given, this is used as the open and closing character. If two are given,
228/// the first is used as the open character, the second as the closing one.
229/// For example, <b>{!Q'}</b> uses single quotes, or <b>{!Q[]}</b> uses rectangular brackets.
230/// Bracket types <b>'{'</b> and <b>'}'</b> cannot be used with this conversion.
231/// To surround a placeholder's contents in this bracket type, add <b>{{</b> and <b>}}</b>
232/// around the placeholder - resulting in <b>{{{}}}</b>!.
233///
234/// - <b>!ESC[<|>]</b><br>
235/// In its default behavior or if \c '<' is specified, certain characters are converted to escape
236/// sequences.
237/// If \c '>' is given, escape sequences are converted to their (ascii) value.
238/// See #"TEscape;Escape" for details about the conversion
239/// that is performed.<br>
240/// An alias name for \b !ESC< is given with \b !a which provides compatibility
241/// with the \b Python specification.
242/// \note If \b !ESC< is used in combination with \b !Quote, then \b !ESC< should be the first
243/// conversion specifier. Otherwise, the quotes inserted might be escaped as well.
244///
245/// - <b>!Fill[Cc]</b><br>
246/// Inserts as many characters as denoted by the integer type argument.
247/// By default the fill character is space <c>' '</c>. It can be changed with optional character
248/// 'C' plus the character wanted.
249///
250/// - <b>!Tab[Cc][NNN]</b><br>
251/// Inserts fill characters to extend the length of the string to be a multiple of a tab width.
252/// By default the fill character is space <c>' '</c>. It can be changed with optional character
253/// 'C' plus the character wanted. The tab width defaults to \c 8. It can be changed by adding
254/// an unsigned decimal number.
255///
256/// - <b>!ATab[[Cc][NNN]|Reset]</b><br>
257/// Inserts an "automatic tabulator stop". These are tabulator positions that are stored
258/// internally and are automatically extended at the moment the actual contents exceeds the
259/// currently stored tab-position. An arbitrary number of auto tab stop and field width
260/// (see <b>!AWith</b> below) values is maintained by the formatter.
261///
262/// Which each new invocation of #"format::Formatter",
263/// the first auto value is chosen and with each use of \c !ATab or \c !AWidth, the next value is
264/// used.<br>
265/// However the stored values are cleared, whenever #"^.Format" is invoked on a non-acquired
266/// formatter! This means, to preserve the auto-positions across multiple format invocations,
267/// a formatter has to be acquired explicitly before the format operations and released
268/// afterwards.
269///
270/// Alternatively to this, the positions currently stored with the formatter can be reset with
271/// providing argument \c Reset in the format string.
272///
273/// By default, the fill character is space <c>' '</c>. It can be changed with optional character
274/// 'C' plus the character wanted. The optional number provided gives the growth value by which
275/// the tab will grow if its position is exceeded. This value defaults to \c 3.
276///
277/// Both, auto tab and auto width conversions may be used to increase readability of multiple
278/// output lines. Of course, output is not completely tabular, only if those values that result
279/// in the biggest sizes are formatted first. If a perfect tabular output is desired, the data
280/// to be formatted may be processed twice: Once to temporary buffer which is disposed and then
281/// a second time to the desired output #"%AString".
282///
283/// - <b>!AWidth[NNN|Reset]</b><br>
284/// Increases field width with repetitive invocations of format whenever a field value did not
285/// fit to the actually stored width. Optional decimal number \b NNN is added as a padding value.
286/// for more information, see <b>!ATab</b> above.
287///
288/// - <b>!Xtinguish</b><br>
289/// Does not print anything. This is useful if format strings are externalized, e.g defined
290/// in #"GetResourcePool;library resources". Modifications of such resources
291/// might use this conversion to suppress the display of arguments (which usually are
292/// hard-coded).
293///
294/// - <b>!Replace<search><replace></b><br>
295/// Searches string \p{search} and replaces with \p{replace}. Both values have to be given
296/// enclosed by characters \c '<' and \c '>'. In the special case that \p{search} is empty
297/// (<c><></c>), string \p{replace} will be inserted if the field argument is an empty
298/// string.
299///
300///\I{##########################################################################################}
301/// # Reference Documentation #
302/// @throws <b>alib::format::FMTExceptions</b>
303/// - #"FMTExceptions::ArgumentIndexOutOfBounds"
304/// - #"FMTExceptions::IncompatibleTypeCode"
305/// - #"FMTExceptions::MissingClosingBracket"
306/// - #"FMTExceptions::MissingPrecisionValuePS"
307/// - #"FMTExceptions::DuplicateTypeCode"
308/// - #"FMTExceptions::UnknownTypeCode"
309/// - #"FMTExceptions::ExclamationMarkExpected"
310/// - #"FMTExceptions::UnknownConversionPS"
311/// - #"FMTExceptions::PrecisionSpecificationWithInteger"
312//==================================================================================================
314 //################################################################################################
315 // Protected fields
316 //################################################################################################
317 protected:
318 /// Set of extended placeholder attributes, needed for this type of formatter in
319 /// addition to parent's #"FormatterStdImpl::PlaceholderAttributes".
321 /// The portion of the replacement field that represents the conversion specification.
322 /// This specification is given at the beginning of the replacement field, starting with
323 /// \c '!'.
325
326 /// The position where the conversion was read. This is set to \c -1 in #"resetPlaceholder".
328
329
330 /// The value read from the precision field. This is set to \c -1 in #"resetPlaceholder".
332
333 /// The position where the precision was read. This is set to \c -1 in #"resetPlaceholder".
335
336 /// The default precision if not given.
337 /// This is set to \c 6 in #"resetPlaceholder", but is changed when specific.
339 };
340
341 /// The extended placeholder attributes.
343
344 //################################################################################################
345 // Public fields
346 //################################################################################################
347 public:
348 /// Storage of sizes for auto-tabulator feature <b>{!ATab}</b> and auto field width feature
349 /// <b>{!AWidth}</b>
351
352 /// The default instance of field #"Sizes". This might be replaced with an external object.
354
355 //################################################################################################
356 // Constructor/Destructor
357 //################################################################################################
358 public:
359 /// Constructs this formatter.
360 /// Inherited field #"DefaultNumberFormat" is initialized to meet the formatting defaults of
361 /// Python.
364
365 /// Clones and returns a copy of this formatter.
366 ///
367 /// If the formatter attached to field
368 /// #"Formatter::Next;*" is of type #"%FormatterStdImpl", then that
369 /// formatter is copied as well.
370 ///
371 /// @returns An object of type #"%FormatterPythonStyle" and with the same custom settings
372 /// than this.
373 ALIB_DLL virtual
374 SPFormatter Clone() override;
375
376 /// Resets #"AutoSizes".
377 /// @return An internally allocated container of boxes that may be used to collect
378 /// formatter arguments.
379 virtual BoxesMA& Reset() override { Sizes->Reset(); return Formatter::Reset(); }
380
381
382 //################################################################################################
383 // Implementation of FormatterStdImpl interface
384 //################################################################################################
385 protected:
386 /// Sets the actual auto tab stop index to \c 0.
387 virtual void initializeFormat() override { Sizes->Restart(); }
388
389
390
391 /// Invokes parent implementation and then applies some changes to reflect what is defined as
392 /// default in the Python string format specification.
394 virtual void resetPlaceholder() override;
395
396 /// Searches for \c '{' which is not '{{'.
397 ///
398 /// @return The index found, -1 if not found.
400 virtual integer findPlaceholder() override;
401
402 /// Parses placeholder field in python notation. The portion \p{format_spec} is not
403 /// parsed but stored in member
404 /// #"PlaceholderAttributes;FormatSpec".
405 ///
406 /// @return \c true on success, \c false on errors.
408 virtual bool parsePlaceholder() override;
409
410 /// Parses the format specification for standard types as specified in
411 /// \https{"Format Specification Mini Language",docs.python.org/3.5/library/string.html#format-specification-mini-language}.
412 ///
413 /// @return \c true on success, \c false on errors.
415 virtual bool parseStdFormatSpec() override;
416
417 /// Implementation of abstract method
418 /// #"FormatterStdImpl::writeStringPortion;*".<br>
419 /// While writing, replaces \c "{{" with \c "{" and \c "}}" with \c "}" as well as
420 /// standard codes like \c "\\n", \c "\\r" or \c "\\t" with corresponding ascii codes.
421 ///
422 /// @param length The number of characters to write.
424 virtual void writeStringPortion( integer length ) override;
425
426 /// Processes "conversions" which are specified with \c '!'.
427 ///
428 /// @param startIdx The index of the start of the field written into
429 /// #"FormatterStdImpl::targetString". A value of \c -1 indicates
430 /// pre-phase.
431 /// @param target The target string, only if different from the field
432 /// #"FormatterStdImpl::targetString", which indicates intermediate phase.
433 /// @return \c false, if the placeholder should be skipped (nothing is written for it).
434 /// \c true otherwise.
436 virtual bool preAndPostProcess( integer startIdx,
437 AString* target ) override;
438
439
440 /// Makes some attribute adjustments and invokes standard implementation
441 /// @return \c true if OK, \c false if replacement should be aborted.
443 virtual bool checkStdFieldAgainstArgument() override;
444};
445} // namespace [alib::format]
446
447ALIB_EXPORT namespace alib {
448/// Type alias in namespace #"%alib".
450}
#define ALIB_DLL
#define ALIB_EXPORT
virtual integer findPlaceholder() override
virtual void initializeFormat() override
Sets the actual auto tab stop index to 0.
virtual void writeStringPortion(integer length) override
virtual bool preAndPostProcess(integer startIdx, AString *target) override
virtual bool checkStdFieldAgainstArgument() override
AutoSizes SizesDefaultInstance
The default instance of field #"Sizes". This might be replaced with an external object.
PlaceholderAttributesPS placeholderPS
The extended placeholder attributes.
virtual SPFormatter Clone() override
FormatterStdImpl(const String &formatterClassName)
virtual BoxesMA & Reset()
Definition alox.cpp:14
containers::SharedPtr< format::Formatter > SPFormatter
Definition formatter.hpp:41
strings::util::AutoSizes AutoSizes
Type alias in namespace #"%alib".
lang::integer integer
Type alias in namespace #"%alib".
Definition integers.hpp:149
strings::TSubstring< character > Substring
Type alias in namespace #"%alib".
boxing::TBoxes< MonoAllocator > BoxesMA
Type alias in namespace #"%alib".
Definition boxes.hpp:192
strings::TAString< character, lang::HeapAllocator > AString
Type alias in namespace #"%alib".
format::FormatterPythonStyle FormatterPythonStyle
Type alias in namespace #"%alib".
int ConversionPos
The position where the conversion was read. This is set to -1 in #"resetPlaceholder".
int PrecisionPos
The position where the precision was read. This is set to -1 in #"resetPlaceholder".
int Precision
The value read from the precision field. This is set to -1 in #"resetPlaceholder".