ALib C++ Library
Library Version: 2402 R1
Documentation generated by doxygen
Loading...
Searching...
No Matches
formatterpythonstyle.hpp
Go to the documentation of this file.
1/** ************************************************************************************************
2 * \file
3 * This header file is part of sub-namespace #alib::lang::format of module \alib_basecamp of
4 * the \aliblong.
5 *
6 * \emoji :copyright: 2013-2024 A-Worx GmbH, Germany.
7 * Published under \ref mainpage_license "Boost Software License".
8 **************************************************************************************************/
9#ifndef HPP_ALIB_LANG_FORMAT_FORMATTER_PYTHONSTYLE
10#define HPP_ALIB_LANG_FORMAT_FORMATTER_PYTHONSTYLE 1
11
12#if !defined (HPP_ALIB_LANG_FORMAT_FORMATTER_STD)
14#endif
15
16#if !defined (HPP_ALIB_STRINGS_UTIL_AUTOSIZES)
18#endif
19
20
21namespace alib::lang::format {
22 /** ************************************************************************************************
23 * Implements a \alib{lang::format;Formatter} according to the
24 * \https{formatting standards of the Python language,docs.python.org/3.5/library/string.html#format-string-syntax}.
25 *
26 * \note
27 * Inherited, public fields of parent class \b FormatterStdImpl provide important possibilities
28 * for changing the formatting behavior of instances of this class. Therefore, do not forget
29 * to consult the \ref alib::lang::format::FormatterStdImpl "parent classes documentation".
30 *
31 * In general, the original \b Python specification is covered quite well. However, there are
32 * some differences, some things are not possible (considering python being a scripting language)
33 * but then there are also found some very helpful extensions to that standard. Instead of repeating
34 * a complete documentation, please refer to the
35 * \https{Python Documentation,docs.python.org/3.5/library/string.html#format-string-syntax}
36 * as the foundation and then take note of the following list of differences, extensions and
37 * general hints:
38 *
39 * - <b>General Notes:</b>
40 * \b Python defines a placeholder field as follows
41 *
42 * "{" [field_name] ["!" conversion] [":" format_spec] "}"
43 *
44 *
45 * - This formatter is <b>less strict</b> in respect to the order of the format symbols. E.g.
46 * it allows <c>{:11.5,}</c> where Python allows only <c>{:11,.5}</c>
47 *
48 * - With this class being derived from
49 * \ref alib::lang::format::FormatterStdImpl "FormatterStdImpl", features of the parent are
50 * available to this formatter as well. This is especially true and sometimes useful in respect to
51 * setting default values number formatting. For example, this allows to modify all number output
52 * without explicitly repeating the settings in each placeholder of format strings. Other options
53 * for example the grouping characters used with hexadecimal numbers, can not be even changed
54 * with the <b>Python Style</b> formatting options. The only way of doing so is modifying the
55 * properties of the formatter object prior to the format operation.
56 *
57 * - Nested replacements in format specification fields are (by nature of this implementation
58 * language) \b not supported.
59 *
60 * <p>
61 * - <b>Positional arguments and field name:</b>
62 * - By the nature of the implementation language (<em>C++, no introspection</em>) of this class,
63 * \b field_name can \b not be the name of an identifier, an attribute name or an array element
64 * index. It can only be a positional argument index, hence a number that chooses a different
65 * index in the provided argument list.<br>
66 * However, the use of field names is often a requirement in use cases that offer configurable
67 * format string setup to the "end user". Therefore, there are two alternatives to cope
68 * with the limitation:
69 * - In simple cases, it is possible to just add all optionally needed data in the argument list,
70 * document their index position and let the user use positional argument notation to choose
71 * the right value from the list.
72 * - More elegant however, is the use of class
73 * \ref alib::lang::format::PropertyFormatter "PropertyFormatter"
74 * which extends the format specification by custom identifiers which control the placement
75 * of corresponding data in the format argument list. This class uses a translator table from
76 * identifier strings to custom callback functions. This way, much more than just simple
77 * field names are allowed.
78 *
79 * - When using positional arguments in a format string placeholders, the Python formatter
80 * implementation does not allow to switch from <b>automatic field indexing</b> to explicit
81 * indexing. This \b %Aib implementation does allow it. The automatic index (aka no positional
82 * argument is given for a next placeholder) always starts with index \c 0 and is incremented
83 * each time automatic indexing is used. Occurrences of explict indexing have no influence
84 * on the automatic indexing.
85 *
86 *
87 * <p>
88 * - <b>Binary, Hexadecimal and Octal Numbers:</b>
89 * - Binary, hexadecimal and octal output is <b>cut in size</b> (!) when a field width is given that
90 * is smaller than the resulting amount of digits of the number arguments provided.
91 * \note This implies that a value written might not be equal to the value given.
92 * This is not a bug but a design decision. The rationale behind this is that with this
93 * behavior, there is no need to mask lower digits when passing the arguments to the
94 * format invocation. In other words, the formatter "assumes" that the given field width
95 * indicates that only a corresponding number of lower digits are of interest.
96 *
97 * - If no width is given and the argument contains a boxed pointer, then the platform-dependent
98 * full output width of pointer types is used.
99 *
100 * - The number <b>grouping option</b> (<c>','</c>) can also be used with binary, hexadecimal and octal
101 * output.
102 * The types support different grouping separators for nibbles, bytes, 16-bit and 32-bit words.
103 * Changing the separator symbols, is not possible with the format fields of the format strings
104 * (if it was, this would become very incompatible to Python standards). Changes have to be made
105 * prior to the format operation by modifying field
106 * \alib{lang::format;FormatterStdImpl::AlternativeNumberFormat;FormatterStdImpl::AlternativeNumberFormat}
107 * which is provided through parent class \b %FormatterStdImpl.
108 *
109 * - Alternative form (\c '#') adds prefixes as specified in members
110 * - \alib{strings;TNumberFormat::BinLiteralPrefix;BinLiteralPrefix},
111 * - \alib{strings;TNumberFormat::HexLiteralPrefix;HexLiteralPrefix} and
112 * - \alib{strings;TNumberFormat::OctLiteralPrefix;OctLiteralPrefix}.
113 *
114 * For upper case formats, those are taken from field
115 * \alib{lang::format;FormatterStdImpl::DefaultNumberFormat;FormatterStdImpl::DefaultNumberFormat},
116 * for lower case formats from
117 * \alib{lang::format;FormatterStdImpl::AlternativeNumberFormat;FormatterStdImpl::AlternativeNumberFormat}.
118 * However, in alignment with the \b Python specification, \b both default to lower case
119 * literals \c "0b", \c "0o" and \c "0x". All defaults may be changed by the user.
120 *
121 *
122 * <p>
123 * - <b>Floating point values:</b>
124 * - If floating point values are provided without a type specification in the format string, then
125 * all values of
126 * \alib{lang::format;FormatterStdImpl::DefaultNumberFormat;FormatterStdImpl::DefaultNumberFormat}
127 * are used to format the number
128 * - For lower case floating point format types (\c 'f' and \c 'e'), the values specified in
129 * attributes \b %ExponentSeparator, \b %NANLiteral and \b %INFLiteral of object
130 * \alib{lang::format;FormatterStdImpl::AlternativeNumberFormat;FormatterStdImpl::AlternativeNumberFormat}
131 * are used. For upper case types (\c 'F' and \c 'E') the corresponding attributes in
132 * \alib{lang::format;FormatterStdImpl::DefaultNumberFormat;FormatterStdImpl::DefaultNumberFormat} apply.
133 * - Fixed point formats (\c 'f' and 'F' types) are not supported to use arbitrary length.
134 * See class \alib{strings;TNumberFormat;NumberFormat} for the limits.
135 * Also, very high values and values close to zero may be converted to scientific format.
136 * Finally, if flag \alib{strings;NumberFormatFlags;ForceScientific} field
137 * \alib{strings::NumberFormat;Flags} in member #DefaultNumberFormat is \c true, types
138 * \c 'f' and 'F' behave like types \c 'e' and 'E'.
139 * - When both, a \p{width} and a \p{precision} is given, then the \p{precision} determines the
140 * fractional part, even if the type is \b 'g' or \b 'G'. This is different than specified with
141 * Python formatter, which uses \p{precision} as the overall width in case of types
142 * \b 'g' or \b 'G'.
143 * - The 'general format' type for floats, specified with \c 'g' or \c 'G' in the python
144 * implementation limits the precision of the fractional part, even if \p{precision} is not
145 * further specified. This implementation does limit the precision only if type is \c 'f'
146 * or \c 'F'.
147 *
148 * <p>
149 * - <b>%String Conversion:</b><br>
150 * If \e type \c 's' (or no \e type) is given in the \b format_spec of the replacement field,
151 * a string representation of the given argument is used.
152 * In \b Java and \b C# such representation is received by invoking <c>Object.[t|T]oString()</c>.
153 * Consequently, to support string representations of custom types, in these languages
154 * the corresponding <b>[t|T]oString()</b> methods of the type have to be implemented.
155 *
156 * In C++ the arguments are "boxed" into objects of type
157 * \ref alib::boxing::Box "Box". For the string representation, the formatter invokes
158 * box-function \alib{boxing;FAppend}. A default implementation exists which
159 * for custom types appends the type name and the memory address of the object in hexadecimal
160 * format. To support custom string representations (for custom types), this box-function
161 * needs to be implemented for the type in question. Information and sample code on how to do this
162 * is found in the documentation of \alib_boxing , chapter
163 * \ref alib_boxing_strings_fappend "10.3 Box-Function FAppend".
164 *
165 * - <b>Hash-Value Output:</b><br>
166 * In extension (and deviation) of the Python specification, format specification type \c 'h' and
167 * its upper case version \c 'H' is implemented. The hash-values of the argument object is
168 * written in hexadecimal format. Options of the type are identical to those of \c 'x',
169 * respectively \c 'X'.
170 *
171 * In the C++ language implementation of \alib, instead of hash-values of objects, the pointer
172 * found in method \alib{boxing;Box::Data} is printed. In case of boxed class-types and default
173 * default boxing mechanics are used with such class types, this will show the memory address of
174 * the given instance.
175 *
176 * - <b>Boolean output:</b><br>
177 * In extension (and deviation) of the Python specification, format specification type \c 'B'
178 * is implemented. The word \b "true" is written if the given value represents a boolean \c true
179 * value, \b "false" otherwise.
180 *
181 * In the C++ language implementation of \alib, the argument is evaluated to boolean by invoking
182 * box-function \alib{boxing;FIsTrue}.
183 *
184 * <p>
185 * - <b>%Custom %Format Specifications:</b><br>
186 * With \c Python formatting syntax, placeholders have the following syntax:
187 *
188 * "{" [field_name] ["!" conversion] [":" format_spec] "}"
189 *
190 * The part that follows the colon is called \b format_spec. \b Python passes this portion of the
191 * placeholder to a built-in function \c format(). Now, each type may interpret this string in a
192 * type specific way. But most built-in \b Python types do it along what they call the
193 * \https{"Format Specification Mini Language",docs.python.org/3.5/library/string.html#format-specification-mini-language}.
194 *
195 * With this implementation, the approach is very similar. The only difference is that the
196 * "Format Specification Mini Language" is implemented for standard types right within this class.
197 * But before processing \b format_spec, this class will check if the argument type assigned to
198 * the placeholder disposes about a custom implementation of box function \alib{lang::format;FFormat}.
199 * If so, this function is invoked and string \b format_spec is passed for custom processing.
200 *
201 * Information and sample code on how to adopt custom types to support this interface is
202 * found in the Programmer's Manual of this module, with chapter
203 * \ref alib_basecamp_format_custom_types_fformat "4.3. Formatting Custom Types".
204 *
205 * For example, \alib class \alib{time;DateTime} supports custom formatting with box-function
206 * \alib{lang::system;FFormat_DateTime} which uses helper class \alib{lang::system;CalendarDateTime} that
207 * provides a very common specific mini language for
208 * \alib{lang::system::CalendarDateTime;Format;formatting date and time values}.
209 *
210 * <p>
211 * - <b>Conversions:</b><br>
212 * In the \b Python placeholder syntax specification:
213 *
214 * "{" [field_name] ["!" conversion] [":" format_spec] "}"
215 *
216 * symbol \c '!' if used prior to the colon <c>':'</c> defines
217 * what is called the <b>conversion</b>. With \b Python, three options are given:
218 * \c '!s' which calls \c str() on the value, \c '!r' which calls \c repr() and \c '!a' which
219 * calls \c ascii(). This is of-course not applicable to this formatter. As a replacement,
220 * this class extends the original specification of that conversion using \c '!'.
221 * The following provides a list of conversions supported. The names given can be abbreviated
222 * at any point and ignore letter case, e.g. \c !Upper can be \c !UP or just \c !u.
223 * In addition, multiple conversions can be given by concatenating them, each repeating
224 * character \c '!'.<br>
225 * The conversions supported are:
226 *
227 * - <b>!Upper</b><br>
228 * Converts the contents of the field to upper case.
229 *
230 * - <b>!Lower</b><br>
231 * Converts the contents of the field to lower case.
232 *
233 * - <b>!Quote[O[C]]</b><br>
234 * Puts quote characters around the field.
235 * Note that these characters are not respecting any optional given field width but instead
236 * are added to such.
237 * An alias name for \!Quote is given with \b !Str. As the alias can be abbreviated to \b !s,
238 * this provides compatibility with the \b Python specification.
239 *
240 * In extension to the python syntax specification, one or two optional characters might be
241 * given after the (optionally abreviated) terms "Quote" respectively "str".
242 * If one character is given, this is used as the open and closing character. If two are given,
243 * the first is used as the open character, the second as the closing one.
244 * For example <b>{!Q'}</b> uses single quotes, or <b>{!Q[]}</b> uses rectangular brackets.
245 * Bracket types <b>'{'</b> and <b>'}'</b> can not be used with this conversion.
246 * To surround a placeholder's contents in this bracket type, add <b>{{</b> and <b>}}</b>
247 * around the placeholder - resulting in <b>{{{}}}</b>!.
248 *
249 * - <b>!ESC[<|>]</b><br>
250 * In its default behavior or if \c '<' is specified, certain characters are converted to escape
251 * sequences.
252 * If \c '>' is given, escape sequences are converted to their (ascii) value.
253 * See \alib{strings;TFormat::Escape;Format::Escape} for details about the conversion
254 * that is performed.<br>
255 * An alias name for \b !ESC< is given with \b !a which provides compatibility
256 * with the \b Python specification.
257 * \note If \b !ESC< is used in combination with \b !Quote, then \b !ESC< should be the first
258 * conversion specifier. Otherwise, the quotes inserted might be escaped as well.
259 *
260 * - <b>!Fill[Cc]</b><br>
261 * Inserts as many characters as denoted by the integer type argument.
262 * By default the fill character is space <c>' '</c>. It can be changed with optional character
263 * 'C' plus the character wanted.
264 *
265 * - <b>!Tab[Cc][NNN]</b><br>
266 * Inserts fill characters to extend the length of the string to be a multiple of a tab width.
267 * By default the fill character is space <c>' '</c>. It can be changed with optional character
268 * 'C' plus the character wanted. The tab width defaults to \c 8. It can be changed by adding
269 * an unsigned decimal number.
270 *
271 * - <b>!ATab[[Cc][NNN]|Reset]</b><br>
272 * Inserts an "automatic tabulator stop". These are tabulator positions that are stored
273 * internally and are automatically extended in the moment the actual contents exceeds the
274 * currently stored tab-position. An arbitrary amount of auto tab stop and field width
275 * (see <b>!AWith</b> below) values is maintained by the formatter.
276 *
277 * Which each new invocation of \alib{lang::format;Formatter},
278 * the first auto value is chosen and with each use of \c !ATab or \c !AWidth, the next value is
279 * used.<br>
280 * However the stored values are cleared, whenever \b %Format is invoked on a non-acquired
281 * formatter! This means, to preserve the auto-positions across multiple format invocations,
282 * a formatter has to be acquired explicitly before the format operations and released
283 * afterwards.
284 *
285 * Alternatively to this, the positions currently stored with the formatter can be reset with
286 * providing argument \c Reset in the format string.
287 *
288 * By default, the fill character is space <c>' '</c>. It can be changed with optional character
289 * 'C' plus the character wanted. The optional number provided gives the growth value by which
290 * the tab will grow if its position is exceeded. This value defaults to \c 3.
291 *
292 * Both, auto tab and auto width conversions may be used to increase readability of multiple
293 * output lines. Of-course, output is not completely tabular, only if those values that result
294 * in the biggest sizes are formatted first. If a perfect tabular output is desired, the data
295 * to be formatted may be processed twice: Once to temporary buffer which is disposed and then
296 * a second time to the desired output \b %AString.
297 *
298 * - <b>!AWidth[NNN|Reset]</b><br>
299 * Increases field width with repetitive invocations of format whenever a field value did not
300 * fit to the actually stored width. Optional decimal number \b NNN is added as a padding value.
301 * for more information, see <b>!ATab</b> above.
302 *
303 * - <b>!Xtinguish</b><br>
304 * Does not print anything. This is useful if format strings are externalized, e.g defined
305 * in \alib{lang::Camp;GetResourcePool;library resources}. Modifications of such resources
306 * might use this conversion to suppress the display of arguments (which usually are
307 * hard-coded).
308 *
309 * - <b>!Replace<search><replace></b><br>
310 * Searches string \p{search} and replaces with \p{replace}. Both values have to be given
311 * enclosed by characters \c '<' and \c '>'. In the special case that \p{search} is empty
312 * (<c><></c>), string \p{replace} will be inserted if the field argument is an empty string.
313 *
314 \I{################################################################################################}
315 * # Reference Documentation #
316 * @throws <b>alib::lang::format::FMTExceptions</b>
317 * - \alib{lang::format::FMTExceptions;ArgumentIndexOutOfBounds}
318 * - \alib{lang::format::FMTExceptions;IncompatibleTypeCode}
319 * - \alib{lang::format::FMTExceptions;MissingClosingBracket}
320 * - \alib{lang::format::FMTExceptions;MissingPrecisionValuePS}
321 * - \alib{lang::format::FMTExceptions;DuplicateTypeCode}
322 * - \alib{lang::format::FMTExceptions;UnknownTypeCode}
323 * - \alib{lang::format::FMTExceptions;ExclamationMarkExpected}
324 * - \alib{lang::format::FMTExceptions;UnknownConversionPS}
325 * - \alib{lang::format::FMTExceptions;PrecisionSpecificationWithInteger}
326 **************************************************************************************************/
328 {
329 // #############################################################################################
330 // Protected fields
331 // #############################################################################################
332 protected:
333 /**
334 * Set of extended placeholder attributes, needed for this type of formatter in
335 * addition to parent's \alib{lang::format::FormatterStdImpl;PlaceholderAttributes}.
336 */
338 {
339 /**
340 * The portion of the replacement field that represents the conversion specification.
341 * This specification is given at the beginning of the replacement field, starting with
342 * \c '!'.
343 */
345
346 /** The position where the conversion was read. This is set to \c -1 in #resetPlaceholder. */
348
349
350 /** The value read from the precision field. This is set to \c -1 in #resetPlaceholder. */
352
353 /** The position where the precision was read. This is set to \c -1 in #resetPlaceholder. */
355
356 /** The default precision if not given.
357 * This is set to \c 6 in #resetPlaceholder, but is changed when specific. */
359 };
360
361 /** The extended placeholder attributes. */
363
364 // #############################################################################################
365 // Public fields
366 // #############################################################################################
367 public:
368 /** Storage of sizes for auto-tabulator feature <b>{!ATab}</b> and auto field width feature
369 * <b>{!AWidth}</b> */
371
372 // #############################################################################################
373 // Constructor/Destructor
374 // #############################################################################################
375 public:
376 /** ****************************************************************************************
377 * Constructs this formatter.
378 * Inherited field #DefaultNumberFormat is initialized to meet the formatting defaults of
379 * Python.
380 ******************************************************************************************/
383
384 /** ****************************************************************************************
385 * Clones and returns a copy of this formatter.
386 *
387 * If the formatter attached to field
388 * \alib{lang::format;Formatter::Next} is of type \b %FormatterStdImpl, then that
389 * formatter is copied as well.
390 *
391 * @returns An object of type \b %FormatterPythonStyle and with the same custom settings
392 * than this.
393 ******************************************************************************************/
394 ALIB_API virtual
395 FormatterStdImpl* Clone() override;
396
397 // #############################################################################################
398 // Implementation of FormatterStdImpl interface
399 // #############################################################################################
400 protected:
401 /** ****************************************************************************************
402 * Sets the actual auto tab stop index to \c 0.
403 ******************************************************************************************/
405 virtual void initializeFormat() override;
406
407 /** ****************************************************************************************
408 * Resets #AutoSizes.
409 ******************************************************************************************/
411 virtual void reset() override;
412
413 /** ****************************************************************************************
414 * Invokes parent implementation and then applies some changes to reflect what is defined as
415 * default in the Python string format specification.
416 ******************************************************************************************/
418 virtual void resetPlaceholder() override;
419
420 /** ****************************************************************************************
421 * Searches for \c '{' which is not '{{'.
422 *
423 * @return The index found, -1 if not found.
424 ******************************************************************************************/
426 virtual integer findPlaceholder() override;
427
428 /** ****************************************************************************************
429 * Parses placeholder field in python notation. The portion \p{format_spec} is not
430 * parsed but stored in member
431 * \alib{lang::format::FormatterStdImpl::PlaceholderAttributes;FormatSpec}.
432 *
433 * @return \c true on success, \c false on errors.
434 ******************************************************************************************/
436 virtual bool parsePlaceholder() override;
437
438 /** ****************************************************************************************
439 * Parses the format specification for standard types as specified in
440 * \https{"Format Specification Mini Language",docs.python.org/3.5/library/string.html#format-specification-mini-language}.
441 *
442 * @return \c true on success, \c false on errors.
443 ******************************************************************************************/
445 virtual bool parseStdFormatSpec() override;
446
447 /** ****************************************************************************************
448 * Implementation of abstract method \alib{lang::format;FormatterStdImpl::writeStringPortion}.<br>
449 * While writing, replaces \c "{{" with \c "{" and \c "}}" with \c "}" as well as
450 * standard codes like \c "\\n", \c "\\r" or \c "\\t" with corresponding ascii codes.
451 *
452 * @param length The number of characters to write.
453 ******************************************************************************************/
455 virtual void writeStringPortion( integer length ) override;
456
457 /** ****************************************************************************************
458 * Processes "conversions" which are specified with \c '!'.
459 *
460 * @param startIdx The index of the start of the field written in #targetString.
461 * \c -1 indicates pre-phase.
462 * @param target The target string, only if different from field #targetString, which
463 * indicates intermediate phase.
464 * @return \c false, if the placeholder should be skipped (nothing is written for it).
465 * \c true otherwise.
466 ******************************************************************************************/
468 virtual bool preAndPostProcess( integer startIdx,
469 AString* target ) override;
470
471
472 /** ****************************************************************************************
473 * Makes some attribute adjustments and invokes standard implementation
474 * @return \c true if OK, \c false if replacement should be aborted.
475 ******************************************************************************************/
477 virtual bool checkStdFieldAgainstArgument() override;
478 };
479} // namespace [alib::lang::format]
480
481#endif // HPP_ALIB_LANG_FORMAT_FORMATTER_PYTHONSTYLE
virtual ALIB_API void initializeFormat() override
virtual ALIB_API integer findPlaceholder() override
virtual ALIB_API bool preAndPostProcess(integer startIdx, AString *target) override
virtual ALIB_API bool checkStdFieldAgainstArgument() override
virtual ALIB_API FormatterStdImpl * Clone() override
virtual ALIB_API bool parsePlaceholder() override
virtual ALIB_API void writeStringPortion(integer length) override
virtual ALIB_API bool parseStdFormatSpec() override
virtual ALIB_API void resetPlaceholder() override
#define ALIB_API
Definition alib.hpp:538
platform_specific integer
Definition integers.hpp:50