ALib C++ Library
Library Version: 2402 R1
Documentation generated by doxygen
Loading...
Searching...
No Matches
token.hpp
Go to the documentation of this file.
1/** ************************************************************************************************
2 * \file
3 * This header file is part of module \alib_strings of the \aliblong.
4 *
5 * \emoji :copyright: 2013-2024 A-Worx GmbH, Germany.
6 * Published under \ref mainpage_license "Boost Software License".
7 **************************************************************************************************/
8#ifndef HPP_ALIB_STRINGS_UTIL_TOKEN
9#define HPP_ALIB_STRINGS_UTIL_TOKEN 1
10
11#if !defined(HPP_ALIB) && !defined(ALIB_DOX)
12# include "alib/alib.hpp"
13#endif
14
15#if !defined (HPP_ALIB_STRINGS_SUBSTRING)
17#endif
18
19#if !defined (HPP_ALIB_STRINGS_LOCALSTRING)
21#endif
22
23#if ALIB_CAMP && !defined(HPP_ALIB_LANG_CAMP)
25#endif
26
27#if ALIB_BOXING && !defined(HPP_ALIB_BOXING_BOXING)
28# include "alib/boxing/boxing.hpp"
29#endif
30
31
32namespace alib { namespace strings::util {
33
34
35/** ************************************************************************************************
36 * Tokens in the context of \alib_strings_nl, are human readable "words" or "symbols" that
37 * represent a certain value or entity of a software. Tokens may be used with configuration files,
38 * mathematical or general expressions, programming languages, communication protocols and so forth.
39 *
40 * This struct contains attributes to describe a token, a method to parse the attributes from a
41 * (resource) string and finally method #Match that matches a given string against the token
42 * definition.
43 *
44 * ## %Token Format: ##
45 * With the construction, respectively the \ref Define "definition" of a token, special formats are
46 * detected. These formats are:
47 * - <em>"snake_case"</em><br>
48 * - <em>"kebab-case"</em><br>
49 * - <em>"CamelCase"</em><br>
50 *
51 * \note
52 * Information about such case formats is given in this
53 * \https{Wikipedia article,en.wikipedia.org/wiki/Letter_case#Special_case_styles}.
54 *
55 * \note
56 * If the name indicates a mix of \e snake_case, \e kebab-case or \e CamelCase formats
57 * (e.g. \e "System_Propery-ValueTable"), then snake_case supersedes both others and kebab-case
58 * supersedes CamelCase.
59 *
60 * The format detection is only performed when more than one minimum length is given. In this case,
61 * the number of "segments" (e.g. "camel humps") has to match the number of length values.
62 *
63 *
64 * ## Character Case Sensitivity: ##
65 * Independent from the token format (normal or snake_case, kebab-case, CamelCase), character case
66 * sensitivity can be chosen. With \e CamelCase and case sensitive parsing, the first character of
67 * the first hump may be defined lower or upper case (called "lowerCamelCase" vs. "UpperCamelCase").
68 *
69 * If none of the special formats is detected, the tokens can optionally be abbreviated by just
70 * providing a minimum amount of starting characters as specified by the then single entry
71 * in #minLengths.
72 * Otherwise, each segment of the token (e.g. "camel hump") can (again optionally) be shortened
73 * on its own.
74 * As an example, if for token <c>"SystemProperty"</c> the minimum lengths given are
75 * \c 3 and \c 4, the minimum abbreviation is <c>"SysProp"</c>, while <c>"SystProper"</c> also
76 * matches.<br>
77 *
78 *
79 * ## Limitation To Seven Segments: ##
80 * This class supports minimum length definitions for up to \c 7 "camel humps", respectively
81 * segments. Should a name contain even more segments, those can not be abbreviated.
82 * Providing more than \c 7 values for minimum segment lengths with the definition string results
83 * in a definition error (see below).
84 *
85 *
86 * ## Special Treatment For CamelCase: ##
87 * ### Omitable Last Camel Hump: ####
88 * The minimum length values provided must be greater than \c 0, except for one exclamation:
89 * With \e CamelCase format and case-insensitive definition, the last "camel hump" may have a
90 * minimum length of \c 0 and hence may be omitted when matched.
91 * If so, the "normalized" version of the token, which can be received by
92 * \alib{strings;T_Append;appending} an instance to an \alib{strings;TAString;AString}, will have
93 * the last letter of the defined name converted to lower case.<br>
94 * The rational for this specific approach is to support the English plural case. This can be best
95 * explained in a sample. If a token was defined using definition string:
96 *
97 * "MilliSecondS Ignore 1 1 0"
98 *
99 * then all of the following words match:
100 *
101 * milliseconds
102 * MilliSecs
103 * millis
104 * MSec
105 * MSecs
106 * MSs
107 * ms
108 *
109 * In the case that the rightfully (normalized) spelled token name is to be written, then with
110 * the last character converted to lower case, the token becomes
111 *
112 * MilliSeconds
113 *
114 * This is assured for example with the specialization of functor \alib{strings;T_Append} for
115 * this type. Hence, when appending a \b Token to an \b AString, if omitable, the last character
116 * of the token name is converted to lower case.
117 *
118 * ### Rollback: ####
119 * \e CamelCase supports a simple "rollback" mechanism, which is needed for example for token
120 *
121 * "SystemTemperature Ignore 1 1 0"
122 *
123 * and given match argument
124 *
125 * system
126 *
127 * All six characters are matching the first hump, but then there are not characters left to
128 * match the start of the second hump \c "Temperature". In this case, a loop of retries is
129 * performed by rolling back characters from the back of the hump (\c 'm') and ending with the
130 * first optional character of that hump (\c 'y'). The loop will be broken when
131 * character \c 't' is found.
132 *
133 * However: This is not continued in the case that the term that was rolled back does not match,
134 * yet. This means, that certain (very unlikely!) tokens, with nested repeating character sequences
135 * in camel humps, can not be abbreviated to certain (unlikely wanted) lengths.
136 *
137 * ## Handling Definition Errors: ###
138 *
139 * The definition strings passed to method #Define are considered static (resourced) data.
140 * In other words, this definition data should be compile-time defined and not be customizable
141 * by end-users, but only by experts.
142 * Therefore, only in debug-compilations of the library, a due testing of correctness of the
143 * definitions is available.
144 *
145 * The source code of static utility method #LoadResourcedTokens demonstrates how error
146 * codes defined with enumeration #DbgDefinitionError can be handled in debug-compilations
147 * by raising debug-assertions.
148 **************************************************************************************************/
149class Token
150{
151 public:
152 /**
153 * Format types detected with #detectFormat.
154 */
155 enum class Formats : int8_t
156 {
157 Normal = 0, ///< Normal, optionally abbreviated words.
158 SnakeCase = 2, ///< snake_case using underscores.
159 KebabCase = 4, ///< kebab-case using hyphens.
160 CamelCase = 8, ///< UpperCamelCase or lowerCamelCase.
161 };
162
163#if ALIB_DEBUG
164 /**
165 * Error codes which which are written in field #format in the case that method #Define
166 * suffers a parsing error.<br>
167 * This enum, as well as the error detection is only available in debug-compilations
168 * of the library.
169 */
170 enum class DbgDefinitionError : int8_t
171 {
172 OK = 0, ///< All is fine.
173 EmptyName = - 1, ///< No token name found.
174 ErrorReadingSensitivity = - 2, ///< Sensitivity value not found.
175 ErrorReadingMinLengths = - 3, ///< Error parsing the list of minimum lengths.
176 TooManyMinLengthsGiven = - 4, ///< A maximum of \c 7 minimum length values was exceeded.
177 InconsistentMinLengths = - 5, ///< The number of given minimum length values is greater than \c 1
178 ///< but does not match the number of segments in the identifier.
179 NoCaseSchemeFound = - 6, ///< More than one minimum length value was given but no
180 ///< segmentation scheme could be detected.
181 MinLenExceedsSegmentLength = - 7, ///< A minimum length is specified to be higher than the token
182 ///< name, respectively the according segment name.
183 DefinitionStringNotConsumed = - 8, ///< The definition string was not completely consumed.
184 ZeroMinLengthAndNotLastCamelHump= - 9, ///< A minimum length of \c 0 was specified for a segment that is not
185 ///< a last camel case hump.
186 };
187#endif
188 protected:
189
190 /** The token name. */
192
193 /** Defines the "case type" as well as the letter case sensitivity of this token. */
195
196 /**
197 * The minimum abbreviation length per segment. If only one is given (second is \c -1), then
198 * field #Format indicates normal tokens.
199 * Otherwise, the token is either snake_case, kebab-case or CamelCase.
200 */
201 int8_t minLengths[7];
202
203
204 /** Letter case sensitivity. This is combined with the format bits. */
205 static constexpr Formats ignoreCase = Formats(1);
206
207
208 // #############################################################################################
209 // Constructors
210 // #############################################################################################
211 public:
212 /** Parameterless constructor. Creates an "undefined" token. */
217
218 /** ****************************************************************************************
219 * Constructor used with function names that do not contain snake_case, kebab-case or
220 * CamelCase name scheme.
221 * \note Of-course, the name may follow such scheme. With this constructor, it just will not
222 * be detected.
223 * @param name The function name.
224 * @param sensitivity The letter case sensitivity of reading the function name.
225 * @param minLength The minimum starting portion of the function name to read..
226 ******************************************************************************************/
228 Token(const String& name, lang::Case sensitivity, int8_t minLength);
229
230
231 /** ****************************************************************************************
232 * Constructor with at least two minimum length values, used to define tokens that follow
233 * snake_case, kebab-case or CamelCase naming schemes.
234 *
235 * @param name The function name.
236 * @param sensitivity The letter case sensitivity of reading the function name.
237 * @param minLength1 The minimum starting portion of the first segment to read.
238 * @param minLength2 The minimum starting portion of the second segment to read.
239 * @param minLength3 The minimum starting portion of the third segment to read.
240 * Defaults to \c 1.
241 * @param minLength4 The minimum starting portion of the fourth segment to read.
242 * Defaults to \c 1.
243 * @param minLength5 The minimum starting portion of the fifth segment to read.
244 * Defaults to \c 1.
245 * @param minLength6 The minimum starting portion of the sixth segment to read.
246 * Defaults to \c 1.
247 * @param minLength7 The minimum starting portion of the seventh segment to read.
248 * Defaults to \c 1.
249 ******************************************************************************************/
251 Token( const String& name, lang::Case sensitivity, int8_t minLength1, int8_t minLength2,
252 int8_t minLength3= -1, int8_t minLength4= -1, int8_t minLength5= -1,
253 int8_t minLength6= -1, int8_t minLength7= -1 );
254
255#if ALIB_ENUMS
256 /** ****************************************************************************************
257 * Constructor using a (usually resourced) string to read the definitions.
258 * Invokes #Define.
259 *
260 * @param definition The input string.
261 * @param separator Separation character used to parse the input.
262 * Defaults to <c>';'</c>.
263 *
264 * \par Module Dependencies
265 * This method is only available if module \alib_enums is included in the \alibdist.
266 ******************************************************************************************/
267 Token( const String& definition, character separator = ';' )
268 {
269 Define( definition, separator );
270 }
271#endif
272
273 // #############################################################################################
274 // Interface
275 // #############################################################################################
276 public:
277 #if ALIB_DEBUG
278 /** ************************************************************************************
279 * Tests if this token was well defined.
280 *
281 * \note
282 * This method is only available in debug-compilations.
283 * Definition strings are considered static data (preferably resourced).
284 * Therefore, in debug-compilations, this method should be invoked and with that,
285 * the consistency of the resources be tested. In the case of failure, a debug
286 * assertion should be raised.
287 *
288 * @return \alib{strings::util::Token;DbgDefinitionError::OK}, if this token is well
289 * defined, a different error code otherwise.
290 **************************************************************************************/
296 #endif
297
298 /** ****************************************************************************************
299 * Returns the "raw" name of the token as given with #Define, respectively with one
300 * of the constructors.
301 *
302 * \note
303 * To receive the "normalized" name of this token, it can be
304 * \ref alib_strings_assembly_ttostring "appended" to an instance of type
305 * \alib{strings;TAString;AString}. The difference will be that in the case of
306 * \e CamelCase format with a last minimum segment size of \c 0, the last character of
307 * the name will be converted to lower case.
308 *
309 * @return This token's #name.
310 ******************************************************************************************/
311 const String& GetRawName() const
312 {
313 ALIB_ASSERT_ERROR( int8_t(format) >= 0, "STRINGS/TOK"
314 "Error in token definition. Use DbgGetError in debug-compilations!" )
315 return name;
316 }
317
318 /** ****************************************************************************************
319 * Returns the format of this token.
320 *
321 * \note Same as methods #Sensitivity and #GetMinLength, this method is usually not
322 * of interest to standard API usage.
323 * These three informational methods are rather provided to support the unit tests.
324 * @return This token's format, used with method #Match.
325 ******************************************************************************************/
327 {
328 #if ALIB_BOXING
329 ALIB_ASSERT_ERROR( int8_t(format) >= 0, "STRINGS/TOK"
330 "Error {} in definition of token {!Q}. "
331 "Use DbgGetError in debug-compilations!",
332 NString256(int8_t(format)) << name )
333 #else
334 ALIB_ASSERT_ERROR( int8_t(format) >= 0, "STRINGS/TOK"
335 "Error ", NString64(int8_t(format)),
336 " in definition of token \"", NString128(name),
337 "\". Use DbgGetError in debug-compilations!" )
338 #endif
339 return Formats( int8_t(format) & ~int8_t(ignoreCase) );
340 }
341
342 /** ****************************************************************************************
343 * Returns the letter case sensitivity of this token.
344 *
345 * \note Same as methods #GetFormat and #GetMinLength, this method is usually not
346 * of interest to standard API usage.
347 * These three informational methods are rather provided to support the unit tests.
348 * @return The letter case sensitivity used with method #Match.
349 ******************************************************************************************/
351 {
352 return (int(format) & 1 ) == 1 ? lang::Case::Ignore
354 }
355
356 /** ****************************************************************************************
357 * Returns the minimum length to be read. In case that this token is not of
358 * snake_case, kebab-case or CamelCase naming scheme, only \c 0 is allowed for parameter
359 * \p{idx} and this defines the minimal abbreviation length. If one of the naming schemes
360 * applies, parameter \p{idx} may be as high as the number of segments found in the
361 * name (and a maximum of \c 6, as this class supports only up to seven segments).
362 *
363 * The first index that exceeds the number of segments, will return \c -1 for the length.
364 * If even higher index values are requested, then the returned value is undefined.
365 *
366 * @param idx The index of the minimum length to receive.
367 *
368 * \note Same as methods #GetFormat and #Sensitivity, this method is usually not
369 * of interest to standard API usage.
370 * These three informational methods are rather provided to support the unit tests.
371 *
372 * @return The minimum length of segment number \p{idx}.
373 ******************************************************************************************/
374 int8_t GetMinLength( int idx ) const
375 {
377 return minLengths[idx];
379 }
380
381 #if ALIB_ENUMS
382 /** ****************************************************************************************
383 * Defines or redefines this token by parsing the attributes from the given sub-string.
384 * This method is usually invoked by code that loads tokens and other data from
385 * \alib{lang::resources;ResourcePool;resources} of \alib \alib{lang;Camp} objects.
386 *
387 * The expected format is defined as a list of the following values, separated by
388 * the character given with parameter \p{separator}:
389 * - The #name of the token. Even if letter case is ignored, this should contain
390 * the name in "normalized" format, as it may be use to generate human readable output
391 * strings.
392 * - Letter case sensitivity. This can be "Sensitive" or "Ignore"
393 * (respectively what is defined with resourced
394 * \ref alib_enums_records "ALib Enum Records" of type \alib{lang::Case}),
395 * can be abbreviated to just one character (i.e. <c>'s'</c> and
396 * <c>'i'</c>) and itself is not parsed taking letter case into account.
397 * - The list of minimum length for each segment of the name. The number of values have
398 * to match the number of segments. A value of \c 0 specifies that no abbreviation
399 * must be done and therefore is the same as specifying the exact length of the segment.
400 *
401 * @param definition The input string.
402 * @param separator Separation character used to parse the input.
403 * Defaults to <c>';'</c>.
404 *
405 * \par Module Dependencies
406 * This method is only available if module \alib_enums is included in the \alibdist.
407 ******************************************************************************************/
409 void Define( const String& definition, character separator = ';' );
410 #endif
411 /** ****************************************************************************************
412 * Matches a given string with this token. See this class's description for details.
413 *
414 * @param needle The potentially abbreviated input string to match.
415 * @return \c true if \p{needle} matches this token, \c false otherwise.
416 ******************************************************************************************/
418 bool Match( const String& needle );
419
420 #if ALIB_ENUMS && ALIB_CAMP
421 #if defined(ALIB_DOX)
422 /** ****************************************************************************************
423 * Static utility function that defines a table of token objects from external resourced
424 * strings.
425 *
426 * It is possible to provide the table lines in two ways:
427 * - In one resource string: In this case, parameter \p{outerDelim} has to specify
428 * the delimiter that separates the records.
429 * - In an array of resource strings: If the resource string as given is not defined, this
430 * method appends an integral index starting with \c 0 to the resource name, parses
431 * a single record and increments the index.
432 * Parsing ends when a resource with a next higher index is not found.
433 *
434 * The second option is recommended for larger token sets. While the separation causes
435 * some overhead in a resource backend, the external (!) management (translation,
436 * manipulation, etc.) is most probably simplified with this approach.
437 *
438 * \note
439 * The length of the given table has to fit to the number of entries found in
440 * the resource pool. To assure this, with debug builds, parameter \p{dbgSizeVerifier}
441 * has to be provided (preferably by using macro \ref ALIB_DBG "ALIB_DBG(, N)").
442 *
443 * @param resourcePool The resource pool to load the resource from.
444 * @param resourceCategory The resource category.
445 * @param resourceName The resource name.
446 * @param target The table to fill.
447 * @param dbgSizeVerifier This parameter has to be specified only in debug builds and
448 * provides the expected size of the resourced table.
449 * To be surrounded by macro #ALIB_DBG (not to be given in
450 * release builds.)
451 * @param outerSeparator The character that separates the entries.
452 * Defaults to <c>','</c>.
453 * @param innerSeparator The character that separates the values of an entry.
454 * Defaults to <c>' '</c> (space).
455 *
456 * \par Module Dependencies
457 * This method is only available if module \alib_enums as well as module \alib_basecamp
458 * is included in the \alibdist.
459 ******************************************************************************************/
460 ALIB_API static
462 const NString& resourceCategory,
463 const NString& resourceName,
464 strings::util::Token* target,
465 int dbgSizeVerifier,
466 character outerSeparator = ',',
467 character innerSeparator = ' ' );
468 #else
469 ALIB_API static
471 const NString& resourceCategory,
472 const NString& resourceName,
473 strings::util::Token* target,
474 ALIB_DBG( int dbgSizeVerifier, )
475 character outerSeparator = ',' ,
476 character innerSeparator = ' ' );
477 #endif
478 #endif
479
480 #if ALIB_CAMP
481 #if defined(ALIB_DOX)
482 /** ****************************************************************************************
483 * Shortcut to #LoadResourcedTokens that accepts a module and uses its resource pool
484 * and resource category.
485 *
486 * @param module The \alibcamp to load the resource from.
487 * @param resourceName The resource name.
488 * @param target The table to fill.
489 * @param dbgSizeVerifier This parameter has to be specified only in debug comilations and
490 * provides the expected size of the resourced table.
491 * To be surrounded by macro #ALIB_DBG (not to be given in
492 * release builds.)
493 * @param outerSeparator The character that separates the entries.
494 * Defaults to <c>','</c>.
495 * @param innerSeparator The character that separates the values of an entry.
496 * Defaults to <c>' '</c> (space).
497 *
498 * \par Module Dependencies
499 * This method is only available if module \alib_basecamp is included in the \alibdist.
500 ******************************************************************************************/
501 static inline
503 const NString& resourceName,
504 strings::util::Token* target,
505 int dbgSizeVerifier,
506 character outerSeparator = ',',
507 character innerSeparator = ' ' );
508 #else
509 static
510 void LoadResourcedTokens( lang::Camp& module,
511 const NString& resourceName,
512 strings::util::Token* target,
513 ALIB_DBG( int dbgSizeVerifier, )
514 character outerSeparator = ',',
515 character innerSeparator = ' ' )
516 {
517 LoadResourcedTokens( module.GetResourcePool(), module.ResourceCategory, resourceName,
518 target, ALIB_DBG(dbgSizeVerifier,) outerSeparator, innerSeparator );
519 }
520 #endif
521
522 #endif
523
524
525 protected:
526 /** ****************************************************************************************
527 * Detects snake_case, kebab-case or CamelCase.
528 ******************************************************************************************/
530 void detectFormat();
531
532}; // struct Token
533
534} // namespace alib[::strings::util]
535
536/// Type alias in namespace \b alib.
538
539} // namespace [alib]
540
541#if ALIB_BOXING
543#endif
544
545namespace alib { namespace strings {
546#if defined(ALIB_DOX)
547 namespace APPENDABLES {
548#endif
549 /** ********************************************************************************************
550 * Specialization of functor \alib{strings;T_Append} for type \alib{strings::util;Token}.
551 **********************************************************************************************/
552 template<> struct T_Append<strings::util::Token, alib::character>
553 {
554 /**
555 * Appends the result of \alib{strings::util;Token::GetRawName} to the \p{target}.<br>
556 * If the token is defined \e CamelCase and the last minimum segment length
557 * given is \c 0, then the last character written is converted to lower case.
558 *
559 * \see Documentation of class \alib{strings::util;Token}, section
560 * <b>Omitable Last Camel Hump</b>, for more information about why the character
561 * conversion to lower case might be performed.
562 *
563 * @param target The \b AString that method \b Append was invoked on.
564 * @param src The \b Token to append.
565 */
566 ALIB_API void operator()( AString& target, const strings::util::Token& src );
567 };
568#if defined(ALIB_DOX)
569 }
570#endif
571 }}
572
573#endif // HPP_ALIB_STRINGS_UTIL_TOKEN
resources::ResourcePool & GetResourcePool()
Definition camp.hpp:266
NCString ResourceCategory
Definition camp.hpp:142
ALIB_API void detectFormat()
Definition token.cpp:145
DbgDefinitionError DbgGetError()
Definition token.hpp:291
lang::Case Sensitivity() const
Definition token.hpp:350
static void LoadResourcedTokens(lang::Camp &module, const NString &resourceName, strings::util::Token *target, int dbgSizeVerifier, character outerSeparator=',', character innerSeparator=' ')
static constexpr Formats ignoreCase
Definition token.hpp:205
const String & GetRawName() const
Definition token.hpp:311
Token(const String &definition, character separator=';')
Definition token.hpp:267
Formats GetFormat() const
Definition token.hpp:326
ALIB_API void Define(const String &definition, character separator=';')
Definition token.cpp:70
@ CamelCase
UpperCamelCase or lowerCamelCase.
@ SnakeCase
snake_case using underscores.
@ Normal
Normal, optionally abbreviated words.
@ KebabCase
kebab-case using hyphens.
static ALIB_API void LoadResourcedTokens(lang::resources::ResourcePool &resourcePool, const NString &resourceCategory, const NString &resourceName, strings::util::Token *target, int dbgSizeVerifier, character outerSeparator=',', character innerSeparator=' ')
@ ErrorReadingSensitivity
Sensitivity value not found.
@ TooManyMinLengthsGiven
A maximum of 7 minimum length values was exceeded.
@ ErrorReadingMinLengths
Error parsing the list of minimum lengths.
@ DefinitionStringNotConsumed
The definition string was not completely consumed.
int8_t GetMinLength(int idx) const
Definition token.hpp:374
ALIB_API bool Match(const String &needle)
Definition token.cpp:284
#define ALIB_WARNINGS_RESTORE
Definition alib.hpp:715
#define ALIB_API
Definition alib.hpp:538
#define ALIB_BOXING_VTABLE_DECLARE(TMapped, Identifier)
Definition vtable.inl:477
#define ALIB_ASSERT_ERROR(cond,...)
Definition alib.hpp:984
#define ALIB_WARNINGS_ALLOW_UNSAFE_BUFFER_USAGE
Definition alib.hpp:644
#define ALIB_DBG(...)
Definition alib.hpp:457
#define ALIB_REL_DBG(releaseCode,...)
Definition alib.hpp:459
Definition alib.cpp:57
NLocalString< 128 > NString128
Type alias name for TLocalString<nchar,128> .
NLocalString< 64 > NString64
Type alias name for TLocalString<nchar,64> .
NLocalString< 256 > NString256
Type alias name for TLocalString<nchar,256> .
characters::character character
Type alias in namespace alib.
ALIB_API void operator()(AString &target, const strings::util::Token &src)