ALib C++ Framework
by
Library Version: 2605 R0
Documentation generated by doxygen
Loading...
Searching...
No Matches
token.hpp
Go to the documentation of this file.
1//==================================================================================================
2/// \file
3/// This header-file is part of module \alib_strings of the \aliblong.
4///
5/// Copyright 2013-2026 A-Worx GmbH, Germany.
6/// Published under #"mainpage_license".
7//==================================================================================================
8ALIB_EXPORT namespace alib { namespace strings::util {
9
10
11//==================================================================================================
12/// Tokens in the context of \alib_strings_nl, are human-readable "words" or "symbols" that
13/// represent a certain value or entity of software. Tokens may be used with configuration files,
14/// mathematical or general expressions, programming languages, communication protocols and so forth.
15///
16/// This struct contains attributes to describe a token, a method to parse the attributes from a
17/// (resource) string and finally method #".Match" that matches a given string against the token
18/// definition.
19///
20/// ## Token Format: ##
21/// With the construction, respectively the #"Token::Define;definition" of a token, special formats
22/// are detected. These formats are:
23/// - <em>"snake_case"</em><br>
24/// - <em>"kebab-case"</em><br>
25/// - <em>"CamelCase"</em><br>
26///
27/// \note
28/// Information about such case formats is given in this
29/// \https{Wikipedia article,en.wikipedia.org/wiki/Letter_case#Special_case_styles}.
30///
31/// \note
32/// If the name indicates a mix of \e snake_case, \e kebab-case or \e CamelCase formats
33/// (e.g., \e "System_Propery-ValueTable"), then snake_case supersedes both others and kebab-case
34/// supersedes CamelCase.
35///
36/// The format detection is only performed when more than one minimum length is given. In this case,
37/// the number of "segments" (e.g., "camel humps") has to match the number of length values.
38///
39///
40/// ## Character Case Sensitivity: ##
41/// Independent of the token format (normal or snake_case, kebab-case, CamelCase), character case
42/// sensitivity can be chosen. With \e CamelCase and case-sensitive parsing, the first character of
43/// the first hump may be defined lower or upper case (called "lowerCamelCase" vs. "UpperCamelCase").
44///
45/// If none of the special formats is detected, the tokens can optionally be abbreviated by just
46/// providing a minimum amount of starting characters as specified by the then single entry
47/// in #"minLengths".
48/// Otherwise, each segment of the token (e.g., "camel hump") can (again optionally) be shortened
49/// on its own.
50/// As an example, if for token <c>"SystemProperty"</c> the minimum lengths given are
51/// \c 3 and \c 4, the minimum abbreviation is <c>"SysProp"</c>, while <c>"SystProper"</c> also
52/// matches.<br>
53///
54///
55/// ## Limitation To Seven Segments: ##
56/// This class supports minimum length definitions for up to \c 7 "camel humps", respectively
57/// segments. Should a name contain even more segments, those cannot be abbreviated.
58/// Providing more than \c 7 values for minimum segment lengths with the definition string results
59/// in a definition error (see below).
60///
61///
62/// ## Special Treatment For CamelCase: ##
63/// ### Omitable Last Camel Hump: ####
64/// The minimum length values provided must be greater than \c 0, with one exception:
65/// With \e CamelCase format and case-insensitive definition, the last "camel hump" may have a
66/// minimum length of \c 0 and hence may be omitted when matched.
67/// If so, the "normalized" version of the token, which can be received by
68/// #"AppendableTraits;appending" an instance to an #"^AString", will have
69/// the last letter of the defined name converted to lower case.<br>
70/// The rationale for this specific approach is to support the English plural case. This can be best
71/// explained in a sample. If a token was defined using definition string:
72///
73/// "MilliSecondS Ignore 1 1 0"
74///
75/// then all of the following words match:
76///
77/// milliseconds
78/// MilliSecs
79/// millis
80/// MSec
81/// MSecs
82/// MSs
83/// ms
84///
85/// In the case that the rightfully (normalized) spelled token name is to be written, then with
86/// the last character converted to lower case, the token becomes
87///
88/// MilliSeconds
89///
90/// This is performed with methods #"GetExportName" (which is also used by the specialization of
91/// functor #"AppendableTraits" for this type.
92/// Hence, when appending a #"%Token" to an #"%AString", if omitable, the last character
93/// of the token name is converted to lower case.
94///
95/// If the above is not suitable, or for any other reasons a different "normalized" name is wanted
96/// when writing the token, then method #Define offers a next mechanism to explicitly define
97/// any custom string to be written.
98///
99/// ### Rollback: ####
100/// \e CamelCase supports a simple "rollback" mechanism, which is needed, for example, for token
101///
102/// "SystemTemperature Ignore 1 1 0"
103///
104/// and given match argument
105///
106/// system
107///
108/// All six characters are matching the first hump, but then there are not characters left to
109/// match the start of the second hump \c "Temperature". In this case, a loop of retries is
110/// performed by rolling back characters from the back of the hump (\c 'm') and ending with the
111/// first optional character of that hump (\c 'y'). The loop will be broken when
112/// character \c 't' is found.
113///
114/// However: This is not continued in the case that the term that was rolled back does not match,
115/// yet. This means, that certain (very unlikely!) tokens, with nested repeating character sequences
116/// in camel humps, cannot be abbreviated to certain (unlikely wanted) lengths.
117///
118/// ## Handling Definition Errors: ###
119///
120/// The definition strings passed to method #Define are considered static (resourced) data.
121/// In other words, this definition data should be compile-time defined and not be customizable
122/// by end-users, but only by experts.
123/// Therefore, only in debug-compilations of the library, a due testing of correctness of the
124/// definitions is available.
125///
126/// The source code of overloaded utility namespace function
127/// #"LoadResourcedTokens(resources::ResourcePool&)"
128/// demonstrates how error codes, defined with enumeration #".DbgDefinitionError", can be handled in
129/// debug-compilations by raising debug-assertions.
130//==================================================================================================
131class Token {
132 public:
133 /// Format types detected with #".detectFormat".
134 enum class Formats : int8_t {
135 Normal = 0, ///< Normal, optionally abbreviated words.
136 SnakeCase = 2, ///< snake_case using underscores.
137 KebabCase = 4, ///< kebab-case using hyphens.
138 CamelCase = 8, ///< UpperCamelCase or lowerCamelCase.
139 };
140
141 #if ALIB_DEBUG
142 /// Error codes which are written in field #".format" in the case that method #Define
143 /// suffers a parsing error.<br>
144 /// This enum, as well as the error detection, is only available in debug-compilations
145 /// of the library.
146 enum class DbgDefinitionError : int8_t {
147 OK = 0, ///< All is fine.
148 EmptyName = - 1, ///< No token name found.
149 ErrorReadingSensitivity = - 2, ///< Sensitivity value not found.
150 ErrorReadingMinLengths = - 3, ///< Error parsing the list of minimum lengths.
151 TooManyMinLengthsGiven = - 4, ///< A maximum of \c 7 minimum length values was exceeded.
152 InconsistentMinLengths = - 5, ///< The number of given minimum length values is greater than \c 1
153 ///< but does not match the number of segments in the identifier.
154 NoCaseSchemeFound = - 6, ///< More than one minimum length value was given but no
155 ///< segmentation scheme could be detected.
156 MinLenExceedsSegmentLength = - 7, ///< A minimum length is specified to be higher than the token
157 ///< name, respectively the according segment name.
158 DefinitionStringNotConsumed = - 8, ///< The definition string was not completely consumed.
159 ZeroMinLengthAndNotLastCamelHump= - 9, ///< A minimum length of \c 0 was specified for a segment that is not
160 ///< a last camel case hump.
161 };
162 #endif
163 protected:
164
165 /// The tokens' definition string part.
167
168 /// The tokens' optional explicit export name.
170
171
172 /// Defines the "case type" as well as the letter case sensitivity of this token.
174
175 /// The minimum abbreviation length per segment. If only one is given (second is \c -1), then
176 /// the field #".format" indicates normal tokens.
177 /// Otherwise, the token is either snake_case, kebab-case or CamelCase.
178 int8_t minLengths[7] ={0,0,0,0,0,0,0};
179
180 /// Letter case sensitivity. This is combined with the format bits.
181 static constexpr Formats ignoreCase = Formats(1);
182
183 //################################################################################################
184 // Constructors
185 //################################################################################################
186 public:
187 /// Parameterless constructor. Creates an "undefined" token.
190
191 /// Constructor used with function names that do not contain snake_case, kebab-case or
192 /// CamelCase name scheme.
193 /// \note Of course, the name may follow such a scheme.
194 /// With this constructor, it just will not be detected.
195 /// @param name The function name.
196 /// @param sensitivity The letter case sensitivity of reading the function name.
197 /// @param minLength The minimum starting portion of the function name to read..
198 /// @param exportName An optional export name. If \b not given, the \p{name} is
199 /// used with method #".GetExportName".
201 Token(const String& name, lang::Case sensitivity, int8_t minLength,
202 const String& exportName= NULL_STRING );
203
204
205 /// Constructor with at least two minimum length values, used to define tokens that follow
206 /// snake_case, kebab-case or CamelCase naming schemes.
207 ///
208 /// @param name The function name.
209 /// @param sensitivity The letter case sensitivity of reading the function name.
210 /// @param minLength1 The minimum starting portion of the first segment to read.
211 /// @param minLength2 The minimum starting portion of the second segment to read.
212 /// @param minLength3 The minimum starting portion of the third segment to read.
213 /// Defaults to \c 1.
214 /// @param minLength4 The minimum starting portion of the fourth segment to read.
215 /// Defaults to \c 1.
216 /// @param minLength5 The minimum starting portion of the fifth segment to read.
217 /// Defaults to \c 1.
218 /// @param minLength6 The minimum starting portion of the sixth segment to read.
219 /// Defaults to \c 1.
220 /// @param minLength7 The minimum starting portion of the seventh segment to read.
221 /// Defaults to \c 1.
223 Token( const String& name, lang::Case sensitivity, int8_t minLength1, int8_t minLength2,
224 int8_t minLength3= -1, int8_t minLength4= -1, int8_t minLength5= -1,
225 int8_t minLength6= -1, int8_t minLength7= -1 );
226
227 #if ALIB_ENUMRECORDS
228 /// Constructor using a (usually resourced) string to read the definitions.
229 /// Invokes #Define.
230 ///
231 /// \par Availability
232 /// This method is available only if the module \alib_enumrecords is included in
233 /// the \alibbuild.
234 /// @param definitionSrc The input string.
235 /// @param separator Separation character used to parse the input.
236 /// Defaults to <c>';'</c>.
237 Token( const String& definitionSrc, character separator = ';' )
238 { Define( definitionSrc, separator ); }
239 #endif
240
241 //################################################################################################
242 // Interface
243 //################################################################################################
244 public:
245 #if ALIB_DEBUG
246 /// Tests if this token was well defined.
247 ///
248 /// \note
249 /// This method is only available in debug-compilations.
250 /// Definition strings are considered static data (preferably resourced).
251 /// Therefore, in debug-compilations, this method should be invoked and with that,
252 /// the consistency of the resources be tested. In the case of failure, a debug
253 /// assertion should be raised.
254 ///
255 /// @return #"DbgDefinitionError::OK", if this token is well
256 /// defined, a different error code otherwise.
262#endif
263
264 /// Returns the definition name used for parsing the token.
265 ///
266 /// \note
267 /// To receive the "normalized" name of this token, method #".GetExportName" can be used, or
268 /// a token can simply be #"alib_strings_assembly_ttostring;appended" to an instance of type
269 /// #"^AString".
270 ///
271 /// @return This token's #".definitionName".
272 const String& GetDefinitionName() const {
273 ALIB_ASSERT_ERROR( int8_t(format) >= 0, "STRINGS/TOK",
274 "Error {} in definition of token \"{}\". Use DbgGetError() in debug-compilations!",
275 int8_t(format), definitionName)
276 return definitionName;
277 }
278
279 /// If field #".exportName" is not \e nulled (hence explicitly given with resourced definition
280 /// string or with a constructor), this is appended.
281 ///
282 /// Otherwise appends the result of #"Token::GetDefinitionName" to
283 /// the \p{target}. If the token is defined \e CamelCase and the minimum length of the last
284 /// segment is defined \c 0, then the last character written is converted to lower case.
285 ///
286 /// As a result, in most cases it is \b not necessary to provide a specific #".exportName"
287 /// with the definition. Instead, this method should provide a reasonable output.
288 ///
289 /// \see Documentation section <b>Omitable Last Camel Hump</b> of this classes'
290 /// #"Token;documentation", for more information about why the
291 /// character conversion to lower case might be performed.
292 ///
293 /// @param target The #"%AString" that method #"%Append(const TAppendable&)" was invoked on.
295 void GetExportName(AString& target) const;
296
297 /// Returns the format of this token.
298 ///
299 /// \note Same as methods #".Sensitivity" and #".GetMinLength", this method is usually not
300 /// of interest to standard API usage.
301 /// These three informational methods are rather provided to support the unit tests.
302 /// @return This token's format, used with method #".Match".
304 ALIB_ASSERT_ERROR( int8_t(format) >= 0, "STRINGS/TOK",
305 "Error {} in definition of token \"{}\". Use DbgGetError() in debug-compilations!",
306 int8_t(format), definitionName)
307 return Formats( int8_t(format) & ~int8_t(ignoreCase) );
308 }
309
310 /// Returns the letter case sensitivity of this token.
311 ///
312 /// \note Same as methods #".GetFormat" and #".GetMinLength", this method is usually not
313 /// of interest to standard API usage.
314 /// These three informational methods are rather provided to support the unit tests.
315 /// @return The letter case sensitivity used with method #".Match".
317 { return (int(format) & 1 ) == 1 ? lang::Case::Ignore : lang::Case::Sensitive; }
318
319 /// Returns the minimum length to be read. In case that this token is not of
320 /// snake_case, kebab-case or CamelCase naming scheme, only \c 0 is allowed for parameter
321 /// \p{idx} and this defines the minimal abbreviation length. If one of the naming schemes
322 /// applies, parameter \p{idx} may be as high as the number of segments found in the
323 /// name (and a maximum of \c 6, as this class supports only up to seven segments).
324 ///
325 /// The first index that exceeds the number of segments, will return \c -1 for the length.
326 /// If even higher index values are requested, then the returned value is undefined.
327 ///
328 /// @param idx The index of the minimum length to receive.
329 ///
330 /// \note Same as methods #".GetFormat" and #".Sensitivity", this method is usually not
331 /// of interest to standard API usage.
332 /// These three informational methods are rather provided to support the unit tests.
333 ///
334 /// @return The minimum length of segment number \p{idx}.
335 int8_t GetMinLength( int idx ) const {
336 ALIB_ASSERT_ERROR( idx >= 0 && idx <= 6 , "STRINGS/TOK", "Index {} out of range.", idx )
337
338 return (idx >= 0 && idx <= 6) ? minLengths[idx] : -1;
339 }
340
341 #if ALIB_ENUMRECORDS
342 /// Defines or redefines this token by parsing the attributes from the given substring.
343 /// This method is usually invoked by code that loads tokens and other data from
344 /// #"ResourcePool;resources" of \alib {lang;Camp} objects.
345 ///
346 /// The expected format is defined as a list of the following values, separated by
347 /// the character given with parameter \p{separator}:
348 /// - The #".definitionName" of the token. Even if the letter case is ignored, this should
349 /// contain the name in "normalized" format, as it may be used with #".GetExportName",
350 /// if no specific name to export is given.
351 /// - Letter case sensitivity. This can be "Sensitive" or "Ignore"
352 /// (respectively, what is defined with resourced
353 /// #"alib_enums_records;ALib Enum Records" of the type #"lang::Case;2"),
354 /// can be abbreviated to just one character (i.e., <c>'s'</c> and
355 /// <c>'i'</c>) and itself is not parsed taking the letter-case into account.
356 /// - Optionally, the standard export string is used with the method #".GetExportName", and
357 /// when appended to an #"%AString". Output names defined with this function must not start
358 /// with a digit, because a digit in this position of \p{definition}, indicates that
359 /// no export name is given.
360 /// - The list of minimum length for each segment of the name. The number of values have
361 /// to match the number of segments. A value of \c 0 specifies that no abbreviation
362 /// must be done and therefore is the same as specifying the exact length of the segment.
363 ///
364 /// \note The given \p{definition} string has to survive the use of the token, which
365 /// is naturally true if the string resides in resources.
366 /// (String contents are not copied. Instead, this class later refers to substrings
367 /// of the given \p{definition}.)
368 ///
369 /// \par Availability
370 /// This method is available only if the module \alib_enumrecords is included in
371 /// the \alibbuild.
372 /// @param definition The input string.
373 /// @param separator Separation character used to parse the input.
374 /// Defaults to <c>';'</c>.
376 void Define( const String& definition, character separator = ';' );
377 #endif
378
379 /// Matches a given string with this token. See this class's description for details.
380 ///
381 /// @param needle The potentially abbreviated input string to match.
382 /// @return \c true if \p{needle} matches this token, \c false otherwise.
384 bool Match( const String& needle );
385
386 protected:
387 /// Detects snake_case, kebab-case or CamelCase.
389 void detectFormat();
390
391}; // struct Token
392
393 } // namespace alib[::strings::util]
394
395 /// Type alias in namespace #"%alib".
397
398} // namespace [alib]
399
400
401namespace alib { namespace strings {
402#if DOXYGEN
403namespace APPENDABLES {
404#endif
405/// Specialization of functor #"AppendableTraits" for type #"util::Token".
406template<typename TAllocator>
407struct AppendableTraits<util::Token, character,TAllocator> {
408 /// Appends the result of #"Token::GetExportName" to the \p{target}.
409 /// @param target The #"%AString" that method #"%Append(const TAppendable&)" was invoked on.
410 /// @param src The #"%Token" to append.
411 inline void operator()( TAString<character,TAllocator>& target, const util::Token& src )
412 { src.GetExportName(target); }
413};
414#if DOXYGEN
415} // namespace alib::strings[::APPENDABLES]
416#endif
417}} // namespace [alib::strings]
418
419
#define ALIB_DLL
#define ALIB_EXPORT
#define ALIB_ASSERT_ERROR(cond, domain,...)
#define ALIB_REL_DBG(releaseCode,...)
@ ErrorReadingSensitivity
Sensitivity value not found.
Definition token.hpp:149
@ TooManyMinLengthsGiven
A maximum of 7 minimum length values was exceeded.
Definition token.hpp:151
@ ErrorReadingMinLengths
Error parsing the list of minimum lengths.
Definition token.hpp:150
@ DefinitionStringNotConsumed
The definition string was not completely consumed.
Definition token.hpp:158
void detectFormat()
Detects snake_case, kebab-case or CamelCase.
Definition token.cpp:130
int8_t GetMinLength(int idx) const
Definition token.hpp:335
String definitionName
The tokens' definition string part.
Definition token.hpp:166
Token(const String &definitionSrc, character separator=';')
Definition token.hpp:237
DbgDefinitionError DbgGetError()
Definition token.hpp:257
const String & GetDefinitionName() const
Definition token.hpp:272
lang::Case Sensitivity() const
Definition token.hpp:316
Formats
Format types detected with #".detectFormat".
Definition token.hpp:134
@ CamelCase
UpperCamelCase or lowerCamelCase.
Definition token.hpp:138
@ SnakeCase
snake_case using underscores.
Definition token.hpp:136
@ Normal
Normal, optionally abbreviated words.
Definition token.hpp:135
@ KebabCase
kebab-case using hyphens.
Definition token.hpp:137
Token()
Parameterless constructor. Creates an "undefined" token.
Definition token.hpp:188
Formats format
Defines the "case type" as well as the letter case sensitivity of this token.
Definition token.hpp:173
String exportName
The tokens' optional explicit export name.
Definition token.hpp:169
static constexpr Formats ignoreCase
Letter case sensitivity. This is combined with the format bits.
Definition token.hpp:181
Formats GetFormat() const
Definition token.hpp:303
void Define(const String &definition, character separator=';')
Definition token.cpp:57
bool Match(const String &needle)
Definition token.cpp:244
void GetExportName(AString &target) const
Definition token.cpp:36
#define ALIB_ENUMS_MAKE_BITWISE(TEnum)
Case
Denotes upper and lower case character treatment.
Definition alox.cpp:14
constexpr String NULL_STRING
A nulled string of the default character type.
Definition string.hpp:2247
strings::TString< character > String
Type alias in namespace #"%alib".
Definition string.hpp:2165
strings::TAString< character, lang::HeapAllocator > AString
Type alias in namespace #"%alib".
characters::character character
Type alias in namespace #"%alib".
strings::util::Token Token
Type alias in namespace #"%alib".
Definition token.hpp:396
void operator()(TAString< character, TAllocator > &target, const util::Token &src)
Definition token.hpp:411