ALib C++ Library
Library Version: 2510 R0
Documentation generated by doxygen
Loading...
Searching...
No Matches
token.inl
Go to the documentation of this file.
1//==================================================================================================
2/// \file
3/// This header-file is part of module \alib_strings of the \aliblong.
4///
5/// \emoji :copyright: 2013-2025 A-Worx GmbH, Germany.
6/// Published under \ref mainpage_license "Boost Software License".
7//==================================================================================================
8ALIB_EXPORT namespace alib { namespace strings::util {
9
10
11//==================================================================================================
12/// Tokens in the context of \alib_strings_nl, are human-readable "words" or "symbols" that
13/// represent a certain value or entity of software. Tokens may be used with configuration files,
14/// mathematical or general expressions, programming languages, communication protocols and so forth.
15///
16/// This struct contains attributes to describe a token, a method to parse the attributes from a
17/// (resource) string and finally method #Match that matches a given string against the token
18/// definition.
19///
20/// ## %Token Format: ##
21/// With the construction, respectively the \ref Define "definition" of a token, special formats are
22/// detected. These formats are:
23/// - <em>"snake_case"</em><br>
24/// - <em>"kebab-case"</em><br>
25/// - <em>"CamelCase"</em><br>
26///
27/// \note
28/// Information about such case formats is given in this
29/// \https{Wikipedia article,en.wikipedia.org/wiki/Letter_case#Special_case_styles}.
30///
31/// \note
32/// If the name indicates a mix of \e snake_case, \e kebab-case or \e CamelCase formats
33/// (e.g., \e "System_Propery-ValueTable"), then snake_case supersedes both others and kebab-case
34/// supersedes CamelCase.
35///
36/// The format detection is only performed when more than one minimum length is given. In this case,
37/// the number of "segments" (e.g., "camel humps") has to match the number of length values.
38///
39///
40/// ## Character Case Sensitivity: ##
41/// Independent of the token format (normal or snake_case, kebab-case, CamelCase), character case
42/// sensitivity can be chosen. With \e CamelCase and case-sensitive parsing, the first character of
43/// the first hump may be defined lower or upper case (called "lowerCamelCase" vs. "UpperCamelCase").
44///
45/// If none of the special formats is detected, the tokens can optionally be abbreviated by just
46/// providing a minimum amount of starting characters as specified by the then single entry
47/// in #minLengths.
48/// Otherwise, each segment of the token (e.g., "camel hump") can (again optionally) be shortened
49/// on its own.
50/// As an example, if for token <c>"SystemProperty"</c> the minimum lengths given are
51/// \c 3 and \c 4, the minimum abbreviation is <c>"SysProp"</c>, while <c>"SystProper"</c> also
52/// matches.<br>
53///
54///
55/// ## Limitation To Seven Segments: ##
56/// This class supports minimum length definitions for up to \c 7 "camel humps", respectively
57/// segments. Should a name contain even more segments, those cannot be abbreviated.
58/// Providing more than \c 7 values for minimum segment lengths with the definition string results
59/// in a definition error (see below).
60///
61///
62/// ## Special Treatment For CamelCase: ##
63/// ### Omitable Last Camel Hump: ####
64/// The minimum length values provided must be greater than \c 0, with one exception:
65/// With \e CamelCase format and case-insensitive definition, the last "camel hump" may have a
66/// minimum length of \c 0 and hence may be omitted when matched.
67/// If so, the "normalized" version of the token, which can be received by
68/// \alib{strings;AppendableTraits;appending} an instance to an \alib{strings;TAString;AString}, will have
69/// the last letter of the defined name converted to lower case.<br>
70/// The rationale for this specific approach is to support the English plural case. This can be best
71/// explained in a sample. If a token was defined using definition string:
72///
73/// "MilliSecondS Ignore 1 1 0"
74///
75/// then all of the following words match:
76///
77/// milliseconds
78/// MilliSecs
79/// millis
80/// MSec
81/// MSecs
82/// MSs
83/// ms
84///
85/// In the case that the rightfully (normalized) spelled token name is to be written, then with
86/// the last character converted to lower case, the token becomes
87///
88/// MilliSeconds
89///
90/// This is performed with methods #GetExportName (which is also used by the specialization of
91/// functor \alib{strings;AppendableTraits} for this type.
92/// Hence, when appending a \b Token to an \b AString, if omitable, the last character
93/// of the token name is converted to lower case.
94///
95/// If the above is not suitable, or for any other reasons a different "normalized" name is wanted
96/// when writing the token, then method #Define offers a next mechanism to explicitly define
97/// any custom string to be written.
98///
99/// ### Rollback: ####
100/// \e CamelCase supports a simple "rollback" mechanism, which is needed for example for token
101///
102/// "SystemTemperature Ignore 1 1 0"
103///
104/// and given match argument
105///
106/// system
107///
108/// All six characters are matching the first hump, but then there are not characters left to
109/// match the start of the second hump \c "Temperature". In this case, a loop of retries is
110/// performed by rolling back characters from the back of the hump (\c 'm') and ending with the
111/// first optional character of that hump (\c 'y'). The loop will be broken when
112/// character \c 't' is found.
113///
114/// However: This is not continued in the case that the term that was rolled back does not match,
115/// yet. This means, that certain (very unlikely!) tokens, with nested repeating character sequences
116/// in camel humps, cannot be abbreviated to certain (unlikely wanted) lengths.
117///
118/// ## Handling Definition Errors: ###
119///
120/// The definition strings passed to method #Define are considered static (resourced) data.
121/// In other words, this definition data should be compile-time defined and not be customizable
122/// by end-users, but only by experts.
123/// Therefore, only in debug-compilations of the library, a due testing of correctness of the
124/// definitions is available.
125///
126/// The source code of utility namespace function \alib{strings;util::LoadResourcedTokens}
127/// demonstrates how error codes defined with enumeration #DbgDefinitionError can be handled in
128/// debug-compilations by raising debug-assertions.
129//==================================================================================================
130class Token
131{
132 public:
133 /// Format types detected with #detectFormat.
134 enum class Formats : int8_t
135 {
136 Normal = 0, ///< Normal, optionally abbreviated words.
137 SnakeCase = 2, ///< snake_case using underscores.
138 KebabCase = 4, ///< kebab-case using hyphens.
139 CamelCase = 8, ///< UpperCamelCase or lowerCamelCase.
140 };
141
142 #if ALIB_DEBUG
143 /// Error codes which are written in field #format in the case that method #Define
144 /// suffers a parsing error.<br>
145 /// This enum, as well as the error detection, is only available in debug-compilations
146 /// of the library.
147 enum class DbgDefinitionError : int8_t
148 {
149 OK = 0, ///< All is fine.
150 EmptyName = - 1, ///< No token name found.
151 ErrorReadingSensitivity = - 2, ///< Sensitivity value not found.
152 ErrorReadingMinLengths = - 3, ///< Error parsing the list of minimum lengths.
153 TooManyMinLengthsGiven = - 4, ///< A maximum of \c 7 minimum length values was exceeded.
154 InconsistentMinLengths = - 5, ///< The number of given minimum length values is greater than \c 1
155 ///< but does not match the number of segments in the identifier.
156 NoCaseSchemeFound = - 6, ///< More than one minimum length value was given but no
157 ///< segmentation scheme could be detected.
158 MinLenExceedsSegmentLength = - 7, ///< A minimum length is specified to be higher than the token
159 ///< name, respectively the according segment name.
160 DefinitionStringNotConsumed = - 8, ///< The definition string was not completely consumed.
161 ZeroMinLengthAndNotLastCamelHump= - 9, ///< A minimum length of \c 0 was specified for a segment that is not
162 ///< a last camel case hump.
163 };
164 #endif
165 protected:
166
167 /// The tokens' definition string part.
169
170 /// The tokens' optional explicit export name.
172
173
174 /// Defines the "case type" as well as the letter case sensitivity of this token.
176
177 /// The minimum abbreviation length per segment. If only one is given (second is \c -1), then
178 /// the field #format indicates normal tokens.
179 /// Otherwise, the token is either snake_case, kebab-case or CamelCase.
180 int8_t minLengths[7] = {0,0,0,0,0,0,0};
181
182 /// Letter case sensitivity. This is combined with the format bits.
183 static constexpr Formats ignoreCase = Formats(1);
184
185 // ###############################################################################################
186 // Constructors
187 // ###############################################################################################
188 public:
189 /// Parameterless constructor. Creates an "undefined" token.
192
193 /// Constructor used with function names that do not contain snake_case, kebab-case or
194 /// CamelCase name scheme.
195 /// \note Of course, the name may follow such a scheme.
196 /// With this constructor, it just will not be detected.
197 /// @param name The function name.
198 /// @param sensitivity The letter case sensitivity of reading the function name.
199 /// @param minLength The minimum starting portion of the function name to read..
200 /// @param exportName An optional export name. If \b not given, the \p{name} is
201 /// used with method #GetExportName.
203 Token(const String& name, lang::Case sensitivity, int8_t minLength,
204 const String& exportName= NULL_STRING );
205
206
207 /// Constructor with at least two minimum length values, used to define tokens that follow
208 /// snake_case, kebab-case or CamelCase naming schemes.
209 ///
210 /// @param name The function name.
211 /// @param sensitivity The letter case sensitivity of reading the function name.
212 /// @param minLength1 The minimum starting portion of the first segment to read.
213 /// @param minLength2 The minimum starting portion of the second segment to read.
214 /// @param minLength3 The minimum starting portion of the third segment to read.
215 /// Defaults to \c 1.
216 /// @param minLength4 The minimum starting portion of the fourth segment to read.
217 /// Defaults to \c 1.
218 /// @param minLength5 The minimum starting portion of the fifth segment to read.
219 /// Defaults to \c 1.
220 /// @param minLength6 The minimum starting portion of the sixth segment to read.
221 /// Defaults to \c 1.
222 /// @param minLength7 The minimum starting portion of the seventh segment to read.
223 /// Defaults to \c 1.
225 Token( const String& name, lang::Case sensitivity, int8_t minLength1, int8_t minLength2,
226 int8_t minLength3= -1, int8_t minLength4= -1, int8_t minLength5= -1,
227 int8_t minLength6= -1, int8_t minLength7= -1 );
228
229 #if ALIB_ENUMRECORDS
230 /// Constructor using a (usually resourced) string to read the definitions.
231 /// Invokes #Define.
232 ///
233 /// \par Availability
234 /// This method is available only if the module \alib_enumrecords is included in
235 /// the \alibbuild.
236 /// @param definitionSrc The input string.
237 /// @param separator Separation character used to parse the input.
238 /// Defaults to <c>';'</c>.
239 Token( const String& definitionSrc, character separator = ';' )
240 { Define( definitionSrc, separator ); }
241 #endif
242
243 // ###############################################################################################
244 // Interface
245 // ###############################################################################################
246 public:
247 #if ALIB_DEBUG
248 /// Tests if this token was well defined.
249 ///
250 /// \note
251 /// This method is only available in debug-compilations.
252 /// Definition strings are considered static data (preferably resourced).
253 /// Therefore, in debug-compilations, this method should be invoked and with that,
254 /// the consistency of the resources be tested. In the case of failure, a debug
255 /// assertion should be raised.
256 ///
257 /// @return \alib{strings::util::Token;DbgDefinitionError::OK}, if this token is well
258 /// defined, a different error code otherwise.
264 #endif
265
266 /// Returns the definition name used for parsing the token.
267 ///
268 /// \note
269 /// To receive the "normalized" name of this token, method #GetExportName can be used, or
270 /// a token can simply be \ref alib_strings_assembly_ttostring "appended" to an
271 /// instance of type \alib{strings;TAString;AString}.
272 ///
273 /// @return This token's #definitionName.
275 {
276 ALIB_ASSERT_ERROR( int8_t(format) >= 0, "STRINGS/TOK",
277 "Error {} in definition of token \"{}\". Use DbgGetError() in debug-compilations!",
278 int8_t(format), definitionName)
279 return definitionName;
280 }
281
282 /// If field #exportName is not \e nulled (hence explicitly given with resourced definition
283 /// string or with a constructor), this is appended.
284 ///
285 /// Otherwise appends the result of \alib{strings::util;Token::GetDefinitionName} to
286 /// the \p{target}. If the token is defined \e CamelCase and the minimum length of the last
287 /// segment is defined \c 0, then the last character written is converted to lower case.
288 ///
289 /// As a result, in most cases it is \b not necessary to provide a specific #exportName
290 /// with the definition. Instead, this method should provide a reasonable output.
291 ///
292 /// \see Documentation section <b>Omitable Last Camel Hump</b> of this classes'
293 /// \alib{strings::util;Token;documentation}, for more information about why the
294 /// character conversion to lower case might be performed.
295 ///
296 /// @param target The \b AString that method \b Append was invoked on.
298 void GetExportName(AString& target) const;
299
300 /// Returns the format of this token.
301 ///
302 /// \note Same as methods #Sensitivity and #GetMinLength, this method is usually not
303 /// of interest to standard API usage.
304 /// These three informational methods are rather provided to support the unit tests.
305 /// @return This token's format, used with method #Match.
307 {
308 ALIB_ASSERT_ERROR( int8_t(format) >= 0, "STRINGS/TOK",
309 "Error {} in definition of token \"{}\". Use DbgGetError() in debug-compilations!",
310 int8_t(format), definitionName)
311 return Formats( int8_t(format) & ~int8_t(ignoreCase) );
312 }
313
314 /// Returns the letter case sensitivity of this token.
315 ///
316 /// \note Same as methods #GetFormat and #GetMinLength, this method is usually not
317 /// of interest to standard API usage.
318 /// These three informational methods are rather provided to support the unit tests.
319 /// @return The letter case sensitivity used with method #Match.
321 { return (int(format) & 1 ) == 1 ? lang::Case::Ignore : lang::Case::Sensitive; }
322
323 /// Returns the minimum length to be read. In case that this token is not of
324 /// snake_case, kebab-case or CamelCase naming scheme, only \c 0 is allowed for parameter
325 /// \p{idx} and this defines the minimal abbreviation length. If one of the naming schemes
326 /// applies, parameter \p{idx} may be as high as the number of segments found in the
327 /// name (and a maximum of \c 6, as this class supports only up to seven segments).
328 ///
329 /// The first index that exceeds the number of segments, will return \c -1 for the length.
330 /// If even higher index values are requested, then the returned value is undefined.
331 ///
332 /// @param idx The index of the minimum length to receive.
333 ///
334 /// \note Same as methods #GetFormat and #Sensitivity, this method is usually not
335 /// of interest to standard API usage.
336 /// These three informational methods are rather provided to support the unit tests.
337 ///
338 /// @return The minimum length of segment number \p{idx}.
339 int8_t GetMinLength( int idx ) const
340 {
341 ALIB_ASSERT_ERROR( idx >= 0 && idx <= 6 , "STRINGS/TOK", "Index {} out of range.", idx )
342
343 return (idx >= 0 && idx <= 6) ? minLengths[idx] : -1;
344 }
345
346 #if ALIB_ENUMRECORDS
347 /// Defines or redefines this token by parsing the attributes from the given substring.
348 /// This method is usually invoked by code that loads tokens and other data from
349 /// \alib{resources;ResourcePool;resources} of \alib {lang;Camp} objects.
350 ///
351 /// The expected format is defined as a list of the following values, separated by
352 /// the character given with parameter \p{separator}:
353 /// - The #definitionName of the token. Even if the letter case is ignored, this should
354 /// contain the name in "normalized" format, as it may be used with #GetExportName,
355 /// if no specific name to export is given.
356 /// - Letter case sensitivity. This can be "Sensitive" or "Ignore"
357 /// (respectively, what is defined with resourced
358 /// \ref alib_enums_records "ALib Enum Records" of type \alib{lang::Case}),
359 /// can be abbreviated to just one character (i.e., <c>'s'</c> and
360 /// <c>'i'</c>) and itself is not parsed taking the letter-case into account.
361 /// - Optionally, the standard export string is used with the method #GetExportName, and
362 /// when appended to an \b AString. Output names defined with this function must not start
363 /// with a digit, because a digit in this position of \p{definition}, indicates that
364 /// no export name is given.
365 /// - The list of minimum length for each segment of the name. The number of values have
366 /// to match the number of segments. A value of \c 0 specifies that no abbreviation
367 /// must be done and therefore is the same as specifying the exact length of the segment.
368 ///
369 /// \note The given \p{definition} string has to survive the use of the token, which
370 /// is naturally true if the string resides in resources.
371 /// (String contents are not copied. Instead, this class later refers to substrings
372 /// of the given \p{definition}.)
373 ///
374 /// \par Availability
375 /// This method is available only if the module \alib_enumrecords is included in
376 /// the \alibbuild.
377 /// @param definition The input string.
378 /// @param separator Separation character used to parse the input.
379 /// Defaults to <c>';'</c>.
381 void Define( const String& definition, character separator = ';' );
382 #endif
383
384 /// Matches a given string with this token. See this class's description for details.
385 ///
386 /// @param needle The potentially abbreviated input string to match.
387 /// @return \c true if \p{needle} matches this token, \c false otherwise.
389 bool Match( const String& needle );
390
391 protected:
392 /// Detects snake_case, kebab-case or CamelCase.
394 void detectFormat();
395
396}; // struct Token
397
398} // namespace alib[::strings::util]
399
400/// Type alias in namespace \b alib.
402
403} // namespace [alib]
404
405
406namespace alib { namespace strings {
407#if DOXYGEN
408namespace APPENDABLES {
409#endif
410 /// Specialization of functor \alib{strings;AppendableTraits} for type \alib{strings::util;Token}.
411 template<typename TAllocator> struct AppendableTraits<strings::util::Token, alib::character,TAllocator>
412 {
413 /// Appends the result of \alib{strings::util;Token::GetExportName} to the \p{target}.
414 /// @param target The \b AString that method \b Append was invoked on.
415 /// @param src The \b Token to append.
417 { src.GetExportName(target); }
418 };
419#if DOXYGEN
420} // namespace alib::strings[::APPENDABLES]
421#endif
422}} // namespace [alib::strings]
423
424
426
@ ErrorReadingSensitivity
Sensitivity value not found.
Definition token.inl:151
@ TooManyMinLengthsGiven
A maximum of 7 minimum length values was exceeded.
Definition token.inl:153
@ ErrorReadingMinLengths
Error parsing the list of minimum lengths.
Definition token.inl:152
@ DefinitionStringNotConsumed
The definition string was not completely consumed.
Definition token.inl:160
ALIB_DLL void detectFormat()
Detects snake_case, kebab-case or CamelCase.
Definition token.cpp:183
int8_t GetMinLength(int idx) const
Definition token.inl:339
String definitionName
The tokens' definition string part.
Definition token.inl:168
Token(const String &definitionSrc, character separator=';')
Definition token.inl:239
DbgDefinitionError DbgGetError()
Definition token.inl:259
const String & GetDefinitionName() const
Definition token.inl:274
lang::Case Sensitivity() const
Definition token.inl:320
Formats
Format types detected with detectFormat.
Definition token.inl:135
@ CamelCase
UpperCamelCase or lowerCamelCase.
Definition token.inl:139
@ SnakeCase
snake_case using underscores.
Definition token.inl:137
@ Normal
Normal, optionally abbreviated words.
Definition token.inl:136
@ KebabCase
kebab-case using hyphens.
Definition token.inl:138
Token()
Parameterless constructor. Creates an "undefined" token.
Definition token.inl:190
Formats format
Defines the "case type" as well as the letter case sensitivity of this token.
Definition token.inl:175
String exportName
The tokens' optional explicit export name.
Definition token.inl:171
static constexpr Formats ignoreCase
Letter case sensitivity. This is combined with the format bits.
Definition token.inl:183
Formats GetFormat() const
Definition token.inl:306
ALIB_DLL void Define(const String &definition, character separator=';')
Definition token.cpp:100
ALIB_DLL bool Match(const String &needle)
Definition token.cpp:316
ALIB_DLL void GetExportName(AString &target) const
Definition token.cpp:72
#define ALIB_DLL
Definition alib.inl:496
#define ALIB_ENUMS_MAKE_BITWISE(TEnum)
#define ALIB_EXPORT
Definition alib.inl:488
#define ALIB_ASSERT_ERROR(cond, domain,...)
Definition alib.inl:1049
#define ALIB_REL_DBG(releaseCode,...)
Definition alib.inl:838
Case
Denotes upper and lower case character treatment.
constexpr String NULL_STRING
A nulled string of the default character type.
Definition string.inl:2463
strings::util::Token Token
Type alias in namespace alib.
Definition token.inl:401
strings::TAString< character, lang::HeapAllocator > AString
Type alias in namespace alib.
strings::TString< character > String
Type alias in namespace alib.
Definition string.inl:2381
characters::character character
Type alias in namespace alib.
void operator()(strings::TAString< character, TAllocator > &target, const strings::util::Token &src)
Definition token.inl:416