ALib C++ Library
Library Version: 2511 R0
Documentation generated by doxygen
Loading...
Searching...
No Matches
token.inl
Go to the documentation of this file.
1//==================================================================================================
2/// \file
3/// This header-file is part of module \alib_strings of the \aliblong.
4///
5/// \emoji :copyright: 2013-2025 A-Worx GmbH, Germany.
6/// Published under \ref mainpage_license "Boost Software License".
7//==================================================================================================
8ALIB_EXPORT namespace alib { namespace strings::util {
9
10
11//==================================================================================================
12/// Tokens in the context of \alib_strings_nl, are human-readable "words" or "symbols" that
13/// represent a certain value or entity of software. Tokens may be used with configuration files,
14/// mathematical or general expressions, programming languages, communication protocols and so forth.
15///
16/// This struct contains attributes to describe a token, a method to parse the attributes from a
17/// (resource) string and finally method #Match that matches a given string against the token
18/// definition.
19///
20/// ## %Token Format: ##
21/// With the construction, respectively the \ref Define "definition" of a token, special formats are
22/// detected. These formats are:
23/// - <em>"snake_case"</em><br>
24/// - <em>"kebab-case"</em><br>
25/// - <em>"CamelCase"</em><br>
26///
27/// \note
28/// Information about such case formats is given in this
29/// \https{Wikipedia article,en.wikipedia.org/wiki/Letter_case#Special_case_styles}.
30///
31/// \note
32/// If the name indicates a mix of \e snake_case, \e kebab-case or \e CamelCase formats
33/// (e.g., \e "System_Propery-ValueTable"), then snake_case supersedes both others and kebab-case
34/// supersedes CamelCase.
35///
36/// The format detection is only performed when more than one minimum length is given. In this case,
37/// the number of "segments" (e.g., "camel humps") has to match the number of length values.
38///
39///
40/// ## Character Case Sensitivity: ##
41/// Independent of the token format (normal or snake_case, kebab-case, CamelCase), character case
42/// sensitivity can be chosen. With \e CamelCase and case-sensitive parsing, the first character of
43/// the first hump may be defined lower or upper case (called "lowerCamelCase" vs. "UpperCamelCase").
44///
45/// If none of the special formats is detected, the tokens can optionally be abbreviated by just
46/// providing a minimum amount of starting characters as specified by the then single entry
47/// in #minLengths.
48/// Otherwise, each segment of the token (e.g., "camel hump") can (again optionally) be shortened
49/// on its own.
50/// As an example, if for token <c>"SystemProperty"</c> the minimum lengths given are
51/// \c 3 and \c 4, the minimum abbreviation is <c>"SysProp"</c>, while <c>"SystProper"</c> also
52/// matches.<br>
53///
54///
55/// ## Limitation To Seven Segments: ##
56/// This class supports minimum length definitions for up to \c 7 "camel humps", respectively
57/// segments. Should a name contain even more segments, those cannot be abbreviated.
58/// Providing more than \c 7 values for minimum segment lengths with the definition string results
59/// in a definition error (see below).
60///
61///
62/// ## Special Treatment For CamelCase: ##
63/// ### Omitable Last Camel Hump: ####
64/// The minimum length values provided must be greater than \c 0, with one exception:
65/// With \e CamelCase format and case-insensitive definition, the last "camel hump" may have a
66/// minimum length of \c 0 and hence may be omitted when matched.
67/// If so, the "normalized" version of the token, which can be received by
68/// \alib{strings;AppendableTraits;appending} an instance to an \alib{strings;TAString;AString}, will have
69/// the last letter of the defined name converted to lower case.<br>
70/// The rationale for this specific approach is to support the English plural case. This can be best
71/// explained in a sample. If a token was defined using definition string:
72///
73/// "MilliSecondS Ignore 1 1 0"
74///
75/// then all of the following words match:
76///
77/// milliseconds
78/// MilliSecs
79/// millis
80/// MSec
81/// MSecs
82/// MSs
83/// ms
84///
85/// In the case that the rightfully (normalized) spelled token name is to be written, then with
86/// the last character converted to lower case, the token becomes
87///
88/// MilliSeconds
89///
90/// This is performed with methods #GetExportName (which is also used by the specialization of
91/// functor \alib{strings;AppendableTraits} for this type.
92/// Hence, when appending a \b Token to an \b AString, if omitable, the last character
93/// of the token name is converted to lower case.
94///
95/// If the above is not suitable, or for any other reasons a different "normalized" name is wanted
96/// when writing the token, then method #Define offers a next mechanism to explicitly define
97/// any custom string to be written.
98///
99/// ### Rollback: ####
100/// \e CamelCase supports a simple "rollback" mechanism, which is needed, for example, for token
101///
102/// "SystemTemperature Ignore 1 1 0"
103///
104/// and given match argument
105///
106/// system
107///
108/// All six characters are matching the first hump, but then there are not characters left to
109/// match the start of the second hump \c "Temperature". In this case, a loop of retries is
110/// performed by rolling back characters from the back of the hump (\c 'm') and ending with the
111/// first optional character of that hump (\c 'y'). The loop will be broken when
112/// character \c 't' is found.
113///
114/// However: This is not continued in the case that the term that was rolled back does not match,
115/// yet. This means, that certain (very unlikely!) tokens, with nested repeating character sequences
116/// in camel humps, cannot be abbreviated to certain (unlikely wanted) lengths.
117///
118/// ## Handling Definition Errors: ###
119///
120/// The definition strings passed to method #Define are considered static (resourced) data.
121/// In other words, this definition data should be compile-time defined and not be customizable
122/// by end-users, but only by experts.
123/// Therefore, only in debug-compilations of the library, a due testing of correctness of the
124/// definitions is available.
125///
126/// The source code of utility namespace function \alib{strings;util::LoadResourcedTokens}
127/// demonstrates how error codes defined with enumeration #DbgDefinitionError can be handled in
128/// debug-compilations by raising debug-assertions.
129//==================================================================================================
130class Token
131{
132 public:
133 /// Format types detected with #detectFormat.
134 enum class Formats : int8_t
135 {
136 Normal = 0, ///< Normal, optionally abbreviated words.
137 SnakeCase = 2, ///< snake_case using underscores.
138 KebabCase = 4, ///< kebab-case using hyphens.
139 CamelCase = 8, ///< UpperCamelCase or lowerCamelCase.
140 };
141
142 #if ALIB_DEBUG
143 /// Error codes which are written in field #format in the case that method #Define
144 /// suffers a parsing error.<br>
145 /// This enum, as well as the error detection, is only available in debug-compilations
146 /// of the library.
147 enum class DbgDefinitionError : int8_t
148 {
149 OK = 0, ///< All is fine.
150 EmptyName = - 1, ///< No token name found.
151 ErrorReadingSensitivity = - 2, ///< Sensitivity value not found.
152 ErrorReadingMinLengths = - 3, ///< Error parsing the list of minimum lengths.
153 TooManyMinLengthsGiven = - 4, ///< A maximum of \c 7 minimum length values was exceeded.
154 InconsistentMinLengths = - 5, ///< The number of given minimum length values is greater than \c 1
155 ///< but does not match the number of segments in the identifier.
156 NoCaseSchemeFound = - 6, ///< More than one minimum length value was given but no
157 ///< segmentation scheme could be detected.
158 MinLenExceedsSegmentLength = - 7, ///< A minimum length is specified to be higher than the token
159 ///< name, respectively the according segment name.
160 DefinitionStringNotConsumed = - 8, ///< The definition string was not completely consumed.
161 ZeroMinLengthAndNotLastCamelHump= - 9, ///< A minimum length of \c 0 was specified for a segment that is not
162 ///< a last camel case hump.
163 };
164 #endif
165 protected:
166
167 /// The tokens' definition string part.
169
170 /// The tokens' optional explicit export name.
172
173
174 /// Defines the "case type" as well as the letter case sensitivity of this token.
176
177 /// The minimum abbreviation length per segment. If only one is given (second is \c -1), then
178 /// the field #format indicates normal tokens.
179 /// Otherwise, the token is either snake_case, kebab-case or CamelCase.
180 int8_t minLengths[7] ={0,0,0,0,0,0,0};
181
182 /// Letter case sensitivity. This is combined with the format bits.
183 static constexpr Formats ignoreCase = Formats(1);
184
185 //################################################################################################
186 // Constructors
187 //################################################################################################
188 public:
189 /// Parameterless constructor. Creates an "undefined" token.
192
193 /// Constructor used with function names that do not contain snake_case, kebab-case or
194 /// CamelCase name scheme.
195 /// \note Of course, the name may follow such a scheme.
196 /// With this constructor, it just will not be detected.
197 /// @param name The function name.
198 /// @param sensitivity The letter case sensitivity of reading the function name.
199 /// @param minLength The minimum starting portion of the function name to read..
200 /// @param exportName An optional export name. If \b not given, the \p{name} is
201 /// used with method #GetExportName.
203 Token(const String& name, lang::Case sensitivity, int8_t minLength,
204 const String& exportName= NULL_STRING );
205
206
207 /// Constructor with at least two minimum length values, used to define tokens that follow
208 /// snake_case, kebab-case or CamelCase naming schemes.
209 ///
210 /// @param name The function name.
211 /// @param sensitivity The letter case sensitivity of reading the function name.
212 /// @param minLength1 The minimum starting portion of the first segment to read.
213 /// @param minLength2 The minimum starting portion of the second segment to read.
214 /// @param minLength3 The minimum starting portion of the third segment to read.
215 /// Defaults to \c 1.
216 /// @param minLength4 The minimum starting portion of the fourth segment to read.
217 /// Defaults to \c 1.
218 /// @param minLength5 The minimum starting portion of the fifth segment to read.
219 /// Defaults to \c 1.
220 /// @param minLength6 The minimum starting portion of the sixth segment to read.
221 /// Defaults to \c 1.
222 /// @param minLength7 The minimum starting portion of the seventh segment to read.
223 /// Defaults to \c 1.
225 Token( const String& name, lang::Case sensitivity, int8_t minLength1, int8_t minLength2,
226 int8_t minLength3= -1, int8_t minLength4= -1, int8_t minLength5= -1,
227 int8_t minLength6= -1, int8_t minLength7= -1 );
228
229 #if ALIB_ENUMRECORDS
230 /// Constructor using a (usually resourced) string to read the definitions.
231 /// Invokes #Define.
232 ///
233 /// \par Availability
234 /// This method is available only if the module \alib_enumrecords is included in
235 /// the \alibbuild.
236 /// @param definitionSrc The input string.
237 /// @param separator Separation character used to parse the input.
238 /// Defaults to <c>';'</c>.
239 Token( const String& definitionSrc, character separator = ';' )
240 { Define( definitionSrc, separator ); }
241 #endif
242
243 //################################################################################################
244 // Interface
245 //################################################################################################
246 public:
247 #if ALIB_DEBUG
248 /// Tests if this token was well defined.
249 ///
250 /// \note
251 /// This method is only available in debug-compilations.
252 /// Definition strings are considered static data (preferably resourced).
253 /// Therefore, in debug-compilations, this method should be invoked and with that,
254 /// the consistency of the resources be tested. In the case of failure, a debug
255 /// assertion should be raised.
256 ///
257 /// @return \alib{strings::util::Token;DbgDefinitionError::OK}, if this token is well
258 /// defined, a different error code otherwise.
264 #endif
265
266 /// Returns the definition name used for parsing the token.
267 ///
268 /// \note
269 /// To receive the "normalized" name of this token, method #GetExportName can be used, or
270 /// a token can simply be \ref alib_strings_assembly_ttostring "appended" to an
271 /// instance of type \alib{strings;TAString;AString}.
272 ///
273 /// @return This token's #definitionName.
274 const String& GetDefinitionName() const {
275 ALIB_ASSERT_ERROR( int8_t(format) >= 0, "STRINGS/TOK",
276 "Error {} in definition of token \"{}\". Use DbgGetError() in debug-compilations!",
277 int8_t(format), definitionName)
278 return definitionName;
279 }
280
281 /// If field #exportName is not \e nulled (hence explicitly given with resourced definition
282 /// string or with a constructor), this is appended.
283 ///
284 /// Otherwise appends the result of \alib{strings::util;Token::GetDefinitionName} to
285 /// the \p{target}. If the token is defined \e CamelCase and the minimum length of the last
286 /// segment is defined \c 0, then the last character written is converted to lower case.
287 ///
288 /// As a result, in most cases it is \b not necessary to provide a specific #exportName
289 /// with the definition. Instead, this method should provide a reasonable output.
290 ///
291 /// \see Documentation section <b>Omitable Last Camel Hump</b> of this classes'
292 /// \alib{strings::util;Token;documentation}, for more information about why the
293 /// character conversion to lower case might be performed.
294 ///
295 /// @param target The \b AString that method \b Append was invoked on.
297 void GetExportName(AString& target) const;
298
299 /// Returns the format of this token.
300 ///
301 /// \note Same as methods #Sensitivity and #GetMinLength, this method is usually not
302 /// of interest to standard API usage.
303 /// These three informational methods are rather provided to support the unit tests.
304 /// @return This token's format, used with method #Match.
306 ALIB_ASSERT_ERROR( int8_t(format) >= 0, "STRINGS/TOK",
307 "Error {} in definition of token \"{}\". Use DbgGetError() in debug-compilations!",
308 int8_t(format), definitionName)
309 return Formats( int8_t(format) & ~int8_t(ignoreCase) );
310 }
311
312 /// Returns the letter case sensitivity of this token.
313 ///
314 /// \note Same as methods #GetFormat and #GetMinLength, this method is usually not
315 /// of interest to standard API usage.
316 /// These three informational methods are rather provided to support the unit tests.
317 /// @return The letter case sensitivity used with method #Match.
319 { return (int(format) & 1 ) == 1 ? lang::Case::Ignore : lang::Case::Sensitive; }
320
321 /// Returns the minimum length to be read. In case that this token is not of
322 /// snake_case, kebab-case or CamelCase naming scheme, only \c 0 is allowed for parameter
323 /// \p{idx} and this defines the minimal abbreviation length. If one of the naming schemes
324 /// applies, parameter \p{idx} may be as high as the number of segments found in the
325 /// name (and a maximum of \c 6, as this class supports only up to seven segments).
326 ///
327 /// The first index that exceeds the number of segments, will return \c -1 for the length.
328 /// If even higher index values are requested, then the returned value is undefined.
329 ///
330 /// @param idx The index of the minimum length to receive.
331 ///
332 /// \note Same as methods #GetFormat and #Sensitivity, this method is usually not
333 /// of interest to standard API usage.
334 /// These three informational methods are rather provided to support the unit tests.
335 ///
336 /// @return The minimum length of segment number \p{idx}.
337 int8_t GetMinLength( int idx ) const {
338 ALIB_ASSERT_ERROR( idx >= 0 && idx <= 6 , "STRINGS/TOK", "Index {} out of range.", idx )
339
340 return (idx >= 0 && idx <= 6) ? minLengths[idx] : -1;
341 }
342
343 #if ALIB_ENUMRECORDS
344 /// Defines or redefines this token by parsing the attributes from the given substring.
345 /// This method is usually invoked by code that loads tokens and other data from
346 /// \alib{resources;ResourcePool;resources} of \alib {lang;Camp} objects.
347 ///
348 /// The expected format is defined as a list of the following values, separated by
349 /// the character given with parameter \p{separator}:
350 /// - The #definitionName of the token. Even if the letter case is ignored, this should
351 /// contain the name in "normalized" format, as it may be used with #GetExportName,
352 /// if no specific name to export is given.
353 /// - Letter case sensitivity. This can be "Sensitive" or "Ignore"
354 /// (respectively, what is defined with resourced
355 /// \ref alib_enums_records "ALib Enum Records" of type \alib{lang::Case}),
356 /// can be abbreviated to just one character (i.e., <c>'s'</c> and
357 /// <c>'i'</c>) and itself is not parsed taking the letter-case into account.
358 /// - Optionally, the standard export string is used with the method #GetExportName, and
359 /// when appended to an \b AString. Output names defined with this function must not start
360 /// with a digit, because a digit in this position of \p{definition}, indicates that
361 /// no export name is given.
362 /// - The list of minimum length for each segment of the name. The number of values have
363 /// to match the number of segments. A value of \c 0 specifies that no abbreviation
364 /// must be done and therefore is the same as specifying the exact length of the segment.
365 ///
366 /// \note The given \p{definition} string has to survive the use of the token, which
367 /// is naturally true if the string resides in resources.
368 /// (String contents are not copied. Instead, this class later refers to substrings
369 /// of the given \p{definition}.)
370 ///
371 /// \par Availability
372 /// This method is available only if the module \alib_enumrecords is included in
373 /// the \alibbuild.
374 /// @param definition The input string.
375 /// @param separator Separation character used to parse the input.
376 /// Defaults to <c>';'</c>.
378 void Define( const String& definition, character separator = ';' );
379 #endif
380
381 /// Matches a given string with this token. See this class's description for details.
382 ///
383 /// @param needle The potentially abbreviated input string to match.
384 /// @return \c true if \p{needle} matches this token, \c false otherwise.
386 bool Match( const String& needle );
387
388 protected:
389 /// Detects snake_case, kebab-case or CamelCase.
391 void detectFormat();
392
393}; // struct Token
394
395} // namespace alib[::strings::util]
396
397/// Type alias in namespace \b alib.
399
400} // namespace [alib]
401
402
403namespace alib { namespace strings {
404#if DOXYGEN
405namespace APPENDABLES {
406#endif
407/// Specialization of functor \alib{strings;AppendableTraits} for type \alib{strings::util;Token}.
408template<typename TAllocator> struct AppendableTraits<strings::util::Token, alib::character,TAllocator>
409{
410 /// Appends the result of \alib{strings::util;Token::GetExportName} to the \p{target}.
411 /// @param target The \b AString that method \b Append was invoked on.
412 /// @param src The \b Token to append.
414 { src.GetExportName(target); }
415};
416#if DOXYGEN
417} // namespace alib::strings[::APPENDABLES]
418#endif
419}} // namespace [alib::strings]
420
421
@ ErrorReadingSensitivity
Sensitivity value not found.
Definition token.inl:151
@ TooManyMinLengthsGiven
A maximum of 7 minimum length values was exceeded.
Definition token.inl:153
@ ErrorReadingMinLengths
Error parsing the list of minimum lengths.
Definition token.inl:152
@ DefinitionStringNotConsumed
The definition string was not completely consumed.
Definition token.inl:160
ALIB_DLL void detectFormat()
Detects snake_case, kebab-case or CamelCase.
Definition token.cpp:164
int8_t GetMinLength(int idx) const
Definition token.inl:337
String definitionName
The tokens' definition string part.
Definition token.inl:168
Token(const String &definitionSrc, character separator=';')
Definition token.inl:239
DbgDefinitionError DbgGetError()
Definition token.inl:259
const String & GetDefinitionName() const
Definition token.inl:274
lang::Case Sensitivity() const
Definition token.inl:318
Formats
Format types detected with detectFormat.
Definition token.inl:135
@ CamelCase
UpperCamelCase or lowerCamelCase.
Definition token.inl:139
@ SnakeCase
snake_case using underscores.
Definition token.inl:137
@ Normal
Normal, optionally abbreviated words.
Definition token.inl:136
@ KebabCase
kebab-case using hyphens.
Definition token.inl:138
Token()
Parameterless constructor. Creates an "undefined" token.
Definition token.inl:190
Formats format
Defines the "case type" as well as the letter case sensitivity of this token.
Definition token.inl:175
String exportName
The tokens' optional explicit export name.
Definition token.inl:171
static constexpr Formats ignoreCase
Letter case sensitivity. This is combined with the format bits.
Definition token.inl:183
Formats GetFormat() const
Definition token.inl:305
ALIB_DLL void Define(const String &definition, character separator=';')
Definition token.cpp:91
ALIB_DLL bool Match(const String &needle)
Definition token.cpp:278
ALIB_DLL void GetExportName(AString &target) const
Definition token.cpp:70
#define ALIB_DLL
Definition alib.inl:503
#define ALIB_ENUMS_MAKE_BITWISE(TEnum)
#define ALIB_EXPORT
Definition alib.inl:497
#define ALIB_ASSERT_ERROR(cond, domain,...)
Definition alib.inl:1066
#define ALIB_REL_DBG(releaseCode,...)
Definition alib.inl:855
Case
Denotes upper and lower case character treatment.
constexpr String NULL_STRING
A nulled string of the default character type.
Definition string.inl:2271
strings::util::Token Token
Type alias in namespace alib.
Definition token.inl:398
strings::TAString< character, lang::HeapAllocator > AString
Type alias in namespace alib.
strings::TString< character > String
Type alias in namespace alib.
Definition string.inl:2189
characters::character character
Type alias in namespace alib.
void operator()(strings::TAString< character, TAllocator > &target, const strings::util::Token &src)
Definition token.inl:413