ALib C++ Library
Library Version: 2510 R0
Documentation generated by doxygen
Loading...
Searching...
No Matches
alib::strings::util::Token Class Reference

Description:

Tokens in the context of ALib Strings, are human-readable "words" or "symbols" that represent a certain value or entity of software. Tokens may be used with configuration files, mathematical or general expressions, programming languages, communication protocols and so forth.

This struct contains attributes to describe a token, a method to parse the attributes from a (resource) string and finally method Match that matches a given string against the token definition.

Token Format:

With the construction, respectively the definition of a token, special formats are detected. These formats are:

  • "snake_case"
  • "kebab-case"
  • "CamelCase"
Note
Information about such case formats is given in this Wikipedia article .
If the name indicates a mix of snake_case, kebab-case or CamelCase formats (e.g., "System_Propery-ValueTable"), then snake_case supersedes both others and kebab-case supersedes CamelCase.

The format detection is only performed when more than one minimum length is given. In this case, the number of "segments" (e.g., "camel humps") has to match the number of length values.

Character Case Sensitivity:

Independent of the token format (normal or snake_case, kebab-case, CamelCase), character case sensitivity can be chosen. With CamelCase and case-sensitive parsing, the first character of the first hump may be defined lower or upper case (called "lowerCamelCase" vs. "UpperCamelCase").

If none of the special formats is detected, the tokens can optionally be abbreviated by just providing a minimum amount of starting characters as specified by the then single entry in minLengths. Otherwise, each segment of the token (e.g., "camel hump") can (again optionally) be shortened on its own. As an example, if for token "SystemProperty" the minimum lengths given are 3 and 4, the minimum abbreviation is "SysProp", while "SystProper" also matches.

Limitation To Seven Segments:

This class supports minimum length definitions for up to 7 "camel humps", respectively segments. Should a name contain even more segments, those cannot be abbreviated. Providing more than 7 values for minimum segment lengths with the definition string results in a definition error (see below).

Special Treatment For CamelCase:

Omitable Last Camel Hump:

The minimum length values provided must be greater than 0, with one exception: With CamelCase format and case-insensitive definition, the last "camel hump" may have a minimum length of 0 and hence may be omitted when matched. If so, the "normalized" version of the token, which can be received by appending an instance to an AString, will have the last letter of the defined name converted to lower case.
The rationale for this specific approach is to support the English plural case. This can be best explained in a sample. If a token was defined using definition string:

 "MilliSecondS Ignore 1 1 0"

then all of the following words match:

 milliseconds
 MilliSecs
 millis
 MSec
 MSecs
 MSs
 ms

In the case that the rightfully (normalized) spelled token name is to be written, then with the last character converted to lower case, the token becomes

 MilliSeconds

This is performed with methods GetExportName (which is also used by the specialization of functor AppendableTraits for this type. Hence, when appending a Token to an AString, if omitable, the last character of the token name is converted to lower case.

If the above is not suitable, or for any other reasons a different "normalized" name is wanted when writing the token, then method Define offers a next mechanism to explicitly define any custom string to be written.

Rollback:

CamelCase supports a simple "rollback" mechanism, which is needed for example for token

 "SystemTemperature Ignore 1 1 0"

and given match argument

 system

All six characters are matching the first hump, but then there are not characters left to match the start of the second hump "Temperature". In this case, a loop of retries is performed by rolling back characters from the back of the hump ('m') and ending with the first optional character of that hump ('y'). The loop will be broken when character 't' is found.

However: This is not continued in the case that the term that was rolled back does not match, yet. This means, that certain (very unlikely!) tokens, with nested repeating character sequences in camel humps, cannot be abbreviated to certain (unlikely wanted) lengths.

Handling Definition Errors:

The definition strings passed to method Define are considered static (resourced) data. In other words, this definition data should be compile-time defined and not be customizable by end-users, but only by experts. Therefore, only in debug-compilations of the library, a due testing of correctness of the definitions is available.

The source code of utility namespace function util::LoadResourcedTokens demonstrates how error codes defined with enumeration DbgDefinitionError can be handled in debug-compilations by raising debug-assertions.

Definition at line 130 of file token.inl.

Collaboration diagram for alib::strings::util::Token:
[legend]

Public Type Index:

enum class  DbgDefinitionError : int8_t {
  OK = 0 , EmptyName = - 1 , ErrorReadingSensitivity = - 2 , ErrorReadingMinLengths = - 3 ,
  TooManyMinLengthsGiven = - 4 , InconsistentMinLengths = - 5 , NoCaseSchemeFound = - 6 , MinLenExceedsSegmentLength = - 7 ,
  DefinitionStringNotConsumed = - 8 , ZeroMinLengthAndNotLastCamelHump = - 9
}
 
enum class  Formats : int8_t { Normal = 0 , SnakeCase = 2 , KebabCase = 4 , CamelCase = 8 }
 Format types detected with detectFormat. More...
 

Public Method Index:

 Token ()
 Parameterless constructor. Creates an "undefined" token.
 
 Token (const String &definitionSrc, character separator=';')
 
ALIB_DLL Token (const String &name, lang::Case sensitivity, int8_t minLength, const String &exportName=NULL_STRING)
 
ALIB_DLL Token (const String &name, lang::Case sensitivity, int8_t minLength1, int8_t minLength2, int8_t minLength3=-1, int8_t minLength4=-1, int8_t minLength5=-1, int8_t minLength6=-1, int8_t minLength7=-1)
 
DbgDefinitionError DbgGetError ()
 
ALIB_DLL void Define (const String &definition, character separator=';')
 
const StringGetDefinitionName () const
 
ALIB_DLL void GetExportName (AString &target) const
 
Formats GetFormat () const
 
int8_t GetMinLength (int idx) const
 
ALIB_DLL bool Match (const String &needle)
 
lang::Case Sensitivity () const
 

Protected Static Field Index:

static constexpr Formats ignoreCase = Formats(1)
 Letter case sensitivity. This is combined with the format bits.
 

Protected Field Index:

String definitionName
 The tokens' definition string part.
 
String exportName = NULL_STRING
 The tokens' optional explicit export name.
 
Formats format
 Defines the "case type" as well as the letter case sensitivity of this token.
 
int8_t minLengths [7] = {0,0,0,0,0,0,0}
 

Protected Method Index:

ALIB_DLL void detectFormat ()
 Detects snake_case, kebab-case or CamelCase.
 

Enumeration Details:

◆ DbgDefinitionError

Error codes which are written in field format in the case that method Define suffers a parsing error.
This enum, as well as the error detection, is only available in debug-compilations of the library.

Enumerator
OK 

All is fine.

EmptyName 

No token name found.

ErrorReadingSensitivity 

Sensitivity value not found.

ErrorReadingMinLengths 

Error parsing the list of minimum lengths.

TooManyMinLengthsGiven 

A maximum of 7 minimum length values was exceeded.

InconsistentMinLengths 

The number of given minimum length values is greater than 1 but does not match the number of segments in the identifier.

NoCaseSchemeFound 

More than one minimum length value was given but no segmentation scheme could be detected.

MinLenExceedsSegmentLength 

A minimum length is specified to be higher than the token name, respectively the according segment name.

DefinitionStringNotConsumed 

The definition string was not completely consumed.

ZeroMinLengthAndNotLastCamelHump 

A minimum length of 0 was specified for a segment that is not a last camel case hump.

Definition at line 147 of file token.inl.

◆ Formats

enum class alib::strings::util::Token::Formats : int8_t
strong

Format types detected with detectFormat.

Enumerator
Normal 

Normal, optionally abbreviated words.

SnakeCase 

snake_case using underscores.

KebabCase 

kebab-case using hyphens.

CamelCase 

UpperCamelCase or lowerCamelCase.

Definition at line 134 of file token.inl.

Field Details:

◆ definitionName

String alib::strings::util::Token::definitionName
protected

The tokens' definition string part.

Definition at line 168 of file token.inl.

◆ exportName

String alib::strings::util::Token::exportName = NULL_STRING
protected

The tokens' optional explicit export name.

Definition at line 171 of file token.inl.

◆ format

Formats alib::strings::util::Token::format
protected

Defines the "case type" as well as the letter case sensitivity of this token.

Definition at line 175 of file token.inl.

◆ ignoreCase

Formats alib::strings::util::Token::ignoreCase = Formats(1)
staticconstexprprotected

Letter case sensitivity. This is combined with the format bits.

Definition at line 183 of file token.inl.

◆ minLengths

int8_t alib::strings::util::Token::minLengths[7] = {0,0,0,0,0,0,0}
protected

The minimum abbreviation length per segment. If only one is given (second is -1), then the field format indicates normal tokens. Otherwise, the token is either snake_case, kebab-case or CamelCase.

Definition at line 180 of file token.inl.

Constructor(s) / Destructor Details:

◆ Token() [1/4]

alib::strings::util::Token::Token ( )
inline

Parameterless constructor. Creates an "undefined" token.

Definition at line 190 of file token.inl.

◆ Token() [2/4]

alib::strings::util::Token::Token ( const String & name,
lang::Case sensitivity,
int8_t minLength,
const String & exportName = NULL_STRING )

Constructor used with function names that do not contain snake_case, kebab-case or CamelCase name scheme.

Note
Of course, the name may follow such a scheme. With this constructor, it just will not be detected.
Parameters
nameThe function name.
sensitivityThe letter case sensitivity of reading the function name.
minLengthThe minimum starting portion of the function name to read..
exportNameAn optional export name. If not given, the name is used with method GetExportName.

Definition at line 43 of file token.cpp.

◆ Token() [3/4]

alib::strings::util::Token::Token ( const String & name,
lang::Case sensitivity,
int8_t minLength1,
int8_t minLength2,
int8_t minLength3 = -1,
int8_t minLength4 = -1,
int8_t minLength5 = -1,
int8_t minLength6 = -1,
int8_t minLength7 = -1 )

Constructor with at least two minimum length values, used to define tokens that follow snake_case, kebab-case or CamelCase naming schemes.

Parameters
nameThe function name.
sensitivityThe letter case sensitivity of reading the function name.
minLength1The minimum starting portion of the first segment to read.
minLength2The minimum starting portion of the second segment to read.
minLength3The minimum starting portion of the third segment to read. Defaults to 1.
minLength4The minimum starting portion of the fourth segment to read. Defaults to 1.
minLength5The minimum starting portion of the fifth segment to read. Defaults to 1.
minLength6The minimum starting portion of the sixth segment to read. Defaults to 1.
minLength7The minimum starting portion of the seventh segment to read. Defaults to 1.

Definition at line 60 of file token.cpp.

Here is the call graph for this function:

◆ Token() [4/4]

alib::strings::util::Token::Token ( const String & definitionSrc,
character separator = ';' )
inline

Constructor using a (usually resourced) string to read the definitions. Invokes Define.

Availability
This method is available only if the module ALib EnumRecords is included in the ALib Build.
Parameters
definitionSrcThe input string.
separatorSeparation character used to parse the input. Defaults to ';'.

Definition at line 239 of file token.inl.

Here is the call graph for this function:

Method Details:

◆ DbgGetError()

DbgDefinitionError alib::strings::util::Token::DbgGetError ( )
inline

Tests if this token was well defined.

Note
This method is only available in debug-compilations. Definition strings are considered static data (preferably resourced). Therefore, in debug-compilations, this method should be invoked and with that, the consistency of the resources be tested. In the case of failure, a debug assertion should be raised.
Returns
DbgDefinitionError::OK, if this token is well defined, a different error code otherwise.

Definition at line 259 of file token.inl.

◆ Define()

void alib::strings::util::Token::Define ( const String & definition,
character separator = ';' )

Defines or redefines this token by parsing the attributes from the given substring. This method is usually invoked by code that loads tokens and other data from resources of ALib {lang;Camp} objects.

The expected format is defined as a list of the following values, separated by the character given with parameter separator:

  • The definitionName of the token. Even if the letter case is ignored, this should contain the name in "normalized" format, as it may be used with GetExportName, if no specific name to export is given.
  • Letter case sensitivity. This can be "Sensitive" or "Ignore" (respectively, what is defined with resourced ALib Enum Records of type lang::Case), can be abbreviated to just one character (i.e., 's' and 'i') and itself is not parsed taking the letter-case into account.
  • Optionally, the standard export string is used with the method GetExportName, and when appended to an AString. Output names defined with this function must not start with a digit, because a digit in this position of definition, indicates that no export name is given.
  • The list of minimum length for each segment of the name. The number of values have to match the number of segments. A value of 0 specifies that no abbreviation must be done and therefore is the same as specifying the exact length of the segment.
Note
The given definition string has to survive the use of the token, which is naturally true if the string resides in resources. (String contents are not copied. Instead, this class later refers to substrings of the given definition.)
Availability
This method is available only if the module ALib EnumRecords is included in the ALib Build.
Parameters
definitionThe input string.
separatorSeparation character used to parse the input. Defaults to ';'.

Definition at line 100 of file token.cpp.

Here is the call graph for this function:

◆ detectFormat()

void alib::strings::util::Token::detectFormat ( )
protected

Detects snake_case, kebab-case or CamelCase.

Definition at line 183 of file token.cpp.

Here is the call graph for this function:

◆ GetDefinitionName()

const String & alib::strings::util::Token::GetDefinitionName ( ) const
inline

Returns the definition name used for parsing the token.

Note
To receive the "normalized" name of this token, method GetExportName can be used, or a token can simply be appended to an instance of type AString.
Returns
This token's definitionName.

Definition at line 274 of file token.inl.

◆ GetExportName()

void alib::strings::util::Token::GetExportName ( AString & target) const

If field exportName is not nulled (hence explicitly given with resourced definition string or with a constructor), this is appended.

Otherwise appends the result of Token::GetDefinitionName to the target. If the token is defined CamelCase and the minimum length of the last segment is defined 0, then the last character written is converted to lower case.

As a result, in most cases it is not necessary to provide a specific exportName with the definition. Instead, this method should provide a reasonable output.

See also
Documentation section Omitable Last Camel Hump of this classes' documentation, for more information about why the character conversion to lower case might be performed.
Parameters
targetThe AString that method Append was invoked on.

Definition at line 72 of file token.cpp.

Here is the call graph for this function:

◆ GetFormat()

Formats alib::strings::util::Token::GetFormat ( ) const
inline

Returns the format of this token.

Note
Same as methods Sensitivity and GetMinLength, this method is usually not of interest to standard API usage. These three informational methods are rather provided to support the unit tests.
Returns
This token's format, used with method Match.

Definition at line 306 of file token.inl.

◆ GetMinLength()

int8_t alib::strings::util::Token::GetMinLength ( int idx) const
inline

Returns the minimum length to be read. In case that this token is not of snake_case, kebab-case or CamelCase naming scheme, only 0 is allowed for parameter idx and this defines the minimal abbreviation length. If one of the naming schemes applies, parameter idx may be as high as the number of segments found in the name (and a maximum of 6, as this class supports only up to seven segments).

The first index that exceeds the number of segments, will return -1 for the length. If even higher index values are requested, then the returned value is undefined.

Parameters
idxThe index of the minimum length to receive.
Note
Same as methods GetFormat and Sensitivity, this method is usually not of interest to standard API usage. These three informational methods are rather provided to support the unit tests.
Returns
The minimum length of segment number idx.

Definition at line 339 of file token.inl.

◆ Match()

bool alib::strings::util::Token::Match ( const String & needle)

Matches a given string with this token. See this class's description for details.

Parameters
needleThe potentially abbreviated input string to match.
Returns
true if needle matches this token, false otherwise.

Definition at line 316 of file token.cpp.

Here is the call graph for this function:

◆ Sensitivity()

lang::Case alib::strings::util::Token::Sensitivity ( ) const
inline

Returns the letter case sensitivity of this token.

Note
Same as methods GetFormat and GetMinLength, this method is usually not of interest to standard API usage. These three informational methods are rather provided to support the unit tests.
Returns
The letter case sensitivity used with method Match.

Definition at line 320 of file token.inl.


The documentation for this class was generated from the following files: