ALib C++ Library
Library Version: 2412 R0
Documentation generated by doxygen
Loading...
Searching...
No Matches
Token Class Reference

Description:

Tokens in the context of ALib Strings, are human-readable "words" or "symbols" that represent a certain value or entity of a software. Tokens may be used with configuration files, mathematical or general expressions, programming languages, communication protocols and so forth.

This struct contains attributes to describe a token, a method to parse the attributes from a (resource) string and finally method Match that matches a given string against the token definition.

Token Format:

With the construction, respectively the definition of a token, special formats are detected. These formats are:

  • "snake_case"
  • "kebab-case"
  • "CamelCase"
Note
Information about such case formats is given in this Wikipedia article .
If the name indicates a mix of snake_case, kebab-case or CamelCase formats (e.g., "System_Propery-ValueTable"), then snake_case supersedes both others and kebab-case supersedes CamelCase.

The format detection is only performed when more than one minimum length is given. In this case, the number of "segments" (e.g., "camel humps") has to match the number of length values.

Character Case Sensitivity:

Independent of the token format (normal or snake_case, kebab-case, CamelCase), character case sensitivity can be chosen. With CamelCase and case-sensitive parsing, the first character of the first hump may be defined lower or upper case (called "lowerCamelCase" vs. "UpperCamelCase").

If none of the special formats is detected, the tokens can optionally be abbreviated by just providing a minimum amount of starting characters as specified by the then single entry in minLengths. Otherwise, each segment of the token (e.g., "camel hump") can (again optionally) be shortened on its own. As an example, if for token "SystemProperty" the minimum lengths given are 3 and 4, the minimum abbreviation is "SysProp", while "SystProper" also matches.

Limitation To Seven Segments:

This class supports minimum length definitions for up to 7 "camel humps", respectively segments. Should a name contain even more segments, those cannot be abbreviated. Providing more than 7 values for minimum segment lengths with the definition string results in a definition error (see below).

Special Treatment For CamelCase:

Omitable Last Camel Hump:

The minimum length values provided must be greater than 0, with one exception: With CamelCase format and case-insensitive definition, the last "camel hump" may have a minimum length of 0 and hence may be omitted when matched. If so, the "normalized" version of the token, which can be received by appending an instance to an AString, will have the last letter of the defined name converted to lower case.
The rationale for this specific approach is to support the English plural case. This can be best explained in a sample. If a token was defined using definition string:

 "MilliSecondS Ignore 1 1 0"

then all of the following words match:

 milliseconds
 MilliSecs
 millis
 MSec
 MSecs
 MSs
 ms

In the case that the rightfully (normalized) spelled token name is to be written, then with the last character converted to lower case, the token becomes

 MilliSeconds

This is performed with methods GetExportName (which is also used by the specialization of functor T_Append for this type. Hence, when appending a Token to an AString, if omitable, the last character of the token name is converted to lower case.

If the above is not suitable, or for any other reasons a different "normalized" name is wanted when writing the token, then method Define offers a next mechanism to explicitly define any custom string to be written.

Rollback:

CamelCase supports a simple "rollback" mechanism, which is needed for example for token

 "SystemTemperature Ignore 1 1 0"

and given match argument

 system

All six characters are matching the first hump, but then there are not characters left to match the start of the second hump "Temperature". In this case, a loop of retries is performed by rolling back characters from the back of the hump ('m') and ending with the first optional character of that hump ('y'). The loop will be broken when character 't' is found.

However: This is not continued in the case that the term that was rolled back does not match, yet. This means, that certain (very unlikely!) tokens, with nested repeating character sequences in camel humps, cannot be abbreviated to certain (unlikely wanted) lengths.

Handling Definition Errors:

The definition strings passed to method Define are considered static (resourced) data. In other words, this definition data should be compile-time defined and not be customizable by end-users, but only by experts. Therefore, only in debug-compilations of the library, a due testing of correctness of the definitions is available.

The source code of static utility method LoadResourcedTokens demonstrates how error codes defined with enumeration DbgDefinitionError can be handled in debug-compilations by raising debug-assertions.

Definition at line 149 of file token.hpp.

#include <token.hpp>

Collaboration diagram for Token:
[legend]

Public Type Index:

enum class  DbgDefinitionError : int8_t {
  OK = 0 , EmptyName = - 1 , ErrorReadingSensitivity = - 2 , ErrorReadingMinLengths = - 3 ,
  TooManyMinLengthsGiven = - 4 , InconsistentMinLengths = - 5 , NoCaseSchemeFound = - 6 , MinLenExceedsSegmentLength = - 7 ,
  DefinitionStringNotConsumed = - 8 , ZeroMinLengthAndNotLastCamelHump = - 9
}
 
enum class  Formats : int8_t { Normal = 0 , SnakeCase = 2 , KebabCase = 4 , CamelCase = 8 }
 Format types detected with detectFormat. More...
 

Public Static Method Index:

static void LoadResourcedTokens (lang::Camp &module, const NString &resourceName, strings::util::Token *target, int dbgSizeVerifier, character outerSeparator=',', character innerSeparator=' ')
 
static ALIB_API void LoadResourcedTokens (lang::resources::ResourcePool &resourcePool, const NString &resourceCategory, const NString &resourceName, strings::util::Token *target, int dbgSizeVerifier, character outerSeparator=',', character innerSeparator=' ')
 

Public Method Index:

 Token ()
 Parameterless constructor. Creates an "undefined" token.
 
 Token (const String &definitionSrc, character separator=';')
 
ALIB_API Token (const String &name, lang::Case sensitivity, int8_t minLength, const String &exportName=NULL_STRING)
 
ALIB_API Token (const String &name, lang::Case sensitivity, int8_t minLength1, int8_t minLength2, int8_t minLength3=-1, int8_t minLength4=-1, int8_t minLength5=-1, int8_t minLength6=-1, int8_t minLength7=-1)
 
DbgDefinitionError DbgGetError ()
 
ALIB_API void Define (const String &definition, character separator=';')
 
const StringGetDefinitionName () const
 
ALIB_API void GetExportName (AString &target) const
 
Formats GetFormat () const
 
int8_t GetMinLength (int idx) const
 
ALIB_API bool Match (const String &needle)
 
lang::Case Sensitivity () const
 

Protected Static Field Index:

static constexpr Formats ignoreCase = Formats(1)
 Letter case sensitivity. This is combined with the format bits.
 

Protected Field Index:

String definitionName
 The tokens' definition string part.
 
String exportName = NULL_STRING
 The tokens' optional explicit export name.
 
Formats format
 Defines the "case type" as well as the letter case sensitivity of this token.
 
int8_t minLengths [7] = {0,0,0,0,0,0,0}
 

Protected Method Index:

ALIB_API void detectFormat ()
 Detects snake_case, kebab-case or CamelCase.
 

Enumeration Details:

◆ DbgDefinitionError

enum class DbgDefinitionError : int8_t
strong

Error codes which which are written in field format in the case that method Define suffers a parsing error.
This enum, as well as the error detection is only available in debug-compilations of the library.

Enumerator
OK 

All is fine.

EmptyName 

No token name found.

ErrorReadingSensitivity 

Sensitivity value not found.

ErrorReadingMinLengths 

Error parsing the list of minimum lengths.

TooManyMinLengthsGiven 

A maximum of 7 minimum length values was exceeded.

InconsistentMinLengths 

The number of given minimum length values is greater than 1 but does not match the number of segments in the identifier.

NoCaseSchemeFound 

More than one minimum length value was given but no segmentation scheme could be detected.

MinLenExceedsSegmentLength 

A minimum length is specified to be higher than the token name, respectively the according segment name.

DefinitionStringNotConsumed 

The definition string was not completely consumed.

ZeroMinLengthAndNotLastCamelHump 

A minimum length of 0 was specified for a segment that is not a last camel case hump.

Definition at line 166 of file token.hpp.

◆ Formats

enum class Formats : int8_t
strong

Format types detected with detectFormat.

Enumerator
Normal 

Normal, optionally abbreviated words.

SnakeCase 

snake_case using underscores.

KebabCase 

kebab-case using hyphens.

CamelCase 

UpperCamelCase or lowerCamelCase.

Definition at line 153 of file token.hpp.

Field Details:

◆ definitionName

String definitionName
protected

The tokens' definition string part.

Definition at line 187 of file token.hpp.

◆ exportName

String exportName = NULL_STRING
protected

The tokens' optional explicit export name.

Definition at line 190 of file token.hpp.

◆ format

Formats format
protected

Defines the "case type" as well as the letter case sensitivity of this token.

Definition at line 194 of file token.hpp.

◆ ignoreCase

Formats ignoreCase = Formats(1)
staticconstexprprotected

Letter case sensitivity. This is combined with the format bits.

Definition at line 203 of file token.hpp.

◆ minLengths

int8_t minLengths[7] = {0,0,0,0,0,0,0}
protected

The minimum abbreviation length per segment. If only one is given (second is -1), then field Format indicates normal tokens. Otherwise, the token is either snake_case, kebab-case or CamelCase.

Definition at line 199 of file token.hpp.

Constructor(s) / Destructor Details:

◆ Token() [1/4]

Token ( )
inline

Parameterless constructor. Creates an "undefined" token.

Definition at line 211 of file token.hpp.

◆ Token() [2/4]

Token ( const String & name,
lang::Case sensitivity,
int8_t minLength,
const String & exportName = NULL_STRING )

Constructor used with function names that do not contain snake_case, kebab-case or CamelCase name scheme.

Note
Of course, the name may follow such scheme. With this constructor, it just will not be detected.
Parameters
nameThe function name.
sensitivityThe letter case sensitivity of reading the function name.
minLengthThe minimum starting portion of the function name to read..
exportNameAn optional export name. If not given, the name is used with method GetExportName.

Definition at line 29 of file token.cpp.

Here is the call graph for this function:

◆ Token() [3/4]

Token ( const String & name,
lang::Case sensitivity,
int8_t minLength1,
int8_t minLength2,
int8_t minLength3 = -1,
int8_t minLength4 = -1,
int8_t minLength5 = -1,
int8_t minLength6 = -1,
int8_t minLength7 = -1 )

Constructor with at least two minimum length values, used to define tokens that follow snake_case, kebab-case or CamelCase naming schemes.

Parameters
nameThe function name.
sensitivityThe letter case sensitivity of reading the function name.
minLength1The minimum starting portion of the first segment to read.
minLength2The minimum starting portion of the second segment to read.
minLength3The minimum starting portion of the third segment to read. Defaults to 1.
minLength4The minimum starting portion of the fourth segment to read. Defaults to 1.
minLength5The minimum starting portion of the fifth segment to read. Defaults to 1.
minLength6The minimum starting portion of the sixth segment to read. Defaults to 1.
minLength7The minimum starting portion of the seventh segment to read. Defaults to 1.

Definition at line 46 of file token.cpp.

Here is the call graph for this function:

◆ Token() [4/4]

Token ( const String & definitionSrc,
character separator = ';' )
inline

Constructor using a (usually resourced) string to read the definitions. Invokes Define.

Parameters
definitionSrcThe input string.
separatorSeparation character used to parse the input. Defaults to ';'.
Module Dependencies
This method is only available if module ALib Enums is included in the ALib Distribution.

Definition at line 266 of file token.hpp.

Here is the call graph for this function:

Method Details:

◆ DbgGetError()

DbgDefinitionError DbgGetError ( )
inline

Tests if this token was well defined.

Note
This method is only available in debug-compilations. Definition strings are considered static data (preferably resourced). Therefore, in debug-compilations, this method should be invoked and with that, the consistency of the resources be tested. In the case of failure, a debug assertion should be raised.
Returns
DbgDefinitionError::OK, if this token is well defined, a different error code otherwise.

Definition at line 288 of file token.hpp.

◆ Define()

void Define ( const String & definition,
character separator = ';' )

Defines or redefines this token by parsing the attributes from the given substring. This method is usually invoked by code that loads tokens and other data from resources of ALib {lang;Camp} objects.

The expected format is defined as a list of the following values, separated by the character given with parameter separator:

  • The definitionName of the token. Even if letter case is ignored, this should contain the name in "normalized" format, as it may be used with GetExportName, if no specific name to export is given.
  • Letter case sensitivity. This can be "Sensitive" or "Ignore" (respectively what is defined with resourced ALib Enum Records of type lang::Case), can be abbreviated to just one character (i.e., 's' and 'i') and itself is not parsed taking letter case into account.
  • Optionally the standard export string used with method GetExportName and when appended to an AString. Output names defined with this function must not start with a digit, because a digit in this position of definition, indicates that no export name is given.
  • The list of minimum length for each segment of the name. The number of values have to match the number of segments. A value of 0 specifies that no abbreviation must be done and therefore is the same as specifying the exact length of the segment.
Note
The given definition string has to survive the use of the token, which is naturally true if the string resides in resources. (String contents is not copied. Instead, this class later refers to substrings of given definition.)
Parameters
definitionThe input string.
separatorSeparation character used to parse the input. Defaults to ';'.
Module Dependencies
This method is only available if module ALib Enums is included in the ALib Distribution.

Definition at line 86 of file token.cpp.

Here is the call graph for this function:

◆ detectFormat()

void detectFormat ( )
protected

Detects snake_case, kebab-case or CamelCase.

Definition at line 175 of file token.cpp.

Here is the call graph for this function:

◆ GetDefinitionName()

const String & GetDefinitionName ( ) const
inline

Returns the definition name used for parsing the token.

Note
To receive the "normalized" name of this token, method GetExportName can be used, or a token can simply be appended to an instance of type AString.
Returns
This token's definitionName.

Definition at line 305 of file token.hpp.

◆ GetExportName()

void GetExportName ( AString & target) const

If field exportName is not nulled (hence explicitly given with resourced definition string or with a constructor), this is appended.

Otherwise appends the result of Token::GetDefinitionName to the target. If the token is defined CamelCase and the minimum length of the last segment is defined 0, then the last character written is converted to lower case.

As a result, in most cases it is not necessary to provide a specific exportName with the definition. Instead, this method should provide a reasonable output.

See also
Documentation section Omitable Last Camel Hump of this classes' documentation, for more information about why the character conversion to lower case might be performed.
Parameters
targetThe AString that method Append was invoked on.

Definition at line 58 of file token.cpp.

Here is the call graph for this function:

◆ GetFormat()

Formats GetFormat ( ) const
inline

Returns the format of this token.

Note
Same as methods Sensitivity and GetMinLength, this method is usually not of interest to standard API usage. These three informational methods are rather provided to support the unit tests.
Returns
This token's format, used with method Match.

Definition at line 340 of file token.hpp.

◆ GetMinLength()

int8_t GetMinLength ( int idx) const
inline

Returns the minimum length to be read. In case that this token is not of snake_case, kebab-case or CamelCase naming scheme, only 0 is allowed for parameter idx and this defines the minimal abbreviation length. If one of the naming schemes applies, parameter idx may be as high as the number of segments found in the name (and a maximum of 6, as this class supports only up to seven segments).

The first index that exceeds the number of segments, will return -1 for the length. If even higher index values are requested, then the returned value is undefined.

Parameters
idxThe index of the minimum length to receive.
Note
Same as methods GetFormat and Sensitivity, this method is usually not of interest to standard API usage. These three informational methods are rather provided to support the unit tests.
Returns
The minimum length of segment number idx.

Definition at line 377 of file token.hpp.

◆ LoadResourcedTokens() [1/2]

static void LoadResourcedTokens ( lang::Camp & module,
const NString & resourceName,
strings::util::Token * target,
int dbgSizeVerifier,
character outerSeparator = ',',
character innerSeparator = ' ' )
inlinestatic

Shortcut to LoadResourcedTokens that accepts a module and uses its resource pool and resource category.

Parameters
moduleThe ALib Camp to load the resource from.
resourceNameThe resource name.
targetThe table to fill.
dbgSizeVerifierThis parameter has to be specified only in debug comilations and provides the expected size of the resourced table. To be surrounded by macro ALIB_DBG (not to be given in release-builds.)
outerSeparatorThe character that separates the entries. Defaults to ','.
innerSeparatorThe character that separates the values of an entry. Defaults to ' ' (space).
Module Dependencies
This method is only available if module ALib BaseCamp is included in the ALib Distribution.

◆ LoadResourcedTokens() [2/2]

static ALIB_API void LoadResourcedTokens ( lang::resources::ResourcePool & resourcePool,
const NString & resourceCategory,
const NString & resourceName,
strings::util::Token * target,
int dbgSizeVerifier,
character outerSeparator = ',',
character innerSeparator = ' ' )
static

Static utility function that defines a table of token objects from external resourced strings.

It is possible to provide the table lines in two ways:

  • In one resource string: In this case, parameter outerDelim has to specify the delimiter that separates the records.
  • In an array of resource strings: If the resource string as given is not defined, this method appends an integral index starting with 0 to the resource name, parses a single record and increments the index. Parsing ends when a resource with a next higher index is not found.

The second option is recommended for larger token sets. While the separation causes some overhead in a resource backend, the external (!) management (translation, manipulation, etc.) is most probably simplified with this approach.

Note
The length of the given table has to fit to the number of entries found in the resource pool. To ensure this, with debug-builds, parameter dbgSizeVerifier has to be provided (preferably by using macro ALIB_DBG(, N)).
Parameters
resourcePoolThe resource pool to load the resource from.
resourceCategoryThe resource category.
resourceNameThe resource name.
targetThe table to fill.
dbgSizeVerifierThis parameter has to be specified only in debug-builds and provides the expected size of the resourced table. To be surrounded by macro ALIB_DBG (not to be given in release-builds.)
outerSeparatorThe character that separates the entries. Defaults to ','.
innerSeparatorThe character that separates the values of an entry. Defaults to ' ' (space).
Module Dependencies
This method is only available if module ALib Enums as well as module ALib BaseCamp is included in the ALib Distribution.

◆ Match()

bool Match ( const String & needle)

Matches a given string with this token. See this class's description for details.

Parameters
needleThe potentially abbreviated input string to match.
Returns
true if needle matches this token, false otherwise.

Definition at line 314 of file token.cpp.

Here is the call graph for this function:

◆ Sensitivity()

lang::Case Sensitivity ( ) const
inline

Returns the letter case sensitivity of this token.

Note
Same as methods GetFormat and GetMinLength, this method is usually not of interest to standard API usage. These three informational methods are rather provided to support the unit tests.
Returns
The letter case sensitivity used with method Match.

Definition at line 356 of file token.hpp.


The documentation for this class was generated from the following files: