Description:

Tokens in the context of ALib Strings , are human readable "words" or "symbols" that represent a certain value or entity of a software. Tokens may be used with configuration files, mathematical or general expressions, programming languages, communication protocols and so forth.

This struct contains attributes to describe a token, a method to parse the attributes from a (resource) string and finally method Match that matches a given string against the token definition.

Token Format:

With the construction, respectively the definition of a token, special formats are detected. These formats are:

"snake_case"
"kebab-case"
"CamelCase"

Note: Information about such case formats is given in this Wikipedia article .; If the name indicates a mix of snake_case, kebab-case or CamelCase formats (e.g. "System_Propery-ValueTable"), then snake_case supersedes both others and kebab-case supersedes CamelCase.

The format detection is only performed when more than one minimum length is given. In this case, the number of "segments" (e.g. "camel humps") has to match the number of length values.

Character Case Sensitivity:

Independent from the token format (normal or snake_case, kebab-case, CamelCase), character case sensitivity can be chosen. With CamelCase and case sensitive parsing, the first character of the first hump may be defined lower or upper case (called "lowerCamelCase" vs. "UpperCamelCase").

If none of the special formats is detected, the tokens can optionally be abbreviated by just providing a minimum amount of starting characters as specified by the then single entry in minLengths. Otherwise, each segment of the token (e.g. "camel hump") can (again optionally) be shortened on its own. As an example, if for token "SystemProperty" the minimum lengths given are 3 and 4, the minimum abbreviation is "SysProp", while "SystProper" also matches.

Limitation To Seven Segments:

This class supports minimum length definitions for up to 7 "camel humps", respectively segments. Should a name contain even more segments, those can not be abbreviated. Providing more than 7 values for minimum segment lengths with the definition string results in a definition error (see below).

Special Treatment For CamelCase:

Omitable Last Camel Hump:

The minimum length values provided must be greater than 0, except for one exclamation: With CamelCase format and case-insensitive definition, the last "camel hump" may have a minimum length of 0 and hence may be omitted when matched. If so, the "normalized" version of the token, which can be received by appending an instance to an AString , will have the last letter of the defined name converted to lower case.
The rational for this specific approach is to support the English plural case. This can be best explained in a sample. If a token was defined using definition string:

 "MilliSecondS Ignore 1 1 0"

then all of the following words match:

 milliseconds
 MilliSecs
 millis
 MSec
 MSecs
 MSs
 ms

In the case that the rightfully (normalized) spelled token name is to be written, then with the last character converted to lower case, the token becomes

 MilliSeconds

This is assured for example with the specialization of functor T_Append for this type. Hence, when appending a Token to an AString, if omitable, the last character of the token name is converted to lower case.

Rollback:

CamelCase supports a simple "rollback" mechanism, which is needed for example for token

 "SystemTemperature Ignore 1 1 0"

and given match argument

 system

All six characters are matching the first hump, but then there are not characters left to match the start of the second hump "Temperature". In this case, a loop of retries is performed by rolling back characters from the back of the hump ('m') and ending with the first optional character of that hump ('y'). The loop will be broken when character 't' is found.

However: This is not continued in the case that the term that was rolled back does not match, yet. This means, that certain (very unlikely!) tokens, with nested repeating character sequences in camel humps, can not be abbreviated to certain (unlikely wanted) lengths.

Handling Definition Errors:

The definition strings passed to method Define are considered static (resourced) data. In other words, this definition data should be compile-time defined and not be customizable by end-users, but only by experts. Therefore, only in debug-compilations of the library, a due testing of correctness of the definitions is available.

The source code of static utility method LoadResourcedTokens demonstrates how error codes defined with enumeration DbgDefinitionError can be handled in debug-compilations by raising debug-assertions.

Definition at line 149 of file token.hpp.

#include <token.hpp>

Collaboration diagram for Token:

[legend]

Public Type Index:
enum class	DbgDefinitionError : int8_t { OK = 0 , EmptyName = - 1 , ErrorReadingSensitivity = - 2 , ErrorReadingMinLengths = - 3 , TooManyMinLengthsGiven = - 4 , InconsistentMinLengths = - 5 , NoCaseSchemeFound = - 6 , MinLenExceedsSegmentLength = - 7 , DefinitionStringNotConsumed = - 8 , ZeroMinLengthAndNotLastCamelHump = - 9 }

enum class	Formats : int8_t { Normal = 0 , SnakeCase = 2 , KebabCase = 4 , CamelCase = 8 }

Public Static Method Index:
static void	LoadResourcedTokens (lang::Camp &module, const NString &resourceName, strings::util::Token *target, int dbgSizeVerifier, character outerSeparator=',', character innerSeparator=' ')

static ALIB_API void	LoadResourcedTokens (lang::resources::ResourcePool &resourcePool, const NString &resourceCategory, const NString &resourceName, strings::util::Token *target, int dbgSizeVerifier, character outerSeparator=',', character innerSeparator=' ')

Public Method Index:
	Token ()

	Token (const String &definition, character separator=';')

ALIB_API	Token (const String &name, lang::Case sensitivity, int8_t minLength)

ALIB_API	Token (const String &name, lang::Case sensitivity, int8_t minLength1, int8_t minLength2, int8_t minLength3=-1, int8_t minLength4=-1, int8_t minLength5=-1, int8_t minLength6=-1, int8_t minLength7=-1)

DbgDefinitionError	DbgGetError ()

ALIB_API void	Define (const String &definition, character separator=';')

Formats	GetFormat () const

int8_t	GetMinLength (int idx) const

const String &	GetRawName () const

ALIB_API bool	Match (const String &needle)

lang::Case	Sensitivity () const

Enumeration Details:

◆ DbgDefinitionError

enum class DbgDefinitionError : int8_t

strong

Error codes which which are written in field format in the case that method Define suffers a parsing error.
This enum, as well as the error detection is only available in debug-compilations of the library.

Enumerator
OK	All is fine.
EmptyName	No token name found.
ErrorReadingSensitivity	Sensitivity value not found.
ErrorReadingMinLengths	Error parsing the list of minimum lengths.
TooManyMinLengthsGiven	A maximum of `7` minimum length values was exceeded.
InconsistentMinLengths	The number of given minimum length values is greater than `1` but does not match the number of segments in the identifier.
NoCaseSchemeFound	More than one minimum length value was given but no segmentation scheme could be detected.
MinLenExceedsSegmentLength	A minimum length is specified to be higher than the token name, respectively the according segment name.
DefinitionStringNotConsumed	The definition string was not completely consumed.
ZeroMinLengthAndNotLastCamelHump	A minimum length of `0` was specified for a segment that is not a last camel case hump.

Definition at line 170 of file token.hpp.

◆ Formats

enum class Formats : int8_t

strong

Format types detected with detectFormat.

Enumerator
Normal	Normal, optionally abbreviated words.
SnakeCase	snake_case using underscores.
KebabCase	kebab-case using hyphens.
CamelCase	UpperCamelCase or lowerCamelCase.

Definition at line 155 of file token.hpp.

Field Details:

◆ format

Formats format

protected

Defines the "case type" as well as the letter case sensitivity of this token.

Definition at line 194 of file token.hpp.

◆ ignoreCase

constexpr Formats ignoreCase = Formats(1)

staticconstexprprotected

Letter case sensitivity. This is combined with the format bits.

Definition at line 205 of file token.hpp.

◆ minLengths

int8_t minLengths[7]

protected

The minimum abbreviation length per segment. If only one is given (second is -1), then field Format indicates normal tokens. Otherwise, the token is either snake_case, kebab-case or CamelCase.

Definition at line 201 of file token.hpp.

◆ name

String name

protected

The token name.

Definition at line 191 of file token.hpp.

Constructor(s) / Destructor Details::

◆ Token() [1/4]

Token ( )

inline

Parameterless constructor. Creates an "undefined" token.

Definition at line 213 of file token.hpp.

◆ Token() [2/4]

Token	(	const String &	name,
		lang::Case	sensitivity,
		int8_t	minLength )

Constructor used with function names that do not contain snake_case, kebab-case or CamelCase name scheme.

Note: Of-course, the name may follow such scheme. With this constructor, it just will not be detected.

Parameters

name	The function name.
sensitivity	The letter case sensitivity of reading the function name.
minLength	The minimum starting portion of the function name to read..

Definition at line 42 of file token.cpp.

Here is the call graph for this function:

◆ Token() [3/4]

Token	(	const String &	name,
		lang::Case	sensitivity,
		int8_t	minLength1,
		int8_t	minLength2,
		int8_t	minLength3 = -1,
		int8_t	minLength4 = -1,
		int8_t	minLength5 = -1,
		int8_t	minLength6 = -1,
		int8_t	minLength7 = -1 )

Constructor with at least two minimum length values, used to define tokens that follow snake_case, kebab-case or CamelCase naming schemes.

Parameters

name	The function name.
sensitivity	The letter case sensitivity of reading the function name.
minLength1	The minimum starting portion of the first segment to read.
minLength2	The minimum starting portion of the second segment to read.
minLength3	The minimum starting portion of the third segment to read. Defaults to `1`.
minLength4	The minimum starting portion of the fourth segment to read. Defaults to `1`.
minLength5	The minimum starting portion of the fifth segment to read. Defaults to `1`.
minLength6	The minimum starting portion of the sixth segment to read. Defaults to `1`.
minLength7	The minimum starting portion of the seventh segment to read. Defaults to `1`.

Definition at line 58 of file token.cpp.

Here is the call graph for this function:

◆ Token() [4/4]

Token	(	const String &	definition,
		character	separator = ';' )

inline

Constructor using a (usually resourced) string to read the definitions. Invokes Define.

Parameters

definition	The input string.
separator	Separation character used to parse the input. Defaults to `';'`.

Module Dependencies: This method is only available if module ALib Enums is included in the ALib Distribution .

Definition at line 267 of file token.hpp.

Here is the call graph for this function:

Method Details:

◆ DbgGetError()

DbgDefinitionError DbgGetError ( )

inline

Tests if this token was well defined.

Note: This method is only available in debug-compilations. Definition strings are considered static data (preferably resourced). Therefore, in debug-compilations, this method should be invoked and with that, the consistency of the resources be tested. In the case of failure, a debug assertion should be raised.

Returns: DbgDefinitionError::OK , if this token is well defined, a different error code otherwise.

Definition at line 291 of file token.hpp.

◆ Define()

void Define	(	const String &	definition,
		character	separator = ';' )

Defines or redefines this token by parsing the attributes from the given sub-string. This method is usually invoked by code that loads tokens and other data from resources of ALib Camp objects.

The expected format is defined as a list of the following values, separated by the character given with parameter separator :

The name of the token. Even if letter case is ignored, this should contain the name in "normalized" format, as it may be use to generate human readable output strings.
Letter case sensitivity. This can be "Sensitive" or "Ignore" (respectively what is defined with resourced ALib Enum Records of type lang::Case ), can be abbreviated to just one character (i.e. 's' and 'i') and itself is not parsed taking letter case into account.
The list of minimum length for each segment of the name. The number of values have to match the number of segments. A value of 0 specifies that no abbreviation must be done and therefore is the same as specifying the exact length of the segment.

Parameters

definition	The input string.
separator	Separation character used to parse the input. Defaults to `';'`.

Module Dependencies: This method is only available if module ALib Enums is included in the ALib Distribution .

Definition at line 70 of file token.cpp.

Here is the call graph for this function:

◆ detectFormat()

void detectFormat ( )

protected

Detects snake_case, kebab-case or CamelCase.

Definition at line 145 of file token.cpp.

Here is the call graph for this function:

◆ GetFormat()

Formats GetFormat ( ) const

inline

Returns the format of this token.

Note: Same as methods Sensitivity and GetMinLength, this method is usually not of interest to standard API usage. These three informational methods are rather provided to support the unit tests.

Returns: This token's format, used with method Match.

Definition at line 326 of file token.hpp.

◆ GetMinLength()

int8_t GetMinLength ( int idx ) const

inline

Returns the minimum length to be read. In case that this token is not of snake_case, kebab-case or CamelCase naming scheme, only 0 is allowed for parameter idx and this defines the minimal abbreviation length. If one of the naming schemes applies, parameter idx may be as high as the number of segments found in the name (and a maximum of 6, as this class supports only up to seven segments).

The first index that exceeds the number of segments, will return -1 for the length. If even higher index values are requested, then the returned value is undefined.

Parameters

idx	The index of the minimum length to receive.

Note: Same as methods GetFormat and Sensitivity, this method is usually not of interest to standard API usage. These three informational methods are rather provided to support the unit tests.

Returns: The minimum length of segment number idx .

Definition at line 374 of file token.hpp.

◆ GetRawName()

const String & GetRawName ( ) const

inline

Returns the "raw" name of the token as given with Define, respectively with one of the constructors.

Note: To receive the "normalized" name of this token, it can be appended to an instance of type AString . The difference will be that in the case of CamelCase format with a last minimum segment size of 0, the last character of the name will be converted to lower case.

Returns: This token's name.

Definition at line 311 of file token.hpp.

◆ LoadResourcedTokens() [1/2]

static void LoadResourcedTokens	(	lang::Camp &	module,
		const NString &	resourceName,
		strings::util::Token *	target,
		int	dbgSizeVerifier,
		character	outerSeparator = ',',
		character	innerSeparator = ' ' )

inlinestatic

Shortcut to LoadResourcedTokens that accepts a module and uses its resource pool and resource category.

Parameters

module	The ALib Camp to load the resource from.
resourceName	The resource name.
target	The table to fill.
dbgSizeVerifier	This parameter has to be specified only in debug comilations and provides the expected size of the resourced table. To be surrounded by macro ALIB_DBG (not to be given in release builds.)
outerSeparator	The character that separates the entries. Defaults to `','`.
innerSeparator	The character that separates the values of an entry. Defaults to `' '` (space).

Module Dependencies: This method is only available if module ALib BaseCamp is included in the ALib Distribution .

◆ LoadResourcedTokens() [2/2]

static ALIB_API void LoadResourcedTokens	(	lang::resources::ResourcePool &	resourcePool,
		const NString &	resourceCategory,
		const NString &	resourceName,
		strings::util::Token *	target,
		int	dbgSizeVerifier,
		character	outerSeparator = ',',
		character	innerSeparator = ' ' )

static

Static utility function that defines a table of token objects from external resourced strings.

It is possible to provide the table lines in two ways:

In one resource string: In this case, parameter outerDelim has to specify the delimiter that separates the records.
In an array of resource strings: If the resource string as given is not defined, this method appends an integral index starting with 0 to the resource name, parses a single record and increments the index. Parsing ends when a resource with a next higher index is not found.

The second option is recommended for larger token sets. While the separation causes some overhead in a resource backend, the external (!) management (translation, manipulation, etc.) is most probably simplified with this approach.

Note: The length of the given table has to fit to the number of entries found in the resource pool. To assure this, with debug builds, parameter dbgSizeVerifier has to be provided (preferably by using macro ALIB_DBG(, N)).

Parameters

resourcePool	The resource pool to load the resource from.
resourceCategory	The resource category.
resourceName	The resource name.
target	The table to fill.
dbgSizeVerifier	This parameter has to be specified only in debug builds and provides the expected size of the resourced table. To be surrounded by macro ALIB_DBG (not to be given in release builds.)
outerSeparator	The character that separates the entries. Defaults to `','`.
innerSeparator	The character that separates the values of an entry. Defaults to `' '` (space).

Module Dependencies: This method is only available if module ALib Enums as well as module ALib BaseCamp is included in the ALib Distribution .

◆ Match()

bool Match ( const String & needle )

Matches a given string with this token. See this class's description for details.

Parameters

needle The potentially abbreviated input string to match.

Returns: true if needle matches this token, false otherwise.

Definition at line 284 of file token.cpp.

Here is the call graph for this function:

◆ Sensitivity()

lang::Case Sensitivity ( ) const

inline

Returns the letter case sensitivity of this token.

Note: Same as methods GetFormat and GetMinLength, this method is usually not of interest to standard API usage. These three informational methods are rather provided to support the unit tests.

Returns: The letter case sensitivity used with method Match.

Definition at line 350 of file token.hpp.

The documentation for this class was generated from the following files:

Description:

Token Format:

Character Case Sensitivity:

Limitation To Seven Segments:

Special Treatment For CamelCase:

Omitable Last Camel Hump:

Rollback:

Handling Definition Errors:

Public Type Index:

Public Static Method Index:

Public Method Index:

Enumeration Details:

◆ DbgDefinitionError

◆ Formats

Field Details:

◆ format

◆ ignoreCase

◆ minLengths

◆ name

Constructor(s) / Destructor Details::

◆ Token() [1/4]

◆ Token() [2/4]

◆ Token() [3/4]

◆ Token() [4/4]

Method Details:

◆ DbgGetError()

◆ Define()

◆ detectFormat()

◆ GetFormat()

◆ GetMinLength()

◆ GetRawName()

◆ LoadResourcedTokens() [1/2]

◆ LoadResourcedTokens() [2/2]

◆ Match()

◆ Sensitivity()