Tokens in the context of ALib Strings , are human readable "words" or "symbols" that represent a certain value or entity of a software. Tokens may be used with configuration files, mathematical or general expressions, programming languages, communication protocols and so forth.
This struct contains attributes to describe a token, a method to parse the attributes from a (resource) string and finally method Match that matches a given string against the token definition.
With the construction, respectively the definition of a token, special formats are detected. These formats are:
The format detection is only performed when more than one minimum length is given. In this case, the number of "segments" (e.g. "camel humps") has to match the number of length values.
Independent from the token format (normal or snake_case, kebab-case, CamelCase), character case sensitivity can be chosen. With CamelCase and case sensitive parsing, the first character of the first hump may be defined lower or upper case (called "lowerCamelCase" vs. "UpperCamelCase").
If none of the special formats is detected, the tokens can optionally be abbreviated by just providing a minimum amount of starting characters as specified by the then single entry in minLengths. Otherwise, each segment of the token (e.g. "camel hump") can (again optionally) be shortened on its own. As an example, if for token "SystemProperty"
the minimum lengths given are 3
and 4
, the minimum abbreviation is "SysProp"
, while "SystProper"
also matches.
This class supports minimum length definitions for up to 7
"camel humps", respectively segments. Should a name contain even more segments, those can not be abbreviated. Providing more than 7
values for minimum segment lengths with the definition string results in a definition error (see below).
The minimum length values provided must be greater than 0
, except for one exclamation: With CamelCase format and case-insensitive definition, the last "camel hump" may have a minimum length of 0
and hence may be omitted when matched. If so, the "normalized" version of the token, which can be received by appending an instance to an AString , will have the last letter of the defined name converted to lower case.
The rational for this specific approach is to support the English plural case. This can be best explained in a sample. If a token was defined using definition string:
"MilliSecondS Ignore 1 1 0"
then all of the following words match:
milliseconds MilliSecs millis MSec MSecs MSs ms
In the case that the rightfully (normalized) spelled token name is to be written, then with the last character converted to lower case, the token becomes
MilliSeconds
This is assured for example with the specialization of functor T_Append for this type. Hence, when appending a Token to an AString, if omitable, the last character of the token name is converted to lower case.
CamelCase supports a simple "rollback" mechanism, which is needed for example for token
"SystemTemperature Ignore 1 1 0"
and given match argument
system
All six characters are matching the first hump, but then there are not characters left to match the start of the second hump "Temperature"
. In this case, a loop of retries is performed by rolling back characters from the back of the hump ('m'
) and ending with the first optional character of that hump ('y'
). The loop will be broken when character 't'
is found.
However: This is not continued in the case that the term that was rolled back does not match, yet. This means, that certain (very unlikely!) tokens, with nested repeating character sequences in camel humps, can not be abbreviated to certain (unlikely wanted) lengths.
The definition strings passed to method Define are considered static (resourced) data. In other words, this definition data should be compile-time defined and not be customizable by end-users, but only by experts. Therefore, only in debug-compilations of the library, a due testing of correctness of the definitions is available.
The source code of static utility method LoadResourcedTokens demonstrates how error codes defined with enumeration DbgDefinitionError can be handled in debug-compilations by raising debug-assertions.
#include <token.hpp>
Public Type Index: | |
enum class | DbgDefinitionError : int8_t { OK = 0 , EmptyName = - 1 , ErrorReadingSensitivity = - 2 , ErrorReadingMinLengths = - 3 , TooManyMinLengthsGiven = - 4 , InconsistentMinLengths = - 5 , NoCaseSchemeFound = - 6 , MinLenExceedsSegmentLength = - 7 , DefinitionStringNotConsumed = - 8 , ZeroMinLengthAndNotLastCamelHump = - 9 } |
enum class | Formats : int8_t { Normal = 0 , SnakeCase = 2 , KebabCase = 4 , CamelCase = 8 } |
Public Static Method Index: | |
static void | LoadResourcedTokens (lang::Camp &module, const NString &resourceName, strings::util::Token *target, int dbgSizeVerifier, character outerSeparator=',', character innerSeparator=' ') |
static ALIB_API void | LoadResourcedTokens (lang::resources::ResourcePool &resourcePool, const NString &resourceCategory, const NString &resourceName, strings::util::Token *target, int dbgSizeVerifier, character outerSeparator=',', character innerSeparator=' ') |
Public Method Index: | |
Token () | |
Token (const String &definition, character separator=';') | |
ALIB_API | Token (const String &name, lang::Case sensitivity, int8_t minLength) |
ALIB_API | Token (const String &name, lang::Case sensitivity, int8_t minLength1, int8_t minLength2, int8_t minLength3=-1, int8_t minLength4=-1, int8_t minLength5=-1, int8_t minLength6=-1, int8_t minLength7=-1) |
DbgDefinitionError | DbgGetError () |
ALIB_API void | Define (const String &definition, character separator=';') |
Formats | GetFormat () const |
int8_t | GetMinLength (int idx) const |
const String & | GetRawName () const |
ALIB_API bool | Match (const String &needle) |
lang::Case | Sensitivity () const |
|
strong |
Error codes which which are written in field format in the case that method Define suffers a parsing error.
This enum, as well as the error detection is only available in debug-compilations of the library.
|
strong |
Format types detected with detectFormat.
Enumerator | |
---|---|
Normal | Normal, optionally abbreviated words. |
SnakeCase | snake_case using underscores. |
KebabCase | kebab-case using hyphens. |
CamelCase | UpperCamelCase or lowerCamelCase. |
|
protected |
|
protected |
|
inline |
Token | ( | const String & | name, |
lang::Case | sensitivity, | ||
int8_t | minLength ) |
Constructor used with function names that do not contain snake_case, kebab-case or CamelCase name scheme.
name | The function name. |
sensitivity | The letter case sensitivity of reading the function name. |
minLength | The minimum starting portion of the function name to read.. |
Definition at line 42 of file token.cpp.
Token | ( | const String & | name, |
lang::Case | sensitivity, | ||
int8_t | minLength1, | ||
int8_t | minLength2, | ||
int8_t | minLength3 = -1, | ||
int8_t | minLength4 = -1, | ||
int8_t | minLength5 = -1, | ||
int8_t | minLength6 = -1, | ||
int8_t | minLength7 = -1 ) |
Constructor with at least two minimum length values, used to define tokens that follow snake_case, kebab-case or CamelCase naming schemes.
name | The function name. |
sensitivity | The letter case sensitivity of reading the function name. |
minLength1 | The minimum starting portion of the first segment to read. |
minLength2 | The minimum starting portion of the second segment to read. |
minLength3 | The minimum starting portion of the third segment to read. Defaults to 1 . |
minLength4 | The minimum starting portion of the fourth segment to read. Defaults to 1 . |
minLength5 | The minimum starting portion of the fifth segment to read. Defaults to 1 . |
minLength6 | The minimum starting portion of the sixth segment to read. Defaults to 1 . |
minLength7 | The minimum starting portion of the seventh segment to read. Defaults to 1 . |
Definition at line 58 of file token.cpp.
Constructor using a (usually resourced) string to read the definitions. Invokes Define.
definition | The input string. |
separator | Separation character used to parse the input. Defaults to ';' . |
Definition at line 267 of file token.hpp.
|
inline |
Tests if this token was well defined.
Defines or redefines this token by parsing the attributes from the given sub-string. This method is usually invoked by code that loads tokens and other data from resources of ALib Camp objects.
The expected format is defined as a list of the following values, separated by the character given with parameter separator :
's'
and 'i'
) and itself is not parsed taking letter case into account.0
specifies that no abbreviation must be done and therefore is the same as specifying the exact length of the segment.definition | The input string. |
separator | Separation character used to parse the input. Defaults to ';' . |
Definition at line 70 of file token.cpp.
|
protected |
|
inline |
Returns the format of this token.
|
inline |
Returns the minimum length to be read. In case that this token is not of snake_case, kebab-case or CamelCase naming scheme, only 0
is allowed for parameter idx and this defines the minimal abbreviation length. If one of the naming schemes applies, parameter idx may be as high as the number of segments found in the name (and a maximum of 6
, as this class supports only up to seven segments).
The first index that exceeds the number of segments, will return -1
for the length. If even higher index values are requested, then the returned value is undefined.
idx | The index of the minimum length to receive. |
|
inline |
Returns the "raw" name of the token as given with Define, respectively with one of the constructors.
0
, the last character of the name will be converted to lower case.
|
inlinestatic |
Shortcut to LoadResourcedTokens that accepts a module and uses its resource pool and resource category.
module | The ALib Camp to load the resource from. |
resourceName | The resource name. |
target | The table to fill. |
dbgSizeVerifier | This parameter has to be specified only in debug comilations and provides the expected size of the resourced table. To be surrounded by macro ALIB_DBG (not to be given in release builds.) |
outerSeparator | The character that separates the entries. Defaults to ',' . |
innerSeparator | The character that separates the values of an entry. Defaults to ' ' (space). |
|
static |
Static utility function that defines a table of token objects from external resourced strings.
It is possible to provide the table lines in two ways:
0
to the resource name, parses a single record and increments the index. Parsing ends when a resource with a next higher index is not found.The second option is recommended for larger token sets. While the separation causes some overhead in a resource backend, the external (!) management (translation, manipulation, etc.) is most probably simplified with this approach.
resourcePool | The resource pool to load the resource from. |
resourceCategory | The resource category. |
resourceName | The resource name. |
target | The table to fill. |
dbgSizeVerifier | This parameter has to be specified only in debug builds and provides the expected size of the resourced table. To be surrounded by macro ALIB_DBG (not to be given in release builds.) |
outerSeparator | The character that separates the entries. Defaults to ',' . |
innerSeparator | The character that separates the values of an entry. Defaults to ' ' (space). |
bool Match | ( | const String & | needle | ) |
|
inline |
Returns the letter case sensitivity of this token.