Tokens in the context of ALib Strings, are human-readable "words" or "symbols" that represent a certain value or entity of a software. Tokens may be used with configuration files, mathematical or general expressions, programming languages, communication protocols and so forth.
This struct contains attributes to describe a token, a method to parse the attributes from a (resource) string and finally method Match that matches a given string against the token definition.
With the construction, respectively the definition of a token, special formats are detected. These formats are:
The format detection is only performed when more than one minimum length is given. In this case, the number of "segments" (e.g., "camel humps") has to match the number of length values.
Independent of the token format (normal or snake_case, kebab-case, CamelCase), character case sensitivity can be chosen. With CamelCase and case-sensitive parsing, the first character of the first hump may be defined lower or upper case (called "lowerCamelCase" vs. "UpperCamelCase").
If none of the special formats is detected, the tokens can optionally be abbreviated by just providing a minimum amount of starting characters as specified by the then single entry in minLengths. Otherwise, each segment of the token (e.g., "camel hump") can (again optionally) be shortened on its own. As an example, if for token "SystemProperty"
the minimum lengths given are 3
and 4
, the minimum abbreviation is "SysProp"
, while "SystProper"
also matches.
This class supports minimum length definitions for up to 7
"camel humps", respectively segments. Should a name contain even more segments, those cannot be abbreviated. Providing more than 7
values for minimum segment lengths with the definition string results in a definition error (see below).
The minimum length values provided must be greater than 0
, with one exception: With CamelCase format and case-insensitive definition, the last "camel hump" may have a minimum length of 0
and hence may be omitted when matched. If so, the "normalized" version of the token, which can be received by appending an instance to an AString, will have the last letter of the defined name converted to lower case.
The rationale for this specific approach is to support the English plural case. This can be best explained in a sample. If a token was defined using definition string:
"MilliSecondS Ignore 1 1 0"
then all of the following words match:
milliseconds MilliSecs millis MSec MSecs MSs ms
In the case that the rightfully (normalized) spelled token name is to be written, then with the last character converted to lower case, the token becomes
MilliSeconds
This is performed with methods GetExportName (which is also used by the specialization of functor T_Append for this type. Hence, when appending a Token to an AString, if omitable, the last character of the token name is converted to lower case.
If the above is not suitable, or for any other reasons a different "normalized" name is wanted when writing the token, then method Define offers a next mechanism to explicitly define any custom string to be written.
CamelCase supports a simple "rollback" mechanism, which is needed for example for token
"SystemTemperature Ignore 1 1 0"
and given match argument
system
All six characters are matching the first hump, but then there are not characters left to match the start of the second hump "Temperature"
. In this case, a loop of retries is performed by rolling back characters from the back of the hump ('m'
) and ending with the first optional character of that hump ('y'
). The loop will be broken when character 't'
is found.
However: This is not continued in the case that the term that was rolled back does not match, yet. This means, that certain (very unlikely!) tokens, with nested repeating character sequences in camel humps, cannot be abbreviated to certain (unlikely wanted) lengths.
The definition strings passed to method Define are considered static (resourced) data. In other words, this definition data should be compile-time defined and not be customizable by end-users, but only by experts. Therefore, only in debug-compilations of the library, a due testing of correctness of the definitions is available.
The source code of static utility method LoadResourcedTokens demonstrates how error codes defined with enumeration DbgDefinitionError can be handled in debug-compilations by raising debug-assertions.
#include <token.hpp>
Public Type Index: | |
enum class | DbgDefinitionError : int8_t { OK = 0 , EmptyName = - 1 , ErrorReadingSensitivity = - 2 , ErrorReadingMinLengths = - 3 , TooManyMinLengthsGiven = - 4 , InconsistentMinLengths = - 5 , NoCaseSchemeFound = - 6 , MinLenExceedsSegmentLength = - 7 , DefinitionStringNotConsumed = - 8 , ZeroMinLengthAndNotLastCamelHump = - 9 } |
enum class | Formats : int8_t { Normal = 0 , SnakeCase = 2 , KebabCase = 4 , CamelCase = 8 } |
Format types detected with detectFormat. More... | |
Public Static Method Index: | |
static void | LoadResourcedTokens (lang::Camp &module, const NString &resourceName, strings::util::Token *target, int dbgSizeVerifier, character outerSeparator=',', character innerSeparator=' ') |
static ALIB_API void | LoadResourcedTokens (lang::resources::ResourcePool &resourcePool, const NString &resourceCategory, const NString &resourceName, strings::util::Token *target, int dbgSizeVerifier, character outerSeparator=',', character innerSeparator=' ') |
Public Method Index: | |
Token () | |
Parameterless constructor. Creates an "undefined" token. | |
Token (const String &definitionSrc, character separator=';') | |
ALIB_API | Token (const String &name, lang::Case sensitivity, int8_t minLength, const String &exportName=NULL_STRING) |
ALIB_API | Token (const String &name, lang::Case sensitivity, int8_t minLength1, int8_t minLength2, int8_t minLength3=-1, int8_t minLength4=-1, int8_t minLength5=-1, int8_t minLength6=-1, int8_t minLength7=-1) |
DbgDefinitionError | DbgGetError () |
ALIB_API void | Define (const String &definition, character separator=';') |
const String & | GetDefinitionName () const |
ALIB_API void | GetExportName (AString &target) const |
Formats | GetFormat () const |
int8_t | GetMinLength (int idx) const |
ALIB_API bool | Match (const String &needle) |
lang::Case | Sensitivity () const |
Protected Static Field Index: | |
static constexpr Formats | ignoreCase = Formats(1) |
Letter case sensitivity. This is combined with the format bits. | |
Protected Field Index: | |
String | definitionName |
The tokens' definition string part. | |
String | exportName = NULL_STRING |
The tokens' optional explicit export name. | |
Formats | format |
Defines the "case type" as well as the letter case sensitivity of this token. | |
int8_t | minLengths [7] = {0,0,0,0,0,0,0} |
Protected Method Index: | |
ALIB_API void | detectFormat () |
Detects snake_case, kebab-case or CamelCase. | |
|
strong |
Error codes which which are written in field format in the case that method Define suffers a parsing error.
This enum, as well as the error detection is only available in debug-compilations of the library.
|
strong |
Format types detected with detectFormat.
Enumerator | |
---|---|
Normal | Normal, optionally abbreviated words. |
SnakeCase | snake_case using underscores. |
KebabCase | kebab-case using hyphens. |
CamelCase | UpperCamelCase or lowerCamelCase. |
|
protected |
|
protected |
|
protected |
|
protected |
|
inline |
Token | ( | const String & | name, |
lang::Case | sensitivity, | ||
int8_t | minLength, | ||
const String & | exportName = NULL_STRING ) |
Constructor used with function names that do not contain snake_case, kebab-case or CamelCase name scheme.
name | The function name. |
sensitivity | The letter case sensitivity of reading the function name. |
minLength | The minimum starting portion of the function name to read.. |
exportName | An optional export name. If not given, the name is used with method GetExportName. |
Definition at line 29 of file token.cpp.
Token | ( | const String & | name, |
lang::Case | sensitivity, | ||
int8_t | minLength1, | ||
int8_t | minLength2, | ||
int8_t | minLength3 = -1, | ||
int8_t | minLength4 = -1, | ||
int8_t | minLength5 = -1, | ||
int8_t | minLength6 = -1, | ||
int8_t | minLength7 = -1 ) |
Constructor with at least two minimum length values, used to define tokens that follow snake_case, kebab-case or CamelCase naming schemes.
name | The function name. |
sensitivity | The letter case sensitivity of reading the function name. |
minLength1 | The minimum starting portion of the first segment to read. |
minLength2 | The minimum starting portion of the second segment to read. |
minLength3 | The minimum starting portion of the third segment to read. Defaults to 1 . |
minLength4 | The minimum starting portion of the fourth segment to read. Defaults to 1 . |
minLength5 | The minimum starting portion of the fifth segment to read. Defaults to 1 . |
minLength6 | The minimum starting portion of the sixth segment to read. Defaults to 1 . |
minLength7 | The minimum starting portion of the seventh segment to read. Defaults to 1 . |
Definition at line 46 of file token.cpp.
Constructor using a (usually resourced) string to read the definitions. Invokes Define.
definitionSrc | The input string. |
separator | Separation character used to parse the input. Defaults to ';' . |
Definition at line 266 of file token.hpp.
|
inline |
Tests if this token was well defined.
Defines or redefines this token by parsing the attributes from the given substring. This method is usually invoked by code that loads tokens and other data from resources of ALib {lang;Camp} objects.
The expected format is defined as a list of the following values, separated by the character given with parameter separator:
's'
and 'i'
) and itself is not parsed taking letter case into account.0
specifies that no abbreviation must be done and therefore is the same as specifying the exact length of the segment.definition | The input string. |
separator | Separation character used to parse the input. Defaults to ';' . |
Definition at line 86 of file token.cpp.
|
protected |
|
inline |
Returns the definition name used for parsing the token.
void GetExportName | ( | AString & | target | ) | const |
If field exportName is not nulled (hence explicitly given with resourced definition string or with a constructor), this is appended.
Otherwise appends the result of Token::GetDefinitionName to the target. If the token is defined CamelCase and the minimum length of the last segment is defined 0
, then the last character written is converted to lower case.
As a result, in most cases it is not necessary to provide a specific exportName with the definition. Instead, this method should provide a reasonable output.
target | The AString that method Append was invoked on. |
Definition at line 58 of file token.cpp.
|
inline |
Returns the format of this token.
|
inline |
Returns the minimum length to be read. In case that this token is not of snake_case, kebab-case or CamelCase naming scheme, only 0
is allowed for parameter idx and this defines the minimal abbreviation length. If one of the naming schemes applies, parameter idx may be as high as the number of segments found in the name (and a maximum of 6
, as this class supports only up to seven segments).
The first index that exceeds the number of segments, will return -1
for the length. If even higher index values are requested, then the returned value is undefined.
idx | The index of the minimum length to receive. |
|
inlinestatic |
Shortcut to LoadResourcedTokens that accepts a module and uses its resource pool and resource category.
module | The ALib Camp to load the resource from. |
resourceName | The resource name. |
target | The table to fill. |
dbgSizeVerifier | This parameter has to be specified only in debug comilations and provides the expected size of the resourced table. To be surrounded by macro ALIB_DBG (not to be given in release-builds.) |
outerSeparator | The character that separates the entries. Defaults to ',' . |
innerSeparator | The character that separates the values of an entry. Defaults to ' ' (space). |
|
static |
Static utility function that defines a table of token objects from external resourced strings.
It is possible to provide the table lines in two ways:
0
to the resource name, parses a single record and increments the index. Parsing ends when a resource with a next higher index is not found.The second option is recommended for larger token sets. While the separation causes some overhead in a resource backend, the external (!) management (translation, manipulation, etc.) is most probably simplified with this approach.
resourcePool | The resource pool to load the resource from. |
resourceCategory | The resource category. |
resourceName | The resource name. |
target | The table to fill. |
dbgSizeVerifier | This parameter has to be specified only in debug-builds and provides the expected size of the resourced table. To be surrounded by macro ALIB_DBG (not to be given in release-builds.) |
outerSeparator | The character that separates the entries. Defaults to ',' . |
innerSeparator | The character that separates the values of an entry. Defaults to ' ' (space). |
bool Match | ( | const String & | needle | ) |
|
inline |
Returns the letter case sensitivity of this token.