ALib C++ Library
Library Version: 2312 R0
Documentation generated by doxygen
Public Fields | Public Methods | Protected Fields | List of all members
TTokenizer< TChar > Class Template Reference

#include <tokenizer.hpp>

Collaboration diagram for TTokenizer< TChar >:
[legend]

Class Description

template<typename TChar>
class aworx::lib::strings::util::TTokenizer< TChar >


This class operates on strings which contains data separated by a delimiter character. It identifies the sub-strings between the delimiters as tokens of type Substring. After an instance of this class is constructed, three methods are available:

After a token was retrieved, it might be modified using the interface of class Substring as the tokenizer does not rely on the bounds of the current token when receiving the next. Furthermore, even field Rest is allowed to be changed using the interface of Substring if it seems appropriate. The effect is the same as if method Set was invoked to apply a different source string.

Objects of this class can be reused by freshly initializing them using method Set.

Sample code:
The following code sample shows how to tokenize a string:

// data string to tokenize
String data= A_CHAR("test; abc ; 1,2 , 3 ; xyz ; including;separator");
// create tokenizer on data with ';' as delimiter
Tokenizer tknzr( data, ';' );
// read tokens
cout << tknzr.Next() << endl; // will print "test"
cout << tknzr.Next() << endl; // will print "abc"
cout << tknzr.Next() << endl; // will print "1,2 , 3"
// tokenize actual (third) token (nested tokenizer)
Tokenizer subTknzr( tknzr.Actual, ',');
cout << subTknzr.Next();
while( subTknzr.HasNext() )
cout << '~' << subTknzr.Next();
cout << endl;
// continue with the main tokenizer
cout << tknzr.Next() << endl; // will print "xyz"
// grab the rest, as we know that the last token might include our separator character
cout << tknzr.GetRest() << endl; // will print "including;separator"

The output will be:

test
abc
1,2 , 3
1~2~3
xyz
including;separator
Template Parameters
TCharThe character type. Implementations for nchar and wchar are provided with type definitions aworx::TokenizerN and aworx::TokenizerW.

Definition at line 66 of file tokenizer.hpp.

Public Fields

TSubstring< TChar > Actual
 
TSubstring< TChar > Rest
 
TLocalString< TChar, 8 > TrimChars
 

Public Methods

 TTokenizer ()
 
 TTokenizer (const TString< TChar > &src, TChar delimiter, bool skipEmptyTokens=false)
 
TSubstring< TChar > & GetRest (Whitespaces trimming=Whitespaces::Trim)
 
bool HasNext ()
 
ALIB_API TSubstring< TChar > & Next (Whitespaces trimming=Whitespaces::Trim, TChar newDelim='\0')
 
void Set (const TString< TChar > &src, TChar delimiter, bool skipEmptyTokens=false)
 

Protected Fields

TChar delim
 
bool skipEmpty
 

Constructor & Destructor Documentation

◆ TTokenizer() [1/2]

TTokenizer ( )
inline

Constructs an empty tokenizer. To initialize, method Set needs to be invoked.

Definition at line 111 of file tokenizer.hpp.

◆ TTokenizer() [2/2]

TTokenizer ( const TString< TChar > &  src,
TChar  delimiter,
bool  skipEmptyTokens = false 
)
inline

Constructs a tokenizer to work on a given string.

Parameters
srcThe string to be tokenized.
delimiterThe delimiter that separates the tokens. Can be changed with every next token.
skipEmptyTokensIf true, empty tokens are omitted. Optional and defaults to false.

Definition at line 123 of file tokenizer.hpp.

Member Function Documentation

◆ GetRest()

TSubstring<TChar>& GetRest ( Whitespaces  trimming = Whitespaces::Trim)
inline

Returns the currently remaining string (without searching for further delimiter characters). After this call HasNext will return false and Next will return a nulled Substring.

Parameters
trimmingDetermines if the token is trimmed in respect to the white space characters defined in field Whitespaces. Defaults to Whitespaces.Trim.
Returns
The rest of the original source string, which was not returned by Next(), yet.

Definition at line 184 of file tokenizer.hpp.

◆ HasNext()

bool HasNext ( )
inline

If this returns true, a call to Next will be successful and will return a Substring which is not nulled.

Returns
true if a next token is available.

Definition at line 199 of file tokenizer.hpp.

◆ Next()

template ALIB_API TSubstring< wchar > & Next ( Whitespaces  trimming = Whitespaces::Trim,
TChar  newDelim = '\0' 
)

Returns the next token, which is afterwards also available through field Actual. If no further token was available, the returned Substring will be nulled. (see String::IsNull). To prevent this, the availability of a next token should be checked using method HasNext().

For clarification, see the explanation and sample code in this classes documentation.

Parameters
trimmingDetermines if the token is trimmed in respect to the white space characters defined in field Whitespaces. Defaults to Whitespaces.Trim.
newDelimThe delimiter separates the tokens. Defaults to 0, which keeps the current delimiter intact. A new delimiter can be provided for every next token.
Returns
true if a next token was available, false if not.

Definition at line 18 of file tokenizer.cpp.

◆ Set()

void Set ( const TString< TChar > &  src,
TChar  delimiter,
bool  skipEmptyTokens = false 
)
inline

Resets a tokenizer to work on a given string.

Parameters
srcThe string to be tokenized
delimiterThe delimiter that separates the tokens. Can be changed with every next token.
skipEmptyTokensIf true, empty tokens are omitted. Optional and defaults to false.

Definition at line 144 of file tokenizer.hpp.

Member Data Documentation

◆ Actual

TSubstring<TChar> Actual

The actual token, which is returned with every invocation of Next() or Rest(). It is allowed to manipulate this field any time.

Definition at line 84 of file tokenizer.hpp.

◆ delim

TChar delim
protected

The most recently set delimiter used by default for the next token extraction.

Definition at line 98 of file tokenizer.hpp.

◆ Rest

TSubstring<TChar> Rest

A Substring that represents the part of the underlying data that has not been tokenized, yet. It is allowed to manipulate this public field, which has a similar effect as using method Set.

Definition at line 78 of file tokenizer.hpp.

◆ skipEmpty

bool skipEmpty
protected

If true, empty tokens are omitted.

Definition at line 101 of file tokenizer.hpp.

◆ TrimChars

TLocalString<TChar, 8> TrimChars

The white spaces characters used to trim the tokens. Defaults to aworx::DefaultWhitespaces

Definition at line 90 of file tokenizer.hpp.


The documentation for this class was generated from the following files:
aworx::String
lib::strings::TString< character > String
Type alias in namespace aworx.
Definition: strings/fwds.hpp:81
aworx::Tokenizer
lib::strings::util::TTokenizer< character > Tokenizer
Type alias in namespace aworx.
Definition: tokenizer.hpp:213
A_CHAR
#define A_CHAR(STR)