Description:

template<typename TChar>
class alib::strings::util::TTokenizer< TChar >

This class operates on strings which contains data separated by a delimiter character. It identifies the substrings between the delimiters as tokens of type Substring. After an instance of this class is constructed, three methods are available:

HasNext: Indicates if there are further tokens available.
Next: Sets field Actual (which is of type Substring) to reference the next token and returns it.
With each call to Next, a different delimiter can be provided, which then serves as the delimiter for this and subsequent tokens.
The returned token by default will be trimmed according to the current trimable characters.
Rest: Like Next, however returns the complete remaining region without searching for further delimiters (and tokens).
After this method was invoked, HasNext() will return false.

After a token was retrieved, it might be modified using the interface of class Substring as the tokenizer does not rely on the bounds of the current token when receiving the next. Furthermore, even field Rest is allowed to be changed using the interface of Substring if it seems appropriate. The effect is the same as if method Set was invoked to apply a different source string.

Objects of this class can be reused by freshly initializing them using method Set.

Sample code:
The following code sample shows how to tokenize a string:

    // data string to tokenize
    String data=  A_CHAR("test;  abc ; 1,2 , 3 ; xyz ; including;separator");
 
    // create tokenizer on data with ';' as delimiter
    Tokenizer tknzr( data, ';' );
 
    // read tokens
    cout << tknzr.Next() << endl; // will print "test"
    cout << tknzr.Next() << endl; // will print "abc"
    cout << tknzr.Next() << endl; // will print "1,2 , 3"
 
    // tokenize actual (third) token (nested tokenizer)
    Tokenizer subTknzr( tknzr.Actual,  ',');
    cout << subTknzr.Next();
 
    while( subTknzr.HasNext() )
        cout << '~' << subTknzr.Next();
 
    cout << endl;
 
    // continue with the main tokenizer
    cout << tknzr.Next() << endl; // will print "xyz"
 
    // grab the rest, as we know that the last token might include our separator character
    cout << tknzr.GetRest()      << endl; // will print "including;separator"

The output will be:

test
abc
1,2 , 3
1~2~3
xyz
including;separator

Template Parameters

TChar The character type. Implementations for nchar and wchar are provided with type definitions alib::TokenizerN and alib::TokenizerW.

Definition at line 59 of file tokenizer.hpp.

#include <tokenizer.hpp>

Collaboration diagram for TTokenizer< TChar >:

[legend]

Public Field Index:
TSubstring< TChar >	Actual

TSubstring< TChar >	Rest

TLocalString< TChar, 8 >	TrimChars

Public Method Index:
	TTokenizer ()
	Constructs an empty tokenizer. To initialize, method Set needs to be invoked.

	TTokenizer (const TString< TChar > &src, TChar delimiter, bool skipEmptyTokens=false)

TSubstring< TChar > &	GetRest (lang::Whitespaces trimming=lang::Whitespaces::Trim)

bool	HasNext ()

ALIB_API TSubstring< TChar > &	Next (lang::Whitespaces trimming=lang::Whitespaces::Trim, TChar newDelim='\0')

void	Set (const TString< TChar > &src, TChar delimiter, bool skipEmptyTokens=false)

Protected Field Index:
TChar	delim
	The most recently set delimiter used by default for the next token extraction.

bool	skipEmpty
	If `true`, empty tokens are omitted.

Field Details:

◆ Actual

template<typename TChar >

TSubstring<TChar> Actual

The actual token, which is returned with every invocation of Next() or Rest(). It is allowed to manipulate this field any time.

Definition at line 73 of file tokenizer.hpp.

◆ delim

template<typename TChar >

TChar delim

protected

The most recently set delimiter used by default for the next token extraction.

Definition at line 85 of file tokenizer.hpp.

◆ Rest

template<typename TChar >

TSubstring<TChar> Rest

A Substring that represents the part of the underlying data that has not been tokenized, yet. It is allowed to manipulate this public field, which has a similar effect as using method Set.

Definition at line 69 of file tokenizer.hpp.

◆ skipEmpty

template<typename TChar >

bool skipEmpty

protected

If true, empty tokens are omitted.

Definition at line 88 of file tokenizer.hpp.

◆ TrimChars

template<typename TChar >

TLocalString<TChar, 8> TrimChars

The white spaces characters used to trim the tokens. Defaults to alib::DEFAULT_WHITESPACES

Definition at line 77 of file tokenizer.hpp.

Constructor(s) / Destructor Details:

◆ TTokenizer() [1/2]

template<typename TChar >

TTokenizer ( )

inline

Constructs an empty tokenizer. To initialize, method Set needs to be invoked.

Definition at line 98 of file tokenizer.hpp.

◆ TTokenizer() [2/2]

template<typename TChar >

TTokenizer	(	const TString< TChar > &	src,
		TChar	delimiter,
		bool	skipEmptyTokens = false )

inline

Constructs a tokenizer to work on a given string.

Parameters

src	The string to be tokenized.
delimiter	The delimiter that separates the tokens. Can be changed with every next token.
skipEmptyTokens	If `true`, empty tokens are omitted. Optional and defaults to `false`.

Definition at line 110 of file tokenizer.hpp.

Method Details:

◆ GetRest()

template<typename TChar >

TSubstring< TChar > & GetRest ( lang::Whitespaces trimming = lang::Whitespaces::Trim )

inline

Returns the currently remaining string (without searching for further delimiter characters). After this call HasNext will return false and Next will return a nulled Substring.

Parameters

trimming Determines if the token is trimmed in respect to the white space characters defined in field TrimChars. Defaults to Whitespaces.Trim.

Returns: The rest of the original source string, which was not returned by Next(), yet.

Definition at line 172 of file tokenizer.hpp.

◆ HasNext()

template<typename TChar >

bool HasNext ( )

inline

If this returns true, a call to Next will be successful and will return a Substring which is not nulled.

Returns: true if a next token is available.

Definition at line 187 of file tokenizer.hpp.

◆ Next()

template<typename TChar >

template ALIB_API TSubstring< wchar > & Next	(	lang::Whitespaces	trimming = lang::Whitespaces::Trim,
		TChar	newDelim = '\0' )

Returns the next token, which is afterwards also available through field Actual. If no further token was available, the returned Substring will be nulled. (see String::IsNull). To prevent this, the availability of a next token should be checked using method HasNext().

For clarification, see the explanation and sample code in this classes documentation.

Parameters

trimming	Determines if the token is trimmed in respect to the white space characters defined in field TrimChars. Defaults to Whitespaces.Trim.
newDelim	The delimiter separates the tokens. Defaults to 0, which keeps the current delimiter intact. A new delimiter can be provided for every next token.

Returns: The next token as Substring. A nulled string is if no next token was available.

Definition at line 16 of file tokenizer.cpp.

◆ Set()

template<typename TChar >

void Set	(	const TString< TChar > &	src,
		TChar	delimiter,
		bool	skipEmptyTokens = false )

inline

Resets a tokenizer to work on a given string.

Parameters

src	The string to be tokenized
delimiter	The delimiter that separates the tokens. Can be changed with every next token.
skipEmptyTokens	If `true`, empty tokens are omitted. Optional and defaults to `false`.

Definition at line 131 of file tokenizer.hpp.

The documentation for this class was generated from the following files:

Description:

Public Field Index:

Public Method Index:

Protected Field Index:

Field Details:

◆ Actual

◆ delim

◆ Rest

◆ skipEmpty

◆ TrimChars

Constructor(s) / Destructor Details:

◆ TTokenizer() [1/2]

◆ TTokenizer() [2/2]

Method Details:

◆ GetRest()

◆ HasNext()

◆ Next()

◆ Set()