ALib C++ Library
Library Version: 2402 R1
Documentation generated by doxygen
Loading...
Searching...
No Matches
ALib Module Expressions - Programmer's Manual

Table of Contents

1. Introduction

1.1 Goals

The goal of this ALib Module is to provide a C++ library that helps to integrate functionality in custom software to allow end users to write expression strings, which are understood and evaluated at run-time by that software.

Usually, to achieve this, it is needed to

This is of-course a lot of work and a month of programming time quickly is consumed, unless a programmer has done this several times before.

Lets quickly consider two samples.

  1. An application that processes files and folders. The end-user should be allowed to write "filter" expressions like:
         ( date > today - days(7) )  & (name = "*.jpg")
         isFolder & notEmpty
    
  2. An application that manages a table of employee data. The end-user should be allowed to write expressions like:
         (StartDate + years(10) < today   & (NumberOfSalaryRaises = 0)
         StartDate( find( "John", "Miller", "Accounting" ) )
         BirthDate( find( 832735 ) )
    
    While on the first glance, these are very different types of expressions, they still have a lot in common:
  • They use functions and identifiers like date, name or find()
  • They do calculations like today - days(7).
  • They use comparisons like <, > or =
  • They use boolean operators like & or |
  • They use brackets to overrule operator precedences or just as a redundant helper for better readability

The areas where the expressions of the two samples differ is:

  • They offer different named functions and identifiers.
  • Such functions and identifiers may return custom types. Those types may be intermediate results as well as results of the expression.
  • The operators and functions used are eventually defined ("overloaded") for custom types.

With this said, we can much better explain what module ALib Expressions offers:

"%ALib %Expressions provide an expression string parser, formatter and evaluator using customizable operators, identifiers and functions which support to process or return built-in and custom types."

You will see later in this documentation, that the amount of coding needed to implement functionality like given in the samples above is surprisingly low.

1.2 Pros and Cons: When To Use ALib Expressions

To give you some help in deciding whether module ALib Expressions suits your needs, the "pros" and "cons" should be listed in bullets. We start with the cons:

Reasons to NOT use ALib Expressions Library

  • The syntax and grammar rules of expressions is rather fixed along the lines of C++ expressions.
    This is mitigated by several tweaks and options available and furthermore the possibility to define custom operator symbols and changing the precedence of existing and custom operators.
  • ALib Expressions builds on other core modules of ALib .
    Why is this a disadvantage? Well, if you are used to ALib it is absolutely not. If not, you may have to learn also some basics of "underlying" ALib Modules that this module builds on.
    Especially important modules to name are ALib Boxing and ALib Strings .

The pros should be given as a feature list:

Features of ALib Expressions

  • Free software, boost open source license.
  • Well tested under GNU/Linux, Windows OS and MacOs.
  • Very fast, handwritten code, no generation tools or 3rd party libraries needed for the built.
  • Complete coverage of expression syntax along the lines of C++ expressions
    All operators implemented, including:
    • Ternary, conditional Q ? T : F
    • Elvis operator A ?: B
    • Array subscript operator [] to access array elements.
      This may also be used as hash-map access operator to form expressions like:
         Preferences["DATA_FOLDER"] + "/database.dat"
      
  • All functions and operators can be "overloaded" to support custom types without interfering existing functions and operators.
  • Verbal operators like "not", "and", "equals" or "greater" can be defined.
  • Optional localization of operator names, identifiers, functions, the number format, etc.
  • More than 130 built-in functions and 180 (overloaded) operators!
    Areas that are covered:

    • Boolean, integer and floating point arithmetics.
    • Math functions.
    • String manipulation, including wildcard and regex matching.
    • Date and time functions.
    • File and directory filtering and inspection. (Brought with sibling ALib module ALib Files .)

    As a sample, the following expression:

    Format("Result: {}", GetDayOfWeek( today + Years(42) ) * int( remainder( PI * exp( sin( E ) ), 1.2345) * random ) % 7 ) != ""

    compiles with (optional) built-in functionality. (Compile time less than 40 µs, evaluation time less 15 µs, on a year 2018 developer machine.)

  • All built-in identifiers, functions and operators are optional/configurable.
  • Support for n-ary and variadic custom functions, including "ellipsis" parameter definitions, like in Format(formatString, ...).
  • Optional definition of custom operators, including custom parsing precedence.
  • Support of nested expressions, which is support of "named" expressions that are recursively referred to from within other (named or anonymous) expressions.
    Supports mechanics to externally define nested expressions using command line parameters, environment variables or within arbitrary (custom) configuration resources, e.g INI-files.
  • Easy use, integration and customization of the library. (This is proved in the tutorial sections below).
  • Compile-time type safety
    Note
    What does this mean and why is this important? Because almost all malformed expression input (by end-users) is detected at "compile-time" of the expression. This way, a software can tell a user that an expression is malformed (almost always) already in the moment that a user announces an expression to the software. With that, a software can in turn reject the expression before taking any action to start working with it.
    The other way round: Once an expression got compiled, its evaluation is deemed to succeed.
  • "Seamless" support of arbitrary custom types within expressions. Types digested by expressions can be any C++ type (class).
    Note
    Custom types are "introduced" to module ALib Expressions just by having custom identifiers, functions and/or operators return them!
    To then further "support" these types, operators and functions can to be added (or overloaded) to work with the types. Of-course, the result of expressions can be of such arbitrary types as well.
  • Support of automatic type cast of built-in types as well as custom types.
    (This reduces the amount of needed "permutations" of overloaded operators and the types they support, and thus the time to customize).
  • Localization of number formats in expression literals, including thousands separator character.
  • All identifier and function names are "resourced" and can be changed without touching the library code.
  • Identifier and function support optional abbreviation and case sensitivity. If "CamelCase" or "snake_case" formats are used, minimum lengths can be defined for each "camel hump", respectively"snake segment".
  • Decimal, hexadecimal, binary and octal integer number literals. Scientific and normal floating point parsing and formatting.
  • Largely configurable normalization of user-defined expression strings. Configuration offers a choice of more than 30 options , including:
    • Removal of redundant brackets and whitespaces (not optional, always performed)
    • Addition of redundant brackets that make expression more readable (several sub-options).
    • Addition of whitespaces for better readability (several sub-options)
    • Replacement of 'alias' operators (e.g. & on boolean converts to && or assign = converts to ==).
    • Replacement of abbreviations of identifiers and functions to their full name.
  • Configurable compiler options, some most obvious ones with simple flags. For example to allow comparison operator '==' to be aliased by assign operator '=', which is more intuitive to end-users.
  • Very fast expression evaluation

    • Expressions get compiled to a "program" which are executed by an extremely lightweight built-in virtual machine . This avoids the otherwise needed evaluation based on an "abstract syntax tree" with expensive recursive invocations of virtual functions.
    • The expression compiler performs various optimizations. For example, expression
         2 * 3 + 4
      
      results in one single program command that provides the constant result 10.
      Compile-time optimization is also supported with custom identifiers, functions and operators.
  • Optional decompilation of expression programs. This can be used for generating a normalized expression string of the optimized expression. (Just needed if you are mean enough to tell your user about the redundancies in his/her given expressions :-)
  • Throws detailed exceptions (exceptions with additional information collected along the stacktrace) that contain information that can be displayed to the user to help finding errors in given expressions.
    All exceptions and other strings are resourced and can be changed and/or translated to target languages.
  • Generation of formatted, commented listings of the compiled expression programs.
    (Available with debug-compilations only. Offered just for curious users of the library that are eager to view the simplistic beauty of a stack machine.)
  • Extensive documentation. (Please excuse verbosity, writing docs inspires us to do better code.)

2. Tutorial: Hello Calculator

This documentation switches between in-depth informational sections and tutorial-like sample sections. Let's start with a quick tutorial section!

What is "hello world" for each new programming language is a "simple calculator" for expression compilers. Here is the code for implementing one using module ALib Expressions :

// Using the expression compiler in this code entity, as well as the evaluation of expressions.
// Get support for writing expression result values to std::cout
// ALib Exception class
// ALib module initialization (has to be done in main())
// std::cout
#include <iostream>
using namespace std;
using namespace alib;
//----- The Command Line Calculator Program -----
int main( int argc, const char **argv )
{
// 0. Initialize ALib (this has to be done once at bootstrap with any software using ALib)
alib::ArgC= argc;
alib::ArgVN= argv;
// 1. Create a defaulted expression compiler. This adds all built-in stuff, like number
// arithmetics, strings, time/date, etc.
Compiler compiler;
compiler.SetupDefaults();
// 2. Compile. Catch exceptions (must not trust user input)
SPExpression expression;
try
{
ALIB_STRINGS_FROM_NARROW( argv[1], expressionString, 256 )
expression= compiler.Compile( expressionString );
}
catch (Exception& e)
{
cout << "An exception occurred compiling the expression. Details follow:" << endl
<< e << endl;
return static_cast<int>( e.Type().Integral() );
}
// 3. We need an evaluation "scope"
// (later we will use a custom type here, that allows custom identifiers, functions and
// operators to access application data)
// 4. Evaluate the expression
// (We must not fear exceptions here, as the compiler did all type checking, and resolved
// everything to a duly checked internal "program" running on a virtual machine.)
Box result= expression->Evaluate( scope );
// 5. Write result
cout << "Input: " << expression->GetOriginalString() << endl;
cout << "Normalized: " << expression->GetNormalizedString() << endl;
cout << "Result: " << result << endl;
// 6. Terminate library
return 0;
}

Compile the program and run it by passing some simple sample expressions (or be lazy and just read on), we give it some tries:

Input:      1 + 2 * 3
Normalized: 1 + (2 * 3)
Result:     7

Fine, it calculates! Notable on this first simple sample are the brackets inserted in the what we call "normalized" expression string. Compare this to the next sample:

Input:      1 * 2 + 3
Normalized: 1 * 2 + 3
Result:     5

Why are the brackets gone here, while in the first case they had been redundant anyhow? The answer is, that human beings could easily misunderstand the first version, so module ALib Expressions feels free to help making an expression more readable.
You think this is childish? Ok, then what do you think about this expression:

       true && false == true < false

Either you are "a pro" or you need to consult a C++ reference manual and check for the operator precedence. Here is what our calculator says:

Input:      true && false == true < false
Normalized: true && (false == (true < false))
Result:     true

The insertion of redundant brackets is one of more than 30 normalization options that are switchable with enumeration flags.
The recent sample has more to show:

  • Boolean arithmetics and operators
  • Built-in identifiers, namely true and false.

Note, that we use the term "identifier" for parameterless expression functions. By default, the parameter brackets can be omitted with parameterless functions.

Functions with parameters are for example found in the area of maths:

Input:      asin(1.0) * 2.0
Normalized: asin( 1.0 ) * 2.0
Result:     3.141592653589793

or with string processing:

Input:      tolo("Hello ") + toup("World")
Normalized: ToLower( "Hello " ) + ToUpper( "World" )
Result:     hello WORLD

"tolo()" and "toup()"? Well, ALib Expressions support shortcuts for function names. Normalization optionally replaces abbreviated names.

Finally, a more complex string function sample:

Input:      Format( "Today is: {:yyyy/MM/dd}", today )
Normalized: Format( "Today is: {:yyyy/MM/dd}", Today )
Result:     Today is: 2024/03/20

As it can be seen, a whole lot of identifiers, functions and operators are already available with the simple calculator example. All of these built-in definitions can be switched off. In fact, the built-in stuff is implemented with the very same interface that custom extensions would be. The only difference between built-in expression identifiers, functions and operators and custom ones is that the built-in ones are distributed with the library.

To get an overview of the built-in functionality, you might have a quick look at the tables found in the reference documentation of the following classes:

Note
Camp module ALib Files introduces a further "plug-in", dedicated to expression functions working on mass-storage files. The functions are documented here .

3. Prerequisites

To fully understand this tutorial, library source code and finally as a prerequisite to implementing your custom expression compiler, a certain level of understanding of some underlying library and principles is helpful.

3.1 ALib Boxing

As mentioned in the introduction, module ALib Expressions make intensive use of underlying module ALib Boxing .

For the time being, lets quickly summarize what module ALib Boxing provides:

  • Encapsulates any C++ value or pointer in an object of type Box .
  • A box is very lightweight (3 x 8 bytes on a 64-bit system) and contains a copy of the value (if possible) or a pointer to the object that it capsules.
  • Construction of Boxes is seamless: Using template meta programming (TMP) and implicit constructors, values, "anything" can just be assigned to a box.
  • Similar features in other programming languages are called auto-boxing. It is especially useful if function arguments or return types are of type Box: Such function can be invoked with (almost) any parameter, without providing explicit conversions.
  • ALib Boxing is 100% type-safe: The boxed type can be queried and trying to unbox a wrong type, raises a run-time assertion (with debug builds).
  • ALib Boxing supports a sort of "virtual function" invocation on boxes. This means, that functions can be invoked on boxes without prior type-checking and/or unboxing of values. Such functions can simply be implemented (according to the required function signature) and then registered for a boxed-type.

For all details, comprehensive Programmer's Manual for ALib Boxing is available.

3.2 Type Definitions With "Sample Boxes"

The type-safety mechanisms and the possibilities of querying the type encapsulated in a box is used by module ALib Expressions in an inarguably lazy fashion: Wherever this expression library needs type information, such information is given as a "sample box" which is created with a sample value of the corresponding C++ type.

Consequently, the value stored (and passed with) the box is ignored and may even may become invalid after the creation of the box without any harm (for example in cases of pointer types).

While this approach causes a little overhead in run-time performance, the benefit in respect to simplification of the API surpasses any such penalty by far! Also, note that the performance drawback is restricted to the code that compiles an expression. During the evaluation, no "sample boxes" are created or passed.

The following code shows how to create sample boxes for some of the built-in standard types :

   Box sampleBool      =    false;
   Box sampleInteger   =        0;
   Box sampleFloat     =      0.0;
   Box sampleString    = String();

The values assigned in the samples are meaningless. Instead of false, the value true could be used and instead of 0.0, we could have written 3.1415. Note that the construction of the empty String instance, will even be optimized away by the C++ compiler in release compilations.

For custom types, there is no need for more efforts, as this code snippet demonstrates:

   struct Person
   {
       String   Name;
       int      Age;
       String   Street;
       String   City;
       String   PostCode;
   };

   Box samplePerson= Person();

By default, with ALib Boxing , non-trivial C++ types that do not fit into the small placeholder embedded in the box are boxed as pointers. This means that even as a value of a custom type was assigned to the box, a pointer to it is stored. In the sample above, the pointer will be invalid in the next line, but that is OK, as only the type information stored in the box is of interest.

Therefore, we can "simplify" the previous code to the following:

   Box samplePerson= reinterpret_cast<Person*>(0);

Besides the advantage that this omits the creation of an otherwise unused object, this approach is the only way, to get sample boxes of abstract C++ types!

The magic of module ALib Boxing makes life as simple as this! Let us preempt what is explained in the following chapters: All native callback functions to be implemented for custom operators, identifiers and functions are defined to return an object of type Box. Thus, these functions can return values of arbitrary custom type. The type of the returned (boxed) value has to correspond with what a custom CompilerPlugin suggested by providing a sample box at expression compile-time. Once understood, this is all very simple!

Note
Wherever possible, this library uses alias type definition Type instead of const Box& to indicate that a box received is a sample box and not a real value. However, sometimes it is not possible. In these cases the parameter or member itself, as well as the corresponding documentation will give a hint whether an object is a just a "sample box" or a boxed value.
For the built-in types, static one-time sample boxes are defined with struct Types . It is recommended to use those and, if custom types are introduced, create one singleton sample box for each custom type in a similar fashion. This approach makes the code smaller, because mostly only a reference to the static box is passed, and the creation of a sample box on the stack is avoided. Also the use of static constant objects in bulk-information-tables (introduced later), allows the compiler to build static compile-time tables.
Attention
If sample boxes for custom types should be globally defined and initialized, likewise the built-in ones found in Types , an "optimization step" has to be performed. Details are given in chapter 12.2 Optimizations With Static VTables of the Programmer's Manual of ALib Boxing .
Note that this is required only if your custom code has to be able to reside in read-only memory, e.g. with embedded systems. (It can of-course also be done by enthusiasts to minimize an executables code size, or just for the fun this brings!)
If these optimizations are not performed, global or static sample boxes have to be default constructed with their definition and then initialized with the right sample value only at bootstrap of the library, preferably in the constructor of a custom CompilerPlugin that introduces these types.

3.3 Use Of Virtual Types Rather Than Templates

A design decision of this ALib Module is to rather use "classic" virtual types instead of using templates, with all the pros and cons taken into account of such a decision. As a result, some "contracts" have to be assured to be fulfilled by the user of the library. The term "contracts" here means: If a at some place a certain specialization of a virtual type is expected, at a different place the creation of an object of that virtual type has to be assured. Details of these contracts will be explained in the next chapters.

Note
The main reason to use this traditional virtual library design is the use of plenty (mostly very short) native callback functions, which this way can be placed in anonymous namespaces of compilation units and thus completely be hidden from library header files and even from the C++ linker.

3.4 Bauhaus Code Style

ALib generally sometimes uses what we call "Bauhaus Code Style". It is not easy to state what we mean by this exactly, but a little notion of what it could be may have come to a programmers mind already by reading the previous two chapters about:

  • (Mis-)using class Box for just type propagation, and
  • Imposing contract rules with specialized types, instead of templating things.

In addition to that, it is notable, that a lot of the types of module ALib Expressions are structs rather than classes. Hence, fields and methods are exposed publicly.

The goal of this library is to allow other software (libraries or internal units of a software) to expose an interface that has two main functions:

  • Allow the input of expression strings.
  • Allow the evaluation of compiled expressions.

Now, lets take a sample: A list of files should be 'filtered' by name, size, date etc. The custom library or internal software unit, would probably expose

  • A class named FileFilter that takes an expression string in the constructor.
  • A method called "Includes" that takes a file object and returns true if the file matches the filter.

Using this custom class could look like this:

   FileFilter  photosOfToday( "name * +\".jpg\" && date >= today" );

   if( photosOfToday.Includes( aFile ) )
   {
       ...
   }

As it is easily understood, really just nothing of library module ALib Expressions needs to be exposed to the "end user" of the code. Especially:

  • Only the sources (compilation units) that implement class FileFilter need to include headers of module ALib Expressions
  • Consequently, not only details of module ALib Expressions , like detail::Parser , detail::Program or detail::VirtualMachine , but also central types like Expression , Compiler , CompilerPlugin or Scope , usually remain completely invisible to most parts of the custom software.
  • The same is true for custom derived types and therefore also for the "contract rules" (see previous chapter) between these types.

This all means, that the "natural way" of using module ALib Expressions automatically hides away all internals, which on the other side gives this module the freedom to generously use Bauhaus style, what here then finally translates to:

  • Generously exposing types and their internals.
  • Avoid redundant getter/setter methods.
  • Impose contracts and avoid templates.
  • Optimizations for speed.
  • Optimizations for short code.

4. Tutorial: Implementing A File Filter

After this already lengthy introduction and discussion of prerequisites, it is now time to implement custom expression logic. The sample application that we use to demonstrate how this is done, implements expressions to filter files of directories, as it may be required by a simple file search software or otherwise be used by a third party application.

As a foundation, we are using the Filesystem Library of C++ 17. Note that this, as of the time of writing this documentation, is an upcoming feature and with some compilers it might not be available today, or instead of including header

   #include <filesystem>

header

   #include <experimental/filesystem>

needs to be used. This library originates from a development of the boost C++ Libraries and even if you have never used it, this should not introduce more burden to understand this sample, as it is very straight forward.

For example, the following few lines of code:

// search source path from current
auto sourceDir = fs::path(ALIB_BASE_DIR);
sourceDir+= "/src/alib/expressions";
ALIB_ASSERT_ERROR( fs::exists(sourceDir), "UNITTESTS", String512("Test directory not found: ") << sourceDir.c_str() )
// list files
for( auto& directoryEntry : fs::directory_iterator( sourceDir ) )
cout << directoryEntry.path().filename().string() << endl;

produce the following output:

compilerplugin.hpp
expressions.hpp
standardrepository.cpp
expression.cpp
expression.hpp
standardrepository.hpp
detail
plugins
compiler.hpp
compiler.cpp
expressions.cpp
util
scope.hpp
Note
As all sample code is extracted directly from special unit-tests that exist just for the purpose to be tutorial sample code and generate tutorial sample output, above and in the following sections we are addressing some parent directories. This results from the fact that the unit tests are executed in the built-directory, which is a sub-directory of this library's main directory.
Consequently, our samples are around searching and filtering the source files of the library! This avoids to introduce sample files, and other overhead in respect to documentation maintenance.
Furthermore, please note that we are using the following statement to shortcut the C++ 17 namespace:
 namespace fs = experimental::filesystem;

4.1 Skeleton Code For Filtering Files

Now, the loop of the above sample should be extended to use a filter to select a subset of the files and folders to be printed. Hence, a filter is needed. We start with a skeleton definition of a struct:

namespace step1 {
struct FileFilter
{
// Constructor.
FileFilter(const String& expressionString)
{
(void) expressionString;
}
// Filter function. Takes a directory entry and returns 'true' if the entry is included.
bool Includes(const fs::directory_entry& directoryEntry)
{
(void) directoryEntry;
return true;
}
};
} // namespace step1

As we have no clue yet, how our custom filter expressions will look like, we pass a dummy string, which is anyhow ignored by the filter skeleton. The loop then looks as follows:

step1::FileFilter filter(A_CHAR("expression string syntax not defined yet"));
for( auto& directoryEntry : fs::directory_iterator( sourceDir ) )
if( filter.Includes( directoryEntry ) )
cout << directoryEntry.path().filename().string() << endl;

Of-course, the output of this loop remains the same, because constant true is returned by the filter skeleton's method Includes.

What we nevertheless have achieved: The interface of how ALib Expressions will be used is already defined!
This is a good point in time to quickly sort out the different perspectives on "interfaces", "libraries" or "APIs" explicitly:

  1. Library module ALib Expressions exposes an interface/API to compile and evaluate expression strings.
  2. The software that uses ALib Expressions usually exposes an own interface/API, either
    • to other parts of the same software, or
    • to other software - in case that this 2nd level is a library itself.
  3. The "end user" that uses a software of-course does not know about any software interface or API. What she needs to know is just the syntax of expressions strings that she can pass into the software!

The goal should be that on the 2nd level, the API of the 1st level (which is this ALib Expressions library), is not visible any more.
Well, and with the simple skeleton code above, this goal is already achieved!

4.2 Adding Generic Ingredients Needed For Expression Evaluation

The next step is about adding all components that we need to compile and evaluate expression strings to the filter class. And this is not much effort. We had seen the ingredients before in the sample code of previous section 2. Tutorial: Hello Calculator.

Because it is so simple, we just present the resulting code of the filter class:

namespace step2 {
struct FileFilter
{
Compiler compiler;
SPExpression expression;
// Constructor. Compiles the expression
FileFilter( const String& expressionString )
: compiler()
, scope( compiler.CfgFormatter )
{
compiler.SetupDefaults();
expression= compiler.Compile( expressionString );
}
// Filter function. Evaluates the expression.
bool Includes(const fs::directory_entry& directoryEntry)
{
(void) directoryEntry;
return expression->Evaluate( scope ).Unbox<bool>();
}
};
} // namespace step2

Et voilĂ : We can now use expression strings to filter the files. Here are two samples:

Sample 1: All files are included with constant expression "true":

cout << "--- Files using expression {true}: ---" << endl;
step2::FileFilter trueFilter(A_CHAR("true"));
for( auto& directoryEntry : fs::directory_iterator( sourceDir ) )
if( trueFilter.Includes( directoryEntry ) )
cout << directoryEntry.path().filename().string() << endl;

The output is:

--- Files using expression {true}: ---
compilerplugin.hpp
expressions.hpp
standardrepository.cpp
expression.cpp
expression.hpp
standardrepository.hpp
detail
plugins
compiler.hpp
compiler.cpp
expressions.cpp
util
scope.hpp

Sample 2: All files are filtered out with constant expression "false":

cout << "--- Files using expression {false}: ---" << endl;
step2::FileFilter falseFilter(A_CHAR("false"));
for( auto& directoryEntry : fs::directory_iterator( sourceDir ) )
if( falseFilter.Includes( directoryEntry ) )
cout << directoryEntry.path().filename().string() << endl;

Which results to the empty output:

--- Files using expression {false}: ---

While this demonstrates fast progress towards our aim to filter files, of-course we have not linked the expression library with this custom code example, yet. All we can do is providing expressions that do not refer to the file given, hence either evaluate to true for any file or to false.
But before we feel free to start working on this, we first need to put one stumbling block aside.

4.3 Checking An Expression's Result Type

In the samples above we used simple, constant expressions "true" and "false". As we already learned in chapter 3, these are built-in identifiers that return the corresponding boolean value. Well, and a boolean value is what the filter needs. Other valid expressions would be

   5 > 3                 // constant true
   Year(Today) < 1984    // constant false

"Valid" here means, that the expression returns a boolean value! But what would happen if we constructed the filter class with expression string

   1 + 2

which returns an integral value? The answer is that in method Includes of the filter class presented in the previous sections a run-time assertion would be raised in the following line of code:

   return expression->Evaluate( scope ).Unbox<bool>();

The code unboxes a value of type bool, but it is not asserted that the result of the evaluation is of that type. This quickly leads us to an enhanced version of that method:

bool Includes(const fs::directory_entry& directoryEntry)
{
(void) directoryEntry;
Box result= expression->Evaluate( scope );
if( result.IsType<bool>() )
return result.Unbox<bool>();
// trow exception
throw std::runtime_error( "Expression result type mismatch: expecting boolean result!" );
}

So here is some bad news: It is obvious, that there is no way around the effort of throwing and catching exceptions (or otherwise do some error processing) as soon as a software allows an end-user to "express herself" by passing expression strings to a software. Besides wrong return types, the whole expression might be malformed, for example by omitting a closing bracket or any other breach of the expression syntax rules.

The good news however is, that with the use of module ALib Expressions , most - if not all - of the errors can be handled already at compile-time! Once an expression is compiled, not much can happen when an expression is later evaluated.

And this is also true for our current thread of facing a wrong result type: Due to the fact that module ALib Expressions implements a type-safe compiler, we can detect the result type at compile-time.

Consequently, we revert our most recent code changes and rather check the result type already right after the compilation:

FileFilter( const String& expressionString )
: compiler()
, scope( compiler.CfgFormatter )
{
compiler.SetupDefaults();
expression= compiler.Compile( expressionString );
// check result type of the expression
if( !expression->ResultType().IsType<bool>() )
throw std::runtime_error( "Expression result type mismatch: expecting boolean result!" );
}
bool Includes(const fs::directory_entry& directoryEntry)
{
(void) directoryEntry;
// no result type check needed: It is asserted that Evaluate() returns a boxed boolean value.
return expression->Evaluate( scope ).Unbox<bool>();
}
Note
It is up to the user of this library to decide how strict an implementation would be. Later in this tutorial, we will read permissions from the files, which might get "tested" using bitwise boolean operators, e.g.
     Permissions & OwnerWrite == OwnerWrite
Similar to programming languages, it could be allowed to shorten this expression to just
     Permissions & OwnerWrite
The result is an integral value, respectively a user defined permission type that probably represents an underlying integral value. It is up to the filter class's method Includes, to check for and interpret other types than boolean.
To provide the biggest degree of freedom, the result of box-function FIsTrue might be returned instead of unboxing a boolean value. This interface is a good candidate to convert just any boxed value to a reasonable representation of a boolean value. Again, this is a design decision of the software that uses this library. It has to be documented to the end-user what type of expression results are allowed.

4.4 Exposing The Directory Entry To ALib Expressions

It is time to finally make our sample meaningful, namely to allow to filter selected files by their attributes.

For this two steps are needed. The first again is extremely simple: We have to expose the current directory entry of our filter loop to the file filter. All we need to do is to specialize class Scope to a custom version that provides the current object.
Here is our new struct:

namespace step4 {
struct FFScope : public ExpressionScope
{
// the current directory entry
const fs::directory_entry* directoryEntry;
// expose parent constructor
using Scope::Scope;
};
} // namespace step4

With this in place, we just need two small changes in our file filter:

namespace step4 {
struct FileFilter
{
Compiler compiler;
FFScope scope; // CHANGE 1: we use FFScope now
SPExpression expression;
FileFilter( const String& expressionString )
: compiler()
, scope( compiler.CfgFormatter )
{
compiler.SetupDefaults();
expression= compiler.Compile( expressionString );
if( !expression->ResultType().IsType<bool>() )
throw std::runtime_error( "Expression result type mismatch: expecting boolean result!" );
}
bool Includes(const fs::directory_entry& directoryEntry)
{
// CHANGE 2: Store the given entry in our scope singleton which is then passed into
// Evaluate().
scope.directoryEntry= &directoryEntry;
return expression->Evaluate( scope ).Unbox<bool>();
}
};
} // namespace step4

Now, the expression's detail::Program that gets compiled in the constructor of the filter class and that is executed by the built-in detail::VirtualMachine with the invocation of Evaluate , potentially has access to the directory entry.

The next section connects the final dots and leads to a working sample.

4.5 Implementing A Compiler Plug-In

We have come quite far without ever thinking about the syntax of the custom expressions that we need to be able to filter files from a directory. Without much reflection of that, it is obvious that filtering files by name should be enabled, maybe with support of "wildcards" just like most users know them from the command prompt:

   ls -l *.hpp  // GNU/Linux
   dir *.hpp    // Windows OS

Thus, the first thing we need is to retrieve the file name from the entry. This is done with a simple custom identifier. As it was said already, an identifier is a "parameterless function". So why don't we need a parameter, namely the file entry in the expression syntax? Well, because the entry is part of the scope. It is the central piece of custom information that the whole effort is done for. Therefore, the expression:

   Name

should return the name of the actual directory entry that is "in scope". This is lovely simple, so let's start. Again we start with a skeleton struct, this time derived from CompilerPlugin :

namespace step5 {
struct FFCompilerPlugin : public CompilerPlugin
{
FFCompilerPlugin( Compiler& compiler )
: CompilerPlugin( "FF Plug-in", compiler )
{}
};
} // namespace step5

To make use of the plug-in, we have again two small changes in the custom filter class:

namespace step5 {
struct FileFilter
{
Compiler compiler;
FFScope scope;
SPExpression expression;
FFCompilerPlugin ffPlugin; // CHANGE 1: We own an instance of our custom plug-in.
FileFilter( const String& expressionString )
: compiler()
, scope( compiler.CfgFormatter )
, ffPlugin( compiler ) // CHANGE 2: Initialize the plug-in with the compiler.
{
compiler.SetupDefaults();
// CHANGE 3: Add our custom plug-in to the compiler prior to compiling the expression
compiler.InsertPlugin( & ffPlugin, CompilePriorities::Custom );
expression= compiler.Compile( expressionString );
if( !expression->ResultType().IsType<bool>() )
throw std::runtime_error( "Expression result type mismatch: expecting boolean result!" );
}
bool Includes(const fs::directory_entry& directoryEntry)
{
scope.directoryEntry= &directoryEntry;
return expression->Evaluate( scope ).Unbox<bool>();
}
};
} // namespace step5

With this, the plug-in is in place and during compilation it is now asked for help. Parent class CompilerPlugin exposes a set of overloaded virtual functions named TryCompilation. In their existing default implementation each function just returns constant false, indicating that a plug-in is not responsible. Thus, we have to make our plug-in now responsible for identifier "Name". For this we choose to override one of the offered virtual functions as follows:

namespace step5 {
struct FFCompilerPlugin : public CompilerPlugin
{
FFCompilerPlugin( Compiler& compiler )
: CompilerPlugin( "FF Plug-in", compiler )
{}
// implement "TryCompilation" for functions
virtual bool TryCompilation( CIFunction& ciFunction ) override
{
// Is parameterless and function name equals "Name"?
if( ciFunction.QtyArgs() == 0
&& ciFunction.Name.Equals<false, lang::Case::Ignore>( A_CHAR("Name") ) )
{
// set callback function, its return type and indicate success
ciFunction.Callback = getName;
ciFunction.TypeOrValue = Types::String;
return true;
}
// For anything else, we are not responsible
return false;
}
};
} // namespace step5

As the code shows, the overridden function simply checks for the given name and the function "signature". If both match, then a native C++ callback function is provided together with the expected result type of that callback function.

The final step, before we can test the code is to implement the callback function. This is usually done in an anonymous namespace at the start of the compilation unit of the plug-in itself. The signature of any callback function that ALib Expressions expects, is given with CallbackDecl . The documentation shows, that it has three parameters, the scope and the begin- and end-iterators for the input parameters. The input parameters are boxed in objects of class Box and the same type is expected to be returned.

Because ALib Boxing makes a programmer's life extremely easy, especially when used with various kinds of strings, and because we are not reading any input parameters, the implementation of the callback function is done with just one line of code:

Box getName( ExpressionScope& scope, ArgIterator argsBegin, ArgIterator argsEnd )
{
// Create a copy of the string using the scope string allocator. This is done by using
// class MAString, which, when returned, right away is boxed as a usual string,
// aka char[]. Therefore, no intermediate string objects need to be stored, neither the
// std::string returned by "string()", nor the string.
return MAString( scope.Allocator,
dynamic_cast<FFScope&>( scope ).directoryEntry->path().filename().string(),
0 );
}
Note
The callback function casts the scope object to our custom type FFScope. The function can trust that this succeeds, if each expression that gets compiled with the compiler that uses the plug-in gets a scope object of exactly this derived custom type passed when evaluated.
This is a sample of the "contracts" that have to be fulfilled by the user of library as already stated in previous chapter 4.3 Use Of Virtual Types. Another of such contract can be seen with the code of the compiler plug-in: The type of the returned boxed value of the callback function has to match the type specified in the TryCompilation. And furthermore, all code paths of the callback function have to return a box value of that very same type, regardless of the input parameters.
These are constraints that the user of this library has to assure. However, as it is recommended to implement classes like FileFilter in our sample, this responsibility to keep the contracts is only shared within a few implementation units. What we previously called, "the custom 2nd level api", hides these constraints completely away, along with all other parts of ALib Expressions .

We are set! Our first "real" filter expressions should work. Here are some filter loops and their output:

Sample 1: :

cout << "--- Files using expression {name == \"compiler.hpp\"}: ---" << endl;
step5::FileFilter filter1(A_CHAR("name == \"compiler.hpp\""));
for( auto& directoryEntry : fs::directory_iterator( sourceDir ) )
if( filter1.Includes( directoryEntry ) )
cout << directoryEntry.path().filename().string() << endl;

Output:

--- Files using expression {name == "compiler.hpp"}: ---
compiler.hpp

Sample 2: :

cout << "--- Files using expression {WildcardMatch(name, \"*.hpp\"}: ---" << endl;
step5::FileFilter filter2(A_CHAR("WildcardMatch(name, \"*.hpp\")"));
for( auto& directoryEntry : fs::directory_iterator( sourceDir ) )
if( filter2.Includes( directoryEntry ) )
cout << directoryEntry.path().filename().string() << endl;

Output:

--- Files using expression {WildcardMatch(name, "*.hpp"}: ---
compilerplugin.hpp
expressions.hpp
expression.hpp
standardrepository.hpp
compiler.hpp
scope.hpp

Sample 3: :

cout << "--- Files using expression {name * \"*.cpp\"}: ---" << endl;
step5::FileFilter filter3(A_CHAR("name * \"*.cpp\""));
for( auto& directoryEntry : fs::directory_iterator( sourceDir ) )
if( filter3.Includes( directoryEntry ) )
cout << directoryEntry.path().filename().string() << endl;

Output:

--- Files using expression {name * "*.cpp"}: ---
standardrepository.cpp
expression.cpp
compiler.cpp
expressions.cpp

This seems to work - mission accomplished!

Some notes on these samples:

  • Because the custom identifier Name does not introduce a custom type, but returns built-in type Types::String , no operators have to be overloaded. In later chapters we will see what needs to be done when custom-types are returned by identifiers, functions or operators.
  • Built-in expression function WildcardMatch accepts two strings, the first is the string that is matched, the second contains the wildcard string. Function WildcardMatch is provided with built-in compiler plug-in Strings .
  • The third sample uses an overloaded version of binary operator '*', with left- and right-hand side being strings. This binary operator is also provided with plug-in Strings and is just an "alias" for function WildcardMatch.

We could now easily continue implementing further identifiers, for example:

  • IsDirectory: Returns true if the directory entry is a sub-directory, false if it is a file.
  • Size: Returns the size of the file built-in type Types::Integer .
  • Date: Returns the date of the entry as built-in type Types::DateTime .
  • Permissions: Returns the access rights of the file or folder. For this, we would probably return an integral value and introduce further identifiers like GroupRead, GroupWrite, OwnerRead,... and so forth that return constants.

This would lead to inserting further if-statements to the custom plug-in, similar to the one demonstrated for identifier Name.

Before this should be sampled, the next chapter explains the general possibilities of compiler plug-ins and shows how the creation of a plug-in can be even further simplified.

5. Compiler Plug-Ins And Class Calculus

In the previous tutorial section, a fully working example program was developed that allows to use custom expression strings to filter files and folders by their name.

It was demonstrated how to attach a custom compiler plug-in to the expression compiler, which selects a native C++ callback function at compile-time. This callback function is then invoked each time a compiled expression is evaluated against a scope. The sample implemented the retrieval of a string value from an object found in a custom specialization of class Scope .

5.1 The Compilation Process

When an expression string gets compiled, such compilation is done in two phases. The first step is called "parsing".

The result of the parsing process is a recursive data structure called "abstract syntax tree". The nodes of this tree can represent one of the following types:

  1. Literals:
    Literals are constants found in the expression strings. There are three types of literals supported: Integral (e.g. "42"), floating-point (e.g. "3.14") and string values (e.g. "Hello"). A literal node is a so-called "terminal" node, which means it has no child nodes.
  2. Identifiers:
    These are named tokens, starting with an alphabetical character and further consisting of alpha-numerical characters or character '_'.
    Likewise literals, identifiers are terminal nodes.
  3. Functions:
    Functions are identifiers followed by a pair of round brackets '()'. Within the brackets, a list of expressions, separated by a colon (',') may be given. Hence, functions are n-ary nodes, having as many child nodes as parameters are given in the brackets.
  4. Unary operators:
    These are nodes that represent an unary operation like "boolean not" ('!') or arithmetic negation ('-'). These nodes have one child node.
  5. Binary operators:
    Samples of binary operators are "boolean and" ('&&') or arithmetic subtraction ('-'). These nodes have two child nodes.
  6. Ternary operators:
    Only one ternary operator is supported. It is called "conditional operator" and parsed in the form "Q ? T : F" with Q, T and F being expressions. The result of the operation is T if Q evaluates to true, otherwise it is F.

This first phase of compilation that builds the AST (abstract syntax tree) usually does not need too much customization.

Note
Various customization options are nevertheless provided. The most important ones are described in:

It could be reasonably argued, that building this tree is all that an expression library needs to do and in fact, many similar libraries stop at this point. What needs to be done to evaluate an expression is to recursively walk the AST in a so called "depths first search" manner, and perform the operations. The result of the evaluation would be the result of the root node of the tree.

ALib Expressions goes one step further, performing a second phase of compilation. In this phase, the recursive walk over the AST is done. The result of the walk is an expression Program . Such program is a list of "commands" which are later, when the expression is evaluated, executed by a virtual stack machine. (This stack machine is implemented with class detail::VirtualMachine ).

This second phase is where the customization takes place. When a node of the AST is translated into a program command for the virtual machine, the compiler iterates through an ordered list of CompilerPlugin s to ask for compilation information. As soon as one plug-in provides such info, the compiler creates the command and continues walking the tree.

Now, what does the compiler exactly "ask" a plug-in for and what information is included in the question? To answer this, let us first look at the list of AST nodes given above. Of the the six types of AST-nodes listed, two do not need customization. These are literals and the ternary operator. What remains is

  • Identifiers and Functions,
  • Unary operators and
  • Binary operators.

It was mentioned before, that ALib Expressions is type-safe. To achieve this, the result type of each node is identified (deepest nodes first). Whenever a node with child nodes is compiled, the result types of each child node has already been identified.

With this in mind, the input and output information that compiler plug-ins receive and return becomes obvious. Input is:

  • The node type and according information (e.g. the operator, the identifier or function name)
  • The result type of each child

The output information is:

  • A pointer to a native C++ callback function that will be invoked by the virtual machine when the program command resulting from the node is executed.
  • The result type of that native C++ callback function.
  • Alternatively to this, in case of "constant nodes", a constant result value may be returned (which likewise defines the node type). For example, built-in identifier "true" returns constant value boolean true.

To finalize this section, a quick hint to the benefits of taking this approach should be given:

  • Compile-time type safety allows to identify almost all errors in user-defined expression strings at compile-time. On the one hand, this allows to reject malformed expressions right at the moment they are given. If such detection was deferred to evaluation-time, then usually a software has quite some effort to "undo" certain actions that the software did to prepare the evaluation.
  • Both compile-time type safety and the fact that the AST is translated into a linear program of-course increase compile time, but this is done in favour to evaluation time. In many use-case scenarios, there is an overwhelmingly high ratio of evaluations per expression. Therefore, this library is 100% optimized for evaluation performance, while compilation performance is considered pretty unimportant.
  • Operation overloading avoids type checking at evaluation time and leads to very thin callback functions, many of them being just a single line of code. In addition, the implementation of the native C++ callback functions can be separated into various compilation units, as already demonstrated with the built-in plug-ins that each addresses a certain dedicated "topic", like "string handling", "date and time", etc.

5.2 The Built-In Compiler Plug-Ins

With the information given in the previous sub-chapter, some important consequence can be noted:

The compilation process works on "permutations" of the following information:
  • node types,
  • node type specific information (e.g. unary/binary operator type or identifier/function name) and
  • all types of child nodes.

This fact in turn leads to the following statements:

  • The compilation process fails, if no plug-in returns compilation information for a certain permutation.
  • Each permutation may lead to different C++ callback function and result type.
    (In the C++ language, this behavior is called "operator and function overloading".)
  • A compiler plug-in with a higher priority, may replace (disable) an implementation of a permutation which would be responded by a plug-in of lower priority.

As a sample, let's take two simple expressions

   1 + 2
   "Result " + 42

Both expressions consist of two literal nodes, which are the two children of binary operator '+'. As literals are not compiled using plug-ins, only the binary operator is passed to the plug-ins. To successfully compile both, plug-ins have to be available that cover the following permutations:

   binary op, + , integer, integer
   binary op, + , string, integer

For the addition of integer values, built-in compiler plug-in Arithmetics is responsible. For the concatenation of integer values to string values, plug-in Strings steps in.

The documentation of the plug-ins therefore mainly consist of tables that list permutations of operators, function names and input types, together with a description of what is done in the C++ callback function and what result type is to be expected.

The use of the built-in plug-ins is optional and configurable. Configuration is done by tweaking member Compiler::CfgBuiltInPlugins prior to invoking method Compiler::SetupDefaults . But a use-case to do so, is not so easy to find, also due to the fact that custom plug-ins default to a higher priority and this way might replace selected built-in behavior.

To implement a custom compiler plug-in, the following "bottom-up" approach is recommended:

  • An application usually provides simple custom identifier names, which for example read property values from application objects defined from a specialized version of type Scope . The compilation of such identifier should be implemented first.
  • If an identifier callback function returns values of application-specific type, then in addition a reasonable set of operators overloaded for these types should to be implemented. (Obviously, if that was not done, only simple expressions, consisting just of the custom identifiers themselves, returning that custom type could be compiled.)
  • If more complex custom functionality is needed, finally custom expression functions can be implemented. Of-course, if such functions again introduce so-far unknown return types, operators for these types have to be implemented as well.

To finalize this chapter, some obvious facts should be named:

  • Each and every calculation is performed by plug-ins. Even a simple "1 + 2" calculation might be handled by custom code.
  • Usually, there is no need to omit the default plug-ins. There is only very, very small performance decrease on compiling an expression with more plug-ins installed. And there is absolutely no impact on the usually much more important evaluation performance.
  • Three of the built-in types, namely Types::Integer , Types::Float and Types::String "emerge" from parsing literals.
  • The other built-in types, namely Types::Boolean , Types::DateTime and Types::Duration instead emerge as being result types of built-in compiler plug-ins. For example, type Boolean is a result type of identifier "True", as well as the result type of DefaultBinaryOperators::Smaller ('<') usable with with various combinations of argument types provided with different built-in compiler plug-ins.
  • The introduction of custom types is done by just introducing a custom plug-in that compiles AST nodes with returning such custom type. There is no need to register the types. (With the exception that for the purpose of the creation of human readable compiler exceptions, method Compiler::AddType is provided.)
  • After the compilation process is done, the AST data structure is deleted.
  • The compilation process is a little more complex than presented here. More details will be explained in later chapters, for example in 11.5 Optimizations and 11.1 Types.

5.3 Class CompilerPlugin

After a lot of theory was given, it is now quite straight forward to explain how struct CompilerPlugin is used.

The struct provides an inner struct CompilationInfo which is the base of several derived further (inner) specializations. The base struct exposes the common base of the input and all of the output information provided to, and received from compiler plug-ins. According to the different node types of the parsed AST, the specializations are:

Along with this, for each of these structs, an overloaded virtual method called TryCompilation is defined. A custom plug-in now simply derives from the plug-in struct, and overrides one or more of the virtual methods. The original implementation of the base struct returns constant false. In the case that the given information corresponds to a permutation that the custom plug-in chooses to compile, the plug-in needs to fill in the output parameters of the given struct and return true.

Note
A fourth specialization of CompilationInfo is given with CIAutoCast together with a corresponding overloaded method TryCompilation. Its purpose and use is explained in chapter 11.1 Types.

5.4 Class Calculus

The architecture of the expression compiler and the use of according plug-ins was explained and we could continue now with extending the sample plug-in given in section 4.5 Implementing A Compiler Plug-In.

This would quickly lead to inserting a bunch of if-statements to the already overridden method TryCompilation. Considering all possible permutations of operators and types, this result in repetitive code. To avoid this, the library provides an optional helper class.
All built-in compiler plug-ins (with the exception of ElvisOperator and AutoCast ) use this class and are therefore not derived from CompilerPlugin, but from plugins::Calculus .

The trick with that type is that permutations of operators, identifiers, function names and argument types are provided as static table data, together with the information of how to compile the permutations.

Then in a custom constructor, these static tables are fed into a hash table that allow a performant search. The custom plug-in does not need to furthermore override any TryCompilation method, as class Calculus provides a default implementation that simply searches the hash table.

Consequently, all that these built-in plug-ins do is feeding their keys and corresponding callback methods to the hash table during construction. This is not just very efficient in respect to this library's code size and in respect to the compilation performance of expressions, it also makes the creation of a plug-in an even more simple and straight-forward task.

Hence, the advice to library users is to also use helper type plugins::Calculus as the parent class for custom compiler plug-ins, instead of deriving from CompilerPlugin.

The permutations of function arguments that class Calculus uses to identify static compilation information, includes an option to keep a trailing portion of such arguments variadic. A sample of such variadic function implemented using this helper class is expression function Format.

We now go back to our tutorial sample and add more file filter functionality, by using this helper class Calculus.

6. Tutorial: Extending The File Filter Sample

6.1 Replacing CompilerPlugin By Calculus

Before we start adding new features to the sample code of section 4. Tutorial: Implementing A File Filter the first task is to refactor the sample to use helper type plugins::Calculus .

The already presented sample plug-in defined a callback function was:

Box getName( ExpressionScope& scope, ArgIterator argsBegin, ArgIterator argsEnd )
{
// Create a copy of the string using the scope string allocator. This is done by using
// class MAString, which, when returned, right away is boxed as a usual string,
// aka char[]. Therefore, no intermediate string objects need to be stored, neither the
// std::string returned by "string()", nor the string.
return MAString( scope.Allocator,
dynamic_cast<FFScope&>( scope ).directoryEntry->path().filename().string(),
0 );
}

Furthermore our compiler plugin was derived from CompilerPlugin and implemented method TryCompilation for functions (identifiers):

namespace step5 {
struct FFCompilerPlugin : public CompilerPlugin
{
FFCompilerPlugin( Compiler& compiler )
: CompilerPlugin( "FF Plug-in", compiler )
{}
// implement "TryCompilation" for functions
virtual bool TryCompilation( CIFunction& ciFunction ) override
{
// Is parameterless and function name equals "Name"?
if( ciFunction.QtyArgs() == 0
&& ciFunction.Name.Equals<false, lang::Case::Ignore>( A_CHAR("Name") ) )
{
// set callback function, its return type and indicate success
ciFunction.Callback = getName;
ciFunction.TypeOrValue = Types::String;
return true;
}
// For anything else, we are not responsible
return false;
}
};
} // namespace step5

The callback function remains untouched. Struct FFCompilerPlugin is changed in three aspects:

  • It is to be derived from struct Calculus,
  • it fills the function table (just one entry so far) and
  • the own implementation of TryCompilation is to be removed.

The resulting code of the plugin looks as follows:

struct FFCompilerPlugin : public Calculus
{
FFCompilerPlugin( Compiler& compiler )
: Calculus( "FF Plug-in", compiler )
{
Functions=
{
{ { A_CHAR("Name"), lang::Case::Ignore, 4 }, // The function name, letter case min. abbreviation (using class strings::util::Token).
nullptr, 0 , // No arguments (otherwise an array of sample boxes defining expected argument types).
CALCULUS_CALLBACK(getName) , // The callback function. In debug mode, also the name of the callback.
&Types::String , // The return type of the callback function, given as pointer to sample box.
ETI // Denotes "evaluation time invokable only". Alternative is "CTI".
},
};
}
};

6.2 Adding More Identifiers

We can now finally continue with adding more functionality to our file filter sample. At the end of chapter 4.5 Implementing A Compiler Plug-In we already thought about what we could add:

  • IsDirectory: Returns true if the directory entry is a sub-directory, false if it is a file.
  • Size: Returns the size of the file built-in type Types::Integer .
  • Date: Returns the date of the entry as built-in type Types::DateTime .
  • Permissions: Returns the access rights of the file or folder.

OK, let's do that! First we add some boxed values that define constants for permission rights. This is still done in the anonymous namespace, hence the following boxes are on namespace scope, just as the callback functions are:

namespace
{
Box constOwnRead = static_cast<integer>( UnderlyingIntegral( fs::perms::owner_read ) );
Box constOwnWrite = static_cast<integer>( UnderlyingIntegral( fs::perms::owner_write ) );
Box constOwnExec = static_cast<integer>( UnderlyingIntegral( fs::perms::owner_exec ) );
Box constGrpRead = static_cast<integer>( UnderlyingIntegral( fs::perms::group_read ) );
Box constGrpWrite = static_cast<integer>( UnderlyingIntegral( fs::perms::group_write ) );
Box constGrpExec = static_cast<integer>( UnderlyingIntegral( fs::perms::group_exec ) );
Box constOthRead = static_cast<integer>( UnderlyingIntegral( fs::perms::others_read ) );
Box constOthWrite = static_cast<integer>( UnderlyingIntegral( fs::perms::others_write) );
Box constOthExec = static_cast<integer>( UnderlyingIntegral( fs::perms::others_exec ) );
}

We are doing two casts here: The first is to get the underlying integral value from the filesystem library's constants. If we did not do this, we would introduce a new type to ALib Expressions . In principle, this would not be a bad thing! The advantages and disadvantages will be explained in a later chapter.
The second cast is to convert the signed integral value to an unsigned one. Again, if we did not do this, this would introduce a new type, namely uinteger . Note that this library does not provide built-in operators for unsigned integers.

With these casts, the permission values become compatible with built-in binary operators DefaultBinaryOperators::BitAnd , DefaultBinaryOperators::BitOr and DefaultBinaryOperators::BitXOr which are defined for built-in type Types::Integer , which in turn is nothing else but a integer !

Next, we add the new callback functions:

namespace
{
Box isFolder( ExpressionScope& scope, ArgIterator argsBegin, ArgIterator argsEnd )
{
return dynamic_cast<FFScope&>( scope ).directoryEntry->status().type()
== fs::file_type::directory;
}
Box getSize( ExpressionScope& scope, ArgIterator argsBegin, ArgIterator argsEnd )
{
return ( dynamic_cast<FFScope&>( scope ).directoryEntry->status().type()
== fs::file_type::directory )
? 0
: static_cast<integer>(fs::file_size( *dynamic_cast<FFScope&>( scope ).directoryEntry ));
}
Box getDate( ExpressionScope& scope, ArgIterator argsBegin, ArgIterator argsEnd )
{
auto fsTime = fs::last_write_time( *dynamic_cast<FFScope&>( scope ).directoryEntry );
#if ALIB_CPP_STANDARD == 17 || defined(__APPLE__) || defined(__ANDROID_NDK__)
return DateTime::FromEpochSeconds( to_time_t( fsTime ) );
#else
return DateTime::FromEpochSeconds( chrono::system_clock::to_time_t(
chrono::clock_cast<chrono::system_clock>(fsTime) ) );
#endif
}
Box getPerm( ExpressionScope& scope, ArgIterator argsBegin, ArgIterator argsEnd )
{
return static_cast<integer>( dynamic_cast<FFScope&>( scope ).directoryEntry->status().permissions() );
}
}

All that is left to do is "announcing" the availability of these constants and functions to class Calculus in the constructor of the custom plug-in. As shown before, functions are added to table Calculus::Functions . The constant, parameterless functions are put into a simplified version of this table found with field Calculus::ConstantIdentifiers .
The entries of both tables expect an object of type Token . This object is used by class Calculus to match identifiers and functions found in expression strings against the names that are defined by a plug-in. With the use of class Token, a flexible way of optional name abbreviation is provided, taking "CamelCase" or "snake_case" token formats into account. In our case, for example, we allow all constant identifiers to be shorted to just two letters. For example Identifier "OwnerExecute" can be abbreviated "OE", "oe", "ownR", etc.

Here is comes the sample snippet:

struct FFCompilerPlugin : public plugins::Calculus
{
FFCompilerPlugin( Compiler& compiler )
: Calculus( "FF Plug-in", compiler )
{
ConstantIdentifiers=
{
// Parameters: "1, 1" denote the minimum abbreviation of each "camel hump"
{ { A_CHAR("OwnerRead") , lang::Case::Ignore, 1, 1}, constOwnRead },
{ { A_CHAR("OwnerWrite") , lang::Case::Ignore, 1, 1}, constOwnWrite },
{ { A_CHAR("OwnerExecute") , lang::Case::Ignore, 1, 1}, constOwnExec },
{ { A_CHAR("GroupRead") , lang::Case::Ignore, 1, 1}, constGrpRead },
{ { A_CHAR("GroupWrite") , lang::Case::Ignore, 1, 1}, constGrpWrite },
{ { A_CHAR("GroupExecute") , lang::Case::Ignore, 1, 1}, constGrpExec },
{ { A_CHAR("OthersRead") , lang::Case::Ignore, 1, 1}, constOthRead },
{ { A_CHAR("OthersWrite") , lang::Case::Ignore, 1, 1}, constOthWrite },
{ { A_CHAR("OthersExecute"), lang::Case::Ignore, 1, 1}, constOthExec },
};
Functions=
{
{ {A_CHAR("Name") , lang::Case::Ignore, 4 }, CALCULUS_SIGNATURE(nullptr), CALCULUS_CALLBACK(getName ), &Types::String , ETI },
{ {A_CHAR("IsDirectory") , lang::Case::Ignore, 2, 3}, CALCULUS_SIGNATURE(nullptr), CALCULUS_CALLBACK(isFolder), &Types::Boolean , ETI },
{ {A_CHAR("Size") , lang::Case::Ignore, 4 }, CALCULUS_SIGNATURE(nullptr), CALCULUS_CALLBACK(getSize ), &Types::Integer , ETI },
{ {A_CHAR("Date") , lang::Case::Ignore, 4 }, CALCULUS_SIGNATURE(nullptr), CALCULUS_CALLBACK(getDate ), &Types::DateTime, ETI },
{ {A_CHAR("Permissions") , lang::Case::Ignore, 4 }, CALCULUS_SIGNATURE(nullptr), CALCULUS_CALLBACK(getPerm ), &Types::Integer , ETI },
};
}
};

After all this theory and discussion, this is surprisingly simple and short code! Our file filter is already quite powerful. Here are some sample expressions and their output:

--- Filter Expression {IsDirectory}: ---
detail
plugins
util
--- Filter Expression {!IsDirectory && size < 20000}: ---
standardrepository.cpp
expression.cpp
expression.hpp
standardrepository.hpp
compiler.cpp
scope.hpp
--- Filter Expression {date > DateTime(2019,2,5)}: ---
compilerplugin.hpp
expressions.hpp
standardrepository.cpp
expression.cpp
expression.hpp
standardrepository.hpp
detail
plugins
compiler.hpp
compiler.cpp
expressions.cpp
util
scope.hpp
--- Filter Expression {(permissions & OwnerExecute) != 0}: ---
detail
plugins
util
--- Filter Expression {size > 20480}: ---
compilerplugin.hpp
expressions.hpp
compiler.hpp
expressions.cpp
Note
Looking at the last sample: If you are wondering why file expressionslib.cpp is so huge, the answer is: it contains this whole manual and tutorial that you are just reading, created with marvelous Doxygen !

6.3 Adding Functions

The latest sample expression was:

      size > 81920

It would be nicer to allow:

      size > kilobytes(80)

Ok, let us add three functions. Here are the callbacks:

Box kiloBytes( ExpressionScope& scope, ArgIterator argsBegin, ArgIterator argsEnd )
{
return argsBegin->Unbox<integer>() * 1024;
}
Box megaBytes( ExpressionScope& scope, ArgIterator argsBegin, ArgIterator argsEnd )
{
return argsBegin->Unbox<integer>() * 1024 * 1024;
}
Box gigaBytes( ExpressionScope& scope, ArgIterator argsBegin, ArgIterator argsEnd )
{
return argsBegin->Unbox<integer>() * 1024 * 1024 * 1024;
}

The functions unbox the first parameter. For this, due to the type-safe compilation of ALib Expressions , neither the availability nor the type of the given argument needs to be checked.

Next we need to define the function "signature", which is defining the number and types of arguments that the functions expect. Class Calculus allows us to do this in a very simple fashion. It is just about defining an array of pointers to sample boxes. As all three simple functions have the same signature (they all just receive one argument of type integer), we need only one signature object:

Box* OneInt[1]= { &Types::Integer };

This was all we needed to prepare: here is the new version of the plug-in:

struct FFCompilerPlugin : public plugins::Calculus
{
FFCompilerPlugin( Compiler& compiler )
: Calculus( "FF Plug-in", compiler )
{
ConstantIdentifiers=
{
{ {A_CHAR("OwnerRead") , lang::Case::Ignore, 1, 1}, constOwnRead },
{ {A_CHAR("OwnerWrite") , lang::Case::Ignore, 1, 1}, constOwnWrite },
{ {A_CHAR("OwnerExecute") , lang::Case::Ignore, 1, 1}, constOwnExec },
{ {A_CHAR("GroupRead") , lang::Case::Ignore, 1, 1}, constGrpRead },
{ {A_CHAR("GroupWrite") , lang::Case::Ignore, 1, 1}, constGrpWrite },
{ {A_CHAR("GroupExecute") , lang::Case::Ignore, 1, 1}, constGrpExec },
{ {A_CHAR("OthersRead") , lang::Case::Ignore, 1, 1}, constOthRead },
{ {A_CHAR("OthersWrite") , lang::Case::Ignore, 1, 1}, constOthWrite },
{ {A_CHAR("OthersExecute"), lang::Case::Ignore, 1, 1}, constOthExec },
};
Functions=
{
{ {A_CHAR("Name") , lang::Case::Ignore, 4 }, CALCULUS_SIGNATURE(nullptr), CALCULUS_CALLBACK(getName ), &Types::String , ETI },
{ {A_CHAR("IsDirectory") , lang::Case::Ignore, 2, 3}, CALCULUS_SIGNATURE(nullptr), CALCULUS_CALLBACK(isFolder ), &Types::Boolean , ETI },
{ {A_CHAR("Size") , lang::Case::Ignore, 4 }, CALCULUS_SIGNATURE(nullptr), CALCULUS_CALLBACK(getSize ), &Types::Integer , ETI },
{ {A_CHAR("Date") , lang::Case::Ignore, 4 }, CALCULUS_SIGNATURE(nullptr), CALCULUS_CALLBACK(getDate ), &Types::DateTime, ETI },
{ {A_CHAR("Permissions") , lang::Case::Ignore, 4 }, CALCULUS_SIGNATURE(nullptr), CALCULUS_CALLBACK(getPerm ), &Types::Integer , ETI },
// the new functions:
{ {A_CHAR("KiloBytes") , lang::Case::Ignore, 1, 1}, CALCULUS_SIGNATURE(OneInt ), CALCULUS_CALLBACK(kiloBytes), &Types::Integer , CTI },
{ {A_CHAR("MegaBytes") , lang::Case::Ignore, 1, 1}, CALCULUS_SIGNATURE(OneInt ), CALCULUS_CALLBACK(megaBytes), &Types::Integer , CTI },
{ {A_CHAR("GigaBytes") , lang::Case::Ignore, 1, 1}, CALCULUS_SIGNATURE(OneInt ), CALCULUS_CALLBACK(gigaBytes), &Types::Integer , CTI },
};
}
};

Macro CALCULUS_SIGNATURE simply provides two arguments from the one given: The pointer to the start of the array along with the array's length. Those two values will be assigned to

fields FunctionEntry::Signature and fields FunctionEntry::SignatureLength of function table records.

And here is a quick test using one of the functions:

--- Filter Expression {size > kilobytes(20)}: ---
compilerplugin.hpp
expressions.hpp
compiler.hpp
expressions.cpp

This worked well!

Note
We have sampled above the creation of an array of pointers to boxes to denote the function signature. The built-in plug-ins of the library also need a certain set of signatures. These are collected in static struct Signatures . In the upcoming samples, object TakesOneInt is removed and replaced by the corresponding field of the library, because custom code is very well allowed to use the built-in arrays.
Only for signatures that are not found with built-in functions, custom arrays have to be created prepared.

A picky reader might now think: well it is more efficient to use expression:

      size > 81920

instead of:

      size > kilobytes(80)

because the latter introduces a function call and hence is less efficient. But this is not the case, at least not in respect to evaluating the expression against a directory entry. The evaluation time of both expressions is exactly the same, because both expressions result in exactly the same expression program.

The only effort for the library is at compile-time. While later chapter 11.5 Optimizations will discuss the details, here we only briefly note what is going on: The definition entry of the function table for function Kilobytes states Calculus::CTI in the last column. This tells class Calculus that the function might be evaluated at compile-time in the case that all arguments are constant. Because the single argument given is constant literal 80, this condition is met. Thus, the callback function is invoked at compile-time and instead of the function's address, the result value is passed back to the compiler. The compiler notes this, and replaces the original command that created the constant value 80 with the constant result value 81920. This is why both expressions lead to exactly the same program.
In contrast to this, the identifiers of the previous chapter are marked as Calculus::ETI , which means "evaluation-time invokable only". The obvious rational is, that these functions access custom data in the Scope object and such custom data is available only when the expression is evaluated for a specific directory entry.

6.4 Adding Operators

Next, some binary operator definitions are to be showcased.

We had implemented identifier Permissions to return a value of Types::Integer instead of returning the C++ 17 filesystem library's internal type. The advantage of this was that the built-in bitwise-boolean operators defined for integral values, could instantly be used with expressions. This was demonstrated in above sample expression:

   (permissions & OwnerExecute) != 0

The disadvantage however is, that the filter expressions are not really type-safe. An end-user could pass the expression:

   (permissions & 42) != 0

without receiving an error. While this is a design decision when using ALib Expressions , in most cases, type-safeness has definite advantages. To achieve type-safeness, we now change the definition of the callback function of identifier Permission as follows:

Box getPerm( ExpressionScope& scope, ArgIterator argsBegin, ArgIterator argsEnd )
{
return dynamic_cast<FFScope&>( scope ).directoryEntry->status().permissions();
}

In the previous version we had casted the enumeration elements of fs::perms to its underlying integral type. Now we are boxing the un-casted enumeration element value.

To denote type fs::perms as being the return type of identifier Permission, we need a sample box. This is an easy task, we just randomly choose one enumeration element and assign it to a new variable of type Box.

A next small change needed, results from a requirement of class Box: Global (or static) objects must not be initialized with custom types (in this case with elements of enum fs::perms). Such initialization has to happen after module ALib Boxing is duly bootstrapped. Therefore, the initializations of all constant boxes, as well as of the sample box, is now moved to the constructor of the compiler plug-in.

Note
It is possible to customize ALib Boxing to support the initialization of global boxes with custom types, but we do not do this exercise here.
Details are explained in the Programmer's Manual of ALib Boxing with chapter 12.2 Optimizations With Static VTables.

The new code for the compiler plug-in's constructor now is:

struct FFCompilerPlugin : public plugins::Calculus
{
FFCompilerPlugin( Compiler& compiler )
: Calculus( "FF Plug-in", compiler )
{
// Initializations of constant values. This now must not be done with their definition
// anymore, because now type "fs::perms" is boxed instead of type "integer"
constOwnRead = UnderlyingIntegral( fs::perms::owner_read );
constOwnWrite = UnderlyingIntegral( fs::perms::owner_write );
constOwnExec = UnderlyingIntegral( fs::perms::owner_exec );
constGrpRead = UnderlyingIntegral( fs::perms::group_read );
constGrpWrite = UnderlyingIntegral( fs::perms::group_write );
constGrpExec = UnderlyingIntegral( fs::perms::group_exec );
constOthRead = UnderlyingIntegral( fs::perms::others_read );
constOthWrite = UnderlyingIntegral( fs::perms::others_write);
constOthExec = UnderlyingIntegral( fs::perms::others_exec );
// A sample box for the new type "fs::perm"
TypePermission = fs::perms::owner_read; // ...could be any other enum element as well!
ConstantIdentifiers=
{
{ {A_CHAR("OwnerRead") , lang::Case::Ignore, 1, 1}, constOwnRead },
{ {A_CHAR("OwnerWrite") , lang::Case::Ignore, 1, 1}, constOwnWrite },
{ {A_CHAR("OwnerExecute") , lang::Case::Ignore, 1, 1}, constOwnExec },
{ {A_CHAR("GroupRead") , lang::Case::Ignore, 1, 1}, constGrpRead },
{ {A_CHAR("GroupWrite") , lang::Case::Ignore, 1, 1}, constGrpWrite },
{ {A_CHAR("GroupExecute") , lang::Case::Ignore, 1, 1}, constGrpExec },
{ {A_CHAR("OthersRead") , lang::Case::Ignore, 1, 1}, constOthRead },
{ {A_CHAR("OthersWrite") , lang::Case::Ignore, 1, 1}, constOthWrite },
{ {A_CHAR("OthersExecute"), lang::Case::Ignore, 1, 1}, constOthExec },
};
Functions=
{
{ {A_CHAR("Name") , lang::Case::Ignore, 4 }, CALCULUS_SIGNATURE(nullptr ), CALCULUS_CALLBACK(getName ), &Types::String , ETI },
{ {A_CHAR("IsDirectory") , lang::Case::Ignore, 2, 3}, CALCULUS_SIGNATURE(nullptr ), CALCULUS_CALLBACK(isFolder ), &Types::Boolean , ETI },
{ {A_CHAR("Size") , lang::Case::Ignore, 4 }, CALCULUS_SIGNATURE(nullptr ), CALCULUS_CALLBACK(getSize ), &Types::Integer , ETI },
{ {A_CHAR("Date") , lang::Case::Ignore, 4 }, CALCULUS_SIGNATURE(nullptr ), CALCULUS_CALLBACK(getDate ), &Types::DateTime, ETI },
// change return type to TypePermission
{ {A_CHAR("Permissions") , lang::Case::Ignore, 4 }, CALCULUS_SIGNATURE(nullptr ), CALCULUS_CALLBACK(getPerm ), &TypePermission , ETI },
{ {A_CHAR("KiloBytes") , lang::Case::Ignore, 1, 1}, CALCULUS_SIGNATURE(Signatures::I), CALCULUS_CALLBACK(kiloBytes), &Types::Integer , CTI },
{ {A_CHAR("MegaBytes") , lang::Case::Ignore, 1, 1}, CALCULUS_SIGNATURE(Signatures::I), CALCULUS_CALLBACK(megaBytes), &Types::Integer , CTI },
{ {A_CHAR("GigaBytes") , lang::Case::Ignore, 1, 1}, CALCULUS_SIGNATURE(Signatures::I), CALCULUS_CALLBACK(gigaBytes), &Types::Integer , CTI },
};
}
};

Apart from initializing the constant boxes, the only new line of code is the definition of the sample box for the return value, which is then used in the function table to denote the return type of function Permissions.

Note
The constant identifiers we had added to the sample returned a constant value and hence no need for a "sample box" was needed with them.
The other way round, the definition of object TypePermission could have been omitted and just one of the constant values, e.g. constOwnRead could have been used as the return type sample box in the function table entry.
We therefore define one unnecessary object of type Box, which resides in the compiled software occupying 24 bytes, in favour to better readable code. Real-life plug-ins could find other solutions, e.g. using a preprocessor macro, to save this small overhead.

Let's see what happens if we try to compile the previous expression:

E1: <expressions::BinaryOperatorNotDefined>
Operator '&' not defined for types "Unknown Type" and "Unknown Type".
I2: <expressions::ExpressionInfo>
Expression: {(permissions & OwnerExecute) != 0}
^->

The compiler throws a run-time exception, noting that operator '&' is not defined. The first thing we want to fix is the output information of this Exception itself. While in general it is not necessary to announce custom types explicitly, the exclamation is is that the human-readable information collected in exceptions thrown by the library benefits from it. For just this purpose, method Compiler::AddType is available. Consequently, we add statement

// Announce our custom type to the compiler
compiler.AddType( TypePermission, "Permission" );
//...

to the constructor of our plug-in.
With this in place, the exception thrown looks as follows:

E1: <expressions::BinaryOperatorNotDefined>
Operator '&' not defined for types "Permission" and "Permission".
I2: <expressions::ExpressionInfo>
Expression: {(permissions & OwnerExecute) != 0}
^->

This looks better, but still its an exception. What it tells us is to define the operator. We do this for a bunch of operators at once. Firstly, we need the callbacks for the operators:

Box opPermAnd( ExpressionScope& scope, ArgIterator argsBegin, ArgIterator argsEnd )
{
return argsBegin ->Unbox<fs::perms>()
& (argsBegin + 1)->Unbox<fs::perms>();
}
Box opPermOr( ExpressionScope& scope, ArgIterator argsBegin, ArgIterator argsEnd )
{
return argsBegin ->Unbox<fs::perms>()
| (argsBegin + 1)->Unbox<fs::perms>();
}
Box opPermXOr( ExpressionScope& scope, ArgIterator argsBegin, ArgIterator argsEnd )
{
return argsBegin ->Unbox<fs::perms>()
^ (argsBegin + 1)->Unbox<fs::perms>();
}
Box opPermEq( ExpressionScope& scope, ArgIterator argsBegin, ArgIterator argsEnd )
{
return argsBegin ->Unbox<fs::perms>()
==(argsBegin + 1)->Unbox<fs::perms>();
}
Box opPermNEq( ExpressionScope& scope, ArgIterator argsBegin, ArgIterator argsEnd )
{
return argsBegin ->Unbox<fs::perms>()
!=(argsBegin + 1)->Unbox<fs::perms>();
}

This is the first time that two parameters are read in the callbacks. It is done using simple iterator arithmetics.

Struct Calculus organizes compilation information on unary and binary operators in a hash map. For filling the map, a convenience function is available that accepts a simple array of information entries. This array usually is defined in the anonymous namespace of the compilation unit:

{
{ A_CHAR("&") , TypePermission, TypePermission, CALCULUS_CALLBACK( opPermAnd ), TypePermission , Calculus::CTI },
{ A_CHAR("|") , TypePermission, TypePermission, CALCULUS_CALLBACK( opPermOr ), TypePermission , Calculus::CTI },
{ A_CHAR("^") , TypePermission, TypePermission, CALCULUS_CALLBACK( opPermXOr ), TypePermission , Calculus::CTI },
{ A_CHAR("=="), TypePermission, TypePermission, CALCULUS_CALLBACK( opPermEq ), Types::Boolean , Calculus::CTI },
{ A_CHAR("!="), TypePermission, TypePermission, CALCULUS_CALLBACK( opPermNEq ), Types::Boolean , Calculus::CTI },
};

For information about the meaning of the values of the table, consult the documentation of Calculus::OperatorTableEntry . But looking at the code, and reflecting what was already presented in this tutorial, the meaning should be is quite self-explanatory. It just should be noted, that also for operators, flags Calculus::CTI or Calculus::ETI may be given. If, like in our case, CTI is specified, then in the moment that both operands are constant, the compiler will optimize and the callbacks are pruned from the compiled expression. This means, that for example sub expression:

   ( OwnerRead | GroupRead | OwnerExecute | GroupExecute )

will be reduced to one single constant in the compiled expression program, because each of the identifiers returns a constant value.

Finally, in the constructor of the plug-in we now add the following line of code:

AddOperators( binaryOpTable );

With this in place, the expression now compiles in a type-safe way:

--- Filter Expression {(permissions & OwnerExecute) == OwnerExecute}: ---
detail
plugins
util
Note
The expression got slightly changed from:
 (permissions & OwnerExecute) != 0
to
 (permissions & OwnerExecute) == OwnerExecute
which is logically the same. To allow the first version, a replacement identifier for integral value 0, e.g. NoPermission had to be inserted to the plug-in.

6.5 Implementing Auto-Casts

To finalize this tutorial part of the documentation, a last quite powerful feature of ALib Expressions is presented. We re-think again what we did in the previous section:

  • We changed identifier Permissions to return values of custom type fs::perms.
  • Because of this, the built-in bitwise boolean operators are not applicable any more and therefore, we implemented operators for the custom type.

For the latter, there is an alternative available, called "auto-casting". If no compiler plug-in compiles an operator for a given argument or pair of arguments, then the compiler invokes method CompilerPlugin::TryCompilation(CIAutoCast&) for each plugin. In the case that one of the plug-ins positively responds by providing one or two "cast functions", the compiler inserts the cast functions for one or both arguments and performs the search for an operator of this now new type (respectively pair of types) a second time.

We add such "auto-casts" for allowing the compiler to convert fs::perms to integer. This approach obviously has the following consequences:

  • "Permissions" still is a distinguishable type.
  • Specific operators using the type can be defined and will be selected with preference.
  • Existing binary operators for integers become available to the type in addition, but with a lower priority. (Because auto-casting is performed only if no direct match exists.)
  • Expression terms using the type are "type-unsafe", the same as in the first implementation, when identifier Permission returned an integer type. For example expression { Permissions == 0 } is well compiled and evaluated.
Note
It is a design decision of ALib Expressions that auto-cats are performed only once per expression operator and not a second or third time: If after one plug-in performed auto-casts, still no matching operator is found, then compilation fails.
One rationale for this behavior is that repeated auto-casts would undermine the "type safeness" of ALib Expressions too much by leading to maybe unforeseen effects.
For example, consider a custom application supposed over two different enum types, which both are exposed by expression function result values. Now, the application programmer might want to allow built-in bitwise boolean operators that are defined on integral values to become compatible with both types. For both types she would implement an auto-cast to integer type. The effect is that the operators work, but also combinations of the custom enum types and integer literals. So far, so good. Now, if the library would do repetitive auto-casts, then even binary operators that contain different custom types for lhs and rhs would be compiled and evaluated. Hence, this would conclude in even less strict type-safeness!
If strict type-safeness is demanded, auto-casts are to be avoided! Instead, the implementation of all needed operators for the custom types is to be preferred. The decision for either approach is a matter of taste and field of application.

6.5.1 Implementing Auto-Casts Using A Compiler Plug-in

To implement this, we revert the most recent code changes (the operator callbacks, the binary operator table and the single line of code that feeds the table to parent Calculus).

As a replacement, we add the following callback function which casts a permission type to Types::Integer :

Box perm2Int( ExpressionScope& scope, ArgIterator argsBegin, ArgIterator argsEnd )
{
return static_cast<integer>( argsBegin->Unbox<fs::perms>() );
}

A cast function takes one parameter of the originating type and returns the converted value. In this sample, this is trivial. Sometimes more complex code is needed. Casting one type to another might even include memory allocations to create a certain custom type from a given value. Such allocations, have to be performed using the provided, Scope object, which, as explained later, optionally is of custom type. Allocations done by auto-casting are then to be deleted when the scope object is deleted or reset.

Note
In the case that casting is done at compile-time (due to expression optimization) such allocations will be then be performed using the compile-time scope which survives the expression's life cycle.

With this casting callback function in place, we add the following method to the custom plugin:

virtual bool TryCompilation( CIAutoCast& ciAutoCast ) override
{
// we don't cast for conditional operator "Q ? T : F"
// Note: It is usually a good practice to also cast for this operator.
// This code is just a sample to demonstrate how to omit casting for certain operator(s).
if( ciAutoCast.Operator.Equals<false>( A_CHAR("Q?T:F") ) )
return false;
bool result= false;
// cast first argument (lhs, if binary op)
if( ciAutoCast.ArgsBegin->IsType<fs::perms>() )
{
result= true;
if( ciAutoCast.IsConst )
{
// compile-time invocation
ciAutoCast.TypeOrValue= perm2Int( ciAutoCast.CompileTimeScope,
ciAutoCast.ArgsBegin,
ciAutoCast.ArgsEnd );
}
else
{
ciAutoCast.Callback = perm2Int;
ciAutoCast.TypeOrValue = Types::Integer;
ALIB_DBG( ciAutoCast.DbgCallbackName= "perm2Int"; )
}
}
// cast RHS, if given
if( ciAutoCast.ArgsBegin + 1 < ciAutoCast.ArgsEnd
&& (ciAutoCast.ArgsBegin + 1)->IsType<fs::perms>() )
{
result= true;
if( ciAutoCast.RhsIsConst )
{
// compile-time invocation
ciAutoCast.TypeOrValueRhs= perm2Int( ciAutoCast.CompileTimeScope,
ciAutoCast.ArgsBegin + 1,
ciAutoCast.ArgsEnd );
}
else
{
ciAutoCast.CallbackRhs = perm2Int;
ciAutoCast.TypeOrValueRhs = Types::Integer;
ALIB_DBG( ciAutoCast.DbgCallbackNameRhs= "perm2Int"; )
}
}
return result;
}

Likewise with previous solution, our sample expression compiles with the very same result:

--- Filter Expression {(permissions & OwnerExecute) == OwnerExecute}: ---
detail
plugins
util

However, unlike the recent implementation, compilation is not type-safe in respect to mixing fs::perms with integer values:

--- Filter Expression {(permissions & 64) != 0}: ---
detail
plugins
util

6.5.2 Implementing Auto-Casts Using Class Calculus

This was a rather simple use case, but a very frequent one. Again class plugins::Calculus , may be used to avoid the code from the previous section and replace it by just one line of static table data.

To demonstrate this, we remove the code of the previous section. This is not only in consideration of method TryCompilation, but we can also remove the custom callback function that performed the cast!

Now, we add the following statement to the constructor of our custom compiler plug-in which is already derived from class Calculus

AutoCasts=
{
{ TypePermission, nullptr, nullptr, CALCULUS_DEFAULT_AUTOCAST, nullptr, nullptr },
};

A quick check confirms that our sample expression compiles and evaluates the same as before:

--- Filter Expression {(permissions & OwnerExecute) == OwnerExecute}: ---
detail
plugins
util

The various options and fields of table Calculus::AutoCasts are not explained here, but well documented with inner struct Calculus::AutoCastEntry .
Further documentation is found with method Calculus::TryCompilation(CIAutoCast&) , including some hints about the use cases not covered by this helper class, hence those that demand the implementation of a custom TryCompilation method.

7. Built-In Expression Functionality

The types, identifiers, functions and operators presented in this manual section are to be named "built-in" in that respect, that they are available by default. But the truth is, that they are implemented using the very same technique of providing compiler plug-ins, that has been explained in the previous section. This way, this built-in logic is fully optional and can be easily switched off, partly or completely.

For doing so, class Compiler offers a set of configurable flags, gathered in member Compiler::CfgBuiltInPlugins . The flags are declared with enumeration Compiler::BuiltInPlugins and are evaluated in method Compiler::SetupDefaults . Field CfgBuiltInPlugins defaults to Compiler::BuiltInPlugins::ALL . With this information, it is easy to understand that the following code of setting up a compiler:

Compiler compiler;
compiler.CfgBuiltInPlugins= Compiler::BuiltInPlugins::NONE;
compiler.SetupDefaults();

leads to a compiler that does not compile anything.

It should be very seldom that disabling of one or more of the built-in compiler plug-ins is needed. Here are some rationals for this statement:

  • While in a certain domain-specific use case scenario, e.g. string handling or math functions are not needed, it should not impose a negative effect if they are provided.
  • The compiler plug-ins are invoked by the compiler in an ordered fashion. Custom plug-ins have highest priority. This way, selected functionality can be "overridden" by a custom plug-in by just compiling permutations operators/function names and their arguments, which otherwise would be compiled by a lower-prioritized, built-in plug-in.
  • The availability of unused functionality provided by unnecessary plug-ins has no effect on the evaluation-time of expressions. There is only a very small downside in compile-time.

7.1 Completeness Of Built-In Functionality

In the default setup (all built-in plug-ins are active), ALib Expressions is considered to be "complete" in respect to providing all reasonable operators for permutations of arguments of all built-in types.

This manual does not elaborate about implications in respect to such completeness in the case that selected built-in plug-ins are omitted. It is up to the user of the library to think about such implications and provide alternatives to the built-in functionality that is decided to be left out.
On the same token, there is no mechanism to disable the compilation of selected built-in compiler plug-ins and with that, their inclusion in the library code. If such is to be achieved in favour to code size, a custom build-process has to be set up.

7.2 Types

As explained in previous sections of this manual, the introduction of types to ALib Expressions is performed in an implicit fashion: New types are introduced in the moment a callback function chooses to return one and consequently, the corresponding compiler plug-in announces this return type of such callback to the compiler during the compilation process.

Therefore, the set of built-in types is resulting from the set of built-in compiler plug-ins. Nevertheless, the library design opted to collect sample boxes for the set in struct Types , which is defined right in namespace alib::expressions.

It is notable that no built-in support for unsigned integral values is provided. In the unlikely event that this is needed for any reason, such support can quite easily by implemented by a custom plug-in. As a jump-start, the source code of class Arithmetics might by used.

Furthermore, all possible sizes of C++ integral values are collectively casted to integer , which on a 64-bit platform to 64-bit signed integral value and to a 32-bit signed integral on a 32-bit platform.

Finally, Types::Float is internally implemented using C++ type double. No (built-in!) support for C++ types float and long double is provided.

This reduction of used types simplifies the built-in plug-ins dramatically and reduce the libraries footprint, as it reduces the number of type-permutations to a reasonable minimum.

Due to the type-safe compilation, adding custom types has no impact on evaluation performance of operators and functions that use the built-in types (or other custom types).

7.3 Arithmetics

What is called "arithmetics" with this library comprises the implementation of unary and binary operators for permutations of types Boolean , Integer and Float .

The operators and some few identifiers and functions are collectively implemented and documented with plug-in Arithmetics .

7.4 Math Functions

Fundamental mathematical functions like trigonometrical, logarithms, etc. are collectively implemented and documented with plug-in Math .

7.5 String Expressions

Plug-In Strings provides quite powerful string operations. The library here benefits tremendously from underling modules ALib Strings and ALib Boxing .

For example, operator Add ('+') allows to concatenate two strings, but also a string with "any" other built-in or custom type. The latter - namely that there is no need to define an overloaded expression operator for strings and custom types - is achieved by leveraging box-function FAppend . Consult the user manual of ALib Boxing for details on how to implement this interface for your custom types, to allow end-users to concatenate your types to strings within expressions.

All built-in string features, including:

  • wildcard matching (using '*' and '?')
  • matching of regular expressions, as well as
  • a powerful Format(String, ...) function

is given with the plug-in's documentation .

7.6 Date And Time Expressions

The built-in types Types::DateTime and Types::Duration represent ALib classes DateTime and TimePointBase::Duration of the same name. The corresponding Expression functionality is implemented and documented with plug-in DateAndTime .

If a user of ALib Expressions prefers to use different, own or 3rd-party types, then support for such type needs to be implemented by a custom plug-in. Such implementation may be created by copying the source code of built-in plug-in DateAndTime and replacing all corresponding code lines to work with the desired date and time types. If wanted, some or all identifiers might remain the same and even if the built-in plug-in may be kept active. In the latter case, no clash of identifiers would occur. This is because the custom plug-in would usually be inserted to the compiler with a higher priority than the priority of the built-in plug-in.

7.7 Conditional Operator

The conditional operator Q ? T : F is the only ternary operator, and (for technical reasons) not implemented as a plug-in. In contrast, it is hard-coded in the details of this library's implementation.
This is not considered a huge limitation, as there is no obvious use case, why this operator should be overloaded: It's meaning is the same for any use of types.

The conditional argument 'Q', which of-course could result in a value of any built-in or custom type, is interpreted as a boolean value using box-function FIsTrue . While a default implementation for this box-function exists that evaluates any custom type, a provision of this interface for a custom type may be used to override this default implementation.

For result arguments 'T' and 'F', the only requirement that needs to be fulfilled is that both are of the same type or that a compilation plug-in for auto-casting them to a joint type exists.

This means:

  • Support for custom types is given, if both arguments share the same custom type.
  • To support a mix of at least one custom types with a different built-in or custom type, corresponding auto-cast mechanics have to be provided.

A variant of the conditional operator is the so called "Elvis Operator", A ?: B. This variant is duly supported by this library and compiled as binary operator DefaultBinaryOperators::Elvis just as any other operator is - including that the compiler tries to perform an auto-cast, if needed.

Built-in compiler plug-in ElvisOperator handles this operator for built-in types as well as for custom-types, in the case that 'A' and 'B' share the same type.

Similar to the conditional operator, the default implementation invokes box-function FIsTrue on argument 'A' and decides whether 'A' or 'B' is chosen. This default behavior can be changed by just implementing the elvis operator, likewise any other operator would be implemented.

7.8 Auto-casts

Built-in compiler plug-in AutoCast offers casting proposals to the compiler in respect to the built-in types.

For details on the casting facilities, consult the class's documentation .

8 Scopes

As it was demonstrated in 4.4 Exposing The Directory Entry To ALib Expressions, a customized (derived) version of struct Scope is passed to method Expression::Evaluate , and the very same object is passed to the callback functions, when the expression program is executed by the built-in virtual machine. As a result, a custom callback function can rely on the fact that it is possible to dynamically cast parameter scope back to the custom type and access "scoped data" which exposes an interface into the application that uses ALib Expressions .

This is the most obvious and also intuitively well understandable role of struct Scope. But there are other important things that this class provides.

Note
Struct Scope is a good (or bad!) sample of this library's design principle discussed in chapter 3.4 Bauhaus Code Style. Remember, that the software that uses ALib Expressions is supposed to hide struct Scope with all it's publicly accessible members, same as all other details of this library.
In other words: not all members that are accessible should be accessed. Some care has to be taken.

8.1 Provision Of The Evaluation Stack

Struct Scope incorporates field Scope::Stack . This vector is used by the built-in virtual stack-machine implementation during evaluation. This way, it was possible to implement the machine's execution method without using any data exposed by the machine (in fact, the machine is a pure static class).

The important consequence is:

A Scope object must not be used in parallel execution threads, for evaluating two different expressions. If two scopes are used, the parallel evaluation of two different expressions is allowed.

It is always a good design principle to pack an instance of a scope for evaluation together with one expression into a containing, encapsulating object. This was demonstrated in section 4.2 Adding Generic Ingredients Needed For Expression Evaluation when sample type FileFilter was introduced.

8.2 Scope Allocations

A next important role that struct Scope fulfills is to provide fields that allow to allocate temporary data. With a simple arithmetic expression like this:

   1 * 2 + 3

no allocations are needed. The reason is that the intermediate result of the multiplication of integer constants, can be (and is) stored as a value in the Box object that operator '*' returned. However, an expression with string operations like this:

   "Hello " + "beautiful " + "world!"

incorporates intermediate results (in this case "Hello beautiful "). Space for such intermediate results has to be allocated somewhere, because the Box object stores only a pointer to a character array, together with its length. In fact, the final result string has as well be allocated, because again, the result of the expression is a boxed string which needs allocation.

For this reason, struct Scope incorporates some built-in "facilities" to allocate data. Those are briefly:

  • Scope::Allocator
    This is a simple but powerful "monotonic memory allocator" of type MonoAllocator . As it name indicates, it allocates chunks of memory and thus leads to high run-time performance, because it reduces allocation costs in contrast to the repeated allocations for each needed. And even better: the allocated chunks remain allocated and are reused, when an expression is evaluated against a next scope. This usually reduces the need for memory allocations to zero, starting with the second evaluation!
    Its templated methods Alloc and Emplace allow the creation of custom object types.
    Furthermore, this allocator is useful to allocate string data, when used in combination with type TMAString .
  • Scope::Resources
    A simple vector of pointers to objects of type ScopeResource . This type is an extremely simple container. All it does ist to provide a virtual destructor which deletes data contained in custom derived types.
  • Scope::NamedResources
    A hash map of pointers to objects of typeScopeResource . Its purpose and use will be discussed in a later section.

All objects allocated are deleted with method Scope::Reset , which is internally invoked when appropriate.

Of-course, custom specializations of the class, which anyway have to be created for the purposes discussed before, may provide other fields that can be used to allocate memory resources, tailored to the type of objects needed. But it has to be made sure, that method Scope::Reset is overridden to free all resources. And: when overriding this method, it has to be assured that the original virtual function of the base class is invoked as well, because built-in plug-ins allocate resources by using the built-in features of struct Scope.

8.3 Compile-Time Scopes

So far, we talked only about the instance of struct Scope that is provided to method Expression::Evaluate . But there is a second scope object created, that is called the "compile-time scope". If you reconsider the sample expression from the previous section:

   "Hello " + "beautiful " + "world!"

All three string type arguments are constant string literals. The operator '+' is implemented with built-in compiler plug-in Strings , which defines the operator being "compile-time evaluable". As explained in the tutorial section, this means that in the moment all arguments are constant, struct Calculus (the parent of struct Strings), invoked the operator's callback function at compile-time. Callback functions however rely on a scope object, e.g. for memory allocation, as just discussed.

For this reason, a compile-time singleton of type Scope is created and provided to the callback functions during compilation of constant terms. Intermediate results may this way be stored either in the compile-time scope instance or in the evaluation-time instance. The latter is cleared upon each evaluation, the data allocated in the compile-time scope is cleared only with the deletion of the expression.

8.4 Custom Compile-Time Scopes

If at least one custom callback function that is compile-time invokable uses custom allocation tools which are only provided by a corresponding custom version of the type, then - ooops!

To support this scenario, a derived version of class Compiler has to be created, which re-implements virtual method getCompileTimeScope . This method is internally called with method Compile to allocate the the compile-time scope.

If the conditions described above are met, then this method has to be overwritten to return a heap-allocated custom scope object. This object will internally be deleted with the deletion of the expression.

Custom callback functions can then rely on the fact that the compile-time scope object can be dynamically casted to the custom type and use its custom allocation facilities.

8.5 Using Compile-Time Resources At Evaluation Time

So far, things had been still quite straight forward. Let us quickly recap what was said about scopes:

  • Scopes are used to provide evaluation time data from the application.
  • Scopes are used to allocate data for intermediate and final expression result objects.
  • Due to the fact that ALib Expressions provides the feature of compile-time optimization, a compile-time scope is created with the compilation of the expression.
  • Of-course, any evaluation specific field of custom scopes which provide access to the application data are nulled in the compile-time scope and accessing them is unspecified behavior (usually the program crashes).
  • The life-cycle of the compile-time scope is bound to the life-cycle of a compiled expression. Its method Scope::Reset is only called with destruction.
  • The life-cycle of the evaluation-time scope is user dependent. It is strongly recommended to create one object and reuse this object for each evaluation (as sampled in the tutorial). Its Reset method is automatically (internally) called at the beginning of method Expression::Evaluate . This also means, that the expression result object of the previous call to Evaluate becomes invalid (if it is relies on evaluation-time allocated data).

This concept of having two separated scope objects in certain cases is extended. In general terms, it could be phrased as follows:

Compiler plug-ins may choose to create resources at compile-time, which are not intermediate constant results, but which are objects used at evaluation time.

To support this, two further fields are found in class Scope:

  • CTScope
    During evaluation, this pointer of the evaluation time scope, provides access to the compile-time scope. In contrast to this, at compilation-time this field equals nullptr, because the given scope object already is the compile-time scope.
    (Consequently, simple inline method Scope::IsCompileTime just checks this pointer for being nullptr.)
  • CTScope
    This hash-map provides access to "named" resources. It is provided to allow creation and storage of resources at compile-time, which are then retrieved during evaluation using a specific resource name.

The following sample taken from the built-in compiler plug-in Strings nicely demonstrates what can be achieved with this concept.

8.6 Sample For Using Compile-Time Resources At Evaluation Time

Built-in compiler plug-in Strings provides expression function WildcardMatch, which matches a pattern against a given string. For example, expression

       WildcardMatch( "MyPhoto.jpg", "*.jpg" )

evaluates to true.

Note
The function is alternatively available through overloaded binary operator '*'. The sample expression of above can this way also be phrased:
    "MyPhoto.jpg" * "*.jpg"
To implement this function, internally helper class WildcardMatcher provided by underlying library module ALib Strings is used. For performance reasons, this class implements a two-phased approach: First, the "pattern" string (here "*.jpg") is parsed and translated into a set of internal information. Then, for a performing a single match, this internal information is used, which is much faster than if the pattern still had to be parsed.

In the most cases, an expression string given by an end-user would contain a non-constant string to match and a constant pattern string, like in the following expression:

       filename * "*.jpg"

In this case, it would be most efficient, if the pattern string was passed to an instance of ALib class WildcardMatcher at compile-time, while at evaluation time this exact matcher would be used to perform the match.

This setup already explains it all:

  • The instance of WildcardMatch is to be created at compile-time and stored as a named resource in the compile-time scope object. For the name (storage key) of the resource, the pattern string is used.
  • At evaluation time, the object is retrieved by accessing the named resources of the compile-time scope and the match is performed against the first given function argument, while the second argument is used to search a named resource. If one is found, the already set-up matcher is used.

You might not be interested in the details of the implementation and skip the rest of the chapter. The code becomes a little more complex than usual plug-in code. The reason is that helper struct Calculus does not provide a mechanism to support this.

We start with defining the resource type, derived from struct ScopeResource . This simply wraps a matcher object and its sole purpose is to have a virtual destructor that later allows internal code to delete the matcher:

struct ScopeWildcardMatcher : public ScopeResource
{
// the matcher object
WildcardMatcher matcher;
// virtual destructor, implicitly deletes the matcher.
virtual ~ScopeWildcardMatcher() override {}
};

Next, method TryCompilation needs to be overwritten to be able to fetch the function:

bool Strings::TryCompilation( CIFunction& ciFunction )
{
// invoke parent
if( !Calculus::TryCompilation( ciFunction ) )
return false;

The methods starts by invoking the original implementation of parent Calculus. Because the wildcard function is compile-time invokable, in the (unlikely) case that both parameters are constant, a constant value would be returned. Only if one of the parameters is non-constant, then the callback is set to callback function wldcrd.

The following if-statement selects this case that we are interested in:

if( ciFunction.Callback == wldcrd && (ciFunction.ArgsBegin + 1)->UnboxLength() > 0)
{

If the second parameter is not an empty string, obviously a constant value was given.

Note
The reason for this is: if it was not a constant string, a sample box value was provided to only denote the argument type and this would be of zero length.
Furthermore, it might be mentioned that we do not need to check the parameter type(s) to be of string type, because if it was not, then the call to the original implementation that was done at the beginning had not returned true!

Now, we extract the pattern string and combine it with prefix "_wc" to a key string to store the resource:

String pattern= (ciFunction.ArgsBegin + 1)->Unbox<String>();
NString128 keyString(A_CHAR("_wc"));
keyString.DbgDisableBufferReplacementWarning();
keyString << pattern;

It may happen, that an expression uses the same pattern twice. In this case, the same matcher object can be used. Therefore, it has to be checked, if a matcher with that same pattern already exists. If not, it is created:

auto hashCode = keyString.Hashcode();
auto storedMatcher= ciFunction.CompileTimeScope.NamedResources.Find( keyString, hashCode );
if( storedMatcher == ciFunction.CompileTimeScope.NamedResources.end() )
{
ScopeWildcardMatcher* matcher= new ScopeWildcardMatcher();
matcher->matcher.Compile( pattern );
NString keyCopy= ciFunction.CompileTimeScope.Allocator.EmplaceString( keyString );
ciFunction.CompileTimeScope.NamedResources.InsertUnique( std::make_pair(keyCopy, matcher),
hashCode);
}
}
return true;

After that, TryCompilation exits, signaling compilation success. All that is left to do is the implementation of the callback function. At the beginning the function checks if this is an evaluation-time invocation. In this case, it searches a named resource according to the given pattern string. If this is found, the function uses the resourced matcher and exits:

Box wldcrd( Scope& scope, ArgIterator args, ArgIterator end )
{
String haystack= STR(ARG0);
String pattern = STR(ARG1);
lang::Case sensitivity= ( end-args > 2 && BOL(ARG2) ) ? lang::Case::Ignore
: lang::Case::Sensitive;
if( !scope.IsCompileTime() )
{
// Search for resource named "_wc"+ pattern.
NString128 keyString("_wc");
keyString.DbgDisableBufferReplacementWarning();
keyString << pattern;
auto storedMatcher= scope.CTScope->NamedResources.Find( keyString );
if( storedMatcher != scope.CTScope->NamedResources.end() )
return dynamic_cast<ScopeWildcardMatcher*>( storedMatcher.Mapped() )
->matcher.Match( haystack, sensitivity );
}

If either this is compile-time or no resource matcher was found (which indicates that the pattern argument is not constant), the match is performed using a local, one-time matcher object.

// This is either compile-time or the pattern string is not constant
{
WildcardMatcher matcher( pattern );
return matcher.Match( haystack, sensitivity );
}
}

9. Operators

9.1 Built-In And Custom Operators And

In its default configuration, module ALib Expressions parses and compiles an almost complete set of operators known from the C++ language. Not supported by default are for example assignment operators like e.g. '+=' or increments '++'. Operators included are:

Unary Operators:

  • Positive, '+'
  • Negative, '-'
  • BoolNot, '!'
  • BitNot, '~'
  • Indirection, '*'

Binary Operators:

  • Multiply , '*'
  • Divide , '/'
  • Modulo , ''
  • Add , '+'
  • Subtract , '-'
  • ShiftLeft , '<<'
  • ShiftRight , '>>'
  • Smaller , '<'
  • SmallerOrEqual , '<='
  • Greater , '>'
  • GreaterOrEqual , '>='
  • Equal , '=='
  • NotEqual , '!='
  • BitAnd , '&'
  • BitXOr , '|'
  • BitOr , '^'
  • BoolAnd , '&&'
  • BoolOr , '||'
  • Assign , '='

Special Operators:

  • Ternary , 'Q ? T : F'
  • Elvis , 'A ?: B'
  • Subscript , '[]'

Not only the operators were taken from C++, but in the case of binary operators, also the definition of their precedence.

The built-in operators are set by method Compiler::SetupDefaults if flag Compilation::DefaultUnaryOperators , respectively Compilation::DefaultBinaryOperators is set in bitfield Compiler::CfgCompilation .

Internally the following approach is taken:

This is a rather simple process, and thus it is similar simple to intervene and customize the operators. While removing what is built-in is seldom necessary, adding an operator might be wanted. This is exercised in the next section.

9.2 Tutorial: Adding A Custom Operator

With what was described in the previous chapter, the following options of customizing the operators parsed and compiled by module ALib Expressions can be taken:

As a sample, the goal is to have a new binary operator '{}' that allows to format the right-hand side operand according to a format provided with the left-hand side operand. Let's first check what happens if we just start and use the operand:

Compiler compiler;
compiler.SetupDefaults();
ExpressionScope scope( compiler.CfgFormatter );
SPExpression expression= compiler.Compile( A_CHAR("\"Hexadecimal: 0x{:x}\" {} 42") );

This produces the following exception which indicates that parsing the expression fails due to a syntax error:

E1: <expressions::SyntaxError>
Syntax error parsing expression.
I2: <expressions::ExpressionInfo>
Expression: {"Hexadecimal: 0x{:x}" {} 42}
^->

Now we define the operator:

Compiler compiler;
compiler.SetupDefaults();
compiler.AddBinaryOperator( A_CHAR("{}") , 900);
ExpressionScope scope( compiler.CfgFormatter );
SPExpression expression= compiler.Compile( A_CHAR("\"Hexadecimal: 0x{:x}\" {} 42") );

We give the operator a high precedence, on the level of operator '*'. The operator precedence values are documented with DefaultBinaryOperators .
The exception changes to:

E1: <expressions::BinaryOperatorNotDefined>
Operator '{}' not defined for types "String" and "Integer".
I2: <expressions::ExpressionInfo>
Expression: {"Hexadecimal: 0x{:x}" {} 42}
^->

Obviously, now the parser recognized the operator. This single line of code, was all we needed to do to define the operator.

To get a working sample, a compiler plug-in that compiles the operator for left-hand side strings and any right hand side type is needed. Here it is:

struct FormatOperator : CompilerPlugin
{
FormatOperator( Compiler& compiler )
: CompilerPlugin( "Tutorial Plugin", compiler )
{}
virtual bool TryCompilation( CIBinaryOp& ciBinaryOp ) override
{
// check if it is not us
if( ciBinaryOp.Operator != A_CHAR("{}")
|| !ciBinaryOp.ArgsBegin->IsSameType( Types::String ) )
return false;
// set debug info
ALIB_DBG( ciBinaryOp.DbgCallbackName = "CBFormat"; )
// all is const? We can do it at compile-time!
if( ciBinaryOp.LhsIsConst && ciBinaryOp.RhsIsConst )
{
ciBinaryOp.TypeOrValue= expressions::plugins::CBFormat(ciBinaryOp.CompileTimeScope,
ciBinaryOp.ArgsBegin,
ciBinaryOp.ArgsEnd );
return true;
}
// set callback
ciBinaryOp.Callback = expressions::plugins::CBFormat;
ciBinaryOp.TypeOrValue = Types::String;
return true;
}
};

With the plug-in attached:

Compiler compiler;
compiler.SetupDefaults();
compiler.AddBinaryOperator( A_CHAR("{}") , 900);
FormatOperator plugin( compiler );
compiler.InsertPlugin( &plugin, CompilePriorities::Custom );
ExpressionScope scope( compiler.CfgFormatter );
SPExpression expression= compiler.Compile( A_CHAR("\"Hexadecimal: 0x{:x}\" {} 42") );
cout << expression->Evaluate( scope ) << endl;

The expression compiles and results in:

Hexadecimal: 0x2a
Attention
While "verbal" operator names are allowed as aliases of operators (see next section), operator symbols must not contain alphanumerical characters and character '_' (underscore).
Note
  • Any changes in respect to operator setup has to be made prior to invoking method Compiler::Compile for the first time, because with that, the internal parser is created (once) and configured according to these settings. Later changes will have no effect or result in undefined behavior.
  • The registration of custom compiler plug-ins may be done before or after modifying the operator setup of the compiler.

9.3 Verbal Operator Aliases

End-users that are not too familiar with programming languages might find it easier to use verbal operators. Instead of writing:

GetYear(Today) == 2017 && GetDayOfWeek(Today) != Monday

they prefer:

GetYear(Today) equals 2017 and GetDayOfWeek(Today) not_equals Monday

Such sort of "verbal" expressions are supported and enabled by default, with the concept of "Verbal Operator Aliases". As the term explains already, verbal operators can not be defined with this library as being full featured "stand-alone" operators, but only as aliases for existing symbolic operators.

Note
The rationale behind this design decision is that usually, verbal operators just in fact are aliases. Now, with restricting verbal aliases to be aliases, the number of "real" operators does not increase by adding verbal aliases. In the case it did, the number of permutations of operators and types that had to be overloaded would drastically increase.

The default built-in (resourced) verbal operator aliases are:

Verbal Operator Is Alias For
Not Unary operator '!'
And Binary operator '&&'
Or Binary operator '||'
Sm Binary operator '<'
Smaller Binary operator '<'
Smeq Binary operator '<='
Smaller_or_equal Binary operator '<='
Gt Binary operator '>'
Greater Binary operator '>'
Gteq Binary operator '>='
Greater_or_equal Binary operator '>='
Eq Binary operator '=='
Equals Binary operator '=='
Neq Binary operator '!='
Not_equals Binary operator '!='

Likewise the operators themselves, ALib Expressions defines the names and alias operators using resourced ALib Enum Records assigned to enumeration class DefaultAlphabeticBinaryOperatorAliases .
The resource data is processed by method SetupDefaults dependent on flag DefaultAlphabeticOperatorAliases of bitfield CfgCompilation (which is set by default).

Additional flag AlphabeticOperatorsIgnoreCase controls whether the alias names are matched ignoring letter case (which is also set by default).

Class Compiler simply stores the alias information in its public hash tables AlphabeticUnaryOperatorAliases and AlphabeticBinaryOperatorAliases which can be altered prior or after the invocation of SetupDefaults, but before a first expression is compiled.

Some further notes:

  • Even with flag AlphabeticOperatorsIgnoreCase cleared, no two verbal operator aliases that only differ in letter case must be defined (e.g the definition of "or" in parallel to "OR" is forbidden).
  • There are five possible configuration settings to normalize verbal operator names. See flag ReplaceVerbalOperatorsToSymbolic for more information.
  • If the resources of built-in alias operators are changed (e.g. for translation/localization), it is allowed to set single names of the predefined enum element names to empty strings. These will be ignored with method SetupDefaults. On the same token, when changing the resources, completely different values and meanings may be used, because the enum class DefaultAlphabeticBinaryOperatorAliases is exclusively used for accessing its enum records.
  • The concept of verbal operator aliases must not be confused with the concept of operator aliases performed with compiler plug-ins (explained in next section). Verbal aliases are defined globally and a compiler plug-in will never "see" the alias names as those get translated to the aliased operator internally before the compilation is performed.
  • Identifiers/functions may not be equal to unary verbal operator aliases. By default the only unary verbal operator is 'not'.
    In contrast, identifiers/functions may have the same name as verbal binary operator aliases. (The expression parser can distinguish both by the context.)

9.4 Operator Aliases

This library does some effort to support operator aliases, which is about telling the system that an operator used with certain argument types is to be considered equal to another operator using the same argument types.

The only occasion where this is internally used, is with combinations of boolean, integer and float types and bitwise operators '~', '&' and '|': Any use of these operators a mix of these types - excluding those permutations that only consist of integers - are optionally aliased to boolean operators '!', '&&' and '||'. It would have been less effort to just define the bitwise operators for the types to perform boolean calculations! So, why does the library do the effort then?

The motivation for taking the effort comes from normalization. While the library should be configurable to accept the expression:

   date = today  &  size > 1024

it should at the same time be configurable to "normalize" this expression to:

   date == today  &&  size > 1024

Maybe, custom fields of application identify other operators where such aliasing is reasonable as well.

The following parts of the library's API are involved in operator aliasing:

  • Inner structs CIUnaryOp and CIBinaryOp of class CompilerPlugin:
    Both structs provide string field Operator by reference and thus allow a compiler plug-in to change the operator string one aliased by the given one.
  • Helper class Calculus with methods AddOperatorAlias and AddOperatorAliases , which simplify feeding entries of operator alias definitions stored in hash map OperatorAliases .
  • Normalization flag ReplaceAliasOperators which controls if alias operators are replaced in the normalized expression string.

9.5 Array Subscript Operator

The "array subscript operator" '[]' is only in so far an exceptional binary operator, as it is parsed differently than other binary operators. While usually the operator is placed between the left-hand side (Lhs) and right-hand side (Rhs) arguments, the subscript operator is expressed in the form

   Lhs[ Rhs ]

In any other respect it is completely the same! The only built-in use of the operator is with lhs-type String and rhs-type Integers. With it, the sub-string at the given position of length 1 is returned.

Its use and meaning is of-course not bound to array access. For example, with the right-hand side operands of type String, a mapped access to pairs of keys and values can be realized. To implement this, the left-hand side type would be a custom type returned by an identifier, say Properties. Now, if the subscript operator was defined for this type and strings, expressions like

   Properties["HOME_PATH"]

are possible. The operator's callback function could catch certain key values and return appropriate results from objects accessible through the custom scope object.

9.6 Unary Operator For Nested Expressions

It was not talked about nested expressions yet. This is concept is introduced only with the next chapter, 10. Nested Expressions

Here, we just quickly want to explain that this operator exists, that it has a special meaning and how it can be changed.

The definition of the operator is made with field Compiler::CfgNestedExpressionOperator . Its default value is '*', which in C/C++ is also called the "indirection operator". With this default definition, expression:

       date < today && *"myNested"

refers to nested expression "myNested".

Changing the operator needs to be done prior to invoking Compiler::SetupDefaults . Should operator definitions be changed as explained in the previous chapters, it is important to know that the nested expression operator itself has to be duly defined. In other words: Specifying an operator with field CfgNestedExpressionOperator does not define the operator.

The operator internally works on string arguments which name the nested expression that is addressed. However, to overcome the need of quoting names of nested expressions, a built in mechanism is provided that allows to omit the quotes. This feature is enabled by default and controlled with compilation flag AllowIdentifiersForNestedExpressions . For this reason, by default, the sample expression given above can be equally stated as:

        date < today && *myNested

Note that this does not introduce a conflict with defined identifiers or function names. For example, if a nested expression was named "PI", just as math constant identifier PI, then still the following works:

   5.0 * PI    // multiplies 5 with math constant PI
   5.0 * *PI   // multiplies 5 with the result of nested expression named "PI".

When changing the nested expression operator, some thinking about the consequences is advised. Other candidates for nested expression operators may be '$', '' or '@', which are more commonly used to denote variables or other "scoped" or "nested" entities. But exactly for this reason, module ALib Expressions opted to default the operator to '*'. Often, applications offer to provide expressions via a command line interface, which in turn allows using bash CLI and any scripting language. The asterisk character '*' seems to be least clashing with such external environments.
Therefore, we recommend to do some careful thinking about potential conflicts in the desired field of application and use case environments, before changing this operator.

10. Nested Expressions

Often certain "terms" of an expression are to be repeated in more than one expression. Furthermore, it sometimes is valuable to be able to split an expression into two parts, for example parts that have different levels of validity. This latter is often the case when it comes to "filtering" records from a set. A first filter might be a more general, long living expression. A second expression adds to this filter by applying more concrete demands. In the filter sample, there are two ways of achieving this:

  1. First apply a filter using the general term and then apply a second filter.
  2. Apply one filter with both terms concatenated via boolean operator &&.

Besides being faster, the second has one huge advantage: it is up to the end user if the single filter refers to a different term - or not. There is no need to hard-code two filters into a software.

These thoughts bring us to the concept of "nested expressions", which is referring to expressions from expressions!

The foundation to achieve such feature, is first to provide a way to store expressions and retrieve them back using a key value.

10.1 Named Expressions

Module ALib Expressions provides a built-in container to store compiled expressions. In the moment an expression is stored, a name has to be given and that is all that makes an expression a named expression.

So far in this manual, we had compiled expressions using method Compiler::Compile . What is returned is an anonymous expression. It is not named. To create a named expression, method Compiler::AddNamed is used. This method internally compiles the expression and stores it with the given name. The expression itself is not returned, instead information about whether an expression with that name existed (and thus was replaced).

For retrieval of named expressions, method GetNamed is offered and for removal method RemoveNamed .

Note
The optional internal storage of expressions are the reason why the library addresses expressions exclusively through type SPExpression which evaluates to std::shared_ptr<Expression>. With this, a named expression is automatically deleted if it is removed from the storage and not externally referred to.

By default, letter case is ignored when using the given name as a storage key. Hence adding "MYEXPRESSION" after adding "MyExpression" replaces the previous instance. This behavior can be changed using compilation flag CaseSensitiveNamedExpressions . Changes of this flag must be done only prior to adding a first named expression. Later changes leads to is undefined behavior.

Named expressions is not too much of a great thing if viewed as a feature by itself. But it is the important prerequisite for nested expressions explained in the following sections.

10.2 Nested Expressions Identified At Compile-Time

The simplest form of addressing nested expressions is by using unary operator '*', which allows to embed a named expression into another expression.

While the operator defaults to being '*', this can be changed as described in 9.6 Unary Operator For Nested Expressions.

The operator expects a string providing the name, but for convenience, this string does not need to be quoted, but may be given like identifiers are.

Note
This behavior is configurable with compilation flag AllowIdentifiersForNestedExpressions and enabled by default. If the name of an expression does not conform to the 11.3 Identifiers/Functions, for example if it begins with a numeric character, then the otherwise optional quotes have to be provided.

With this operator, expressions:

     *"MyNestedExpression"
     *MyNestedExpression

both simply "invoke" the expression named "MyNestedExpression" and return its result.

Of-course, this sample was just the shortest possible. Nested expressions can be used just like any other identifier:

   GetDayOfWeek( Today ) == Monday &&  *MyNestedExpression

This expressions evaluates to true on mondays and if named expression "MyNestedExpression" evaluates to true in parallel.

One might think that this is all that has to be said about nested expressions. But unfortunately this is not. An attentive reader might have noticed some important restriction with nesting expressions like this: Because ALib Expressions is a type-safe library, the compiler can compile operator && in the above sample only if it knows the result type of "MyNestedExpression". As a consequence, we have to state the following rule:

Attention
Expressions addressed with the unary operator have to be existing at a the time the referring expression is compiled.

Let us simply have a try. The following code:

Compiler compiler;
compiler.SetupDefaults();
ExpressionScope scope( compiler.CfgFormatter );
SPExpression expression= compiler.Compile(A_CHAR(R"( *MyNestedExpression )"));
cout << "Result: " << expression->Evaluate( scope );

Produces the following exception:

E1: <expressions::NamedExpressionNotFound>
Named expression "MyNestedExpression" not found.
E2: <expressions::NestedExpressionNotFoundCT>
Compile-time defined nested expression "MyNestedExpression" not found.

The exception tells us that this is a "compile-time defined nested expression". This indicates that there will be a way out, what we will see in the next chapter, but for the time being let us fix the sample by adding the named expression upfront:

Compiler compiler;
compiler.SetupDefaults();
ExpressionScope scope( compiler.CfgFormatter );
compiler.AddNamed( A_CHAR("MyNestedExpression"), A_CHAR("6 * 7") );
SPExpression expression= compiler.Compile(A_CHAR(R"( *MyNestedExpression )"));
cout << "Result: " << expression->Evaluate( scope ) << endl;

Now this works:

Result: 42

The compiler found the nested expression, identified its return type and is now even able to use it in more complex terms like this:

SPExpression expression= compiler.Compile(A_CHAR(R"( 2 * *MyNestedExpression )"));

This results in:

Result: 84

But there is also another restriction that has to be kept in mind with the use of the unary operator for nested expressions.
While this sample still works well:

SPExpression expression= compiler.Compile( A_CHAR(R"( *("MyNested" + "Expression") )") );
Result: 42

This expression does not work:

SPExpression expression= compiler.Compile(A_CHAR(R"( *("MyNested" + ( random >= 0.0 ? "Expression" : "" )) )") );

as it throws:

E1: <expressions::NamedExpressionNotConstant>
    Expression name has to be constant, if no expression return type is given.
I2: <expressions::ExpressionInfo>
    Expression: {    *("MyNested" + ( random >= 0.0 ? "Expression" : "" ))     }
                     ^->

While - due to the compile-time optimization of ALib Expressions - the constant concatenation term "MyNested" + "Expression" is still accepted, the compiler complains if we use the function random. The compiler does not have the information that random>=0 evaluates to constant true, and hence the term is not optimized and not constant.
The exception names the problem, which leads us to a second rule:

Attention
Expressions names addressed with the unary operator have to be constant at compile-time.

This is obvious, as the expression has to exist and be known. But still, it is a restriction.

There are many use cases, where still this simple operator notation for nested expressions is all that is needed. For example, imagine a set of expressions is defined in an INI-file of a software. If the software loads and compiles these "predefined" expressions at start, a user can use them, for example with expressions given as command line parameters. This way, a user can store "shortcuts" in the INI-file and use those as nested expressions at the command line.

A final note to compile-time nested expressions: After an expression that refers to a named nested expression is compiled, the named nested expression may be removed using Compiler::RemoveNamed . The program of the outer expression stores the shared pointers to all compile-time nested expressions used. While after the removal from the compiler the nested expression is not addressable for future nesting, the nested expression is only deleted in the moment the last expression that refers to it is deleted!

10.3 Nested Expressions Identified At Evaluation-Time

In the previous section we saw the first samples of nested expressions. The unary operator '*' was used to address nested expressions. These nested expressions suffer from two restrictions:

  1. The nested expression has to be defined prior to using it.
  2. The name of the nested expression must not be an expression itself, but rather a constant string term or an identifier.

Let us recall what the reason for this restriction was: The compiler needs to know the result type of the nested exception to continue its type-safe compilation.

The way to overcome this restriction is to use function Expression() instead of unary operator '*'.

Note
Likewise the unary nested expression operator is configurable with member CfgNestedExpressionOperator of class Compiler, the name, letter case sensitivity and abbreviation options of the nested expression function is configurable with member CfgNestedExpressionFunction .
It just defaults to "Expression()"

This function has three overloaded versions. The first is using just one parameter of string type and with that is 100% equivalent to the use of the unary expression operator - including its restrictions.

The second overload however takes a "replacement expression" as its second value. This is how it may be used:

Compiler compiler;
compiler.SetupDefaults();
ExpressionScope scope( compiler.CfgFormatter );
SPExpression expression= compiler.Compile(A_CHAR(R"( Expression( "MyNestedExpression", -1 ) )"));
cout << "Result: " << expression->Evaluate( scope ) << endl;

The output of this sample is:

Result: -1

As you see, although the nested expression is was not defined, this sample now compiles. The compiler uses the result type of the second parameter and assumes that the expression will return the same type. And even more, the expression even evaluates! On evaluation it is noticed that the expression does not exist, hence the result of the "replacement expression" is used. While in this case the replacement is simply value -1, any expression might be stated here. Even one that contains a next nested expression.

We extend the sample by adding the nested expression:

Compiler compiler;
compiler.SetupDefaults();
ExpressionScope scope( compiler.CfgFormatter );
SPExpression expression= compiler.Compile(A_CHAR(R"( Expression( "MyNestedExpression", -1 ) )"));
compiler.AddNamed( A_CHAR("MyNestedExpression"), A_CHAR("3 * 3") );
cout << "Result: " << expression->Evaluate( scope ) << endl;
Result: 9

As a "proof" that the nested expression is identified only at evaluation time, the following sample might work:

Compiler compiler;
compiler.SetupDefaults();
ExpressionScope scope( compiler.CfgFormatter );
SPExpression expression= compiler.Compile(A_CHAR(R"( Expression( "MyNestedExpression", -1 ) )"));
compiler.AddNamed( A_CHAR("MyNestedExpression"), A_CHAR("3 * 3") );
cout << "Result1: " << expression->Evaluate( scope ) << endl;
compiler.AddNamed( A_CHAR("MyNestedExpression"), A_CHAR("4 * 4") );
cout << "Result2: " << expression->Evaluate( scope ) << endl;
Result1: 9
Result2: 16

Above, we said that the compiler "assumes" that the named expression addressed, has the same return type. The following code shows that this was the right verb:

Compiler compiler;
compiler.SetupDefaults();
ExpressionScope scope( compiler.CfgFormatter );
SPExpression expression= compiler.Compile(A_CHAR(R"( Expression( "MyNestedExpression", -1 ) )"));
compiler.AddNamed( A_CHAR("MyNestedExpression"), A_CHAR(R"( "Hello" )"));

No exception is thrown on compilation. The compiler does not check the return type at compile-time. The simple reason is: At the time the expression becomes evaluated, the named expression might have been changed to return the right type. This is why the return type is only checked at evaluation time. Let's see what happens when we evaluate:

cout << "Result: " << expression->Evaluate( scope ) << endl;
E1: <expressions::NestedExpressionResultTypeError>
    Nested expression "MyNestedExpression" returned wrong result type.
    Type expected: Integer
    Type returned: String
I2: <expressions::ExpressionInfo>
    Expression: {   Expression( "MyNestedExpression", -1 )   }
                    ^->

This shows, that with using function Expression() we are a little leaving the secure terrain of expression evaluation: While the only exceptions that can happen at evaluation-time had been ones that occurred in callback functions (for example in expression "5 / 0"), with nested expressions that are identified only at evaluation-time, we have a first exception that is thrown by the virtual machine that executes the expression program.

At the beginning of this section, a third overload of Expression() was mentioned. We postpone its documentation to the next manual section, and end this chapter with another quick note:

Likewise with unary operator '*' for nested expressions, compilation flag AllowIdentifiersForNestedExpressions allows to omit the quotes and accept identifier syntax instead. Hence, this expression is compiling fine:

SPExpression expression= compiler.Compile(A_CHAR(R"( Expression( MyNestedExpression, -1 ) )"));

In addition, with this form of embedding nested expressions, also the restriction of expression names being constants, fell. This way, the sample of a random name is now allowed:

SPExpression expression= compiler.Compile(A_CHAR(R"( Expression( ("MyNested" + ( random >= 0.0 ? "Expression" : "" )), -1 ) )"));

10.4 Forcing The Existence Of Nested Expressions

The previous two chapters explained the differences between nested expressions that are identified at compile-time and those identified at evaluation time.

Compile-time nested expressions are usually expressed with unary operator '*', but can also be expressed when using only one parameter with nested expression function Expression().

Evaluation-time nested expressions are expressed by giving a second parameter to function Expression(), which provides the "replacement expression" for the case that a nested expression of the name given with the first parameter does not exist.

In some situations however, a user might not want to provide a "replacement expression". It might be rather in her interest, that an expression just fails to evaluate in the case that the nested expression is not found. For example, if this indicates a configuration error of a software.

Such behavior can be achieved by adding a third parameter to Expression(). This third parameter is keyword "throw".

We take the sample of the previous section where the expression was not defined, which resulted in default value -1. The only difference is the use of keyword throw:

Compiler compiler;
compiler.SetupDefaults();
ExpressionScope scope( compiler.CfgFormatter );
SPExpression expression= compiler.Compile(A_CHAR(R"( Expression( "MyNestedExpression", -1, throw ) )"));
cout << "Result: " << expression->Evaluate( scope ) << endl;

The output of this sample is:

E1: <expressions::NamedExpressionNotFound>
Named expression "MyNestedExpression" not found.
E2: <expressions::NestedExpressionNotFoundET>
Evaluation-time defined nested expression "MyNestedExpression" not found.

So, why do we need the second parameter which previously gave the "replacement expression" when this is not evaluated? Well, the only purpose of the replacement expression is to determine the nested expression's return type. Otherwise it is ignored. In fact, it is not optimized out and its result will be calculated with each next evaluation of an expression against a scope. Different to other areas, where the library puts some effort in optimization, here this was omitted. An end-user simply should be noted to put a constant "sample value" for this parameter. A user that uses this third version of the nested expression function, is supposed to be a "pro" and understand the impacts.

Note
"throw" is the only keyword of ALib Expressions !

10.5 Automated Named Expressions

As described in a previous section, a prerequisite for the nested expression feature is to have named expressions. Methods AddNamed , GetNamed and RemoveNamed of class Compiler had been already described briefly.

Nested expressions often can be seen as building blocks of other expressions. A software might want to provide a predefined and/or configurable set of expressions to be usable within end user's expressions.
To support such scenario, a mechanism is needed that allows to retrieve (and compile) named expression strings, right in the moment an unknown identifier of a nested expression occurs during the compilation of the main expression.

Module ALib Expressions offers abstract virtual class ExpressionRepository which offers a customizable implementation of such mechanism. This interface is used as follows:

  • With field Compiler::Repository , class Compiler exposes a public pointer to an object of type ExpressionRepository .
  • If the field is set, named expressions that are referred by other expressions (during compilation), and that have not been previously defined by a prior invocation of method Compiler::AddNamed will be defined on the fly by method Compiler::GetNamed . For this, GetNamed will use the expression repository to retrieve an expression string associated with the given identifier.

In other words, method GetNamed supports a "lazy" approach to compile nested expressions "on the fly" as needed. The expression strings are received using the abstract virtual method ExpressionRepository::Get .

A built-in implementation of this interface class is provided with class StandardRepository . This implementation retrieves expression strings from

  1. Static resource strings as provided with module ALib BaseCamp and documented here.
  2. External configuration data, like command line parameters, environment variables, INI-files or any other custom resource that is attached to the configuration object.
Note
The second option is available only if module ALib Configuration is included in the ALib Distribution .

For the details of using the built-in implementation, consult the reference documentation of class ExpressionRepository.

The creation of an own implementation that receives predefined expression strings in a custom way, should be a straight-forward task.

10.6 Summary and Final Notes On Nested Expressions

Nested expressions is a powerful feature of ALib Expressions , but also needs some thoughtful and knowledgeable user because of the different approaches of compile-time and evaluation-time defined nested expressions.
If a software offers an end-user to "express herself", a certain level of understanding is anyhow required. Often software hides expression syntax behind a graphical user interface with checkboxes and input fields per "attribute", e.g. to define an email filter and creates an expression string unseen by the user in the background. Then, only in a certain "expert mode" an end-user is allowed to freely expressions, which then may be more complex and probably even allow to "address" nested expressions that such end-user had defined in external configuration resources.

So, it is a bit of a task to define the user interface and user experience when it comes to allowing expressions. This library tries to cover the broad spectrum of use cases and this can probably be noticed in the area of nested expressions very well.

To end this chapter about nested expressions, some final hints and notes should be collected here:

  • To disallow nested expressions, simply fields CfgNestedExpressionOperator and CfgNestedExpressionFunction of class Compiler are to be cleared.
  • It is undefined behavior if a nested expression that is successfully identified at compile-time is deleted and the referring expression still evaluated afterwards.
  • To disallow compile-time nested expressions only, compilation flag AllowCompileTimeNestedExpressions is to be cleared.
  • To disallow evaluation-time nested expressions only, field CfgNestedExpressionFunction is to be cleared. In this case nested expressions are available only using the unary operator.
  • Evaluation-time nested expressions may be changed (replaced) prior to evaluating an expression that uses them.
  • As long as only compile-time nested expressions are used, no circular nesting can occur. As soon as evaluation-time nested expressions are used, circular nesting might happen. The library detects such circular nesting an throws Exceptions::CircularNestedExpressions during evaluation. The exception includes informational entries of type Exceptions::CircularNestedExpressionsInfo that list the "call stack" of named expressions that caused the circle.

11. Detail Topics

In the previous chapters of this manual, most features of module ALib Expressions have been touched, either as tutorial sample code or in a more theoretic fashion. This chapter now provides a list of in-depth discussions on different dedicated topics.

11.1 Types

It was a lot said about the intermediate or final result types of expressions in various sections of this manual. The use of underlying run-time type information library ALib Boxing with its very "seamless" nature, helped to implement ALib Expressions tremendously.

But being so seamless, it is not so easy to understand all aspects of its use and meaning in this library. Therefore, this quick chapter tries to review various aspects of the library from the angle of types. For simplification of writing and reading this chapter, this is done with a list of bullet points.

  • A prerequisite to fully understand how type information is handled by this library, is to read and understand the documentation of ALib Boxing .
  • While custom types are to be registered with the compiler using AddType , such registration is purely used for the generation of exception messages or other sorts of end-user information. It is not necessary otherwise.
  • This library follows a "type-safe paradigm". This means that during compilation of expressions each of its terms is determined in respect to what exact type it will result to during evaluation. The disadvantages of this approach are not easy to be named: A non-type-safe library would just look a lot different and it could name advantages along that ultimately different design. Hence we rather talk about the consequences of this library's type-safe approach:
    • Exceptions caused by malformed expression strings, are as far as possible happening at compile-time.
    • The choice of overloaded operators and functions happens at compile-time, which allows a very performant evaluation. In fact, at evaluation time, the run-time type information included in boxed intermediate result values can mostly be completely ignored in that respect that a callback function does not need to perform checks on its input values in respect to their type: Their implementation just unbox values without doing type checking.
      In this matter, it might be hinted to the fact that library ALib Boxing is designed to not throw run-time exceptions. It rather raises assertions in debug-compilation, e.g. if inconsistent types are tried to be unboxed. In release builds, a software simply has undefined behavior (crashes).
    • Overloaded versions of one operator (or function) can be implemented in very separated software units, not "knowing" each other and not interfering with each other.

  • While it may seem to a user of this library, that for each possible permutation of input parameter types, a distinct callback function has to be provided, this is not the case. The concept of quite strict assignment of such permutations to corresponding callbacks is not hard-coded in the depth of this library, but rather "voluntarily suggested" with using high-level helper struct plugins::Calculus . By using the underlying type CompilerPlugin directly, it is possible to provide just one callback function that is enabled to process various combinations of input parameters. The disadvantage of doing so however is that this moves the effort of identifying the types to evaluation-time, which implies a drop of evaluation performance. This is why this library "suggests" to use struct Calculus as the foundation for custom plug-ins.
  • Likewise to the previous note, the concept of "variadic function parameters" is not something that arises from the depth of the library, but again comes only with the use of helper struct plugins::Calculus . It may need some time and thinking about the relationship of structs CompilerPlugin and Calculus to fully grasp the differences and benefits of each. And by the use of virtual functions, in some situations it makes very much sense to mix both concepts, by inheriting a custom plug-in from struct Calculus but still overwriting parts of the underlying interface of CompilerPlugin.
    Such mixed approach is also used with some of the built-in compiler plug-ins.
  • The concept of auto-casting of types is located somewhere in the middle! Auto-casts can be fully prevented by providing either a dedicated callback function for each permutation of types or by doing "manual" casts just within a callback function that accepts multiple permutations. Still, this library takes the effort of supporting auto-cast in the details of the implementation of the compilation process which assembles the evaluation program. How it is done can be seen a little like "a last call for help" before throwing a compilation exception.

    The compiler does these calls in two occasions: When a binary operator could not be compiled and when terms T and F of ternary conditional operator Q ? T : F are not of the same type. In this moment, the compiler just calls for help by asking each plug-in for an auto-cast of one or both of the types. It does this only once! After a first plug-in provided some conversion, it is retried to compile the actual operator. If this still fails, the exception is thrown, although it might have been possible that a next plug-in provided a different cast that would lead to success.
    This of-course is a design decision of the library. Complexity was traded against effectiveness. At the end of the day, the whole concept of auto-cast could be described as being not really necessary to do any sort of custom type processing. Therefore, auto-cast is being offered as an optional way of reducing the number of necessarily provided callback functions, that has two disadvantages: First, the auto-cast has to be implemented as a compiler plug-in functionality and second auto-casts increase the length of the evaluation program and hence constitute a penalty on an expression's evaluation performance.

  • The variety of built-in types has been reduced to a bare minimum needed. While module ALib Boxing (by default!) already drops the distinction of C++ integral type of different size (short, long, int, etc.), module ALib Expressions in addition drops the distinction between signed and unsigned integral types. All integral types are signed. (Given that the "complete" JAVA programming language dropped unsigned integers, we thought it might not be too problematic).

    The good news for users of this library is that it is no problem to implement support for unsigned types, because "dropping" here just means, that just none of the built-in operators and functions "produces" a result value of unsigned integral type. In other words, unsigned integral types are considered just another custom type.

    If - unexpectedly - unsigned integer types and corresponding operations need to be supported, custom operators and function definitions have to be added.

  • For technical reasons, ALib Boxing is not "aware" of type inheritance relations. It is not possible to detect if a type of one box inherits a different type in another box. Usually it is sufficient to expose only the base type to ALib Expressions . It is important to note, that in this case callback functions have to perform a dynamic_cast prior to returning derived types. If they do not do that, the derived type becomes introduced to ALib Expressions !
    Finally, if the common base type is abstract, and therefore no sample value can be created, the trick to create a sample box is to use reinterpret_cast<>() like this:
     Box myTypeSample   = reinterpret_cast<MyAbstractType*>(0);
    
  • To make custom types compatible in full with all features of the library, it might be needed to do some side-implementations along the lines of underlying ALib features. For example, to allow nicely formatted string output of custom data using built-in expression function Format(formatString,...) , box-function FAppend and/or FFormat have to be implemented for the custom type.

11.2 Literals

While it was in some places of this manual indicated that the built-in types listed with Types are all "inherently introduced" by the built-in compiler plug-ins just as any custom type could be, this is not the full truth. In fact, types Integer , Float and String are in so far "hard-coded" as values of these types are also created (and thus introduced) by expression "literals".

With the current version of the library it is not possible to change the internal "boxed" types which result from "parsing" a literal. The term "parsing" denotes the first phase of the compilation of an expression string. Changes on how literals are parsed and in what types such parsing results can only be made by touching the library code, which is not further documented here.

11.2.1 Numerical Literals

The parsing of numerical constants found in expression strings is done with the help of member Formatter::DefaultNumberFormat which in turn is found in member Compiler::CfgFormatter .

The use of this helper type, allows to influence how numerical literals are parsed. For example, integral types can be parsed in decimal, hexadecimal, octal and binary formats. For this a set of prefix symbols, which default to "0x", "0o" and "0b", can be customized. The support for one or some of the formats can also be dropped, if this is wanted for whatever reason.

Likewise, the format of floating point numbers and its scientific variants can be defined. In respect to the topic of localization (see also 11.6 Localization), this is especially of interest if a different floating-point separation character than '.' is to be supported. It is supported and tested in the unit tests of this library to allow the use of character ',' as it is standard with many countries. In this case, an end-user has to be only aware of the fact that the two expressions:

       MyFunc(1,2)
       MyFunc(1 , 2)

have a different meaning: The first is a call to an unary function providing floating point argument 1.2, the second is a call to a binary function providing integral values 1 and 2.

Even worse:

       MyFunc(1,2,3)

is parsed as two arguments, the first being float value 1.2 and the second integral value 3. This means, the end-user has to insert spaces to separate function parameters.

As this is a source of ambiguity, applications that address end-users with a high degree of professionalism, should rather not localize number formats but instead document with their software that english standards are to be used.

In general, all flags and options in respect to parsing and formatting (normalizing) number literals that are available through class NumberFormat are compatible with ALib Expressions . This includes even to set character ' ' (space) as a grouping character for any number format! This might be used to allow quite nicely readable numbers in expression strings.

Finally, normalization flags KeepScientificFormat , ForceHexadecimal , ForceOctal and ForceBinary may be used to further tweak how numbers are converted in normalized strings.

11.2.2 String Literals

String literals are to be enclosed in quote characters '"'. If a string literal should contain, the quote character itself, this needs to be "escaped" using backslash character '\'. Some further escape characters are supported, by the internal use of ALib string feature documented with Format::Escape .

For the output of string literals in the normalized version of expression string, the reverse functions of Format::Escape are used.

11.2.3 Box-Function FToLiteral

Ultimately, box-function FToLiteral might be implemented for one of the types (Integer , Float or String ) to do any imaginable custom conversions, other than possible with the standards provided by the mechanics of the ALib types used. But this should be seldom needed. The main purpose of this boxing-function is described with 11.5 Optimizations.

11.3 Identifiers/Functions

Identifier (parameterless functions) and function names are recognized (parsed) in expression strings at appropriate places only if the following rules apply:

  • The name starts with an letter, 'a' to 'z' or 'A' to 'Z'.
  • The rest of the name consists of letters, underscore character '_' and numbers '0' to '9'.

In the current version of module ALib Expressions , this is hard-coded and not configurable.

11.4 Localization

In the previous section, information about localizing number formats in respect to parsing expression strings and their output as a normalized expression, was already given.

A second area, where localization may become an obvious requirement is the naming of built-in and custom expression functions. The built-in compiler plug-in use mechanics provided by ALib classes ResourcePool and Camp to externalize the names, letter case sensitivity and optional minimum abbreviation length of identifiers and functions. The matching of identifier and function names found in expression strings is performed using class Token , which allows not only simple abbreviations, but also "CamelCase" and "snake_case" specific abbreviations.

These mechanics allow to replace such resources using an arbitrary custom "string/data resource backend": The one that your application uses! With this, it is possible for example to translate certain identifiers (e.g. Minutes or True) to different locales and languages.

While there is no detailed documentation or step-by-step sample on how to perform such localization in detail is given, investigating the documentation and optionally the simple source code of the entities named above, should enable a user of this library to quite quickly succeed in integrating any custom localization mechanics used otherwise with her software. For creating a custom plug-in, the way to go is of-course to copy the setup code from the built-in plug-ins of this library.

A third area where localization might become a need are callback functions processing expression data. Again, for formatting and parsing, an instance of ALib class Formatter , which has (as was explained above) an instance of NumberFormat attached.
A compile-time scope (used with optimizations) is created with virtual method getCompileTimeScope which in its default implementation attaches the same formatter to the compile-time scope that is used with parsing, namely the one found in CfgFormatter .
The scope object used for evaluation should be constructed passing again the very same formatter. This way, formatting and number formats remain the same throughout the whole chain of processing an exception and can collectively tweaked through this one instance CfgFormatter.

Finally a fourth area where localization might be applied is when it comes to exceptions during compilation or evaluation of expressions. All exceptions used in this library provide human readable information, which is built from resourced strings and hence can be localized. See chapter 11.6 Exceptions for details.

11.5 Optimizations

11.5.1 Goals

One very important design goal of this library was to favour evaluation-time performance of expressions over compile-time performance. This way, library is optimized for use cases where a single expression that is compiled only once, is evaluated against many different scopes. The higher the ratio of number of evaluations per expression term is, the more increases the overall process performance when this design principle is applied. This design goal caused a great deal of effort and its implications were far reaching.

The concept of "expression optimization", that was touched in this manual various times already, is all about optimizing the evaluation-time performance. The library volunteers to have some efforts at compile-time to shorten the compiled expression program that is run on the built-in virtual machine as much as possible.

A simple sample for optimization might be an expression that calculates the circumference for a given radius. In case the radius is received from the scope with a custom expression identifier Radius, then the expression would be:

       2 * PI * Radius

If no optimization was applied, each time this expression was evaluated, four callback functions had to be invoked: Two for receiving the values PI and Radius and two for the multiplications. Now, we know that PI is constant and so is the term "2 * PI". The goal of optimization consequently is to reduce the expression program to just do two callback invocations: one for retrieving the radius from the scope and a second for multiplying the radius with the constant 2 * PI.

To express this goal the other way round: An end-user should be allowed to provide operations that introduce some redundancy, but are easier readable and understandable for human beings, without impacting evaluation performance.

11.5.2 Optimization Mechanics

The foundation of compile-time optimization of this library is implemented with the assembly of the expression program: During the assembly, the compiler keeps track of previous results being constant or not. Each time a compiler plug-in is asked to provide compilation information, this information about whether the arguments are all or partly constant is provided. Then it is up to the plug-ins to decide whether the expression term is a constant in turn or implies a callback function call.

In the simple case of identifiers (parameterless functions), no arguments exist and hence all arguments are constant. Nevertheless, custom identifiers are usually not constant, because they return data received from the (custom) scope object. Therefore, the compiler does not "know" if identifier "PI" is a constant, only the plug-in that compiles the identifier knows that. While in the case of PI it is, in the custom case of FileDate it is not: It depends on the currently examined scope data.

This way, the compiler and its plug-ins have to work hand in hand: The compiler provides information about arguments being constant and the plug-ins can return either a callback function or leave the callback function nullptr and return a constant value instead. The compiler then either assembles a callback function or the use of the constant value, which is an internal program command of the virtual machine's "assembly language".

With binary operators, a further option is available: In the case that one operator is constant, while the other is not, some operators might inform the compiler to either optimize out the complete term or at least to optimize out the constant argument. Again, this information has to be encoded in the result data provided by the compiler plug-ins. The compiler will then modify the existing program and remove the program code for one or both arguments. (Further samples of binary operator optimizations are given in documentation of struct CompilerPlugin::CIBinaryOp .)

11.5.3 Optimizations Of The Built-In Compiler Plug-ins

As explained earlier, the built-in compiler plug-ins mostly rely on helper struct plugins::Calculus instead of deriving directly from CompilerPlugin . Calculus provides very convenient ways to assure that every operation that can be optimized at compile-time truly is optimized.

For example, callback functions can be denoted "compile-time invokable". If so, helper struct Calculus automatically invokes them at compile-time if all arguments provided are constant (or no arguments are given) and returns the calculated result to the compiler instead of the callback function itself.

Furthermore, struct Calculus provides a special sort of optimization applied to binary operators that may be applied when only one of the two arguments is constant. For example, an arithmetic multiplication with constant 0 results in 0 and with constant 1 it results to the other argument's value. These kind of rules can be encoded using quite simple static data tables.

Overall, the use of struct Calculus makes the implementation of optimization features of custom plug-ins as easy as possible. Consult the struct's documentation for further details.

11.5.4 Compile- And Evaluation-Time Optimization Of The Conditional Operator

In the current version of ALib Expressions there is only one evaluation-time optimization performed. This considers built-in ternary operator Q ? T : F (conditional operator).

Likewise a C/C++ compiler, depending on the evaluation-time value of Q only the program code to calculate T or F is executed by the virtual machine.

Note
This is important to be known by end-users: The "side effect" of this optimization is that it does not produce "side effects" that the left-out term might be expected to produce in the case that this optimization was not performed and thus custom callbacks had been invoked for both terms T and F in parallel.

The optimization of the conditional operator is as well performed at compile-time: In the case that Q is a compile-time constant, the code for either T or F is not even included in the program.

11.5.5 Limits Of Optimization

The current version of this library has an important limit in respect to optimizations. While - as we saw - expression:

        2 * PI * Radius

is optimized to perform only two callbacks instead of four, the mathematically equivalent expression:

        2 * Radius * PI

is only optimized by one callback, hence still includes three.

The reason is that there is no built-in mechanics to tell the compiler that for the two multiplications, the associative and commutative laws apply, which would allow to transform the latter expression back to the first one.

Instead, the compiler "sees" two multiplications that both are not performed on constant operands and hence can not be optimized. Only the callback of constant identifier PI is removed.

Consequently, if performance is key, it might be documented to an end-user that she is good to write:

   HoursPassed  * 60 * 60 * 1000

because this expression is optimized to:

   HoursPassed * 3600000

but that she should "sort" constants to constants, because expression

   60 * HoursPassed * 60 * 1000

is less effectively optimized to

   60 * HoursPassed * 60000

In a more abstract way, it could be stated that a C++ compiler does such optimization, because it "knows" about the rules of the multiply operator of integral values. The compiler of this library does not know about that and hence can not perform these kind of operations. If in the case of C++, the operands were custom types with overloaded operator '*', the C++ code would also not be optimized, because in this case, the compiler does not know the mathematical "traits" of the operator. The C++ language has no syntax to express operator traits.

From the "point of view" of the expression compiler provided with this library, the built-in operators are just "built-in custom operators". This leads to the inability of optimizing such rather simple mathematics.

Finally, evaluation-time optimization of operator && (as known from the C++ language) is not implemented with this library. For example, with expression:

   IsDirectory && ( name == "..")

the right-hand side operand of && is evaluated even if IsDirectory returned false already.

11.5.6 Normalized, Optimized Expression Strings

In the case that professional, experienced end-users are the addressees of a software, it might be wanted to tell such end-users about the result of optimizations. To stay with the sample of the previous sections, this means to be able to show an end-user that the expression:

   2 * PI * Radius

was optimized to

   6.283185307179586 * Radius.

To be able to do this, a normalized expression string of the optimized expression has to be generated. This way, the interface of class expressions allows access to three strings with methods

The generation of the normalized string during compilation can not be disabled and hence is available in constant (zero) time after the compilation of an expression. However, the first invocation of method GetOptimizedString is all but a constant task! With this, the library "decompiles" the optimized expression program with the result being an abstract syntax tree (AST). This AST is then compiled into an new program and with this compilation a "normalized" expression string is generated.

Consequently, this normalized string is the optimized version of the original expression string! Once done, the AST and the compiled (second) program are disposed, while the optimized string is stored.

It is questionable if the result is worth the effort! The decision if a software using library ALib Expressions presents "optimized normalized expression strings" to the end-user is highly use-case dependent. In case of doubt our recommendation is to not do it. The feature may be helpful during development of custom compiler plug-ins.

In any case, to receive correct, compilable, optimized expression strings, a last hurdle might has to be taken. In the sample above, the optimized term 2 * PI results in floating point value 6.283185307179586. This value can easily be written out and - if wanted - later be even parsed back to a correct expression. But this is only the case, because the type Float is expressible as a literal. Imagine the following sample:

   Seconds(1) * 60

Built-in identifier Seconds returns an object of type Duration . The multiplication operator is overloaded an in turn results in a value of type Duration. And yes, it is a constant value. The challenge now is to produce an expression string that creates a constant time span value representing 60 seconds. The result needs to be

   Seconds(60)

or even better:

   Minutes(1)

To achieve this, this library introduces ALib a box-function which is declared with FToLiteral . This function has to be implemented for all custom boxed types that might occur as results of constant expression terms. Only if this is assured, the "optimized normalized expression string" is correct, meaningful and re-compilable.

For details and a source code sample consult the documentation of the box-function descriptor class FToLiteral .

Besides this box-function to create constant expression terms for custom types, a next prerequisite might have to be met to receive compilable expression strings. This is in the area of auto-cast functionality. If custom auto-casts are in place, such auto-casts, if decompiled, have to be replaced by a function call which takes the original value and returns the casted value. The names of the function has to be provided with members CIAutoCast::ReverseCastFunctionNameRhs and CIAutoCast::ReverseCastFunctionNameRhs of the auto-cast information struct in the moment an auto-cast is compiled. If optimized, normalized expression strings are not used, these fields are not necessary to be set and much more, the corresponding expression functions that create the constant values may not be needed (they might still be needed for the expression syntax a programmer wants to offer).

Note
All built-in identifiers, functions, operators and auto-casts are fully compatible with optimized expression strings. Most unit tests provided with this ALib Module perform the "full circle" of compilation and evaluation, which is:
  • compile the given expression string and check the evaluation result.
  • compile the normalized version of the expression string and check the evaluation result.
  • create the optimized version of the expression string, recompile it and check the evaluation result.
  • switch off optimization (see next section), compile the original expression string and check the evaluation result.

11.5.7 Disabling Optimization

While there is no reason to switch off optimization, the library offers compilation flag Compilation::NoOptimization for completeness.

11.6 Shared Resources And Concurrent Use

This library does not make use of semaphores (aka "thread locks") to protect resources against violating concurrent access. Consequently, it is up to the user of the library to assure some rules and potentially implement semaphores if needed. This goes well along with the design principle proposed in chapter 3.4 Bauhaus Code Style.

Attention
While class Compiler indirectly inherits class ThreadLock (by inheriting lang::PluginContainer ), the destructor disables its semaphore (by invoking SetSafeness(Safeness::Unsafe) ) and furthermore its interface methods do not acquire the lock. The latter fact implies that reverting the locking mode to Safeness::Safe does not enable thread-safeness with this class!

Therefore, this chapter lists the rules of allowed and denied parallel actions:

  • Any use of the same expression compiler instance in multi-threaded software, has to be locked by the using software. This includes changes of compiler properties, addition and removal of plug-ins, addition and removal of named expressions and the compilation of expressions.
  • Instances of compiler plug-ins may be shared between different compiler instances. The parallel compilation using compilers that use shared instances of plug-ins is allowed. This is true for all built-in plug-ins. The concurrent use of custom plug-ins depend on their implementation.
  • The parallel evaluation of one single expression object, is not allowed.
  • The parallel of a set of different expressions originating from either the same or different compiler instances, is allowed, if:
    • different evaluation scope objects are used.
    • all custom callback functions allow parallel invocation (the built-in callbacks do allow this).

11.7 Exceptions

11.7.1 Catching Exceptions

In the moment end-users are allowed to provide expression strings, some error handling to deal with malformed expression strings is unavoidable. Module ALib Expressions chooses to implement error handling using C++ exceptions.

One of the design goals of this library is to allow to recognize erroneous expressions at compile-time if possible. The advantage of this is that compilation is often performed at a point in time where the consequences of exceptions are usually less "harmful". Of-course, a software can not continue its tasks if exceptions occur, but the implicated effort of performing a "rollback" should be much lower.

For this, the following general approach should be taken:

  1. Compile expressions
  2. On exceptions inform the end user (e.g. stop program)
  3. Allocate resources (scope) needed.
  4. Evaluate expressions
  5. On exceptions rollback resource allocations and inform the end user (e.g. stop program)

As evaluation-time exceptions anyhow can occur, in simple cases step 2 might be left and step 1-4 be wrapped in one try statement.

The exception object thrown by any ALib Module is of type alib::lang::Exception.

Note
This class combines the advantages of two paradigms frequently discussed as alternative approaches to exception handling. For more information and to fully leverage its use with this ALib Module , please consult the class's documentation .

11.7.2 Exceptions In Compiler Plug-Ins

A compiler plug-in may throw an exception during compilation. Helper struct Calculus already throws exception MissingFunctionParentheses and IdentifierWithFunctionParentheses . Furthermore, a callback function may throw an exception during the compile-time evaluation of a constant expression term.

Exceptions of type std::exception as well as those of type alib::Exception that are not exposed by this ALib Module itself (hence using values of enum types different than expressions::Exceptions), by default are "wrapped" by the compiler into an exception of enum type Exceptions::ExceptionInPlugin . Such wrapping can be disabled by setting flag Compilation::PluginExceptionFallThrough .

In addition, plug-in exceptions of type alib::Exception are extended by an informational entry of type ExpressionInfo .

Exception objects of other types are never caught and wrapped and therefore have to be caught in a custom way.

11.7.3 Evaluation-Time Exceptions In Callback Functions

In the case that a callback function throws an exception during the evaluation of an expression, such exceptions by default are "wrapped" into ExceptionInCallback . Wrapping is performed with exceptions of type std::exception and ALib Exception. Other exception types are never caught and wrapped and therefore have to be caught in a custom way.

The wrapping of evaluation-time exceptions can be disabled by setting flag Compilation::CallbackExceptionFallThrough . Note, that even while this flag is tested at evaluation-time, it is still stored in member Compiler::CfgCompilation .

A. Appendix

A.1 Built-In Identifier, Function And Operator Reference

The built-in expression functionality is provided via the built-in compiler plug-ins which by default are enabled and used.

Reference tables about identifiers, functions and operators is provided with the each plug-in's class documentation. Those are:

Note
Sibling ALib module ALib Files , provides a compiler plug-in dedicated to file and directory trees.

A.2 Expression Based String Formatting

The goal of using a library like this is to allow end-users to write expressions. One common field of applications are filter expression, as sampled in this manual.

Another common requirement is to allow users to define output formats. To - once more - stay with the file-system sample of this manual, a software may want to allow a user to specify how a line of output for a directory entry should look like.

With built-in plug-in Strings , expressions that return a string can be created quite easily. For example:

   String( Name )  + "  " + Size/1024 + "kB"

could be such an output expression.

However, there is a more comfortable and powerful way to do this! The key to that is the use of format strings as processed by ALib Formatters in a combination with expression strings that comprise the placeholder values found in the format strings.

Utility class ExpressionFormatter implements such combination.

For details and usage information, please consult the class's documentation .

A.3 The Built-In Virtual Machine

Talking about a virtual machine, most people today would consider the JAVA Virtual Machine as good sample. While this is true and comparable, the machine that is included in this library is a million times simpler. In fact, the current implementation that executes an expression program consists of less than 300 lines of code: A very simple "stack machine" that has only just five commands!

For people who are interested in how the machine works, besides investigating into its source code a look at some sample programs for it, leads to a quick understanding.

With debug builds of this library, static method VirtualMachine::DbgList may be invoked to generate a listing of an expression's program. Because the originating expression string itself is given with these listings, in this chapter, we just sample the listing output, without sampling the expressions explicitly.

Let's have an easy start with a simple expression of a constant value:

--------------------------------------------------------------------------------------
ALib Expression Compiler
(c) 2024 AWorx GmbH. Published under MIT License (Open Source).
More Info: https://alib.dev
--------------------------------------------------------------------------------------
Expression name: ANONYMOUS
     Normalized: {42}

PC | ResultType | Command  | Param | Stack | Description      | ArgNo{Start..End} | 42
--------------------------------------------------------------------------------------
00 | Integer    | Constant | '42'  |     1 | Literal constant |                   |_^_

This shows the first command "Constant", which pushes a constant value that is a parameter of the command to the stack.

Lets do some multiplication:

-----------------------------------------------------------------------------------------------
ALib Expression Compiler
(c) 2024 AWorx GmbH. Published under MIT License (Open Source).
More Info: https://alib.dev
-----------------------------------------------------------------------------------------------
Expression name: ANONYMOUS
     Normalized: {42 * 2}

PC | ResultType | Command  | Param | Stack | Description           | ArgNo{Start..End} | 42 * 2
-----------------------------------------------------------------------------------------------
00 | Integer    | Constant | '84'  |     1 | Optimization constant |                   |   _^_

Ooops, it is still one command, which includes the result. The reason for this is the optimizing compiler that detected two constants, passed this information to the compiler plug-in and this in turn did the calculation at compile-time. Consequently, we have still a constant expression program.
We now have two options: Use non-constant functions like built-in math function Random, or just switch off optimization . The latter is what we do:

--------------------------------------------------------------------------------------------------------------------------
ALib Expression Compiler
(c) 2024 AWorx GmbH. Published under MIT License (Open Source).
More Info: https://alib.dev
--------------------------------------------------------------------------------------------------------------------------
Expression name: ANONYMOUS
     Normalized: {42 * 2}

PC | ResultType | Command  | Param      | Stack | Description                                | ArgNo{Start..End} | 42 * 2
--------------------------------------------------------------------------------------------------------------------------
00 | Integer    | Constant | '42'       |     1 | Literal constant                           |                   |_^_
01 | Integer    | Constant | '2'        |     2 | Literal constant                           |                   |     _^_
02 | Integer    | Function | mul_II(#2) |     1 | Binary operator '*', CP="ALib Arithmetics" | 0{0..0}, 1{1..1}  |   _^_

We now see two pushes of constant values and then virtual machine command "Function", which invokes a C++ callback function as provided by the compiler plug-ins. In this case it is a callback named "mul_II", which implements operator '*' for two integer arguments. Those arguments will be taken from the current execution stack. The result of the callback will be pushed to the stack.

Note
A correct expression program leaves one value at the stack when finished. This value is the result value of the expression.

Command "Function" is used for expression terms of type unary operator, binary operator, identifier and function.

In column "Description" the listing tells us that the callback "mul_II" in the third and final program command was compiled by plug-in "ALib Arithmetics" with operator '*'. Such information is debug-information and not available in release compilations of the library.

We now know two out of five virtual machine commands and already quite complex expressions can be compiled:

----------------------------------------------------------------------------------------------------------------------------------------------
ALib Expression Compiler
(c) 2024 AWorx GmbH. Published under MIT License (Open Source).
More Info: https://alib.dev
----------------------------------------------------------------------------------------------------------------------------------------------
Expression name: ANONYMOUS
     Normalized: {(42 * 2 / 5) * (2 + 3) * 7}

PC | ResultType | Command  | Param      | Stack | Description                                | ArgNo{Start..End} | (42 * 2 / 5) * (2 + 3) * 7
----------------------------------------------------------------------------------------------------------------------------------------------
00 | Integer    | Constant | '42'       |     1 | Literal constant                           |                   | _^_
01 | Integer    | Constant | '2'        |     2 | Literal constant                           |                   |      _^_
02 | Integer    | Function | mul_II(#2) |     1 | Binary operator '*', CP="ALib Arithmetics" | 0{0..0}, 1{1..1}  |    _^_
03 | Integer    | Constant | '5'        |     2 | Literal constant                           |                   |          _^_
04 | Integer    | Function | div_II(#2) |     1 | Binary operator '/', CP="ALib Arithmetics" | 0{0..2}, 1{3..3}  |        _^_
05 | Integer    | Constant | '2'        |     2 | Literal constant                           |                   |                _^_
06 | Integer    | Constant | '3'        |     3 | Literal constant                           |                   |                    _^_
07 | Integer    | Function | add_II(#2) |     2 | Binary operator '+', CP="ALib Arithmetics" | 0{5..5}, 1{6..6}  |                  _^_
08 | Integer    | Function | mul_II(#2) |     1 | Binary operator '*', CP="ALib Arithmetics" | 0{0..4}, 1{5..7}  |             _^_
09 | Integer    | Constant | '7'        |     2 | Literal constant                           |                   |                         _^_
10 | Integer    | Function | mul_II(#2) |     1 | Binary operator '*', CP="ALib Arithmetics" | 0{0..8}, 1{9..9}  |                       _^_

Note, that listing column "ArgNo" denotes for each argument the program code lines which are responsible for calculating it on the stack. In other words: each segment of code {x..y} noted in this column produces exactly one result value on the stack, just as the whole expression produces one.

The following sample uses a function that consumes three arguments:

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
ALib Expression Compiler
(c) 2024 AWorx GmbH. Published under MIT License (Open Source).
More Info: https://alib.dev
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Expression name: ANONYMOUS
     Normalized: {Format( "Result of: {}", "2 * 3", 2 * 3 )}

PC | ResultType | Command  | Param           | Stack | Description                                | ArgNo{Start..End}         | Format( "Result of: {}", "2 * 3", 2 * 3 )
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
00 | String     | Constant | "Result of: {}" |     1 | Literal constant                           |                           |        _^_
01 | String     | Constant | "2 * 3"         |     2 | Literal constant                           |                           |                         _^_
02 | Integer    | Constant | '2'             |     3 | Literal constant                           |                           |                                  _^_
03 | Integer    | Constant | '3'             |     4 | Literal constant                           |                           |                                      _^_
04 | Integer    | Function | mul_II(#2)      |     3 | Binary operator '*', CP="ALib Arithmetics" | 0{2..2}, 1{3..3}          |                                    _^_
05 | String     | Function | CBFormat(#3)    |     1 | Function "Format(#3)", CP="ALib Strings"   | 0{0..0}, 1{1..1}, 2{2..4} |_^_

Now, as two VM-commands are understood, lets continue with two further ones. For implementing the ternary conditional operator Q ? T : B, two types of jump commands are needed, a conditional jump and an unconditioned one:

---------------------------------------------------------------------------------------------------------------------------------------
ALib Expression Compiler
(c) 2024 AWorx GmbH. Published under MIT License (Open Source).
More Info: https://alib.dev
---------------------------------------------------------------------------------------------------------------------------------------
Expression name: ANONYMOUS
     Normalized: {true ? 1 : 2}

PC | ResultType | Command     | Param        | Stack | Description                                  | ArgNo{Start..End} | true ? 1 : 2
---------------------------------------------------------------------------------------------------------------------------------------
00 | Boolean    | Constant    | 'true'       |     1 | Optimization constant, CP="ALib Arithmetics" |                   |_^_
01 | NONE       | JumpIfFalse | 4 (absolute) |     1 | '?'                                          | 0{0..0}           |     _^_
02 | Integer    | Constant    | '1'          |     2 | Literal constant                             |                   |       _^_
03 | NONE       | Jump        | 5 (absolute) |     2 | ':'                                          | 0{2..2}           |         _^_
04 | Integer    | Constant    | '2'          |     3 | Literal constant                             |                   |           _^_

Note that while the program listing for convenience presents the destination address using the absolute program counter (first column "PC") number, internally relative addressing is used. The insertion of the two jump commands explains what is said in 11.5.4 Compile- And Evaluation-Time Optimization Of The Conditional Operator.
Just for fun, we enable compile-time optimization and check the output:

------------------------------------------------------------------------------------------------
ALib Expression Compiler
(c) 2024 AWorx GmbH. Published under MIT License (Open Source).
More Info: https://alib.dev
------------------------------------------------------------------------------------------------
Expression name: ANONYMOUS
     Normalized: {true ? 1 : 2}

PC | ResultType | Command  | Param | Stack | Description      | ArgNo{Start..End} | true ? 1 : 2
------------------------------------------------------------------------------------------------
00 | Integer    | Constant | '1'   |     1 | Literal constant |                   |       _^_

The fifth and final command "Subroutine" is needed to allow Nested Expressions. We add an expression named "nested" and refer to it:

----------------------------------------------------------------------------------------------------------------------
ALib Expression Compiler
(c) 2024 AWorx GmbH. Published under MIT License (Open Source).
More Info: https://alib.dev
----------------------------------------------------------------------------------------------------------------------
Expression name: ANONYMOUS
     Normalized: {*nested}

PC | ResultType | Command    | Param     | Stack | Description                           | ArgNo{Start..End} | *nested
----------------------------------------------------------------------------------------------------------------------
00 | Integer    | Subroutine | *"nested" |     1 | Nested expr. searched at compile-time |                   |_^_

Using the alternative version that locates nested expressions at evaluation-time only, the program looks like this:

---------------------------------------------------------------------------------------------------------------------------------------------------------------
ALib Expression Compiler
(c) 2024 AWorx GmbH. Published under MIT License (Open Source).
More Info: https://alib.dev
---------------------------------------------------------------------------------------------------------------------------------------------------------------
Expression name: ANONYMOUS
     Normalized: {Expression( nested, -1, throw )}

PC | ResultType | Command    | Param                   | Stack | Description                              | ArgNo{Start..End} | Expression( nested, -1, throw )
---------------------------------------------------------------------------------------------------------------------------------------------------------------
00 | String     | Constant   | "nested"                |     1 | Literal constant                         |                   |            _^_
01 | Integer    | Constant   | '-1'                    |     2 | Literal constant                         |                   |                    _^_
02 | Integer    | Subroutine | Expr(name, type, throw) |     1 | Nested expr. searched at evaluation-time |                   |_^_

With these few simple samples, all five commands of class VirtualMachine are covered.

A.4 Notes On The Architecture Of The Library

This quick chapter is not needed to be read. We just felt to write it for those people who want to take the source code and understand how module ALib Expressions was implemented, and maybe want to extend it or add internal features.

Often, there are two different perspectives needed when you think about the architecture of a software library. The first is from the viewpoint of the user of the library. This may be called the "API perspective". It basically asks: What types do I need to create and which methods do I need to invoke? The second is from the implementer's perspective. Here, it is more about what types implement which functionality and how do they interact internally.
With the development of this small library, these two perspectives had been in a constant internal fight. The decision was taken to follow the needs of the API perspective.

A user of the library just needs to "see":

  • Type Compiler , which she extends with custom derivates of
  • type CompilerPlugin . Together, these create objects of
  • type Expression , which, under provision of an object of
  • type Scope become evaluated. That's roughly it. Very simple.

From an implementation perspective there is some more things:

  • Expression strings need to be parsed into an abstract syntax tree (AST),
  • ASTs need to be compiled into a program,
  • Programs need to be executed by a virtual machine,
  • Optimized programs need to be decompiled back into ASTs to create normalized, optimized expression strings.

To keep the types that are needed from the API-perspective clean and lean, responsibilities had been moved into maybe "unnatural" places. Some more quick bullets and we have said what this chapter aimed to say:

  • Types necessary for the user may be abstract and show only a minimum set of interface methods. Corresponding implementations have been shifted to sub-namespace detail. The differentiation between the abstract base and the implementation is a pure design decision. It even costs some nanoseconds of run-time overhead, by invoking virtual functions, where no such abstract concept is technically needed. (While it reduces compile time for a user's software)
  • To keep class Compiler clean, it just contains configuration options and holds the plug-ins, while
  • the compilation itself is implemented in class detail::Program . Maybe a class named "Program" should not compile and assemble itself. Well, but it does. If it didn't, the class would probably not exist: It would be just a std::vector of virtual machine commands residing in the expression. Therefore, it just was a nice empty thing that we put the assembly stuff in to keep class Compiler free of that.
  • Well, and we admit: to keep the program concentrating on just assembly, the virtual machine has besides its duty to run programs, two other responsibilities: The first can be almost considered "OK": In debug-compilations of the library, it creates program listings. But then:
  • The virtual machine decompiles programs back to ASTs!

This design and structure might be questionable. Probably, a virtual machine should not perform decompilation and should not "know" about ASTs, which otherwise constitute the intermediate data layer between a parser and a compiler. Please do not blame us. We do not foresee bigger feature updates of this library. If such were needed, this current code design may fail and need some refactoring. But as we did it, its a compromise strongly towards simplicity of the API as well of internal code.