The goal of this ALib Module is to provide a C++ library that helps to integrate functionality in custom software to allow end users to write expression strings, which are understood and evaluated at run-time by that software.
Usually, to achieve this, it is needed to
This is of course a lot of work and a month of programming time quickly is consumed, unless a programmer has done this several times before.
Lets quickly consider two samples.
( date > today - days(7) ) & (name = "*.jpg") isFolder & notEmpty
(StartDate + years(10) < today & (NumberOfSalaryRaises = 0) StartDate( find( "John", "Miller", "Accounting" ) ) BirthDate( find( 832735 ) )
While on the first glance, these are very different types of expressions, they still have a lot in common:
date
, name
or find()
today - days(7)
.<
, >
or =
&
or |
The areas where the expressions of the two samples differ is:
With this said, we can much better explain what module ALib Expressions offers:
You will see later in this documentation, that the amount of coding needed to implement functionality like given in the samples above is surprisingly low.
To give you some help in deciding whether module ALib Expressions suits your needs, the "pros" and "cons" should be listed in bullets. We start with the cons:
Reasons to NOT use ALib Expressions Library
The pros should be given as a feature list:
Features of ALib Expressions
Q ? T : F
A ?: B
[]
to access array elements.Preferences["DATA_FOLDER"] + "/database.dat"
More than 130 built-in functions and 180 (overloaded) operators!
Areas that are covered:
As a sample, the following expression:
compiles with (optional) built-in functionality. (Compile time less than 40 µs, evaluation time less 15 µs, on a year 2018 developer machine.)
Format(formatString, ...)
.&
on boolean converts to &&
or assign =
converts to ==
).'=='
to be aliased by assign operator '='
, which is more intuitive to end-users.Very fast expression evaluation
The expression compiler performs various optimizations. For example, expression
2 * 3 + 4
results in one single program command that provides the constant result 10
.
Compile-time optimization is also supported with custom identifiers, functions and operators.
This documentation switches between in-depth informational sections and tutorial-like sample sections. Let's start with a quick tutorial section!
What is "hello world" for each new programming language is a "simple calculator" for expression compilers. Here is the code for implementing one using module ALib Expressions:
Compile the program and run it by passing some simple sample expressions (or be lazy and just read on), we give it some tries:
Input: 1 + 2 * 3 Normalized: 1 + (2 * 3) Result: 7
Fine, it calculates! Notable on this first simple sample are the brackets inserted in the what we call "normalized" expression string. Compare this to the next sample:
Input: 1 * 2 + 3 Normalized: 1 * 2 + 3 Result: 5
Why are the brackets gone here, while in the first case they had been redundant anyhow? The answer is, that human beings could easily misunderstand the first version, so module ALib Expressions feels free to help making an expression more readable.
You think this is childish? Ok, then what do you think about this expression:
true && false == true < false
Either you are "a pro" or you need to consult a C++ reference manual and check for the operator precedence. Here is what our calculator says:
Input: true && false == true < false Normalized: true && (false == (true < false)) Result: true
The insertion of redundant brackets is one of more than 30 normalization options that are switchable with enumeration flags.
The recent sample has more to show:
true
and false
.Note, that we use the term "identifier" for parameterless expression functions. By default, the parameter brackets can be omitted with parameterless functions.
Functions with parameters are for example found in the area of maths:
Input: asin(1.0) * 2.0 Normalized: asin( 1.0 ) * 2.0 Result: 3.141592653589793
or with string processing:
Input: tolo("Hello ") + toup("World") Normalized: ToLower( "Hello " ) + ToUpper( "World" ) Result: hello WORLD
"tolo()" and "toup()"? Well, ALib Expressions support shortcuts for function names. Normalization optionally replaces abbreviated names.
Finally, a more complex string function sample:
Input: Format( "Today is: {:yyyy/MM/dd}", today ) Normalized: Format( "Today is: {:yyyy/MM/dd}", Today ) Result: Today is: 2024/12/15
As it can be seen, a whole lot of identifiers, functions and operators are already available with the simple calculator example. All of these built-in definitions can be switched off. In fact, the built-in stuff is implemented with the very same interface that custom extensions would be. The only difference between built-in expression identifiers, functions and operators and custom ones is that the built-in ones are distributed with the library.
To get an overview of the built-in functionality, you might have a quick look at the tables found in the reference documentation of the following classes:
To fully understand this tutorial, library source code and finally as a prerequisite to implementing your custom expression compiler, a certain level of understanding of some underlying library and principles is helpful.
As mentioned in the introduction, module ALib Expressions make intensive use of underlying module ALib Boxing.
For the time being, lets quickly summarize what module ALib Boxing provides:
For all details, comprehensive Programmer's Manual for ALib Boxing is available.
The type-safety mechanisms and the possibilities of querying the type encapsulated in a box is used by module ALib Expressions in an inarguably lazy fashion: Wherever this expression library needs type information, such information is given as a "sample box" which is created with a sample value of the corresponding C++ type.
Consequently, the value stored (and passed with) the box is ignored and may even may become invalid after the creation of the box without any harm (for example in cases of pointer types).
While this approach causes a little overhead in run-time performance, the benefit in respect to simplification of the API surpasses any such penalty by far! Also, note that the performance drawback is restricted to the code that compiles an expression. During the evaluation, no "sample boxes" are created or passed.
The following code shows how to create sample boxes for some of the built-in standard types:
Box sampleBool = false; Box sampleInteger = 0; Box sampleFloat = 0.0; Box sampleString = String();
The values assigned in the samples are meaningless. Instead of false
, the value true
could be used and instead of 0.0
, we could have written 3.1415
. Note that the construction of the empty String instance, will even be optimized away by the C++ compiler in release compilations.
For custom types, there is no need for more efforts, as this code snippet demonstrates:
struct Person { String Name; int Age; String Street; String City; String PostCode; }; Box samplePerson= Person();
By default, with ALib Boxing, non-trivial C++ types that do not fit into the small placeholder embedded in the box are boxed as pointers. This means that even as a value of a custom type was assigned to the box, a pointer to it is stored. In the sample above, the pointer will be invalid in the next line, but that is OK, as only the type information stored in the box is of interest.
Therefore, we can "simplify" the previous code to the following:
Box samplePerson= reinterpret_cast<Person*>(0);
Besides the advantage that this omits the creation of an otherwise unused object, this approach is the only way, to get sample boxes of abstract C++ types!
The magic of module ALib Boxing makes life as simple as this! Let us preempt what is explained in the following chapters: All native callback functions to be implemented for custom operators, identifiers and functions are defined to return an object of type Box. Thus, these functions can return values of arbitrary custom type. The type of the returned (boxed) value has to correspond with what a custom CompilerPlugin suggested by providing a sample box at expression compile-time. Once understood, this is all very simple!
const Box&
to indicate that a box received is a sample box and not a real value. However, sometimes it is not possible. In these cases the parameter or member itself, as well as the corresponding documentation will give a hint whether an object is a just a "sample box" or a boxed value.A design decision of this ALib Module is to rather use "classic" virtual types instead of using templates, with all the pros and cons taken into account of such a decision. As a result, some "contracts" have to be ensured to be fulfilled by the user of the library. The term "contracts" here means: If a at some place a certain specialization of a virtual type is expected, at a different place the creation of an object of that virtual type has to be ensured. Details of these contracts will be explained in the next chapters.
ALib generally sometimes uses what we call "Bauhaus Code Style". It is not easy to state what we mean by this exactly, but a little notion of what it could be may have come to a programmers mind already by reading the previous two chapters about:
In addition to that, it is notable, that a lot of the types of module ALib Expressions are structs rather than classes. Hence, fields and methods are exposed publicly.
The goal of this library is to allow other software (libraries or internal units of a software) to expose an interface that has two main functions:
Now, lets take a sample: A list of files should be 'filtered' by name, size, date etc. The custom library or internal software unit, would probably expose
true
if the file matches the filter.Using this custom class could look like this:
FileFilter photosOfToday( "name * +\".jpg\" && date >= today" ); if( photosOfToday.Includes( aFile ) ) { ... }
As it is easily understood, really just nothing of library module ALib Expressions needs to be exposed to the "end user" of the code. Especially:
This all means, that the "natural way" of using module ALib Expressions automatically hides away all internals, which on the other side gives this module the freedom to generously use Bauhaus style, what here then finally translates to:
After this already lengthy introduction and discussion of prerequisites, it is now time to implement custom expression logic. The sample application that we use to demonstrate how this is done, implements expressions to filter files of directories, as it may be required by a simple file search software or otherwise be used by a third party application.
As a foundation, we are using the Filesystem Library of C++ 17. Note that this, as of the time of writing this documentation, is an upcoming feature and with some compilers it might not be available today, or instead of including header
#include <filesystem>
header
#include <experimental/filesystem>
needs to be used. This library originates from a development of the boost C++ Libraries and even if you have never used it, this should not introduce more burden to understand this sample, as it is very straight forward.
For example, the following few lines of code:
produce the following output:
namespace fs = experimental::filesystem;
Now, the loop of the above sample should be extended to use a filter to select a subset of the files and folders to be printed. Hence, a filter is needed. We start with a skeleton definition of a struct:
As we have no clue yet, how our custom filter expressions will look like, we pass a dummy string, which is anyhow ignored by the filter skeleton. The loop then looks as follows:
Of course, the output of this loop remains the same, because constant true
is returned by the filter skeleton's method Includes.
What we nevertheless have achieved: The interface of how ALib Expressions will be used is already defined!
This is a good point in time to quickly sort out the different perspectives on "interfaces", "libraries" or "APIs" explicitly:
The goal should be that on the 2nd level, the API of the 1st level (which is this ALib Expressions library), is not visible any more.
Well, and with the simple skeleton code above, this goal is already achieved!
The next step is about adding all components that we need to compile and evaluate expression strings to the filter class. And this is not much effort. We had seen the ingredients before in the sample code of previous section 2. Tutorial: Hello Calculator.
Because it is so simple, we just present the resulting code of the filter class:
Et voilĂ : We can now use expression strings to filter the files. Here are two samples:
Sample 1: All files are included with constant expression "true"
:
The output is:
Sample 2: All files are filtered out with constant expression "false"
:
Which results to the empty output:
--- Files using expression {false}: ---
While this demonstrates fast progress towards our aim to filter files, of course we have not linked the expression library with this custom code example, yet. All we can do is providing expressions that do not refer to the file given, hence either evaluate to true
for any file or to false
.
But before we feel free to start working on this, we first need to put one stumbling block aside.
In the samples above we used simple, constant expressions "true"
and "false"
. As we already learned in chapter 3, these are built-in identifiers that return the corresponding boolean value. Well, and a boolean value is what the filter needs. Other valid expressions would be
5 > 3 // constant true Year(Today) < 1984 // constant false
"Valid" here means, that the expression returns a boolean value! But what would happen if we constructed the filter class with expression string
1 + 2
which returns an integral value? The answer is that in method Includes of the filter class presented in the previous sections a run-time assertion would be raised in the following line of code:
return expression->Evaluate( scope ).Unbox<bool>();
The code unboxes a value of type bool, but it is not asserted that the result of the evaluation is of that type. This quickly leads us to an enhanced version of that method:
So here is some bad news: It is obvious, that there is no way around the effort of throwing and catching exceptions (or otherwise do some error processing) as soon as a software allows an end-user to "express herself" by passing expression strings to a software. Besides wrong return types, the whole expression might be malformed, for example by omitting a closing bracket or any other breach of the expression syntax rules.
The good news is, that with the use of module ALib Expressions, most - if not all - of the errors can be handled already at compile-time! Once an expression is compiled, not much can happen when an expression is later evaluated.
And this is also true for our current threat of facing a wrong result type: Due to the fact that module ALib Expressions implements a type-safe compiler, we can detect the result type at compile-time.
Consequently, we revert our most recent code changes and rather check the result type already right after the compilation:
Permissions & OwnerWrite == OwnerWrite
Permissions & OwnerWrite
It is time to finally make our sample meaningful, namely to allow to filter selected files by their attributes.
For this two steps are needed. The first again is extremely simple: We have to expose the current directory entry of our filter loop to the file filter. All we need to do is to specialize class Scope to a custom version that provides the current object.
Here is our new struct:
With this in place, we just need two small changes in our file filter:
Now, the expression's detail::Program that gets compiled in the constructor of the filter class and that is executed by the built-in detail::VirtualMachine with the invocation of Evaluate, potentially has access to the directory entry.
The next section connects the final dots and leads to a working sample.
We have come quite far without ever thinking about the syntax of the custom expressions that we need to be able to filter files from a directory. Without much reflection of that, it is obvious that filtering files by name should be enabled, maybe with support of "wildcards" just like most users know them from the command prompt:
ls -l *.hpp // GNU/Linux dir *.hpp // Windows OS
Thus, the first thing we need is to retrieve the file name from the entry. This is done with a simple custom identifier. As it was said already, an identifier is a "parameterless function". So why don't we need a parameter, namely the file entry in the expression syntax? Well, because the entry is part of the scope. It is the central piece of custom information that the whole effort is done for. Therefore, the expression:
Name
should return the name of the actual directory entry that is "in scope". This is lovely simple, so let's start. Again we start with a skeleton struct, this time derived from CompilerPlugin:
To make use of the plug-in, we have again two small changes in the custom filter class:
With this, the plug-in is in place and during compilation it is now asked for help. Parent class CompilerPlugin exposes a set of overloaded virtual functions named TryCompilation. In their existing default implementation each function just returns constant false
, indicating that a plug-in is not responsible. Thus, we have to make our plug-in now responsible for identifier "Name"
. For this we choose to override one of the offered virtual functions as follows:
As the code shows, the overridden function simply checks for the given name and the function "signature". If both match, then a native C++ callback function is provided together with the expected result type of that callback function.
The final step, before we can test the code is to implement the callback function. This is usually done in an anonymous namespace at the start of the compilation unit of the plug-in itself. The signature of any callback function that ALib Expressions expects, is given with CallbackDecl. The documentation shows, that it has three parameters, the scope and the begin- and end-iterators for the input parameters. The input parameters are boxed in objects of class Box and the same type is expected to be returned.
Because ALib Boxing makes a programmer's life extremely easy, especially when used with various kinds of strings, and because we are not reading any input parameters, the implementation of the callback function is done with just one line of code:
We are set! Our first "real" filter expressions should work. Here are some filter loops and their output:
Sample 1: :
Output:
Sample 2: :
Output:
Sample 3: :
Output:
This seems to work - mission accomplished!
Some notes on these samples:
Name
does not introduce a custom type, but returns built-in type Types::String, no operators have to be overloaded. In later chapters we will see what needs to be done when custom-types are returned by identifiers, functions or operators.'*'
, with left- and right-hand side being strings. This binary operator is also provided with plug-in Strings and is just an "alias" for function WildcardMatch.We could now easily continue implementing further identifiers, for example:
true
if the directory entry is a subfolderory, false
if it is a file.This would lead to inserting further if
-statements to the custom plug-in, similar to the one demonstrated for identifier Name.
Before this should be sampled, the next chapter explains the general possibilities of compiler plug-ins and shows how the creation of a plug-in can be even further simplified.
In the previous tutorial section, a fully working example program was developed that allows using custom expression strings to filter files and folders by their name.
It was demonstrated how to attach a custom compiler plug-in to the expression compiler, which selects a native C++ callback function at compile-time. This callback function is then invoked each time a compiled expression is evaluated against a scope. The sample implemented the retrieval of a string value from an object found in a custom specialization of class Scope.
When an expression string gets compiled, such compilation is done in two phases. The first step is called "parsing".
The result of the parsing process is a recursive data structure called "abstract syntax tree". The nodes of this tree can represent one of the following types:
'_'
.'()'
. Within the brackets, a list of expressions, separated by a colon (','
) may be given. Hence, functions are n-ary nodes, having as many child nodes as parameters are given in the brackets.'!'
) or arithmetic negation ('-'
). These nodes have one child node.'&&'
) or arithmetic subtraction ('-'
). These nodes have two child nodes.true
, otherwise it is F.This first phase of compilation that builds the AST (abstract syntax tree) usually does not need too much customization.
It could be reasonably argued, that building this tree is all that an expression library needs to do and in fact, many similar libraries stop at this point. What needs to be done to evaluate an expression is to recursively walk the AST in a so called "depths first search" manner, and perform the operations. The result of the evaluation would be the result of the root node of the tree.
ALib Expressions goes one step further, performing a second phase of compilation. In this phase, the recursive walk over the AST is done. The result of the walk is an expression Program. Such program is a list of "commands" which are later, when the expression is evaluated, executed by a virtual stack machine. (This stack machine is implemented with class detail::VirtualMachine).
This second phase is where the customization takes place. When a node of the AST is translated into a program command for the virtual machine, the compiler iterates through an ordered list of CompilerPlugins to ask for compilation information. As soon as one plug-in provides such info, the compiler creates the command and continues walking the tree.
Now, what does the compiler exactly "ask" a plug-in for and what information is included in the question? To answer this, let us first look at the list of AST nodes given above. Of the the six types of AST-nodes listed, two do not need customization. These are literals and the ternary operator. What remains is
It was mentioned before, that ALib Expressions is type-safe. To achieve this, the result type of each node is identified (deepest nodes first). Whenever a node with child nodes is compiled, the result types of each child node has already been identified.
With this in mind, the input and output information that compiler plug-ins receive and return becomes obvious. Input is:
The output information is:
true
.To finalize this section, a quick hint to the benefits of taking this approach should be given:
With the information given in the previous subchapter, some important consequence can be noted:
This fact in turn leads to the following statements:
As a sample, let's take two simple expressions
1 + 2 "Result " + 42
Both expressions consist of two literal nodes, which are the two children of binary operator '+'
. As literals are not compiled using plug-ins, only the binary operator is passed to the plug-ins. To successfully compile both, plug-ins have to be available that cover the following permutations:
binary op, + , integer, integer binary op, + , string, integer
For the addition of integer values, built-in compiler plug-in Arithmetics is responsible. For the concatenation of integer values to string values, plug-in Strings steps in.
The documentation of the plug-ins therefore mainly consist of tables that list permutations of operators, function names and input types, together with a description of what is done in the C++ callback function and what result type is to be expected.
The use of the built-in plug-ins is optional and configurable. Configuration is done by tweaking member Compiler::CfgBuiltInPlugins before invoking method Compiler::SetupDefaults. But a use-case to do so, is not so easy to find, also due to the fact that custom plug-ins default to a higher priority and this way might replace selected built-in behavior.
To implement a custom compiler plug-in, the following "bottom-up" approach is recommended:
To finalize this chapter, some obvious facts should be named:
"1 + 2"
calculation might be handled by custom code.'<'
) usable with with various combinations of argument types provided with different built-in compiler plug-ins.After a lot of theory was given, it is now quite straight forward to explain how struct CompilerPlugin is used.
The struct provides an inner struct CompilationInfo which is the base of several derived further (inner) specializations. The base struct exposes the common base of the input and all of the output information provided to, and received from compiler plug-ins. According to the different node types of the parsed AST, the specializations are:
Along with this, for each of these structs, an overloaded virtual method called TryCompilation is defined. A custom plug-in now simply derives from the plug-in struct, and overrides one or more of the virtual methods. The original implementation of the base struct returns constant false
. In the case that the given information corresponds to a permutation that the custom plug-in chooses to compile, the plug-in needs to fill in the output parameters of the given struct and return true
.
The architecture of the expression compiler and the use of according plug-ins was explained and we could continue now with extending the sample plug-in given in section 4.5 Implementing A Compiler Plug-In.
This would quickly lead to inserting a bunch of if
-statements to the already overridden method TryCompilation. Considering all possible permutations of operators and types, this result in repetitive code. To avoid this, the library provides an optional helper-class.
All built-in compiler plug-ins (with the exception of ElvisOperator and AutoCast) use this class and are therefore not derived from CompilerPlugin, but from plugins::Calculus.
The trick with that type is that permutations of operators, identifiers, function names and argument types are provided as static table data, together with the information of how to compile the permutations.
Then in a custom constructor, these static tables are fed into a hash table that allow a performant search. The custom plug-in does not need to furthermore override any TryCompilation method, as class Calculus provides a default implementation that simply searches the hash table.
Consequently, all that these built-in plug-ins do is feeding their keys and corresponding callback methods to the hash table during construction. This is not just very efficient in respect to this library's code size and in respect to the compilation performance of expressions, it also makes the creation of a plug-in an even more simple and straight-forward task.
The permutations of function arguments that class Calculus uses to identify static compilation information, includes an option to keep a trailing portion of such arguments variadic. A sample of such variadic function implemented using this helper-class is expression function Format.
We now go back to our tutorial sample and add more file filter functionality, by using this helper-class Calculus.
Before we start adding new features to the sample code of section 4. Tutorial: Implementing A File Filter the first task is to refactor the sample to use helper-type plugins::Calculus.
The already presented sample plug-in defined a callback function was:
Furthermore our compiler plugin was derived from CompilerPlugin and implemented method TryCompilation for functions (identifiers):
The callback function remains untouched. Struct FFCompilerPlugin is changed in three aspects:
The resulting code of the plugin looks as follows:
We can now finally continue with adding more functionality to our file filter sample. At the end of chapter 4.5 Implementing A Compiler Plug-In we already thought about what we could add:
true
if the directory entry is a subfolderory, false
if it is a file.OK, let's do that! First we add some boxed values that define constants for permission rights. This is still done in the anonymous namespace, hence the following boxes are on namespace scope, just as the callback functions are:
We are doing two casts here: The first is to get the underlying integral value from the filesystem library's constants. If we did not do this, we would introduce a new type to ALib Expressions. In principle, this would not be a bad thing! The advantages and disadvantages will be explained in a later chapter.
The second cast is to convert the signed integral value to an unsigned
one. Again, if we did not do this, this would introduce a new type, namely uinteger. Note that this library does not provide built-in operators for unsigned integers.
With these casts, the permission values become compatible with built-in binary operators DefaultBinaryOperators::BitAnd, DefaultBinaryOperators::BitOr and DefaultBinaryOperators::BitXOr which are defined for built-in type Types::Integer, which in turn is nothing else but a integer!
Next, we add the new callback functions:
All that is left to do is "announcing" the availability of these constants and functions to class Calculus in the constructor of the custom plug-in. As shown before, functions are added to table Calculus::Functions. The constant, parameterless functions are put into a simplified version of this table found with field Calculus::ConstantIdentifiers.
The entries of both tables expect an object of type Token. This object is used by class Calculus to match identifiers and functions found in expression strings against the names that are defined by a plug-in. With the use of class Token, a flexible way of optional name abbreviation is provided, taking "CamelCase" or "snake_case" token formats into account. In our case, for example, we allow all constant identifiers to be shorted to just two letters. For example Identifier "OwnerExecute" can be abbreviated "OE", "oe", "ownR", etc.
Here is comes the sample snippet:
After all this theory and discussion, this is surprisingly simple and short code! Our file filter is already quite powerful. Here are some sample expressions and their output:
--- Filter Expression {IsDirectory}: --- detail plugins util
--- Filter Expression {!IsDirectory && size < 20000}: --- standardrepository.cpp expression.cpp expression.hpp standardrepository.hpp compiler.cpp scope.hpp
--- Filter Expression {date > DateTime(2019,2,5)}: --- compilerplugin.hpp standardrepository.cpp expression.cpp expression.hpp standardrepository.hpp detail plugins compiler.hpp expressionscamp.cpp compiler.cpp util scope.hpp expressionscamp.hpp
--- Filter Expression {(permissions & OwnerExecute) != 0}: --- detail plugins util
--- Filter Expression {size > 20480}: --- compilerplugin.hpp compiler.hpp expressionscamp.cpp expressionscamp.hpp
expressionslib.cpp
is so huge, the answer is: it contains this whole manual and tutorial that you are just reading, created with marvelous Doxygen !The latest sample expression was:
size > 81920
It would be nicer to allow:
size > kilobytes(80)
Ok, let us add three functions. Here are the callbacks:
The functions unbox the first parameter. For this, due to the type-safe compilation of ALib Expressions, neither the availability nor the type of the given argument needs to be checked.
Next we need to define the function "signature", which is defining the number and types of arguments that the functions expect. Class Calculus allows us to do this in a very simple fashion. It is just about defining an array of pointers to sample boxes. As all three simple functions have the same signature (they all just receive one argument of type integer), we need only one signature object:
This was all we needed to prepare: here is the new version of the plug-in:
Macro CALCULUS_SIGNATURE simply provides two arguments from the one given: The pointer to the start of the array along with the array's length. Those two values will be assigned to
fields FunctionEntry::Signature and fields FunctionEntry::SignatureLength of function table records.
And here is a quick test using one of the functions:
This worked well!
TakesOneInt
is removed and replaced by the corresponding field of the library, because custom code is very well allowed to use the built-in arrays.A picky reader might now think: well it is more efficient to use expression:
size > 81920
instead of:
size > kilobytes(80)
because the latter introduces a function call and hence is less efficient. But this is not the case, at least not in respect to evaluating the expression against a directory entry. The evaluation time of both expressions is exactly the same, because both expressions result in exactly the same expression program.
The only effort for the library is at compile-time. While later chapter 11.5 Optimizations will discuss the details, here we only briefly note what is going on: The definition entry of the function table for function Kilobytes states Calculus::CTI in the last column. This tells class Calculus that the function might be evaluated at compile-time in the case that all arguments are constant. Because the single argument given is constant literal 80
, this condition is met. Thus, the callback function is invoked at compile-time and instead of the function's address, the result value is passed back to the compiler. The compiler notes this, and replaces the original command that created the constant value 80
with the constant result value 81920
. This is why both expressions lead to exactly the same program.
In contrast to this, the identifiers of the previous chapter are marked as Calculus::ETI, which means "evaluation-time invokable only". The obvious rationale is, that these functions access custom data in the Scope object and such custom data is available only when the expression is evaluated for a specific directory entry.
Next, some binary operator definitions are to be showcased.
We had implemented identifier Permissions to return a value of Types::Integer instead of returning the C++ 17 filesystem library's internal type. The advantage of this was that the built-in bitwise-boolean operators defined for integral values, could instantly be used with expressions. This was demonstrated in above sample expression:
(permissions & OwnerExecute) != 0
The disadvantage is that the filter expressions are not really type-safe. An end-user could pass the expression:
(permissions & 42) != 0
without receiving an error. While this is a design decision when using ALib Expressions, in most cases, type-safeness has definite advantages. To achieve type-safeness, we now change the definition of the callback function of identifier Permission as follows:
In the previous version we had casted the enumeration elements of fs::perms
to its underlying integral type. Now we are boxing the un-casted enumeration element value.
To denote type fs::perms
as being the return type of identifier Permission, we need a sample box. This is an easy task, we just randomly choose one enumeration element and assign it to a new variable of type Box.
A next small change needed, results from a requirement of class Box: Global (or static) objects must not be initialized with custom types (in this case with elements of enum fs::perms
). Such initialization has to happen after module ALib Boxing is duly bootstrapped. Therefore, the initializations of all constant boxes, as well as of the sample box, is now moved to the constructor of the compiler plug-in.
The new code for the compiler plug-in's constructor now is:
Apart from initializing the constant boxes, the only new line of code is the definition of the sample box for the return value, which is then used in the function table to denote the return type of function Permissions.
TypePermission
could have been omitted and just one of the constant values, e.g., constOwnRead
could have been used as the return type sample box in the function table entry.Box
, which resides in the compiled software occupying 24 bytes, in favor to better readable code. Real-life plug-ins could find other solutions, e.g., using a preprocessor macro, to save this small overhead.Let's see what happens if we try to compile the previous expression:
The compiler throws a run-time exception, noting that operator '&'
is not defined. The first thing we want to fix is the output information of this Exception itself. While in general it is not necessary to announce custom types explicitly, the exception is is that the human-readable information collected in exceptions thrown by the library benefits from it. For just this purpose, method Compiler::AddType is available. Consequently, we add statement
to the constructor of our plug-in.
With this in place, the exception thrown looks as follows:
This looks better, but still its an exception. What it tells us is to define the operator. We do this for a bunch of operators at once. Firstly, we need the callbacks for the operators:
This is the first time that two parameters are read in the callbacks. It is done using simple iterator arithmetics.
Struct Calculus organizes compilation information on unary and binary operators in a hash map. For filling the map, a convenience function is available that accepts a simple array of information entries. This array usually is defined in the anonymous namespace of the compilation unit:
For information about the meaning of the values of the table, consult the documentation of Calculus::OperatorTableEntry. But looking at the code, and reflecting what was already presented in this tutorial, the meaning should be is quite self-explanatory. It just should be noted, that also for operators, flags Calculus::CTI or Calculus::ETI may be given. If, like in our case, CTI is specified, then at the moment that both operands are constant, the compiler will optimize and the callbacks are pruned from the compiled expression. This means, that for example sub expression:
( OwnerRead | GroupRead | OwnerExecute | GroupExecute )
will be reduced to one single constant in the compiled expression program, because each of the identifiers returns a constant value.
Finally, in the constructor of the plug-in we now add the following line of code:
With this in place, the expression now compiles in a type-safe way:
(permissions & OwnerExecute) != 0
(permissions & OwnerExecute) == OwnerExecute
0
, e.g., NoPermission had to be inserted to the plug-in.To finalize this tutorial part of the documentation, a last quite powerful feature of ALib Expressions is presented. We re-think again what we did in the previous section:
fs::perms
.For the latter, there is an alternative available, called "auto-casting". If no compiler plug-in compiles an operator for a given argument or pair of arguments, then the compiler invokes method CompilerPlugin::TryCompilation(CIAutoCast&) for each plugin. In the case that one of the plug-ins positively responds by providing one or two "cast functions", the compiler inserts the cast functions for one or both arguments and performs the search for an operator of this now new type (respectively pair of types) a second time.
We add such "auto-casts" for allowing the compiler to convert fs::perms
to integer. This approach obviously has the following consequences:
{ Permissions == 0 }
is well compiled and evaluated.To implement this, we revert the most recent code changes (the operator callbacks, the binary operator table and the single line of code that feeds the table to parent Calculus).
As a replacement, we add the following callback function which casts a permission type to Types::Integer:
A cast function takes one parameter of the originating type and returns the converted value. In this sample, this is trivial. Sometimes more complex code is needed. Casting one type to another might even include memory allocations to create a certain custom type from a given value. Such allocations, have to be performed using the provided, Scope object, which, as explained later, optionally is of custom type. Allocations done by auto-casting are then to be deleted when the scope object is deleted or reset.
With this casting callback function in place, we add the following method to the custom plugin:
Likewise with previous solution, our sample expression compiles with the very same result:
However, unlike the recent implementation, compilation is not type-safe in respect to mixing fs::perms
with integer values:
This was a rather simple use case, but a very frequent one. Again class plugins::Calculus, may be used to avoid the code from the previous section and replace it by just one line of static table data.
To demonstrate this, we remove the code of the previous section. This is not only in consideration of method TryCompilation, but we can also remove the custom callback function that performed the cast!
Now, we add the following statement to the constructor of our custom compiler plug-in which is already derived from class Calculus
A quick check confirms that our sample expression compiles and evaluates the same as before:
The various options and fields of table Calculus::AutoCasts are not explained here, but well documented with inner struct Calculus::AutoCastEntry.
Further documentation is found with method Calculus::TryCompilation(CIAutoCast&) , including some hints about the use cases not covered by this helper-class, hence those that demand the implementation of a custom TryCompilation method.
The types, identifiers, functions and operators presented in this manual section are to be named "built-in" in that respect, that they are available by default. But the truth is, that they are implemented using the very same technique of providing compiler plug-ins, that has been explained in the previous section. This way, this built-in logic is fully optional and can be easily switched off, partly or completely.
For doing so, class Compiler offers a set of configurable flags, gathered in member Compiler::CfgBuiltInPlugins. The flags are declared with enumeration Compiler::BuiltInPlugins and are evaluated in method Compiler::SetupDefaults. Field CfgBuiltInPlugins defaults to Compiler::BuiltInPlugins::ALL. With this information, it is easy to understand that the following code of setting up a compiler:
leads to a compiler that does not compile anything.
It should be very seldom that disabling of one or more of the built-in compiler plug-ins is needed. Here are some rationals for this statement:
In the default setup (all built-in plug-ins are active), ALib Expressions is considered to be "complete" in respect to providing all reasonable operators for permutations of arguments of all built-in types.
This manual does not elaborate about implications in respect to such completeness in the case that selected built-in plug-ins are omitted. It is up to the user of the library to think about such implications and provide alternatives to the built-in functionality that is decided to be left out.
By the same token, there is no mechanism to disable the compilation of selected built-in compiler plug-ins and with that, their inclusion in the library code. If such is to be achieved in favor to code size, a custom build-process has to be set up.
As explained in previous sections of this manual, the introduction of types to ALib Expressions is performed in an implicit fashion: New types are introduced at the moment a callback function chooses to return one and consequently, the corresponding compiler plug-in announces this return type of such callback to the compiler during the compilation process.
Therefore, the set of built-in types is resulting from the set of built-in compiler plug-ins. Nevertheless, the library design opted to collect sample boxes for the set in struct Types, which is defined right in namespace alib::expressions.
It is notable that no built-in support for unsigned integral values is provided. In the unlikely event that this is needed for any reason, such support can quite easily by implemented by a custom plug-in. As a jump-start, the source code of class Arithmetics might by used.
Furthermore, all possible sizes of C++ integral values are collectively casted to integer, which on a 64-bit platform to 64-bit signed integral value and to a 32-bit signed integral on a 32-bit platform.
Finally, Types::Float is internally implemented using C++ type double
. No (built-in!) support for C++ types float
and long double
is provided.
This reduction of used types simplifies the built-in plug-ins dramatically and reduce the libraries footprint, as it reduces the number of type-permutations to a reasonable minimum.
Due to the type-safe compilation, adding custom types has no impact on evaluation performance of operators and functions that use the built-in types (or other custom types).
What is called "arithmetics" with this library comprises the implementation of unary and binary operators for permutations of types Boolean, Integer and Float.
The operators and some few identifiers and functions are collectively implemented and documented with plug-in Arithmetics.
Fundamental mathematical functions like trigonometrical, logarithms, etc. are collectively implemented and documented with plug-in Math.
Plug-In Strings provides quite powerful string operations. The library here benefits tremendously from underling modules ALib Strings and ALib Boxing.
For example, operator Add ('+'
) allows concatenating two strings, but also a string with "any" other built-in or custom type. The latter - namely that there is no need to define an overloaded expression operator for strings and custom types - is achieved by leveraging box-function FAppend. Consult the user manual of ALib Boxing for details on how to implement this interface for your custom types, to allow end-users to concatenate your types to strings within expressions.
All built-in string features, including:
'*'
and '?'
)is given with the plug-in's documentation.
The built-in types Types::DateTime and Types::Duration represent ALib classes DateTime and TimePointBase::Duration of the same name. The corresponding Expression functionality is implemented and documented with plug-in DateAndTime.
If a user of ALib Expressions prefers to use different, own or 3rd-party types, then support for such type needs to be implemented by a custom plug-in. Such implementation may be created by copying the source code of built-in plug-in DateAndTime and replacing all corresponding code lines to work with the desired date and time types. If wanted, some or all identifiers might remain the same and even if the built-in plug-in may be kept active. In the latter case, no clash of identifiers would occur. This is because the custom plug-in would usually be inserted to the compiler with a higher priority than the priority of the built-in plug-in.
The conditional operator Q ? T : F
is the only ternary operator, and (for technical reasons) not implemented as a plug-in. In contrast, it is hard-coded in the details of this library's implementation.
This is not considered a huge limitation, as there is no obvious use case, why this operator should be overloaded: It's meaning is the same for any use of types.
The conditional argument 'Q'
, which of course could result in a value of any built-in or custom type, is interpreted as a boolean value using box-function FIsTrue. While a default implementation for this box-function exists that evaluates any custom type, a provision of this interface for a custom type may be used to override this default implementation.
For result arguments 'T'
and 'F'
, the only requirement that needs to be fulfilled is that both are of the same type or that a compilation plug-in for auto-casting them to a joint type exists.
This means:
A variant of the conditional operator is the so called "Elvis Operator", A ?: B
. This variant is duly supported by this library and compiled as binary operator DefaultBinaryOperators::Elvis just as any other operator is - including that the compiler tries to perform an auto-cast, if needed.
Built-in compiler plug-in ElvisOperator handles this operator for built-in types as well as for custom-types, in the case that 'A'
and 'B'
share the same type.
Similar to the conditional operator, the default implementation invokes box-function FIsTrue on argument 'A'
and decides whether 'A'
or 'B'
is chosen. This default behavior can be changed by just implementing the elvis operator, likewise any other operator would be implemented.
Built-in compiler plug-in AutoCast offers casting proposals to the compiler in respect to the built-in types.
For details on the casting facilities, consult the class's documentation.
As it was demonstrated in 4.4 Exposing The Directory Entry To ALib Expressions, a customized (derived) version of struct Scope is passed to method ExpressionVal::Evaluate, and the very same object is passed to the callback functions, when the expression program is executed by the built-in virtual machine. As a result, a custom callback function can rely on the fact that it is possible to dynamically cast parameter scope
back to the custom type and access "scoped data" which exposes an interface into the application that uses ALib Expressions.
This is the most obvious and also intuitively well understandable role of struct Scope. But there are other important things that this class provides.
Struct Scope incorporates field Scope::Stack. This vector is used by the built-in virtual stack-machine implementation during evaluation. This way, it was possible to implement the machine's execution method without using any data exposed by the machine (in fact, the machine is a pure static class).
The important consequence is:
A next important role that struct Scope fulfills is to provide fields that allow to allocate temporary data. With a simple arithmetic expression like this:
1 * 2 + 3
no allocations are needed. The reason is that the intermediate result of the multiplication of integer constants, can be (and is) stored as a value in the Box object that operator '*'
returned. However, an expression with string operations like this:
"Hello " + "beautiful " + "world!"
incorporates intermediate results (in this case "Hello beautiful "
). Space for such intermediate results has to be allocated somewhere, because the Box object stores only a pointer to a character array, together with its length. In fact, the final result string has as well be allocated, because again, the result of the expression is a boxed string which needs allocation.
For this reason, struct Scope incorporates some built-in "facilities" to allocate data. Those are briefly:
new
and then added to the container, a memory leak would occur.So far, we talked only about the instance of struct Scope that is provided to method ExpressionVal::Evaluate. But there is a second scope object created, that is called the "compile-time scope". If you reconsider the sample expression from the previous section:
"Hello " + "beautiful " + "world!"
All three string-type arguments are constant string literals. The operator '+'
is implemented with built-in compiler plug-in Strings, which defines the operator being "compile-time-evaluable". As explained in the tutorial section, this means that at the moment all arguments are constant, struct Calculus (the parent of struct Strings), invoked the operator's callback function at compile-time. Callback functions rely on a scope object, e.g., for memory allocation, as just discussed.
For this reason, a compile-time singleton of type Scope is created and provided to the callback functions during compilation of constant terms. Intermediate results may this way be stored either in the compile-time scope instance or in the evaluation-time instance. The latter is cleared upon each evaluation, while the data allocated in the compile-time scope is cleared only with the deletion of the expression.
If at least one custom callback function that is compile-time invokable uses custom allocation tools which are only provided by a corresponding custom version of the type, then - ooops!
To support this scenario, a derived version of class Compiler has to be created, which re-implements virtual method createCompileTimeScope. This method is internally called with method Compile to allocate the compile-time scope.
If the conditions described above are met, then this method has to be overwritten to return a heap-allocated custom scope object. This object will internally be deleted with the deletion of the expression.
Custom callback functions can then rely on the fact that the compile-time scope object can be dynamically casted to the custom type and use its custom allocation facilities.
So far, things had been still quite straight forward. Let us quickly recap what was said about scopes:
This concept of having two separated scope objects in certain cases is extended. In general terms, it could be phrased as follows:
Compiler plug-ins may choose to create resources at compile-time, which are not intermediate constant results, but which are objects used at evaluation time.
To support this, two further fields are found in class Scope:
nullptr
, because the given scope object already is the compile-time scope.nullptr
.)The following sample taken from the built-in compiler plug-in Strings nicely demonstrates what can be achieved with this concept.
Built-in compiler plug-in Strings provides expression function WildcardMatch, which matches a pattern against a given string. For example, expression
WildcardMatch( "MyPhoto.jpg", "*.jpg" )
evaluates to true
.
'*'
. The sample expression of above can this way also be phrased: "MyPhoto.jpg" * "*.jpg"
To implement this function, internally helper-class WildcardMatcher provided by underlying library module ALib Strings is used. For performance reasons, this class implements a two-phased approach: First, the "pattern" string (here "*.jpg"
) is parsed and translated into a set of internal information. Then, for performing a single match, this internal information is used, which is much faster than if the pattern still had to be parsed.
In most cases, an expression string given by an end-user would contain a non-constant string to match and a constant pattern string, like in the following expression:
filename * "*.jpg"
In this case, it would be most efficient, if the pattern string was passed to an instance of ALib class WildcardMatcher at compile-time, while at evaluation time this exact matcher would be used to perform the match.
This setup already explains it all:
You might not be interested in the details of the implementation and skip the rest of the chapter. The code becomes a little more complex than usual plug-in code. The reason is that helper-struct Calculus does not provide a mechanism to support this.
We start with defining the resource type, derived from struct ScopeResource. This simply wraps a matcher object and its sole purpose is to have a virtual destructor that later allows internal code to delete the matcher:
Next, method TryCompilation needs to be overwritten to be able to fetch the function:
The methods starts by invoking the original implementation of parent Calculus. Because the wildcard function is compile-time invokable, in the (unlikely) case that both parameters are constant, a constant value would be returned. Only if one of the parameters is non-constant, then the callback is set to callback function wldcrd
.
The following if-statement selects this case that we are interested in:
If the second parameter is not an empty string, obviously a constant value was given.
true
!Now, we extract the pattern string and combine it with prefix "_wc"
to a key string to store the resource:
It may happen, that an expression uses the same pattern twice. In this case, the same matcher object can be used. Therefore, it has to be checked, if a matcher with that same pattern already exists. If not, it is created:
After that, TryCompilation exits, signaling compilation success. All that is left to do is the implementation of the callback function. At the beginning the function checks if this is an evaluation-time invocation. In this case, it searches a named resource according to the given pattern string. If this is found, the function uses the resourced matcher and exits:
If it was not found, then two possibilities are left:
Consequently, all that is needed to be done now, is to perform the match operation by using a local, one-time matcher object:
In its default configuration, module ALib Expressions parses and compiles an almost complete set of operators known from the C++ language. Not supported by default are for example assignment operators like e.g., '+='
or increments '++'
. Operators included are:
Unary Operators:
'+'
'-'
'!'
'~'
'*'
Binary Operators:
'*'
'/'
''
'+'
'-'
'<<'
'>>'
'<'
'<='
'>'
'>='
'=='
'!='
'&'
'|'
'^'
'&&'
'||'
'='
Special Operators:
'Q ? T : F'
'A ?: B'
'[]'
Not only the operators were taken from C++, but in the case of binary operators, also the definition of their precedence.
The built-in operators are set by method Compiler::SetupDefaults if flag Compilation::DefaultUnaryOperators, respectively Compilation::DefaultBinaryOperators is set in bitfield Compiler::CfgCompilation.
Internally the following approach is taken:
This is a rather simple process, and thus it is similar simple to intervene and customize the operators. While removing what is built-in is seldom necessary, adding an operator might be wanted. This is exercised in the next section.
With what was described in the previous chapter, the following options of customizing the operators parsed and compiled by module ALib Expressions can be taken:
As a sample, the goal is to have a new binary operator '{}'
that allows to format the right-hand side operand according to a format provided with the left-hand side operand. Let's first check what happens if we just start and use the operand:
This produces the following exception which indicates that parsing the expression fails due to a syntax error:
Now we define the operator:
We give the operator a high precedence, on the level of operator '*'
. The operator precedence values are documented with DefaultBinaryOperators.
The exception changes to:
Obviously, now the parser recognized the operator. This single line of code, was all we needed to do to define the operator.
To get a working sample, a compiler plug-in that compiles the operator for left-hand side strings and any right hand side type is needed. Here it is:
With the plug-in attached:
The expression compiles and results in:
Hexadecimal: 0x2a
'_'
(underscore).End-users that are not too familiar with programming languages might find it easier to use verbal operators. Instead of writing:
they prefer:
Such sort of "verbal" expressions are supported and enabled by default, with the concept of "Verbal Operator Aliases". As the term explains already, verbal operators can not be defined with this library as being full featured "stand-alone" operators, but only as aliases for existing symbolic operators.
The default built-in (resourced) verbal operator aliases are:
Verbal Operator | Is Alias For |
---|---|
Not | Unary operator '!' |
And | Binary operator '&&' |
Or | Binary operator '||' |
Sm | Binary operator '<' |
Smaller | Binary operator '<' |
Smeq | Binary operator '<=' |
Smaller_or_equal | Binary operator '<=' |
Gt | Binary operator '>' |
Greater | Binary operator '>' |
Gteq | Binary operator '>=' |
Greater_or_equal | Binary operator '>=' |
Eq | Binary operator '==' |
Equals | Binary operator '==' |
Neq | Binary operator '!=' |
Not_equals | Binary operator '!=' |
Likewise the operators themselves, ALib Expressions defines the names and alias operators using resourced ALib Enum Records assigned to enumeration class DefaultAlphabeticBinaryOperatorAliases.
The resource data is processed by method SetupDefaults dependent on flag DefaultAlphabeticOperatorAliases of bitfield CfgCompilation (which is set by default).
Additional flag AlphabeticOperatorsIgnoreCase controls whether the alias names are matched ignoring letter case (which is also set by default).
Class Compiler simply stores the alias information in its public hash tables AlphabeticUnaryOperatorAliases and AlphabeticBinaryOperatorAliases which can be altered prior or after the invocation of SetupDefaults, but before a first expression is compiled.
Some further notes:
This library does some effort to support operator aliases, which is about telling the system that an operator used with certain argument types is to be considered equal to another operator using the same argument types.
The only occasion where this is internally used, is with combinations of boolean, integer and float types and bitwise operators '~'
, '&'
and '|'
: Any use of these operators a mix of these types - excluding those permutations that only consist of integers - are optionally aliased to boolean operators '!'
, '&&'
and '||'
. It would have been less effort to just define the bitwise operators for the types to perform boolean calculations! So, why does the library do the effort then?
The motivation for taking the effort comes from normalization. While the library should be configurable to accept the expression:
date = today & size > 1024
it should at the same time be configurable to "normalize" this expression to:
date == today && size > 1024
Maybe, custom fields of application identify other operators where such aliasing is reasonable as well.
The following parts of the library's API are involved in operator aliasing:
The "array subscript operator" '[]'
is only in so far an exceptional binary operator, as it is parsed differently than other binary operators. While usually the operator is placed between the left-hand side (Lhs) and right-hand side (Rhs) arguments, the subscript operator is expressed in the form
Lhs[ Rhs ]
In any other respect it is completely the same! The only built-in use of the operator is with lhs-type String and rhs-type Integers. With it, the substring at the given position of length 1
is returned.
Its use and meaning is of course not bound to array access. For example, with the right-hand side operands of type String, a mapped access to pairs of keys and values can be realized. To implement this, the left-hand side type would be a custom type returned by an identifier, say Properties. Now, if the subscript operator was defined for this type and strings, expressions like
Properties["HOME_PATH"]
are possible. The operator's callback function could catch certain key values and return appropriate results from objects accessible through the custom scope object.
It was not talked about nested expressions yet. This is concept is introduced only with the next chapter, 10. Nested Expressions
Here, we just quickly want to explain that this operator exists, that it has a special meaning and how it can be changed.
The definition of the operator is made with field Compiler::CfgNestedExpressionOperator. Its default value is '*'
, which in C/C++ is also called the "indirection operator". With this default definition, expression:
date < today && *"myNested"
refers to nested expression "myNested"
.
Changing the operator needs to be done before invoking Compiler::SetupDefaults. Should operator definitions be changed as explained in the previous chapters, it is important to know that the nested expression operator itself has to be duly defined. In other words: Specifying an operator with field CfgNestedExpressionOperator does not define the operator.
The operator internally works on string arguments which name the nested expression that is addressed. However, to overcome the need of quoting names of nested expressions, a built in mechanism is provided that allows omitting the quotes. This feature is enabled by default and controlled with compilation flag AllowIdentifiersForNestedExpressions. For this reason, by default, the sample expression given above can be equally stated as:
date < today && *myNested
Note that this does not introduce a conflict with defined identifiers or function names. For example, if a nested expression was named "PI", just as math constant identifier PI, then still the following works:
5.0 * PI // multiplies 5 with math constant PI 5.0 * *PI // multiplies 5 with the result of nested expression named "PI".
When changing the nested expression operator, some thinking about the consequences is advised. Other candidates for nested expression operators may be '$'
, ''
or '@'
, which are more commonly used to denote variables or other "scoped" or "nested" entities. But exactly for this reason, module ALib Expressions opted to default the operator to '*'
. Often, applications offer to provide expressions via a command line interface, which in turn allows using bash CLI and any scripting language. The asterisk character '*'
seems to be least clashing with such external environments.
Therefore, we recommend to do some careful thinking about potential conflicts in the desired field of application and use case environments, before changing this operator.
Often certain "terms" of an expression are to be repeated in more than one expression. Furthermore, it sometimes is valuable to be able to split an expression into two parts, for example parts that have different levels of validity. This latter is often the case when it comes to "filtering" records from a set. A first filter might be a more general, long living expression. A second expression adds to this filter by applying more concrete demands. In the filter sample, there are two ways of achieving this:
&&
.Besides being faster, the second has one huge advantage: it is up to the end user if the single filter refers to a different term - or not. There is no need to hard-code two filters into a software.
These thoughts bring us to the concept of "nested expressions", which is referring to expressions from expressions!
The foundation to achieve such feature, is first to provide a way to store expressions and retrieve them back using a key value.
Module ALib Expressions provides a built-in container to store compiled expressions. At the moment an expression is stored, a name has to be given and that is all that makes an expression a named expression.
So far in this manual, we had compiled expressions using method Compiler::Compile. What is returned is an anonymous expression. It is not named. To create a named expression, method Compiler::AddNamed is used. This method internally compiles the expression and stores it with the given name. The expression itself is not returned, instead information about whether an expression with that name existed (and thus was replaced).
For retrieval of named expressions, method GetNamed is offered and for removal method RemoveNamed.
std::shared_ptr<Expression>
. With this, a named expression is automatically deleted if it is removed from the storage and not externally referred to.By default, letter case is ignored when using the given name as a storage key. Hence adding "MYEXPRESSION"
after adding "MyExpression"
replaces the previous instance. This behavior can be changed using compilation flag CaseSensitiveNamedExpressions. Changes of this flag must be done only before adding a first named expression. Later changes leads to is undefined behavior.
Named expressions is not too much of a great thing if viewed as a feature by itself. But it is the important prerequisite for nested expressions explained in the following sections.
The simplest form of addressing nested expressions is by using unary operator '*'
, which allows embedding a named expression into another expression.
While the operator defaults to being '*'
, this can be changed as described in 9.6 Unary Operator For Nested Expressions.
The operator expects a string providing the name, but for convenience, this string does not need to be quoted, but may be given like identifiers are.
With this operator, expressions:
*"MyNestedExpression" *MyNestedExpression
both simply "invoke" the expression named "MyNestedExpression"
and return its result.
Of course, this sample was just the shortest possible. Nested expressions can be used just like any other identifier:
GetDayOfWeek( Today ) == Monday && *MyNestedExpression
This expressions evaluates to true
on mondays and if named expression "MyNestedExpression"
evaluates to true in parallel.
One might think that this is all that has to be said about nested expressions. But unfortunately this is not. An attentive reader might have noticed some important restriction with nesting expressions like this: Because ALib Expressions is a type-safe library, the compiler can compile operator &&
in the above sample only if it knows the result type of "MyNestedExpression"
. As a consequence, we have to state the following rule:
Let us simply have a try. The following code:
Produces the following exception:
The exception tells us that this is a "compile-time defined nested expression". This indicates that there will be a way out, what we will see in the next chapter, but for the time being let us fix the sample by adding the named expression upfront:
Now this works:
Result: 42
The compiler found the nested expression, identified its return type and is now even able to use it in more complex terms like this:
This results in:
Result: 84
But there is also another restriction that has to be kept in mind with the use of the unary operator for nested expressions.
While this sample still works well:
Result: 42
This expression does not work:
as it throws:
E1: <expressions::NamedExpressionNotConstant> Expression name has to be constant, if no expression return type is given. [@ /home/dev/A-Worx/ALib/src/alib/expressions/detail/program.cpp:470 from 'Program::AssembleUnaryOp()' by 'MAIN_THREAD(-1,0x00007814AA223740)'] I2: <expressions::ExpressionInfo> Expression: { *("MyNested" + ( random >= 0.0 ? "Expression" : "" )) } ^-> [@ /home/dev/A-Worx/ALib/src/alib/expressions/detail/program.cpp:471 from 'Program::AssembleUnaryOp()' by 'MAIN_THREAD(-1,0x00007814AA223740)']
While - due to the compile-time optimization of ALib Expressions - the constant concatenation term "MyNested" + "Expression"
is still accepted, the compiler complains if we use the function random
. The compiler does not have the information that random>=0
evaluates to constant true, and hence the term is not optimized and not constant.
The exception names the problem, which leads us to a second rule:
This is obvious, as the expression has to exist and be known. But still, it is a restriction.
There are many use cases, where still this simple operator notation for nested expressions is all that is needed. For example, imagine a set of expressions is defined in an INI-file of a software. If the software loads and compiles these "predefined" expressions at start, a user can use them, for example with expressions given as command line parameters. This way, a user can store "shortcuts" in the INI-file and use those as nested expressions at the command line.
A final note to compile-time nested expressions: After an expression that refers to a named nested expression is compiled, the named nested expression may be removed using Compiler::RemoveNamed. The program of the outer expression stores the shared pointers to all compile-time nested expressions used. While after the removal from the compiler the nested expression is not addressable for future nesting, the nested expression is only deleted at the moment the last expression that refers to it is deleted!
In the previous section we saw the first samples of nested expressions. The unary operator '*'
was used to address nested expressions. These nested expressions suffer from two restrictions:
Let us recall what the reason for this restriction was: The compiler needs to know the result type of the nested exception to continue its type-safe compilation.
The way to overcome this restriction is to use function Expression() instead of unary operator '*'
.
This function has three overloaded versions. The first is using just one parameter of a string-type, and is 100% equivalent to the use of the unary expression operator - including its restrictions.
The second overload takes a "replacement expression" as its second value. This is how it may be used:
The output of this sample is:
Result: -1
As you see, although the nested expression is was not defined, this sample now compiles. The compiler uses the result type of the second parameter and assumes that the expression will return the same type. And even more, the expression even evaluates! On evaluation it is noticed that the expression does not exist, hence the result of the "replacement expression" is used. While in this case the replacement is simply value -1
, any expression might be stated here. Even one that contains a next nested expression.
We extend the sample by adding the nested expression:
Result: 9
As a "proof" that the nested expression is identified only at evaluation time, the following sample might work:
Result1: 9 Result2: 16
Above, we said that the compiler "assumes" that the named expression addressed, has the same return type. The following code shows that this was the right verb:
No exception is thrown on compilation. The compiler does not check the return type at compile-time. The simple reason is: At the time the expression becomes evaluated, the named expression might have been changed to return the right type. This is why the return type is only checked at evaluation time. Let's see what happens when we evaluate:
E1: <expressions::NestedExpressionResultTypeError> Nested expression "MyNestedExpression" returned wrong result type. Type expected: Integer Type returned: String [@ /home/dev/A-Worx/ALib/src/alib/expressions/detail/virtualmachine.cpp:296 from 'run()' by 'MAIN_THREAD(-1,0x00007814AA223740)'] I2: <expressions::ExpressionInfo> Expression: { Expression( "MyNestedExpression", -1 ) } ^-> [@ /home/dev/A-Worx/ALib/src/alib/expressions/detail/virtualmachine.cpp:300 from 'run()' by 'MAIN_THREAD(-1,0x00007814AA223740)']
This shows, that with using function Expression() we are a little leaving the secure terrain of expression evaluation: While the only exceptions that can happen at evaluation-time had been ones that occurred in callback functions (for example in expression "5 / 0"
), with nested expressions that are identified only at evaluation-time, we have a first exception that is thrown by the virtual machine that executes the expression program.
At the beginning of this section, a third overload of Expression() was mentioned. We postpone its documentation to the next manual section, and end this chapter with another quick note:
Likewise with unary operator '*'
for nested expressions, compilation flag AllowIdentifiersForNestedExpressions allows omitting the quotes and accept identifier syntax instead. Hence, this expression is compiling fine:
In addition, with this form of embedding nested expressions, also the restriction of expression names being constants, fell. This way, the sample of a random name is now allowed:
The previous two chapters explained the differences between nested expressions that are identified at compile-time and those identified at evaluation time.
Compile-time nested expressions are usually expressed with unary operator '*'
, but can also be expressed when using only one parameter with nested expression function Expression().
Evaluation-time nested expressions are expressed by giving a second parameter to function Expression(), which provides the "replacement expression" for the case that a nested expression of the name given with the first parameter does not exist.
In some situations however, a user might not want to provide a "replacement expression". It might be rather in her interest, that an expression just fails to evaluate in the case that the nested expression is not found. For example, if this indicates a configuration error of a software.
Such behavior can be achieved by adding a third parameter to Expression(). This third parameter is keyword "throw"
.
We take the sample of the previous section where the expression was not defined, which resulted in default value -1
. The only difference is the use of keyword throw
:
The output of this sample is:
So, why do we need the second parameter which previously gave the "replacement expression" when this is not evaluated? Well, the only purpose of the replacement expression is to determine the nested expression's return type. Otherwise it is ignored. In fact, it is not optimized out and its result will be calculated with each next evaluation of an expression against a scope. Different to other areas, where the library puts some effort in optimization, here this was omitted. An end-user simply should be noted to put a constant "sample value" for this parameter. A user that uses this third version of the nested expression function, is supposed to be a "pro" and understand the impacts.
As described in a previous section, a prerequisite for the nested expression feature is to have named expressions. Methods AddNamed, GetNamed and RemoveNamed of class Compiler had been already described briefly.
Nested expressions often can be seen as building blocks of other expressions. A software might want to provide a predefined and/or configurable set of expressions to be usable within end user's expressions.
To support such scenario, a mechanism is needed that allows retrieving (and compile) named expression strings, right at the moment an unknown identifier of a nested expression occurs during the compilation of the main expression.
Module ALib Expressions offers abstract virtual class ExpressionRepository which offers a customizable implementation of such mechanism. This interface is used as follows:
In other words, method GetNamed supports a "lazy" approach to compile nested expressions "on the fly" as needed. The expression strings are received using the abstract virtual method ExpressionRepository::Get.
A built-in implementation of this interface class is provided with class StandardRepository. This implementation retrieves expression strings from
For the details of using the built-in implementation, consult the reference documentation of class ExpressionRepository.
The creation of an own implementation that receives predefined expression strings in a custom way, should be a straight-forward task.
Nested expressions is a powerful feature of ALib Expressions, but also needs some thoughtful and knowledgeable user because of the different approaches of compile-time and evaluation-time defined nested expressions.
If a software offers an end-user to "express herself", a certain level of understanding is anyhow required. Often software hides expression syntax behind a graphical user interface with checkboxes and input fields per "attribute", e.g., to define an email filter and creates an expression string unseen by the user in the background. Then, only in a certain "expert mode" an end-user is allowed to freely expressions, which then may be more complex and probably even allow to "address" nested expressions that such end-user had defined in external configuration resources.
So, it is a bit of a task to define the user interface and user experience when it comes to allowing expressions. This library tries to cover the broad spectrum of use cases and this can probably be noticed in the area of nested expressions very well.
To end this chapter about nested expressions, some final hints and notes should be collected here:
In the previous chapters of this manual, most features of module ALib Expressions have been touched, either as tutorial sample code or in a more theoretic fashion. This chapter now provides a list of in-depth discussions on different dedicated topics.
It was a lot said about the intermediate or final result types of expressions in various sections of this manual. The use of underlying run-time type information library ALib Boxing with its very "seamless" nature, helped to implement ALib Expressions tremendously.
But being so seamless, it is not so easy to understand all aspects of its use and meaning in this library. Therefore, this quick chapter tries to review various aspects of the library from the angle of types. For simplification of writing and reading this chapter, this is done with a list of bullet points.
The concept of auto-casting of types is located somewhere in the middle! Auto-casts can be fully prevented by providing either a dedicated callback function for each permutation of types or by doing "manual" casts just within a callback function that accepts multiple permutations. Still, this library takes the effort of supporting auto-cast in the details of the implementation of the compilation process which assembles the evaluation program. How it is done can be seen a little like "a last call for help" before throwing a compilation exception.
The compiler does these calls in two occasions: When a binary operator could not be compiled and when terms T
and F
of ternary conditional operator Q ? T : F
are not of the same type. In this moment, the compiler just calls for help by asking each plug-in for an auto-cast of one or both of the types. It does this only once! After a first plug-in provided some conversion, it is retried to compile the actual operator. If this still fails, the exception is thrown, although it might have been possible that a next plug-in provided a different cast that would lead to success.
This of course is a design decision of the library. Complexity was traded against effectiveness. At the end of the day, the whole concept of auto-cast could be described as being not really necessary to do any sort of custom type processing. Therefore, auto-cast is being offered as an optional way of reducing the number of necessarily provided callback functions, that has two disadvantages: First, the auto-cast has to be implemented as a compiler plug-in functionality and second auto-casts increase the length of the evaluation program and hence constitute a penalty on an expression's evaluation performance.
The variety of built-in types has been reduced to a bare minimum needed. While module ALib Boxing (by default!) already drops the distinction of C++ integral type of different size (short
, long
, int
, etc.), module ALib Expressions in addition drops the distinction between signed and unsigned integral types. All integral types are signed. (Given that the "complete" JAVA programming language dropped unsigned integers, we thought it might not be too problematic).
The good news for users of this library is that it is no problem to implement support for unsigned types, because "dropping" here just means, that just none of the built-in operators and functions "produces" a result value of unsigned integral type. In other words, unsigned integral types are considered just another custom type.
If - unexpectedly - unsigned integer types and corresponding operations need to be supported, custom operators and function definitions have to be added.
dynamic_cast
before returning derived types. If they do not do that, the derived type becomes introduced to ALib Expressions!reinterpret_cast<>()
like this: Box myTypeSample = reinterpret_cast<MyAbstractType*>(0);
While it was in some places of this manual indicated that the built-in types listed with Types are all "inherently introduced" by the built-in compiler plug-ins just as any custom type could be, this is not the full truth. In fact, types Integer, Float and String are in so far "hard-coded" as values of these types are also created (and thus introduced) by expression "literals".
With the current version of the library it is not possible to change the internal "boxed" types which result from "parsing" a literal. The term "parsing" denotes the first phase of the compilation of an expression string. Changes on how literals are parsed and in what types such parsing results can only be made by touching the library code, which is not further documented here.
The parsing of numerical constants found in expression strings is done with the help of member Formatter::DefaultNumberFormat which in turn is found in member Compiler::CfgFormatter.
The use of this helper-type, allows influencing how numerical literals are parsed. For example, integral types can be parsed in decimal, hexadecimal, octal and binary formats. For this a set of prefix symbols, which default to "0x"
, "0o"
and "0b"
, can be customized. The support for one or some of the formats can also be dropped, if this is wanted for whatever reason.
Likewise, the format of floating point numbers and its scientific variants can be defined. In respect to the topic of localization (see also 11.6 Localization), this is especially of interest if a different floating-point separation character than '.'
is to be supported. It is supported and tested in the unit tests of this library to allow the use of character ','
as it is standard with many countries. In this case, an end-user has to be only aware of the fact that the two expressions:
MyFunc(1,2) MyFunc(1 , 2)
have a different meaning: The first is a call to an unary function providing floating point argument 1.2
, the second is a call to a binary function providing integral values 1
and 2
.
Even worse:
MyFunc(1,2,3)
is parsed as two arguments, the first being float value 1.2
and the second integral value 3
. This means, the end-user has to insert spaces to separate function parameters.
As this is a source of ambiguity, applications that address end-users with a high degree of professionalism, should rather not localize number formats but instead document with their software that english standards are to be used.
In general, all flags and options in respect to parsing and formatting (normalizing) number literals that are available through class NumberFormat are compatible with ALib Expressions. This includes even to set character ' '
(space) as a grouping character for any number format! This might be used to allow quite nicely readable numbers in expression strings.
Finally, normalization flags KeepScientificFormat, ForceHexadecimal, ForceOctal and ForceBinary may be used to further tweak how numbers are converted in normalized strings.
String literals are to be enclosed in quote characters '"'
. If a string literal should contain, the quote character itself, this needs to be "escaped" using backslash character '\'
. Some further escape characters are supported, by the internal use of ALib string feature documented with Format::Escape.
For the output of string literals in the normalized version of expression string, the reverse functions of Format::Escape are used.
Ultimately, box-function FToLiteral might be implemented for one of the types (Integer, Float or String) to do any imaginable custom conversions, other than possible with the standards provided by the mechanics of the ALib types used. But this should be seldom needed. The main purpose of this boxing-function is described with 11.5 Optimizations.
Identifier (parameterless functions) and function names are recognized (parsed) in expression strings at appropriate places only if the following rules apply:
'a'
to 'z'
or 'A'
to 'Z'
.'_'
and numbers '0'
to '9'
.In the current version of module ALib Expressions, this is hard-coded and not configurable.
In the previous section, information about localizing number formats in respect to parsing expression strings and their output as a normalized expression, was already given.
A second area, where localization may become an obvious requirement is the naming of built-in and custom expression functions. The built-in compiler plug-in use mechanics provided by ALib classes ResourcePool and Camp to externalize the names, letter case sensitivity and optional minimum abbreviation length of identifiers and functions. The matching of identifier and function names found in expression strings is performed using class Token, which allows not only simple abbreviations, but also "CamelCase" and "snake_case" specific abbreviations.
These mechanics allow to replace such resources using an arbitrary custom "string/data resource backend": The one that your application uses! With this, it is possible for example to translate certain identifiers (e.g., Minutes or True) to different locales and languages.
While there is no detailed documentation or step-by-step sample on how to perform such localization in detail is given, investigating the documentation and optionally the simple source code of the entities named above, should enable a user of this library to quite quickly succeed in integrating any custom localization mechanics used otherwise with her software. For creating a custom plug-in, the way to go is of course to copy the setup code from the built-in plug-ins of this library.
A third area where localization might become a need are callback functions processing expression data. Again, for formatting and parsing, an instance of ALib class Formatter, which has (as was explained above) an instance of NumberFormat attached.
A compile-time scope (used with optimizations) is created with virtual method createCompileTimeScope which in its default implementation attaches the same formatter to the compile-time scope that is used with parsing, namely the one found in CfgFormatter.
The scope object used for evaluation should be constructed passing again the very same formatter. This way, formatting and number formats remain the same throughout the whole chain of processing an exception and can collectively tweaked through this one instance CfgFormatter.
Finally a fourth area where localization might be applied is when it comes to exceptions during compilation or evaluation of expressions. All exceptions used in this library provide human readable information, which is built from resourced strings and hence can be localized. See chapter 11.6 Exceptions for details.
One very important design goal of this library was to favor evaluation-time performance of expressions over compile-time performance. This way, library is optimized for use cases where a single expression that is compiled only once, is evaluated against many different scopes. The higher the ratio of number of evaluations per expression term is, the more increases the overall process performance when this design principle is applied. This design goal caused a great deal of effort and its implications were far-reaching.
The concept of "expression optimization", that was touched in this manual various times already, is all about optimizing the evaluation-time performance. The library volunteers to have some efforts at compile-time to shorten the compiled expression program that is run on the built-in virtual machine as much as possible.
A simple sample for optimization might be an expression that calculates the circumference for a given radius. In case the radius is received from the scope with a custom expression identifier Radius, then the expression would be:
2 * PI * Radius
If no optimization was applied, each time this expression was evaluated, four callback functions had to be invoked: Two for receiving the values PI and Radius and two for the multiplications. Now, we know that PI is constant and so is the term "2 * PI". The goal of optimization consequently is to reduce the expression program to just do two callback invocations: one for retrieving the radius from the scope and a second for multiplying the radius with the constant 2 * PI.
To express this goal the other way round: An end-user should be allowed to provide operations that introduce some redundancy, but are easier readable and understandable for human beings, without impacting evaluation performance.
The foundation of compile-time optimization of this library is implemented with the assembly of the expression program: During the assembly, the compiler keeps track of previous results being constant or not. Each time a compiler plug-in is asked to provide compilation information, this information about whether the arguments are all or partly constant is provided. Then it is up to the plug-ins to decide whether the expression term is a constant in turn or implies a callback function call.
In the simple case of identifiers (parameterless functions), no arguments exist and hence all arguments are constant. Nevertheless, custom identifiers are usually not constant, because they return data received from the (custom) scope object. Therefore, the compiler does not "know" if identifier "PI" is a constant, only the plug-in that compiles the identifier knows that. While in the case of PI it is, in the custom case of FileDate it is not: It depends on the currently examined scope data.
This way, the compiler and its plug-ins have to work hand in hand: The compiler provides information about arguments being constant and the plug-ins can return either a callback function or leave the callback function nullptr
and return a constant value instead. The compiler then either assembles a callback function or the use of the constant value, which is an internal program command of the virtual machine's "assembly language".
With binary operators, a further option is available: In the case that one operator is constant, while the other is not, some operators might inform the compiler to either optimize out the complete term or at least to optimize out the constant argument. Again, this information has to be encoded in the result data provided by the compiler plug-ins. The compiler will then modify the existing program and remove the program code for one or both arguments. (Further samples of binary operator optimizations are given in documentation of struct CompilerPlugin::CIBinaryOp.)
As explained earlier, the built-in compiler plug-ins mostly rely on helper-struct plugins::Calculus instead of deriving directly from CompilerPlugin. Calculus provides very convenient ways to ensure that every operation that can be optimized at compile-time truly is optimized.
For example, callback functions can be denoted "compile-time invokable". If so, helper-struct Calculus automatically invokes them at compile-time if all arguments provided are constant (or no arguments are given) and returns the calculated result to the compiler instead of the callback function itself.
Furthermore, struct Calculus provides a special sort of optimization applied to binary operators that may be applied when only one of the two arguments is constant. For example, an arithmetic multiplication with constant 0
results in 0
and with constant 1
it results to the other argument's value. These kind of rules can be encoded using quite simple static data tables.
Overall, the use of struct Calculus makes the implementation of optimization features of custom plug-ins as easy as possible. Consult the struct's documentation for further details.
In the current version of ALib Expressions there is only one evaluation-time optimization performed. This considers built-in ternary operator Q ? T : F
(conditional operator).
Likewise a C/C++ compiler, depending on the evaluation-time value of Q
only the program code to calculate T
or F
is executed by the virtual machine.
T
and F
in parallel.The optimization of the conditional operator is as well performed at compile-time: In the case that Q
is a compile-time constant, the code for either T
or F
is not even included in the program.
The current version of this library has an important limit in respect to optimizations. While - as we saw - expression:
2 * PI * Radius
is optimized to perform only two callbacks instead of four, the mathematically equivalent expression:
2 * Radius * PI
is only optimized by one callback, hence still includes three.
The reason is that there is no built-in mechanics to tell the compiler that for the two multiplications, the associative and commutative laws apply, which would allow to transform the latter expression back to the first one.
Instead, the compiler "sees" two multiplications that both are not performed on constant operands and hence cannot be optimized. Only the callback of constant identifier PI is removed.
Consequently, if performance is key, it might be documented to an end-user that she is good to write:
HoursPassed * 60 * 60 * 1000
because this expression is optimized to:
HoursPassed * 3600000
but that she should "sort" constants to constants, because expression
60 * HoursPassed * 60 * 1000
is less effectively optimized to
60 * HoursPassed * 60000
In a more abstract way, it could be stated that a C++ compiler does such optimization, because it "knows" about the rules of the multiply operator of integral values. The compiler of this library does not know about that and hence cannot perform these kind of operations. If in the case of C++, the operands were custom types with overloaded operator '*'
, the C++ code would also not be optimized, because in this case, the compiler does not know the mathematical "traits" of the operator. The C++ language has no syntax to express operator traits.
From the "point of view" of the expression compiler provided with this library, the built-in operators are just "built-in custom operators". This leads to the inability of optimizing such rather simple mathematics.
Finally, evaluation-time optimization of operator && (as known from the C++ language) is not implemented with this library. For example, with expression:
IsDirectory && ( name == "..")
the right-hand side operand of && is evaluated even if IsDirectory returned false
already.
In the case that professional, experienced end-users are the addressees of a software, it might be wanted to tell such end-users about the result of optimizations. To stay with the sample of the previous sections, this means to be able to show an end-user that the expression:
2 * PI * Radius
was optimized to
6.283185307179586 * Radius.
To be able to do this, a normalized expression string of the optimized expression has to be generated. This way, the interface of class expressions allows access to three strings with methods
The generation of the normalized string during compilation cannot be disabled and hence is available in constant (zero) time after the compilation of an expression. However, the first invocation of method GetOptimizedString is all but a constant task! With this, the library "decompiles" the optimized expression program with the result being an abstract syntax tree (AST). This AST is then compiled into an new program and with this compilation a "normalized" expression string is generated.
Consequently, this normalized string is the optimized version of the original expression string! Once done, the AST and the compiled (second) program are disposed, while the optimized string is stored.
It is questionable if the result is worth the effort! The decision if a software using library ALib Expressions presents "optimized normalized expression strings" to the end-user is highly use-case dependent. In case of doubt our recommendation is to not do it. The feature may be helpful during development of custom compiler plug-ins.
In any case, to receive correct, compilable, optimized expression strings, a last hurdle might has to be taken. In the sample above, the optimized term 2 * PI
results in floating point value 6.283185307179586
. This value can easily be written out and - if wanted - later be even parsed back to a correct expression. But this is only the case, because the type Float is expressible as a literal. Imagine the following sample:
Seconds(1) * 60
Built-in identifier Seconds returns an object of type Duration. The multiplication operator is overloaded an in turn results in a value of type Duration. And yes, it is a constant value. The challenge now is to produce an expression string that creates a constant time span value representing 60
seconds. The result needs to be
Seconds(60)
or even better:
Minutes(1)
To achieve this, this library introduces ALib a box-function which is declared with FToLiteral. This function has to be implemented for all custom boxed types that might occur as results of constant expression terms. Only if this is ensured, the "optimized normalized expression string" is correct, meaningful and re-compilable.
For details and a source code sample consult the documentation of the box-function descriptor class FToLiteral.
Besides this box-function to create constant expression terms for custom types, a next prerequisite might have to be met to receive compilable expression strings. This is in the area of auto-cast functionality. If custom auto-casts are in place, such auto-casts, if decompiled, have to be replaced by a function call which takes the original value and returns the casted value. The names of the function has to be provided with members CIAutoCast::ReverseCastFunctionNameRhs and CIAutoCast::ReverseCastFunctionNameRhs of the auto-cast information struct at the moment an auto-cast is compiled. If optimized, normalized expression strings are not used, these fields are not necessary to be set and much more, the corresponding expression functions that create the constant values may not be needed (they might still be needed for the expression syntax a programmer wants to offer).
While there is no reason to switch off optimization, the library offers compilation flag Compilation::NoOptimization for completeness.
This library does not make use of semaphores (aka "thread locks") to protect resources against violating concurrent access. Consequently, it is up to the user of the library to ensure some rules and potentially implement semaphores if needed. This goes well along with the design principles of most foundational, use-case agnostic libraries.
Therefore, this chapter lists the rules of allowed and denied parallel actions:
At the moment end-users are allowed to provide expression strings, some error handling to deal with malformed expression strings is unavoidable. Module ALib Expressions chooses to implement error handling using C++ exceptions.
One of the design goals of this library is to allow to recognize erroneous expressions at compile-time if possible. The advantage of this is that compilation is often performed at a point in time where the consequences of exceptions are usually less "harmful". Of course, a software cannot continue its tasks if exceptions occur, but the implicated effort of performing a "rollback" should be much lower.
For this, the following general approach should be taken:
As evaluation-time exceptions anyhow can occur, in simple cases step 2 might be left and step 1-4 be wrapped in one try
statement.
The exception object thrown by any ALib Module is of type alib::lang::Exception.
A compiler plug-in may throw an exception during compilation. Helper-struct Calculus already throws exception MissingFunctionParentheses and IdentifierWithFunctionParentheses. Furthermore, a callback function may throw an exception during the compile-time evaluation of a constant expression term.
Exceptions of type std::exception
as well as those of type alib::Exception that are not exposed by this ALib Module itself (hence using values of enum types different than expressions::Exceptions), by default are "wrapped" by the compiler into an exception of enum type Exceptions::ExceptionInPlugin. Such wrapping can be disabled by setting flag Compilation::PluginExceptionFallThrough.
In addition, plug-in exceptions of type alib::Exception are extended by an informational entry of type ExpressionInfo.
Exception objects of other types are never caught and wrapped and therefore have to be caught in a custom way.
In the case that a callback function throws an exception during the evaluation of an expression, such exceptions by default are "wrapped" into ExceptionInCallback. Wrapping is performed with exceptions of type std::exception
and ALib Exception. Other exception types are never caught and wrapped and therefore have to be caught in a custom way.
The wrapping of evaluation-time exceptions can be disabled by setting flag Compilation::CallbackExceptionFallThrough. Note, that even while this flag is tested at evaluation-time, it is still stored in member Compiler::CfgCompilation.
The built-in expression functionality is provided via the built-in compiler plug-ins which by default are enabled and used.
Reference tables about identifiers, functions and operators is provided with the each plug-in's class documentation. Those are:
The goal of using a library like this is to allow end-users to write expressions. One common field of applications are filter expression, as sampled in this manual.
Another common requirement is to allow users to define output formats. To - once more - stay with the file-system sample of this manual, a software may want to allow a user to specify how a line of output for a directory entry should look like.
With built-in plug-in Strings, expressions that return a string can be created quite easily. For example:
String( Name ) + " " + Size/1024 + "kB"
could be such an output expression.
However, there is a more comfortable and powerful way to do this! The key to that is the use of format strings as processed by ALib Formatters in a combination with expression strings that comprise the placeholder values found in the format strings.
Utility class ExpressionFormatter implements such combination.
For details and usage information, please consult the class's documentation.
Talking about a virtual machine, most people today would consider the JAVA Virtual Machine as good sample. While this is true and comparable, the machine that is included in this library is a million times simpler. In fact, the current implementation that executes an expression program consists of less than 300
lines of code: A very simple "stack machine" that has only just five commands!
For people who are interested in how the machine works, besides investigating into its source code a look at some sample programs for it, leads to a quick understanding.
With debug-builds of this library, static method VirtualMachine::DbgList may be invoked to generate a listing of an expression's program. Because the originating expression string itself is given with these listings, in this chapter, we just sample the listing output, without sampling the expressions explicitly.
Let's have an easy start with a simple expression of a constant value:
-------------------------------------------------------------------------------------- ALib Expression Compiler (c) 2024 AWorx GmbH. Published under MIT License (Open Source). More Info: https://alib.dev -------------------------------------------------------------------------------------- Expression name: ANONYMOUS Normalized: {42} PC | ResultType | Command | Param | Stack | Description | ArgNo{Start..End} | 42 -------------------------------------------------------------------------------------- 00 | Integer | Constant | '42' | 1 | Literal constant | |_^_
This shows the first command "Constant"
, which pushes a constant value that is a parameter of the command to the stack.
Lets do some multiplication:
----------------------------------------------------------------------------------------------- ALib Expression Compiler (c) 2024 AWorx GmbH. Published under MIT License (Open Source). More Info: https://alib.dev ----------------------------------------------------------------------------------------------- Expression name: ANONYMOUS Normalized: {42 * 2} PC | ResultType | Command | Param | Stack | Description | ArgNo{Start..End} | 42 * 2 ----------------------------------------------------------------------------------------------- 00 | Integer | Constant | '84' | 1 | Optimization constant | | _^_
Ooops, it is still one command, which includes the result. The reason for this is the optimizing compiler that detected two constants, passed this information to the compiler plug-in and this in turn did the calculation at compile-time. Consequently, we have still a constant expression program.
We now have two options: Use non-constant functions like built-in math function Random, or just switch off optimization. The latter is what we do:
-------------------------------------------------------------------------------------------------------------------------- ALib Expression Compiler (c) 2024 AWorx GmbH. Published under MIT License (Open Source). More Info: https://alib.dev -------------------------------------------------------------------------------------------------------------------------- Expression name: ANONYMOUS Normalized: {42 * 2} PC | ResultType | Command | Param | Stack | Description | ArgNo{Start..End} | 42 * 2 -------------------------------------------------------------------------------------------------------------------------- 00 | Integer | Constant | '42' | 1 | Literal constant | |_^_ 01 | Integer | Constant | '2' | 2 | Literal constant | | _^_ 02 | Integer | Function | mul_II(#2) | 1 | Binary operator '*', CP="ALib Arithmetics" | 0{0..0}, 1{1..1} | _^_
We now see two pushes of constant values and then virtual machine command "Function"
, which invokes a C++ callback function as provided by the compiler plug-ins. In this case it is a callback named "mul_II", which implements operator '*'
for two integer arguments. Those arguments will be taken from the current execution stack. The result of the callback will be pushed to the stack.
Command "Function"
is used for expression terms of type unary operator, binary operator, identifier and function.
In column "Description" the listing tells us that the callback "mul_II" in the third and final program command was compiled by plug-in "ALib Arithmetics" with operator '*'
. Such information is debug-information and not available in release compilations of the library.
We now know two out of five virtual machine commands and already quite complex expressions can be compiled:
---------------------------------------------------------------------------------------------------------------------------------------------- ALib Expression Compiler (c) 2024 AWorx GmbH. Published under MIT License (Open Source). More Info: https://alib.dev ---------------------------------------------------------------------------------------------------------------------------------------------- Expression name: ANONYMOUS Normalized: {(42 * 2 / 5) * (2 + 3) * 7} PC | ResultType | Command | Param | Stack | Description | ArgNo{Start..End} | (42 * 2 / 5) * (2 + 3) * 7 ---------------------------------------------------------------------------------------------------------------------------------------------- 00 | Integer | Constant | '42' | 1 | Literal constant | | _^_ 01 | Integer | Constant | '2' | 2 | Literal constant | | _^_ 02 | Integer | Function | mul_II(#2) | 1 | Binary operator '*', CP="ALib Arithmetics" | 0{0..0}, 1{1..1} | _^_ 03 | Integer | Constant | '5' | 2 | Literal constant | | _^_ 04 | Integer | Function | div_II(#2) | 1 | Binary operator '/', CP="ALib Arithmetics" | 0{0..2}, 1{3..3} | _^_ 05 | Integer | Constant | '2' | 2 | Literal constant | | _^_ 06 | Integer | Constant | '3' | 3 | Literal constant | | _^_ 07 | Integer | Function | add_II(#2) | 2 | Binary operator '+', CP="ALib Arithmetics" | 0{5..5}, 1{6..6} | _^_ 08 | Integer | Function | mul_II(#2) | 1 | Binary operator '*', CP="ALib Arithmetics" | 0{0..4}, 1{5..7} | _^_ 09 | Integer | Constant | '7' | 2 | Literal constant | | _^_ 10 | Integer | Function | mul_II(#2) | 1 | Binary operator '*', CP="ALib Arithmetics" | 0{0..8}, 1{9..9} | _^_
Note, that listing column "ArgNo" denotes for each argument the program code lines which are responsible for calculating it on the stack. In other words: each segment of code {x..y} noted in this column produces exactly one result value on the stack, just as the whole expression produces one.
The following sample uses a function that consumes three arguments:
------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ALib Expression Compiler (c) 2024 AWorx GmbH. Published under MIT License (Open Source). More Info: https://alib.dev ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Expression name: ANONYMOUS Normalized: {Format( "Result of: {}", "2 * 3", 2 * 3 )} PC | ResultType | Command | Param | Stack | Description | ArgNo{Start..End} | Format( "Result of: {}", "2 * 3", 2 * 3 ) ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 00 | String | Constant | "Result of: {}" | 1 | Literal constant | | _^_ 01 | String | Constant | "2 * 3" | 2 | Literal constant | | _^_ 02 | Integer | Constant | '2' | 3 | Literal constant | | _^_ 03 | Integer | Constant | '3' | 4 | Literal constant | | _^_ 04 | Integer | Function | mul_II(#2) | 3 | Binary operator '*', CP="ALib Arithmetics" | 0{2..2}, 1{3..3} | _^_ 05 | String | Function | CBFormat(#3) | 1 | Function "Format(#3)", CP="ALib Strings" | 0{0..0}, 1{1..1}, 2{2..4} |_^_
Now, as two VM-commands are understood, lets continue with two further ones. For implementing the ternary conditional operator Q ? T : B
, two types of jump commands are needed, a conditional jump and an unconditioned one:
--------------------------------------------------------------------------------------------------------------------------------------- ALib Expression Compiler (c) 2024 AWorx GmbH. Published under MIT License (Open Source). More Info: https://alib.dev --------------------------------------------------------------------------------------------------------------------------------------- Expression name: ANONYMOUS Normalized: {true ? 1 : 2} PC | ResultType | Command | Param | Stack | Description | ArgNo{Start..End} | true ? 1 : 2 --------------------------------------------------------------------------------------------------------------------------------------- 00 | Boolean | Constant | 'true' | 1 | Optimization constant, CP="ALib Arithmetics" | |_^_ 01 | NONE | JumpIfFalse | 4 (absolute) | 1 | '?' | 0{0..0} | _^_ 02 | Integer | Constant | '1' | 2 | Literal constant | | _^_ 03 | NONE | Jump | 5 (absolute) | 2 | ':' | 0{2..2} | _^_ 04 | Integer | Constant | '2' | 3 | Literal constant | | _^_
Note that while the program listing for convenience presents the destination address using the absolute program counter (first column "PC") number, internally relative addressing is used. The insertion of the two jump commands explains what is said in 11.5.4 Compile- And Evaluation-Time Optimization Of The Conditional Operator.
Just for fun, we enable compile-time optimization and check the output:
------------------------------------------------------------------------------------------------ ALib Expression Compiler (c) 2024 AWorx GmbH. Published under MIT License (Open Source). More Info: https://alib.dev ------------------------------------------------------------------------------------------------ Expression name: ANONYMOUS Normalized: {true ? 1 : 2} PC | ResultType | Command | Param | Stack | Description | ArgNo{Start..End} | true ? 1 : 2 ------------------------------------------------------------------------------------------------ 00 | Integer | Constant | '1' | 1 | Literal constant | | _^_
The fifth and final command "Subroutine"
is needed to allow Nested Expressions. We add an expression named "nested" and refer to it:
---------------------------------------------------------------------------------------------------------------------- ALib Expression Compiler (c) 2024 AWorx GmbH. Published under MIT License (Open Source). More Info: https://alib.dev ---------------------------------------------------------------------------------------------------------------------- Expression name: ANONYMOUS Normalized: {*nested} PC | ResultType | Command | Param | Stack | Description | ArgNo{Start..End} | *nested ---------------------------------------------------------------------------------------------------------------------- 00 | Integer | Subroutine | *"nested" | 1 | Nested expr. searched at compile-time | |_^_
Using the alternative version that locates nested expressions at evaluation-time only, the program looks like this:
--------------------------------------------------------------------------------------------------------------------------------------------------------------- ALib Expression Compiler (c) 2024 AWorx GmbH. Published under MIT License (Open Source). More Info: https://alib.dev --------------------------------------------------------------------------------------------------------------------------------------------------------------- Expression name: ANONYMOUS Normalized: {Expression( nested, -1, throw )} PC | ResultType | Command | Param | Stack | Description | ArgNo{Start..End} | Expression( nested, -1, throw ) --------------------------------------------------------------------------------------------------------------------------------------------------------------- 00 | String | Constant | "nested" | 1 | Literal constant | | _^_ 01 | Integer | Constant | '-1' | 2 | Literal constant | | _^_ 02 | Integer | Subroutine | Expr(name, type, throw) | 1 | Nested expr. searched at evaluation-time | |_^_
With these few simple samples, all five commands of class VirtualMachine are covered.
This quick chapter does not need to be read. We just felt to write it for those people who want to take the source code and understand how module ALib Expressions was implemented, and maybe want to extend it or add internal features.
Often, there are two different perspectives needed when you think about the architecture of a software library. The first is from the viewpoint of the user of the library. This may be called the "API perspective". It basically asks: What types do I need to create and which methods do I need to invoke? The second is from the implementer's perspective. Here, it is more about what types implement which functionality and how do they interact internally.
With the development of this small library, these two perspectives had been in a constant internal fight. The decision was taken to follow the needs of the API perspective.
A user of the library just needs to "see":
From an implementation perspective there is some more things:
To keep the types that are needed from the API-perspective clean and lean, responsibilities had been moved into maybe "unnatural" places. Some more quick bullets and we have said what this chapter aimed to say:
detail
. The differentiation between the abstract base and the implementation is a pure design decision. It even costs some nanoseconds of run-time overhead, by invoking virtual functions, where no such abstract concept is technically needed. (While it reduces compile time for a user's software)std::vector
of virtual machine commands residing in the expression. Therefore, it just was a nice empty thing that we put the assembly stuff in to keep class Compiler free of that.This design and structure might be questionable. Probably, a virtual machine should not perform de-compilation and should not "know" about ASTs, which otherwise constitute the intermediate data layer between a parser and a compiler. Please do not blame us. We do not foresee bigger feature updates of this library. If such were needed, this current code design may fail and need some refactoring. But as we did it, its a compromise strongly towards simplicity of the API as well of internal code.