This manual covers two ALib Modules, namely:
The rationale behind joining both modules in one manual is because both topics are closely tight together:
The reason why two separate ALib Modules have been established is because still both topics are technically separate:
This technical separation is also expressed by the fact that both modules (as always) live in different namespaces:
Finally, outside both modules, in namespace alib::lang, which is available independent from the list of modules included in an ALib Distribution, a few types implement an abstraction layer to memory allocation. Among those types are:
Their use is also covered in this manual.
Two "built-in" sorts of memory are available with the C++ language:
new
and delete
.new
and de-allocated with delete
.std::malloc
and std::free
are used, which have complex (and smart) implementations in place. In short, usually the functions are interfacing with the underlying operating system to retrieve bigger memory buffers, but provide an own "management layer" to efficiently organize allocated and de-allocated fragments.While the use of local allocation is very restricted (namely to local variables), dynamic allocation is the opposite: it provides the maximum freedom possible. But this degree of freedom comes at a high cost due to the necessary memory-management overhead and other factors discussed in the next quick chapters.
To be motivated to use monotonic allocation, the following is important to understand (and to believe):
This is why ALib (and various other 3rd-party libraries) offers further memory allocation models.
The module name "monomem" stands for "monotonic memory". The term monotonic here relates to the way that memory allocation may be performed by a software: At a certain point in execution time, a set of allocations is performed which are all deleted at a distinct future point in time. The used memory for allocation is "monotonically growing" during a defined time span.
This is what we call the "strict" definition of monotonic allocations. And this use case is what common C++ libraries in this area offer to support. In this introduction chapter, we will see how "weaker" definitions lead to a much wider range of possible use-cases and thus leveraging a great potential for performance improvements.
The possibility of using monotonic allocation has to be identified and decided by a programmer. Such analysis and decision is already the daily business in respect to whether object can be a fast automatic local variable, or whether it is needed to allocate an object on the heap using new
.
Due to the name "local variable", the decision about automatic vs. dynamic allocation seems to be bound to the fact whether an object is used only locally in a method or function, or if it is shared between different code entities. But this is only partly true: A local variable may well be passed to arbitrary functions invoked by the code block that declares the variable.
Instead, the decision is much more considering an object's life cycle: If an object can be deleted with the end of the code block it was declared in, then an automatic variable is to be used. If otherwise the life cycle has to survive the execution of such code block, dynamic allocation has to be used. (Note: An exception is of course the needed object size: Large objects may be dynamically allocated, due to the limited size of stack-memory.)
Now, the extended decision about the potential use of monotonic allocation is also bound to observing life cycles. But this time, it is not about the life cycle of one single object, but instead about the life cycle of a set of objects.
The decision rule can be phrased as follows:
With this definition, the implementation of monotonic allocation can be easily specified:
delete
. Instead, the object's destructor methods are "manually" invoked.The performance advantages can be tremendous, not only because what was already mentioned:
But further reasons of performance gain are:
On the hardware level: It is likely that the different access operations to the memory of monotonic allocated objects is done in code sequence that are executed in timely relations, respectively in code sequences that are executed in series. This way, the hardware memory cache will less often be reloaded, as all objects tend to lie side-by-side in the same memory buffer.
A well understandable sample use case for monotonic allocation, is found with module ALib Expressions. There, this technique is used twice:
The second use of monotonic allocation is even more important due to the fact that an expression is compiled only once, but may be evaluated plenty of times against different scopes. Now, with the first evaluation, buffers for monotonic memory are allocated, but after the evaluation is done, they are not freed! While the intermediate objects are duly destructed, the buffers already allocated are kept for the next run.
This way, starting with the second evaluation of an expression, not a single dynamic memory allocation is performed!
Another prominent sample is found in server software that offer a certain service. Let's consider a simple http-server. Usual implementations maintain a pool of threads and with each incoming request a next free thread is chosen to process the request. Now, all memory needed during processing can be considered monotonically allocatable. And you can bet that all prominent implementations of http-serves are using this concept, as introduced here.
The definition given above stated that those objects that share the same life cycle may be considered a set and be monotonically allocated. This definition is quite strict and therefore strongly narrows the use cases.
There are several ways to weaken the requirements. The following "thought experiment" may easily demonstrate this: Consider a simple command-line calculator software, as demonstrated with module Expressions or any other "simple" software that accepts an input, creates an output, and exits. What would happen if every dynamic allocation would be turned into a monotonic allocation by extending the life cycle of all allocations to the overall life cycle of the software process?
The answer is: no problem! The result executable would potentially show slightly more performance while it would most probably allocate a higher total amount of dynamic memory during it's run. But because it is known that the total amount of allocations is very limited - considered the simplicity of the task - it is ensured that no situation of memory exhaustion happens - which is exactly the danger if the strictness of the definition of monotonic allocation is weakened.
With this thought experiment, we can "weaken" the definition and can extend the use cases in the following ways
The technical implementation of weak monotonic allocation is the same as that of strict monotonic allocation and for this reason, this ALib Module does not provide any specific interface to support such approach. It is moreover not transparent to the module's code entity if monotonic allocation is strict or weak.
Nevertheless, if the increase of the peak memory consumption is guaranteed to be limited, then the performance gain of monotonic allocation usually goes far beyond the performance loss that might be caused (in the whole system!) by a higher peak memory consumption.
The takeaway from the previous section was that weakening the requirements to perform monotonic allocation might immensely broaden its use cases and with that the potential to optimize the performance of a software.
Now, let us construct a sample that - at the first sight - is not a candidate for optimization and let's think ways to turn it into a candidate. Consider the following scenario:
A typical implementation in C++ would use an std::unordered_map
(aka hash table) to store the session objects. With each insertion, the std::unordered_map
allocates an internal "node" object and with removals such node object is deleted.
Together with dynamic memory allocations performed by the server in other places, over time a reasonable fragmentation of the heap memory will occur. The internal node objects allocated by the hash table at a certain point in time, will be spread (fragmented) across the whole area of dynamic memory allocated from the operating system.
There is a very simple way to mitigate this problem: Instead of de-allocating a node object when a session is removed from the storage, such node object has to be destructed only, while it's allocated memory is to be pushed to the front of a simple linked list of "recyclable nodes". When a new user session is inserted, it is checked if a recyclable node is available and only if not, a new node is allocated.
The consequence of this approach are:
But maybe the best of all is:
A reader could now comment that such recycling technique is not necessarily related to monotonic memory and can easily be used without it. This is very true and reflected by the fact that we are talking about two separate ALib Modules in this manual: ALib Containers and ALib Monomem. Both are not dependent on each other. The topic of "recycling container objects" is completely covered with module ALib Containers and furthermore, type lang::StdContainerAllocatorRecycling even provides mechanics to enable recycling for containers of the C++ Standard Library. Finally, in a next independent effort, module ALib Monomem, besides monotonic allocation offers in addition mechanics for recycling non-container objects.
Note that with both in place, recycling, and monotonic allocation, the joint benefit surpasses the benefit of the two single efforts: The number of potential cache-misses on the hardware level is greatly reduced, if already the first memory buffer allocated is large enough to store all (recyclable) nodes needed to handle the peak of parallel allocated objects.
Besides the fact that the use of container types can often be turned into a use case of monotonic allocation, the more general takeaway of this final introductory chapter is:
This was quite a long introduction chapter, and unfortunately before we start with monotonic allocation, a few prerequisites that ALib provides outside modules ALib Containers and ALib Monomem, have to be introduced.
These prerequisites consider the generalization of the idioms introduced by module ALib Monomem, with the result that all other ALib Modules can make use of monotonic allocation, without having any header file dependencies into module ALib Monomem.
Consequently, the types introduced here neither belong to ALib Monomem nor to ALib Containers. Instead, they do belong to the core of ALib and thus reside in namespace alib::lang. Remember that types of that namespace are always included in an ALib Distribution, independent of its module selection.
Class Allocator is a prototype class. In fact, this class is not exposed to the C++ compiler and it is just living in the documentation! The prototype defines what an "allocator" needs to offer to be usable as a template parameter that expects an "ALib Allocator". Such template parameters are always named TAllocator across all of ALib.
Allocators have to provide four basic allocation/de-allocation functions, which are quite user-unfriendly. Besides those, a few informational values and methods for debug-purposes are to be given.
Finally, types that implement this prototype, need to provide operator()(), which is explained in the next section.
As just seen, the prototype for ALib Allocator implementations demands to specify the operator()() to return a value of type AllocatorInterface. This type provides a comfortable (and much safer!), high-level interface into allocators.
Of course, these high-level convenient methods make use of the four "basic" allocation/de-allocation methods provided with any Allocator.
With this "design trick" in place, the using code can always "call" any given templated operator and receive the very same high-level interface. Standard C++ compilers will optimize out these intermediate steps: the invocation of the call operator as well as the calls of the methods found with AllocatorInterface.
The third type found in namespace alib::lang related to allocation, is AllocatorMember. This type is used to implement templated types that are allocation agnostic and enables to eliminate an overhead in the size of a type, at the moment that heap allocation is chosen.
Finally, namespace alib::lang introduces types StdContainerAllocator as well as StdContainerAllocatorRecycling. For now, let's skip reading their reference documentation. All details of these types will be discussed in a later chapter 5. C++ Standard Library Container Types.
std
and std::pmr
.) The central type of module ALib Monomem is class TMonoAllocator, which is usually referred to its default type-definition MonoAllocator. Its interface follows the standards described in the previous chapter 2.2 Class AllocatorInterface.
Internally, bigger buffers are requested from the heap and used for the provision of allocations. No meta-information about the allocations is stored. Instead, allocated objects get laid side-by-side, with no gaps, apart from those that may arise by alignment requirements.
When the next allocation does not fit into the current buffer, a new buffer is allocated, probably with a higher size than the previous buffer had. The latter is configurable.
The memory buffers that an instance of class MonoAllocator allocates on the heap are stored in a single-linked list and are necessarily only deleted with the deletion of the allocator instance.
The allocator offers a method named Reset. Before the invocation of the method, all monotonically allocated objects have to be destructed (by using software) - but not deleted. This ensures that any resources these have acquired, will be released.
After invocation of Reset, the allocator can be re-used, which means that the previously allocated buffers are re-used as well! Only in the case that more memory is required than in a prior "run", a new buffer is requested from the heap.
The simple concept of resetting an allocator to be able to reuse it is internally realized by a more sophisticated concept. Class MonoAllocator does not only allow being just reset to its initial empty state but also allows creating a Snapshot of its current use and later reset back to exactly this state.
A snapshot is taken with the method TakeSnapshot. The returned object is a lightweight, 16-byte value (on 64-Bit systems, 8 byte on 32-bit systems). A previously taken snapshot may be passed to method Reset which replaces its default argument that completely resets an allocator.
If several snapshots are taken, those are "nested": A later snapshot "points to" memory covered by a prior one. Consequently, if an allocator is reset to an "older" snapshot, all subsequent snapshots become invalid. In debug-compilations, an ALib assertion is raised if an invalid snapshot is passed to method Reset.
This concept allows furthermore reducing heap allocations by splitting a set of monotonic allocations into several, nested ones. Nested snapshots and resets slightly extend the definition of monotonic allocation as given in the introduction chapters. It is especially useful in situations where weak monotonic allocation is nested in a more strict use case because it has the potential to mitigate peaks of overall memory usage.
While the exact conditions of use are not easy to precisely define, during the course of implementing monotonic allocation, a programmer will probably identify use cases for nested allocations quite naturally.
Class TLocalAllocator is a derivate of MonoAllocator, which has a large internal member which is used as the first memory buffer. The size of the member is defined by a template parameter. In namespace alib, several predefined type definitions for different sizes are given with
This class is designed to be used as a local function variable. With that, the first buffer of memory (and typically the single allocated buffer) resides on the execution stack of the current thread. This is the most efficient allocation method available: Not only does it avoid just any heap allocation, but also the memory used is in the same hardware range as other local variables of the execution thread. With this, it is almost guaranteed that the memory is available in the first level cache of the microprocessor.
Of course, a user has to be aware, that:
Module ALib Containers provides implementations of container types that are very similar to types found in the C++ standard library. Some of the provided types are considered alternatives for the standard types to be used with allocators, others are types not existing in the standard.
Only types that internally allocate node objects are replaced. Standard containers that do not allocate node objects are:
std::vector
andstd::deque
.No alternatives are given for these two types. This is true for the current release of the library and it is foreseeable that also in future releases no alternative will be given.
std
-types imposes towards implementing efficient allocation strategies, were not so high. We had once hoped that with C++ 17 namespace std::pmr
things become better, but this was only partly the case. The following container types are provided with module ALib Containers :
std::vector
. With that, type definition FixedSizePriorityQueue is also given.std::list
.LRUCacheTable, implementing a simple caching container that follows the "least recently used" principle. Along with it come type definitions
This type has no alternative in the C++ standard library.
std
. In fact this type provides even more flexibility then the four together and is still more convenient and equally fast. Along with it come type definitions
std::shared_ptr
but which holds a value, instead of a pointer.std::shared_ptr
with pros and cons.All container types provided, accept an Allocator with construction which they use for allocations. In the "simple" case of using class lang::HeapAllocator, (which is always included in any ALib Distribution, independent of the inclusion of module ALib Monomem,) alternative constructors exist that do not expect an external allocator.
Of course, a user of the library could also provide an own allocator type and run these containers with that.
For a detailed description of the container types, please consult their reference documentation, as linked above with the type names.
Besides generic allocation, the some of the container types support recycling of internal node objects. This is discussed in the next chapter.
The concept of "node recycling", as it was introduced in chapter 1.5 Recycling, is supported by all container types that internally allocate nodes, which are List, HashTable and StringTree. For that, each type specifies a template parameter named TRecycling which has to be set as any of the following three enum elements:
None switches recycling off and supports use cases of strict monotonic allocation. Its advantage is that the (negligible) overhead of collecting destructed nodes and re-using with insertions is omitted.
The default value Private establishes an internal stack implementation (a forward list) and all recycling takes place internally.
The next chapter introduces details of shared recycling.
The concept of shared recycling aims to reduce the overall peak size of allocated memory from being the sum of each containers peak size (in case of private recycling), to reflecting the true peak of all concurrently allocated nodes.
But this slight optimization is not the main reason to offer shared recycling. More important, use cases exist, which cannot be made suitable for monotonic allocation if only private recycling was available.
As a sample let us have a look at a situation where a top level list contains container objects which in turn contain the element objects. In other words, a programmer wants to have tree of entries, restricted to depth two. Such situation might for example be found with INI-files. The outer list is the list of INI-file sections and each section holds a list of INI-file variables.
Now, a software should be free to create and delete arbitrary sections and arbitrary variables within such sections during the life-cycle of the object. The problem arises if a complete category is deleted: While the object for the category itself is recycled with the creation of a next category, the list of entries of such recycled category was destructed on removal. Now, if private recycling is used with the list of entries, the recycled entries are "lost" with that destruction. Such loss of objects makes this use case not suitable for monotonic allocation, because an "infinite" amount of insertions and deletions of categories might occur.
The way out of this is to share the stack of recyclable elements among different containers. With that, each variable list uses the same "external" recycler instance and when a list is destructed, the recycled elements are not lost.
As already noted above, this approach does not only resolve the original problem, it further optimizes the overall memory use: As soon as a single variable is erased, the next insertion will recycle this entry, no matter which section is concerned.
Shared recycling is activated by passing Recycling::Shared, for template parameter TRecycling with the corresponding container instance. When doing that, the container instance requires a constructor parameter which provides the external, shared recycler object. This parameter replaces the otherwise necessary parameter allocator, because the allocator is embedded in the shared recycler that is externally created. The exact (templated) type of the recycler that has to be created and passed, is determined by a type definition named SharedRecyclerType that each container type offers. (For example List::SharedRecyclerType or HashTable::SharedRecyclerType.)
The object passed, has to survive the life-span of each container type that uses it.
Here is a quick sample taken from the unit tests that demonstrates the use of shared recycling:
The implementation of class HashTable allocates so called "bucket arrays" besides node objects. If, with the insertion of a new element, the maximum load factor of the hash-table is reached, a new, bigger bucket array is allocated and the existing nodes are "re-hashed". Then the former bucket array becomes useless, hence would be de-allocated if dynamic memory allocation was used.
In general, this is not a problematic situation because the growth continues only up to a certain limit, which does not violate the concept of weak monotonic allocation. Nevertheless, if recycling (private or shared) is enabled, class HashTable passes this non-node piece of memory to its recycler object: There, the array is sliced into as many (duly aligned) pieces of the internal node size as possible, and then passed over to recycling stack.
What was said above is, however, only true if the underlying allocator is of type TMonoAllocator. More precisely, if method Allocator::allowsMemSplit returns true
. The rationale behind this is that containers that do to have their allocated objects freed as they were allocated (which is also true for heap allocations), then such buckets must not be split into nodes.
As seen in the last lines of the sample above, the built-in recyclers even allow to reserve an amount of recyclables upfront. The concept of reserving container capacity in case the future size of a container is known or at least assessable, from the standard library. With recycling in place, camp ALib Containers extends this feature to likewise pre-allocating a corresponding amount of node objects as well. For this, interface methods RecyclablesCount and ReserveRecyclables (for example List::ReserveRecyclables or HashTable::RecyclablesCount) are provided.
Note that pre-allocation of recyclable node objects can reasonably improve execution performance. This is due to the caching levels of modern processor hardware. More information on this was given in chapter 1.5 Recycling.
Furthermore, preserving capacity can be useful when monotonic allocation is used. As a sample, consider the creation of (nested) snapshots: During the execution of an "inner" monotonic allocation sequence, which will be reset in the near future, no insertions of elements into containers that belong to the "outer" sequence must be done - unless it is ensured that such insertions do not allocate node objects, because those would become invalid with the reset! Preserving capacity including recyclables is a way to mitigate this common source of "undefined behavior" when using nested monotonic allocations. Of course, such approach needs to be taken with a lot of care: the number of preallocated recyclables must not be accidentally underestimated and some assertions in debug-compilations should be added. Still, often it is absolutely foreseeable how many insertions are expected until an allocator can be reset and here, the concept should be used without imposing too many risks.
As quickly mentioned in chapter 2.4 Struct StdContainerAllocator, namespace alib::lang introduces type StdContainerAllocator. This is independent of the inclusion of either module ALib Containers or ALib Monomem in the ALib Distribution.
std::vector
.As seen in the sample, the type enables C++ standard container types to be used with class TMonoAllocator. Some limitations apply in respect to "node recycling" which is discussed in the next sections.
For all uses cases that constitute monotonic allocation without the need of recycling internal container node objects, template types that accept a std::allocator
can be equipped with SCAMono, which specializes StdContainerAllocator to use monotonic allocation.
To use SCAMono, an understanding of the general concept of using std::allocator
type is helpful.
A standard container is declared by providing the allocator types alib::SCAMono<T> as the allocator template typename. The contained object's type has to be repeated:
With the definition of an instance of a container, an instance of the std::allocator
type has to be passed to the constructor of the container:
Note that we simply pass the "original" allocator type MonoAllocator here, which implicitly constructs the StdContainerAllocator, aka SCAMono with the constructor call.
If an allocator used by a standard container is reset, the container object itself has to be reset. Because the standard containers do not support such a reset, the simple way out of it is to:
The documentation of class StdContainerAllocator provides a source code sample that demonstrates this.
A next shortcut to using types SCAMono and SCAPool in combination with std::vector
is given with alias StdVectorMono. The type definition of the sample above then look like this:
The list of sibling aliases provided are:
The challenge of implementing recycling for the C++ standard container types lies in the fact that the standard specification does not cover a container's implementation but only its interface. Consequently, the internal node type is not determined. Even worse, it is not even determined if the type allocates and de-allocates single node objects at all, or if any other implementation strategy is followed.
Nevertheless, this ALib Module provides type StdContainerAllocatorRecycling which performs recycling. The problem is: It is not guaranteed to work on each platform/toolset!
The type internally uses RTTRAllocator, where RTTR stands for run-time type recycler. This means: The type of recyclable node objects is only detected at run-time. If such detection fails, in debug-compilations different ALib warnings are raised.
In practice, the type works with all implementations of the standard library known to the authors of ALib. Nevertheless it has to be warned that switching the library/platform/tools of a compilation's target platform, imposes the need to test the new tool chain.
For this reason it is generally recommended to use the container types this module provides instead of using the standard types.
Being a platform agnostic library, of course ALib does not use type StdContainerAllocatorRecycling anywhere internally.
With C++ language version 17
, the standard library was extended by namespace std::pmr
which stands for "polymorphic memory resource". The standard here primarily aims to allow two containers that use different types of memory allocation, to be of the same type.
Included in this namespace is class std::pmr::monotonic_buffer_resource which introduces monotonic memory allocation to the standard and is comparable to class TMonoAllocator.
As of today, ALib does not provide a compatibility layer to entities found in std::pmr
. In future versions of ALib, it may be worthwhile to provide more specialized, built-in bridges to allow a tight integration with the std::pmr
infrastructure. A foreseeable first bridging entity would be the implementation of abstract interface type std::pmr::memory_resource with using class MonoAllocator, and vice versa, the implementation of template along the lines of prototype lang::Allocator, that internally uses a std::pmr::memory_resource.
If a user needs such layer between ALib and the C++ Library, such implementation should be possible with a few lines of code only. Let's quickly sample this:
We need to implement a class that derives from std::pmr::memory_resource
and forwards memory allocation and deallocation requests to ALib's allocator. Below is an implementation that demonstrates how to wrap ALib’s MonoAllocator.
Here is a usage example of how to use the PmrMonoAllocator with a standard C++17 std::pmr::vector
.
Notes
PmrMonoAllocator::do_deallocate()
method is left empty, as it’s unnecessary in this context.std::pmr::unordered_map
, std::pmr::list
, etc.).
Type PoolAllocator is the next allocator that module ALib Monomem introduces.
What you have just learned about the pool allocator should remind you on previous chapters 1.5 Recycling and 4.2 Recycling Support. And yes, the PoolAllocator is nothing else but a proxy class in front of a MonoAllocator that just recycles everything that it allocates. As explained in the reference documentation, to reach this gaol an average of 25% allocation overhead is taken into account. Well, there is no free lunch with programming, it is always a tradeoff between speed and memory consumption. And speed is key here. The use of a pool allocator has almost no impact on the pure allocation speed. When used correctly (with the right use cases) the overall performance gain can be dramatic.
This allocator type was the reason to make parameters size and newSize of prototype methods Allocator::allocate and Allocator::reallocate in/out parameters. These methods expect a reference to the size parameter, and PoolAllocator is the one that in fact changes the value. This is because the requested memory size is rounded up to the next higher power of 2. Some types might make use of the extra space. A prominent sample is class AString, which in its pool allocator version is addressed via type definition AStringPA.
Like thousands of other string class implementations, also class AString increases an internal buffer when things are appended to the string. For this, a new larger buffer is requested, the current data is copied, and the old buffer is freed. The growth factor of C++ class std::string
is not specified, the same is true for AString. However, with type definition AStringPA which uses a PoolAllocator (more on this in a later chapter) any exceeding memory returned by the pool allocator will be added to the internal capacity member.
Apart from this, there is no other use case found in ALib (as of the time of writing this), because most of the time, distinct memory sizes are requested and oversized returns are of no use. Nevertheless it should be good to know and understand about this feature.
The high-level allocation methods available through AllocatorInterface are missing this feature. They only accept values and will thus will never increase the given sizes. This is a design decision: In case a change should be recognized, the low-level methods allocate and reallocate have to be used.
Finally, if compiler symbol ALIB_DEBUG_ALLOCATIONS is set, then an acceptance of a higher size than requested, has to be announced by calling method Allocator::dbgAcknowledgeIncreasedAllocSize. Otherwise, the internal debug-checks would (rightfully) report size mismatch errors when an object is freed. More on this topic is given in chapter 11.4 Debugging.
Besides the core interface defined with Allocator and the available high-level interface AllocatorInterface, this class offers a specific allocation technique. This is constituted by the following methods:
As it is quickly understandable, with deallocation the allocation size of the memory has to be known by this allocator. While other allocator types may just ignore this parameter, and function std::free
does not even expect a size to be returned, this allocator produces undefined behavior if wrong values are given.
In some use case scenarios, objects of different size are allocated and at the point of freeing them, this size might not known to the part of the software that frees the objects. In this case, one possibility is to store the allocation information with each allocated object. As you learned from the reference documentation, such storage can be achieved by reserving just 6 bits, respectively 5 bits on 32-bit systems.
A next option could be to make allocated objects virtual and add a method that returns their size.
A third way of coping with the issue of providing the allocation size when freeing memory is given in the next chapter.
std::free(void*)
does not have such parameter.This is a hands-on chapter providing a recipe to implement safe and efficient use of class TPoolAllocator along with a concrete implementation. Instead of introducing some artificial sample, we are looking at the internals of camp ALib Configuration, which constitutes the access to external configuration variables with ALib.
The explanation should be given as step-by-step recipe:
Configuration
provides the method RegisterType<typename TVMeta>(). This method does only take one single template parameter, and no run-time parameters. The template parameter has to specify a type that is derived from virtual abstract base type VMeta.This approach imposes only one restriction to the custom types that are usable with run-time variables offered by that ALib Module: The alignment of the types must not exceed what is specified as the fixed alignment used by the pool-allocator. Class Configuration uses the default-value of the template parameter TAlignment, which is defined by compiler symbol ALIB_MONOMEM_POOLALLOCATOR_DEFAULT_ALIGNMENT. If greater alignments are needed, the library and its using code entities have to be compiled with specifying this symbol. But common use-cases of configuration variables do not require greater alignments for custom types.
On a final note, when you look at the ALox type FormatTimeDiff mentioned
above, you will see that this class has several members of type AStringPA. Consequently, this class needs a pool allocator at construction which is passed to each string member. This is, of course, again the pool allocator of the Configuration object. It is offered for custom use with methods VMeta::construct and VMeta::destruct.
We hope walking you through this real-world sample we have motivated you to consider using pool-allocation in future software designs. Compared to other allocation strategies, especially compared to using the system heap, the performance is unrivalled. And the field of applications is vast.
Module ALib Monomem provides utility class TSharedMonoVal. This class has two features, which are described in the next sections.
Sets of objects that are eligible to be monotonically allocatable (see introduction chapter), often at the same time
do share their scope and life-cycle with a certain, rather prominent compound type.
As a sample for this, consider class FTree of module ALib Files: The type uses a StringTree, which is an extended form of a HashTable,
to collect information about files in a filesystem. Such a tree may contain many entries, and some
or all of such entries may even have some custom information attached. Altogether, a plethora of single allocations have to be performed when working with large FTree instances.
The lifecycle of each entry in the tree ends with the lifecycle of the tree itself.
While class FTree requires an allocator to be given with construction, which may be used for other allocations as well, in most use-cases, it makes absolute sense to use a dedicated allocator just for the tree and delete this allocator with the disposal of the tree.
Therefore, the first feature of class TSharedMonoVal is to combine a dedicated MonoAllocator with a custom type. In contrast to just creating a compound type by adding an internal allocator member, the approach is a little trickier: The custom type itself is placed inside the monotonic allocator. Furthermore - and this may take a moment to think about - the monotonic allocator itself is placed in this first buffer!
As a consequence, the "footprint" of class TSharedMonoVal (aka the result of sizeof(TSharedMonoVal<CustomType>)
) is always the size of a single pointer. On 64-bit systems the size is 8
bytes, on 32-bit systems it is 4
bytes. This single pointer points somewhere inside the first buffer of the monotonic allocator where both objects are placed.
Before the second fundamental feature of class TSharedMonoVal is discussed in the next section, a simple sample code should show how to use this class.
A dictionary for translating single words from one language into another should be implemented. The dictionary is nothing but a simple HashMap assigning an input string to an output string:
The HashMap is defined to use monotonic allocation and thus the constructor of class Dictionary expects such external allocator to be passed. This is how it would be created without using class TSharedMonoVal:
The recipe to moving this type into a TSharedMonoVal class is:
That is all that is needed and in the sample looks like this:
Now the instantiation of the type does not need an external MonoAllocator anymore:
To access the members of the original Dictionary class, overloaded operator operator->() is to be used:
The functionality to reset a MonoAllocator is extended with type TSharedMonoVal to destructing the contained object, resetting the allocator, and re-constructing the contained type again. Consequently, this method expects the parameters necessary for (re-)construction of the contained type:
As elaborated in the previous sections, class TSharedMonoVal makes only one single (heap) allocation and packs the MonoAllocator as well as the custom type in the front of this first buffer. Furthermore, the only member that the type has is a pointer into its allocator's first buffer, and thus the type has a minimal footprint equalling sizeof(void*)
. Of course, this makes the type attractive to be copied to other places and shared between code entities.
Consequently, the type received a next feature which is very similar to what C++ standard library type std::shared_ptr
offers, respectively exactly the same what ALib type SharedVal offers.
Technically spoken, together with the MonoAllocator and the custom type T, in addition, an atomic reference counter is embedded. Along with the provision of corresponding copy and move semantics, the type offers methods:
This explains, why we had equipped the sample type SharedDictionary
with a default constructor, that is omitting the parameter that determines the size of the MonoAllocator's initial buffer. If the class is default constructed, nothing is done and the internal pointer is simply set to nullptr
. Secondly, we added a constructor taking nullptr
. This allows assigning nullptr
to values of that type.
The following further excerpt from the unit-tests demonstrates this and should be all that is needed for understanding how it works:
Across ALib, two schemes for naming the "inner" type and the type derived from TSharedMonoVal exist. The choice depends on whether the "inner" type is commonly used without being encapsulated in the outer type.
Module ALib Threads provides comfortable support for locking critical sections against racing-conditions in multithreaded software. In short:
For details consult that module's Programmer's Manual.
It is evident that shared values often need to be accompanied by a corresponding mutex. To provide and share an instance of such lock right with the shared value itself, class TSharedMonoVal provides template parameter TLock. While it defaults to void
, which indicates not to include a lock, it can be set to any of the six ALib lock types. If so, not only a corresponding instance of a lock is included, but furthermore the specific interface methods of the lock type given become available as corresponding interface methods.
As a result of having the lock's interface methods directly invocable class TSharedMonoVal, the Owner types and the set of locking macros can all be applied conveniently to instances of the class.
Here is a quick sample snippet that shows this with type TSharedFTree:
void
.With debug-compilations and furthermore compiler symbol ALIB_DEBUG_CRITICAL_SECTIONS set, the integrated lock is automatically attached to the instance of DbgCriticalSections that is embedded in the underlying StringTree of contained class FTree. With that, (often) assertions are raised in case the instance is used without prior acquisition.
If parts of the using code are protected otherwise and thus do not need the lock to be set in addtion, the assertions can be switched on and with the method TSharedFTree::DbgCriticalSections.
The very same approach was taken with class TSharedConfiguration and its alias type SharedConfiguration.
Any allocator - at the end of the day - needs to make calls to the operating system to allocate "heap memory", because this is the one and only true memory source. Even class TLocalAllocator uses stack-memory only for its first buffer. Once this buffer is full, the next buffer has to come from the heap.
Now, instead of hard-coding the underlying memory source to be heap memory, the ALib allocation architecture allows what we call "allocator chaining". For this, each allocator itself receives a template parameter named TAllocator, which is exposed as a type definition ChainedAllocator, as with:
void
),The allocators implemented with ALib, allow access to the chained allocators through
the publicly inherited interface of type lang::AllocatorMember, namely methods
The type definition alib::MonoAllocator hard-codes template parameter TAllocator to type HeapAllocator:
TMonoAllocator<HeapAllocator>
Because this "chain" covers most use cases of monotonic allocation, you could name this the "natural choice".
A similar natural choice for the designers of this module was to set type MonoAllocator as the source for type definition PoolAllocator:
TPoolAllocator<MonoAllocator>
Resolving this shows that PoolAllocator in fact equals type:
TPoolAllocator<TMonoAllocator<HeapAllocator>>
The answer to the question of why this decision can be called a "natural choice" is easiest given by thinking about the alternative: Why should a pool allocator use heap memory directly? A pool allocator is a perfect recycler and as this allocator never frees memory until it is destructed, here recycling turns any use case into a monotonic allocation case.
Because a user still might wish to have a pool allocator chained directly to HeapAllocator, the library provides a second alias type with PoolAllocatorHA, which is defined as
TPoolAllocator<HeapAllocator>
Two chains that are not instantiated with the library and thus would cause linker errors when used, are:
TMonoAllocator<TPoolAllocator<HeapAllocator>> TMonoAllocator<TPoolAllocator<TMonoAllocator<HeapAllocator>>>
Still these chains could be valuable in complex scenarios.
Another scenario where instantiations of other chains may come handy is when a user of the library introduces an own type that implements the prototype Allocator. A user would do this with the intention to chain it with ALib allocators (why else would he or she be using this prototype?).
The ALib source file layout allows instantiating the built-in allocators for different chains. The explanation and a source-code sample that concretely shows how to instantiate type:
TMonoAllocator<TPoolAllocator<HeapAllocator>>
is given in the general ALib manual, because the recipe applies to other template types of ALib as well.
On interest, consult chapter A.5 T.HPP-Files And Custom Template Instantiations.
This chapter can be made brief: There is no built-in protection against racing conditions caused by parallel allocations invoked by different threads. Consequently, any mechanism of protection has to be performed by the using code.
With special compiler symbol ALIB_DEBUG_CRITICAL_SECTIONS set, which is enabled by default with the inclusion of module ALib Threads in the ALib Distribution, allocators will add code that checks the exclusive use of critical sections. If those checks raise an assertion, the using code needs to add locks to prevent racing conditions in multithreaded software.
While this is in alignment with all low-level ALib entities (and with the usual designs of low-level libraries), programmers are very much used that the new/delete operators
, respectively calls to std::malloc/free
are thread-safe. This is why this quick chapter hints to that fact that using ALib allocators, these calls are not thread-safe.
Besides using protection mechanisms, avoiding race conditions can be achieved by using different instances of allocators in different threads. Thus, the design decision about which parts of a software share the same allocator, and which parts dispose of an own instance, finally decides about the necessity of protecting the allocators from parallel access.
For the latter reason, this small manual section was placed together with the next section into one chapter.
In case of multi-threading, things become even more complex and a well-thought design is key. On the other hand, the performance benefits of using monotonic allocation, for various reasons may also become very huge in multi-threading applications.
An expected multi-threading use of a software entity can be a driver for both: either for sharing an allocator or for creating dedicated allocators used by specific threads. It is use-case-dependent and cannot simply be used as a guideline to decide about it.
The feature of taking snapshots of a MonoAllocator again increases the danger of undefined behavior of malicious code, but on the other hand can dramatically improve performance. Once more, the combination with multithreaded access further increases such danger and complexity.
Similar concerns apply to the built-in recycling features and class PoolAllocator.
An alternative to creating a local snapshot which might impose the need to protect the code section against racing conditions, might be the use of class LocalAllocator instead. This alternative should be kept in mind, because the huge benefit may be that stack memory is already cached. Depending on the use-case however, the currently used buffer of a MonoAllocator in use, might also be already cached on the hardware level.
std::malloc/free
, ALib allocators are not inherently thread-safe, and users need to manage synchronization themselves.In the case that the life-span of an object set that is suitable for monotonic allocation equals the complete life-span of an application itself, the global allocator singleton GLOBAL_ALLOCATOR may be used for allocations.
Because, this allocator is deemed to be shared across many compilation units, its use has to be protected against racing conditions of multithreaded access by acquiring GLOBAL_ALLOCATOR_LOCK before use.
This is most conveniently done by using macro ALIB_LOCK_RECURSIVE_WITH. With single-threaded applications, the allocator's lock is not present and thus has to be ignored, which is automatically done with that macro.
With multithreaded applications, it has to be acknowledged that the global instance has to be locked whenever it is used. If during acquisition, the global instance is, for example, passed to one of the container types, then mutex GLOBAL_ALLOCATOR_LOCK has to be acquired whenever this container type is potentially allocating memory. This consequence can be "overseen" easily. Also note, that several ALib modules make use of this allocator instance also after bootstrapping. For example, when new instances of classes Thread and Lox are registered, or when class Path detects default directories for the first time.
For this reason, with debug-compilations, it is asserted that the global allocator is duly locked. Further details are given in chapter 1.4.3 Asserting Critical Section Locks of the Programmer's Manual of module ALib Threads.
A second case for using the global allocator singleton is using it in combination with snapshots. If sets of monotonic allocatable objects of short life-cycle, may use the allocator singleton and reset it to its previous state afterwards.
The advantage of doing so is that memory buffers allocated by the allocator during the "nested" monotonic allocation, are reused with a next use. This furthermore reduces the overall number of dynamic allocations.
Apart from this advantage, the disadvantage may be huge: In multithreaded applications, the global instance has to remain locked until it is reset to the snapshot. Furthermore, it has to be ensured that no other code invoked is using the global instance, as resetting the global instance would invalidate any allocation made since the snapshot was taken. Errors resulting from such behavior might be challenging to debug.
As a conclusion, this use pattern is not recommended unless a programmer knows exactly what and why she is doing it.
The global instance is constructed using special constructor TMonoAllocator(const char*, std::nullptr_t), which does not allocate a first buffer and produces an invalid instance. At the beginning of the bootstrapping process method IsInitialized is used to check if the allocator was rightfully initialized already. If not, a placement-new is performed, passing an initial size of 128
kilobytes and a growth factor of 200
.
Now, if a user wants to change the initial size and/or the growth factor, all that is needed to be done ist to perform such placement-new before bootstrapping. The code could look like follows:
While it is not specified by the C++ language, as a prerequisite, let's list some alignments of types as used with common toolchains:
Type | Alignment on 64-bit systems | Alignment on 32-bit systems |
---|---|---|
char | 1 | 1 |
void* | 8 | 4 |
int64_t | 8 | 4 or 8 |
std::max_align_t | 16 | 8 |
Again, these values may be different with other compilers/systems but already, when comparing GCC with MSVC
under 32-bit architecture, a difference is noticed for type std::int64_t
.
Function std::malloc
does not take an alignment parameter and always aligns to alignof(std::max_align_t)
. If a higher alignment is required, a programmer needs to allocate a larger memory piece and increase the start of the memory used in own responsibility.
In contrast to this, the interface functions defined with ALib allocators, which are prototyped by (non-existing) type lang::Allocator, allow the alignment-specification of the allocated memory. But attention: This parameter is not necessarily respected and cannot be freely chosen in all cases.
Instead, following rules apply:
std::malloc
and std::free
, ignores the parameter and always align to alignof(std::max_align_t)
.Type TPoolAllocator ignores the parameter and uses the fixed alignment that is given with its template parameter TAlignment. This parameter defaults to compiler symbol ALIB_MONOMEM_POOLALLOCATOR_DEFAULT_ALIGNMENT, and this symbol in turn defaults to alignof(void*)
.
The type definition alib::PoolAllocator, which is used across other modules of ALib, uses this default value. This has some implications at the moment that a library (or code unit that is not under control of a programmer) fixes the alignment to a value that is too low to allocate a user's type. As good example is the restriction concerning the alignment of custom types that are to be stored in instances of class Variable provided by module ALib Configuration. (See chapter 5.2 Custom Variable Types of the Programmer's Manual of module ALib Configuration for more information.)
While the maximum value of template parameter TAlignment depends on the chained allocator,
for technical reasons, passing a value lower than alignof(void*)
to template parameter TAlignment is not allowed.
Whenever possible, static assertions are raised if illegal values are given. In debug-compilations ALib assertions are raised when unsupported alignments are requested.
String data is often assembled (e.g., using a Formatter) and the character buffer for this sometimes need to be allocated for later use. The simplest approach to this is method TString::Allocate (or corresponding constructor TString(TAllocator& allocator, const TString<TChar>&)), which is templated and thus accepts any Allocator. Besides allocating the character array, these methods also copy the given source to it.
However, some attention has to be given to the fact that only class AString manages allocated buffers. In contrast, class String does not know - and thus does not care - about whether explicit allocation was used. Therefore, on destruction the memory has to be explicitly freed by using software. Still the use of type String can have performance benefits over AString.
ALib's string buffer class AString is only called "AString" everywhere for better readability. In fact it is a type definition providing both template parameters to the "real" string buffer type alib::strings::TString<TChar, TAllocator>. Besides choosing the standard character size, type definition AString chooses heap allocation.
With the inclusion this module ALib Monomem in the ALib Distribution, further type definitions for MonoAllocator and PoolAllocator become available. The following table gives an overview of the different combinations of character and allocator types. Note that the alias type definitions themselves are made in outer namespace alib.
The aliases associated with a MonoAllocator are declared in header alib/monomem/aliases/astringma.hpp, those associated with PoolAllocator in header alib/monomem/aliases/astringpa.hpp.
For cases where the initial buffer size is not known, but an average or maximum size is, an alternative is to emplace an instance of class LocalString inside monotonic memory. The effect of doing so is that the internal reserved buffer is monotonically allocated together with the object itself. If the buffer is exceeded, dynamic allocation is used.
It is good practice to use type LocalString (for example String32 with container types. They can be used for attributes of the contained value the same as for key-attributes (for example in case of HashTable
). Their fixed (templated) internal buffer size has to be chosen carefully depending on the data that the strings are supposed to hold. The goal here is to find a good trade-off between minimizing the average allocation overhead and minimizing the number of occasional dynamic allocations caused by buffer exceeds.
Corresponding aliases using different character and allocator types are given:
The aliases associated with a MonoAllocator are declared in header alib/monomem/aliases/localstringma.hpp, those associated with PoolAllocator in header alib/monomem/aliases/localstringpa.hpp.
Furthermore type definitions corresponding to String8, String16, String32..., are given in these headers with:
Class util::TStringVector is a utility type of module ALib Strings which is templated with an allocator.
With the inclusion of header alib/monomem/aliases/stringvector.hpp, the following type definitions become available:w StringVectorMA, StringVectorPA, NStringVectorMA, NStringVectorPA, WStringVectorMA, and WStringVectorPA.
Class TBoxes constitutes a std::vector
of elements of type Box. Definition of types BoxesMA and BoxesPA are made with the inclusion of header alib/boxing/boxing.hpp already. This is done via forward declarations of types MonoAllocator and PoolAllocator, even if module ALib Containers is not included in the
ALib Distribution.
With debug-compilations and symbol ALIB_DEBUG set, the following mechanics features become available:
true
is passed, the allocator will assert with further allocations. This is useful in more complex scenarios, for example, to assert that no other code entities perform allocations, when a snapshot was taken and a Reset is scheduled.With special compiler symbol ALIB_DEBUG_MONOMEM, the following features are activated:
With special compiler symbol ALIB_DEBUG_CONTAINERS, the following features are activated for class HashTable:
With special compiler symbol ALIB_DEBUG_ALLOCATIONS, the following features are activated:
Finally, with special compiler symbol ALIB_DEBUG_CRITICAL_SECTIONS set, which is enabled by default with the inclusion of module ALib Threads in the ALib Distribution, the following features are activated:
The set of allocators introduced here, together with the "recycling container types", provides a well chosen balance of flexibility, simplicity, and performance for a broad range of use cases.
By leveraging chaining, recycling, and snapshotting, most performance-critical memory allocation scenarios can be handled, without needing to introduce additional, more complex allocation strategies.
If your project is designed to serve general-purpose C++ development, especially in environments where performance and control over memory allocation are critical (e.g., game development, real-time systems, server-side applications), then types MonoAllocator, LocalAllocator, and PoolAllocator offered, should be more than sufficient.
The fact that this Programmer's Manual needed quite lengthy introduction chapters and contains many general knowledge sections, explains why alternative allocation strategies are not often found in programming languages.
Even the C++ language only in their C++ 17 level, received valuable and consistent types in that area
From our perspective, the mechanisms that these two combined ALib Modules provide,
It remains to notice: If the allocation models are chosen rightfully, the performance gain can be enormous for many applications.
"Bigger" software, like databases, web-servers, CAD systems, etc., in their development evolution, have all come to the point of using this concept! Due to the long-term lack of standards, mostly by implementing their own tools and strategies.