This module's name "monomem" stands for "monotonic memory" . The term monotonic here relates to the way that memory allocation is performed: At a certain point in execution time, a set of allocations is performed which are all deleted at the same future point in time. The used memory for allocation is "monotonically growing" during a defined time span.
This is the "strict" definition. In this introduction we will see how "weaker" definitions lead to a wide range of possible use-cases.
Two "built-in" sorts of memory are available with the C++ language:
new
and delete
.new
and de-allocated with delete
.std::malloc
and std::free
are used, which have very complex (and smart) implementations in place. In short, usually the functions are interfacing with the underlying operating system to retrieve bigger memory chunks, but provide an own "management layer" to efficiently organize allocated and de-allocated fragments.While the use of local allocation is very restricted (namely to local variables), dynamic allocation is the opposite: it provides the possible maximum of freedom. But this degree of freedom comes at a high cost due to the necessary management overhead and other factors discussed in the next quick chapters.
To be motivated to use monotonic allocation, the following is important to understand (and to believe):
Besides these two, other memory allocation models are available with 3rd-party libraries.
Well, and one fundamental of those is provided with this ALib Module .
Monotonic allocation builds on dynamic allocation but avoids memory fragmentation and thus reduces the performance bottleneck caused by a fragmented heap.
The same as a programmer has to identify and decide whether an object can be allocated locally as a fast automatic variable or is needed to be allocated on the heap using new
, the possibility of using monotonic allocation has to be identified and decided by a programmer.
Due to the name "local variable", the decision about automatic vs. dynamic allocation seems to be bound to the fact whether an object is used only locally in a method or function, of if it is shared between different code entities. But this is not the case: A local variable may well be passed to arbitrary functions invoked by the code block that declares the variable.
Instead, the decision is considering an object's life cycle: If an object can be deleted with the end of the code block it was declared in, then an automatic variable is to be used. If otherwise the life cycle has to survive the execution of such code block, dynamic allocation has to be used.
The decision about the potential use of monotonic allocation is also bound to observing life cycles. But this time, it is not about the life cycle of one single object, but instead about the life cycle of a set of objects.
The decision rule is as follows:
With this definition, the implementation of monotonic allocation can be easily specified:
delete
. Instead the object's destructor methods are "manually" invoked.The performance advantages can be tremendous, not only because what was already mentioned:
But further reasons of performance gain are:
new
and delete
are allowed to be used from threads without taking own precautions about racing conditions, as such precautions are made internally, which imposes a next performance penalty.On the hardware level: It is likely that the different access operations to the memory of monotonic allocated objects is done in code sequence that are executed in timely relations, respectively in code sequences that are executed in series. This way, the hardware memory cache will less often be reloaded, as all objects tend to lie side-by-side in the same memory chunk.
A well understandable sample use case for monotonic allocation, is found with module ALib Expressions . There, this technique is used twice:
The second use of monotonic allocation is even more important due to the fact that an expression is compiled only once, but may be evaluated plenty of times against different scopes. Now, with the first evaluation, chunks for monotonic memory are allocated, but after the evaluation is done, they are not free'd! While the intermediate objects are duly destructed, the chunks already allocated are kept for the next run.
This way, starting with the second evaluation of an expression, not a single dynamic memory allocation is performed!
Another prominent sample is found in server software that offer a certain service. Let's consider a simple http-server. Usual implementations maintain a pool of threads and with each incoming request a next free thread is chosen to process the request. Now, all memory needed during processing can be considered monotonically allocatable. And you can bet that all prominent implementations of http-serves are using this concept, as introduced here.
The definition given above stated that those objects that share the same life cycle may be considered a set and be monotonically allocated. This definition is quite strict and therefore narrows the use cases much.
There are several ways to weaken the requirements. The following "thought experiment" may easily demonstrate this: Consider a simple command-line calculator software, as demonstrated with module Expressions or any other "simple" software that accepts an input, creates an output and exits. What would happen if each and every dynamic allocation would be turned into a monotonic allocation by extending the life cycle of all allocations to the overall life cycle of the software process?
The answer is: no problem! The result executable would potentially show slightly more performance while it would most probably allocate a higher total amount of dynamic memory during it's run. But because it is known that the total amount of allocations is very limited - considered the simplicity of the task - it is assured that no situation of memory exhaustion happens - which is exactly the danger if the strictness of the definition of monotonic allocation is weakened.
With this thought experiment, we can "weaken" the definition and can extend the use cases in the following ways
The technical implementation of weak monotonic allocation is the same as that of strict monotonic allocation and for this reason, this ALib Module does not provide any specific interface to support such approach. It is moreover not transparent to the module's code entity if monotonic allocation is strict or weak.
Nevertheless, if the increase of the peak memory consumption is guaranteed to be limited, then the performance gain of monotonic allocation usually goes far beyond the performance loss that might be caused (in the whole system!) by a higher peak memory consumption.
The takeaway from the previous section was that weakening the requirements to perform monotonic allocation might immensely broaden its use cases and with that the potential to optimize the performance of a software.
Now, let us construct a sample that - at the first sight - is not a candidate for optimization and let's think ways to turn it into a candidate. Consider the following scenario:
A typical implementation in C++ would use an std::unordered_map
(aka hash table) to store the session objects. With each insertion, the std::unordered_map
allocates an internal "node" object and with removals such node object is deleted.
Together with dynamic memory allocations performed by the server in other places, over time a reasonable fragmentation of the heap memory will occur. The internal node objects allocated by the hash table at a certain point in time, will be spread (fragmented) across the whole area of dynamic memory allocated from the operating system.
There is a very simple way to mitigate this problem: Instead of de-allocating a node object when a session is removed from the storage, such node object has to be destructed only, while it's allocated memory is to be pushed to the front of a simple linked list of "recyclable nodes". With the next insertion of a new user session, it is checked if a recyclable node is available and only if not, a new node is allocated.
The consequence of this approach are
And the best of all:
Note that with "recycling" and monotonic allocation in place, the potential cache-misses on hardware level are completely mitigated, if only the first chunk of memory allocated is large enough to store all nodes needed to handle the peak of parallel user sessions.
Besides the fact that the use of container types can often be turned into a use case of monotonic allocation, the more general takeaway of this final introductory chapter is:
The central type of this ALib Module is class MonoAllocator . Its interface allows to allocate memory of given size and alignment, emplace objects and allocate and initialize array types.
Internally bigger chunks of memory are requested from the heap and used for the provision of the data.
The memory chunks that an instance of class MonoAllocator allocates on the heap are stored in a single-linked list and only deleted with the deletion of the allocator instance. This approach allows to have method MonoAllocator::Reset . Before the invocation of the method, all monotonically allocated objects have to be destructed (by the using software). After that, the allocator can be re-used, which means that the previously allocated chunks are re-used as well! Only in the case that more memory is needed than in a prior "run", a new chunk is requested from the heap.
The simple concept of resetting an allocator to have the ability to reuse it, internally is realized by a more sophisticated concept. Class MonoAllocator does not only allow to be just reset to its initial, empty state, but also allows to create "snapshots" of its current use and later reset back to exactly this state.
A snapshot is taken with MonoAllocator::TakeSnapshot . A previously taken snapshot may be passed to method MonoAllocator::Reset which replaces its default argument that completely resets an allocator.
If several snapshots are taken, those are "nested": A later snapshot resides in the prior one. Consequently, if an allocator is reset to an "older" snapshot, all subsequent snapshots automatically become invalid. In debug-compilations, an ALib assertion is raised if an invalid snapshot is passed to method Reset.
This concept allows to furthermore reduce heap allocations by splitting a set of monotonic allocations into several, nested ones. Nested snapshots and resets slightly extends the definition of monotonic allocation as given in the introduction chapters. It is especially useful in situations where weak monotonic allocation is nested in a more strict use case, because it has the potential to mitigate peaks of overall memory usage.
While the exact conditions of use are not easy to precisely define, during the course of implementing monotonic allocation, a programmer will probably identify use cases for nested allocations quite naturally.
This module provides implementations of container types that are very similar to types found in the C++ standard library. The provided types are considered alternatives for the standard types to be used with monotonic allocation.
Only types that internally allocate node objects are replaced. Standard containers that do not allocate node objects are:
std::vector
andstd::deque
.No alternatives are given for these two types. This is true for the current release of the library and it is foreseeable that also in future releases no alternative will be given.
std::forward_list
, std::map
and std::set
- which internally create node objects - are missing in this module. They will be provided in a future update. (We're looking for volunteers!)The rational for the existence of the alternatives and the difference to the standard types is given in this and the next chapter 4. C++ Library Container Types.
The following container types are provided with module ALib Memory :
std::list
.std::unordered_set
and std::unordered_multiset
.std::unordered_map
and std::unordered_multimap
.All container types provided, accept an external MonoAllocator with construction which they use for allocations.
Besides monotonic allocation, the types support recycling of internal node objects, as explained in the next chapter.
The concept of "node recycling" as it was introduced in chapter 1.5 Recycling, is supported by all built-in container types. For that, each type specifies a template parameter named TRecycling which has to be set as any of the following three types:
None switches recycling off and may be used with use cases of strict monotonic allocation. Its advantage is that the (negligible) overhead of collecting destructed nodes and re-using with insertions is omitted.
The default value Private establishes an internal stack implementation (a simple pointer to the first recyclable) and all recycling takes place internally.
The next chapter introduces details of shared recycling.
The concept of shared recycling aims to reduce the overall peak size of allocated memory from being the sum of each containers peak size (in case of private recycling), to reflecting the true peak of all concurrently allocated nodes.
But this slight optimization is not the main reason to offer shared recycling. More important, use cases exist, which can not be made suitable for monotonic allocation if only private recycling was available.
As a sample let us have a look at class InMemoryPlugin of module ALib Configuration . This type internally stores a List of "categories". Each category in turn contains a List of named "entries" and a using software is free to create and delete arbitrary categories and arbitrary entries within such categories during the life-cycle of the object. The problem arises if a complete category is deleted: While the object for the category itself is recycled with the creation of a next category, the list of entries of such recycled category was destructed on removal. Now, if private recycling is used with the list of entries, the recycled entries are "lost" with that destruction. Such loss of objects makes this use case not suitable for monotonic allocation, because an infinite amount of insertions and deletions of categories might occur.
The way out of this is to share the stack of recyclable elements among different containers.
In class InMemoryPlugin, this is activated for the list of entries found in each category. With that, each entry list uses the same "external" recycler instance and when a list is destructed, the recycled elements are not lost.
As already noted above, this approach does not only resolve the original problem, it further optimizes the overall memory use: As soon as an entry is erased, the next insertion will recycle this entry, no matter which category is concerned.
Shared recycling is activated by setting tag-type Recycling::Shared , for template parameter TRecycling with the corresponding container instance. When doing that, the container instance requires an additional constructor parameter which provides the external, shared recycler object. The type of the recycler that has to be created and passed, is determined by a type definition named TSharedRecycler that each container type offers. (For example List::TSharedRecycler or HashTable::TSharedRecycler .)
The object passed has to survive the life-span of each container type that uses it.
Here is a quick sample taken from the unit tests that demonstrates the use of shared recycling:
The implementation of class HashTable allocates so called "bucket arrays" besides node objects. If, with the insertion of a new element, the maximum load factor of the hash-table is reached, a new, bigger bucket array is allocated and the existing nodes are "re-hashed". Then the former bucket array becomes useless, hence would be de-allocated if dynamic memory allocation was used.
In general, this is not a problematic situation because the growth continues only up to a certain limit, which does not violate the concept of weak monotonic allocation. Nevertheless, if recycling (private or shared) is enabled, class HashTable passes this non-node piece of memory to its recycler object: There, the array be sliced into as many (duly aligned) pieces of the internal node size as possible, and then passed to recycling stack.
The concept of reserving container capacity if the future size of a container is known or at least assessable, is extended by this ALib Module 's container type to the optional inclusion of allocating a corresponding amount of recyclables as well. For this, interface methods RecyclablesCount and ReserveRecyclables (for example List::ReserveRecyclables or HashTable::RecyclablesCount ) are provided.
Note that their use can reasonably improve execution performance, due to the caching levels of modern processor hardware. More information on this was given in chapter 1.5 Recycling.
Furthermore, preserving capacity is useful with the use of (nested) snapshots: During the execution of an "inner" monotonic allocation sequence, which will be reset in the near future, no insertions of elements into containers that belong to the "outer" sequence must be done - unless it is assured that such insertions do not allocate node objects, because those would become invalid with the reset! Preserving capacity including recyclables is a way to mitigate this common source of "undefined behaviour" when using nested monotonic allocations.
While the C++ standard container types can be used with this implementation, some limitations apply in respect to "node recycling" .
For all uses cases that constitute monotonic allocation without the need of recycling internal container node objects, template types that satisfy std::allocator
are provided with StdContMA and StdContMAOptional .
Both can be used with any container type found in namespace std
. To use them, an understanding general concept of using std::allocator
type is helpful.
As most of ALib types do, also these types dispose about an alias type definition in namespace alib.
A standard container is declared by providing the allocator types alib::StdContMA<T> or alib::StdContMAOptional<T> for the allocator template typename with repeating the contained object's type:
With the definition of an instance of a container, an instance of the std::allocator
type has to be passed to the constructor of the container. This often requires a pair of extra brackets, as otherwise the compiler might consider the construction of a temporary of type StdContMA<T> to be a function declaration:
If an allocator that is used by a standard container is reset, the container object itself has to be reset. Because, the standard containers do not support such reset, the simple way out of it is to:
The documentation of class StdContMA provides a source code sample that demonstrates this.
The challenge of implementing recycling for the C++ standard container types lies in the fact that the standard specification does not cover a container's implementation but only its interface. Consequently, the internal node type is not determined. Even worse, it is not even determined if the type allocates and de-allocates single node objects at all, or if any other implementation strategy is followed.
Nevertheless, this ALib Module provides type StdContMARecycling which performs recycling. The problem is: It is not guaranteed to work on each platform/toolset!
The type internally uses util::RTTRAllocator , where RTTR stands for run-time type recycler. This means: The type of recyclable node objects is only detected at run-time. If such detection fails, in debug-compilations different ALib warnings are raised.
In practice, the type works with all implementations of the standard library known to the authors of ALib . Nevertheless it has to be warned that switching the library/platform/tools of a compilation's target platform, imposes the need to test the new tool chain.
For this reason it is generally recommended to use the container types this module provides instead of using the standard types.
Being a platform agnostic library, of-course ALib does not use type StdContMARecycling anywhere internally.
With C++ language version 17
, the standard library was extended by namespace std::pmr
which stands for "polymorphic memory resource". The standard here primarily aims to allow two containers that use different types of memory allocation, to be of the same type.
Included in this namespace is class std::pmr::monotonic_buffer_resource which introduces monotonic memory allocation to the standard and is comparable to class MonoAllocator .
As of today, ALib does not provide a compatibility layer to entities found in std::pmr
. A foreseeable first bridging entity would be the implementation of abstract interface type std::pmr::memory_resource with using class MonoAllocator.
For now, such implementation can be done by a user of both, ALib and std::pmr
, with a few lines of code only.
It is possible to emplace an instance of type MonoAllocator in the first chunk of memory it allocates. To achieve this, static method MonoAllocator::Create is provided, which returns a pointer to the allocator.
Allocators created with this technique must not be destructed using keyword delete
, but their destructor must be invoked "manually".
The following code snippet demonstrates this:
Note that monomem::Destruct is a templated inline namespace utility function that simply calls the constructor of the object given. In theory any object could be passed, regardless of monotonic allocation. The function is just given for better readability.
The concept of placing a monotonic allocator inside its first chunk of memory is extended to furthermore placing any custom type inside that first chunk and making the allocator a field of that custom type.
Let us look at a sample. A dictionary that maps words to other words should be implemented. First we need to define the data members needed, which is just one hash table:
Then we can define the custom type, which inherits class SelfContained . The struct that defines the custom fields is given as a template parameter to this base class.
When implementing a method, keyword this
has to be replaced by a call to method SelfContained::Self .
The sample class then looks like this:
Note that the hash map that implements the dictionary is of-course using the very same monotonic allocator that class SelfContained creates for itself. However, as it can not be accessed at construction time of struct FieldsOfDictionary, the constructor of class Dictionary catches up on providing that.
If an object of type Dictionary is constructed, all is internally set up and cleaned on destruction:
While it may not be very reasonable to make the effort emplace a type that implements a dictionary inside the allocator that it uses, in other occasions this surely is. An advantage of this concept is that the footprint of the resulting type has just the size of one pointer and this way instances can be moved to other places very easily.
The standard throwable type of ALib , class Exception , is a prominent sample.
There is no built-in protection against racing conditions caused by parallel allocations invoked by different threads. Any mechanism of protection has to be performed by the using code.
new operator
or calls to std::malloc
are protected. This is why this quick chapter hints to that fact.In the case that the life-span of a set of objects suitable for monotonic allocation equals the complete life-span of the software process itself, the global allocator singleton GlobalAllocator may be used for allocations.
As this allocator is deemed to be shared across many compilation units, its use has to be protected against racing conditions of multi-threaded access by acquiring GlobalAllocatorLock before use.
The standard pattern of use is to gain access to the global allocator using namespace function AcquireGlobalAllocator and after its use to release it with sibling function ReleaseGlobalAllocator .
With single-threaded applications, the allocator's lock may be ignored and instead the global allocator instance might be accessed and used directly.
With multi-threaded applications it has to be acknowledged that the global instance has to be locked whenever it is used. If during acquisition, the global instance is for example passed to one of the container types, then whenever this container type is potentially allocating memory, mutex GlobalAllocatorLock has to be acquired. This can be "overseen" easily. For this reason, the interface of class MonoAllocator asserts in debug-compilations, that either the current instance (aka the this
pointer) does not equal the object GlobalAllocator or that it is duly locked.
A second case for using the global allocator singleton is using it in combination with snapshots. If sets of monotonic allocatable objects of short life-cycle, may use the allocator singleton and reset it to its previous state afterwards.
The advantage of doing so is that memory chunks allocated by the allocator during the "nested" monotonic allocation, are reused with a next use. This furthermore reduces the overall amount of dynamic allocations.
Apart from this advantage, the disadvantage may be huge: In multi-threaded applications, the global instance has to remain locked until it is reset to the snapshot. Furthermore, it has to be assured that no other code invoked is using the global instance, as resetting the global instance would invalidate any allocation made since the snapshot was taken. Errors resulting from such behaviour might be difficult to debug.
As a conclusion, this use pattern is not recommended, unless a programmer knows exactly what and why she is doing it.
To monotonically allocate copies of constant strings, convenience method MonoAllocator::EmplaceString is given.
With type MAString , a buffered string type is available. It specializes class ALib type AString and uses monotonic allocation for its first, initial memory needed to copy the parameter given with construction. If with later use, the initially reserved character array is exceeded, the type uses dynamic allocation.
For cases where the initial buffer size is not known, but an average or maximum size is, an alternative to type MAString is to emplace an instance of class LocalString inside monotonic memory. The effect of doing so is that the internal reserved buffer is monotonically allocated together with the object itself. If the buffer is exceeded, dynamic allocation is used.
It is good practice to use type LocalString (for example String32 with container types. They can be used for attributes of the contained value the same as for key-attributes (for example in case of HashTable
). Their fixed (templated) internal buffer size has to be chosen carefully depending on the data that the strings are supposed to hold. The goal here is to find a good trade-off between minimizing the average allocation overhead and minimizing the number of occasional dynamic allocations caused by buffer exceeds.
Note that no string type is given that stores the monotonic allocator to be used with future capacity increase. To not provide such type is a design decision.
With debug compilations and symbol ALIB_DEBUG set, common assertions are activated in the sources of this module.
With special compilation symbol ALIB_DEBUG_MONOMEM, the following features are activated:
15
chunks have been allocated.