Jun 4, 2018

[C++][cppcon 2018][note] “C++17's std - -pmr Comes With a Cost" - David Sankel



What std::pmr trying to solve?
  • Allocation is slow
  • Fragmentation hurts performance
  • pre-C++17 allocator is overly complicated

Speed
The memory_resource should have global lifetime.
When passing pointer into function(including constructor), always consider the lifetime of the pointer it points to.


Ok, here's the tricky part.
Consider code below presented by David Sankel:
int main() {
    static Loggingresource memoryResource(
        std::pmr::new_delete_resource()}; // why static? why not local?
    
    std::pmr::set_default_resource(&memoryResource);
    
    std::pmr::vector<int> ints;
    ints.push_back(42);
}

The reason for why making memoryResource global variable? It's due to in different TU(translation unit), functions could have static variables as well, and the init. sequence of this static variable depends on when the function being called. If the static variable of a function from other TU using std::pmr::vector, the destruction sequence of it is later then local variable of main function.

This reminds us the well known article from Alexander Bernauer: RAII vs. exit()

ALWAYS CONSIDER VARIABLE CONSTRUCT/DESTRUCT SEQUENCE

Refer to Linking notes and How main() is executed on Linux
  1. starts up program threading
  2. call _init()
    __attribute__((constructor))
  3. **registers** the _fini() and _rtld_fini(). called after program terminiates.
    __attribute__((destructor))
  4. call main()
class Bar2 {
    std::string data{"data"};
};


class Foo2 {
    std::unique_ptr<Bar2, 
         polymorphic_allocator_delete /* our deleter taking memory_resource ptr */> 
             d_bar;
    
    public:
        Foo2() : d_bar(nullptr /* init. with null */, {{std::pmr::get_default_resource()}}) {
            std::pmr:polymorphic_allocator<bar2> alloc{
                std::pmr::get_default_resource()};
            Bar2 *const bar = alloc.allocate(1); // 1 as one Bar2 instance. Not sizeof(Bar2)
            // Be ware here, we should have a try/catch here
            // since Bar2 constructor can throw.
            // and manually call alloc.deallocate 
            // Remember Effective C++ Item 52, placement new
            // and placement delete should go in pairs.
            alloc.construct(bar);
            d_bar.reset(bar);        
        }
};
To the question from audience asking why using polymorphic allocator instead of using placement new?

Because, because...
operator new has scope...
there's deleting destructor...
Allocator gives us a uniform way to grab memory from heap.

Destructor should NOT throw.
It's hard, but take that into consideration, seriously.

Recap:
[C++11] destructor with noexcept
[C++] Exception in detail.


std::pmr::polymorphic_allocator::destroy noexcept(false)
std::pmr::polymorphic_allocator::deallocate noexcept(false)
class polymorphic_allocator_delete{
    public:
        polymorphic_allocator_delete(
            std::pmr::polymorphic_allocator<std::byte> allocator)
            : d_allocator(std::move(allocator) /* actually, a copy, not move */) 
            {}
        template<typename T>
        void operator() (T *ptr) {
            std::pmr::polymorphic_allocator<T>(d_allocator).destroy(ptr);
            std::pmr::polymorphic_allocator<T>(d_allocator).deallocate(ptr, 1);
        }
        
    private:
        std::pmr::polymorphic_allocator<std::byte> d_allocator;
};
std::pmr::polymorphic_allocator::polymorphic_allocator has no move constructor

With this implementation, the size being allocated is larger with extra 4 words due to unique_ptr has captured a pointer to deleter.

Try using type instead of using ptr to function, which prior has only 1 byte.
Ref: [C++14] unique_ptr with type erasure as shared_ptr

Most of the STL container has a std::pmr namespace version.

Do NOT change the default global memory_resource during the run time. It's dangerous.

We can use std::byte or void for polymorphic_allocator:
std::pmr::polymorphic_allocator<std::byte>
std::pmr::polymorphic_allocator<void>
By using void is align with 'new' operator, which returns void*


Strong exception safety

IFF value type's move constructor is noexcept, otherwise container will go back using value type's copy constructor instead. When? For std::vector, while it need's more slots.

class Foo {
    std::pmr::polymorphic_allocator<std::byte> d_allocator;
    std::unique_ptr<Bar2, 
        polymorphic_allocator_delete
           /* our deleter taking memory_resource ptr */> 
               d_bar;
    
    public:
        // Let's focus on move constructor

        // passing in new allocator
        Foo(Foo&& rhs, std::pmr::polymorphic_allocator<std::byte> allocator);
        Foo(Foo&& rhs) noexcept : 
        d_allocator(other.d_allocator /* no need move because it's copy ptr anyway */), 
        d_bar(nullptr, {d_allocator /* Make sure d_allocator is declared before d_bar*/})
        {
            d_bar.reset(other.d_bar.release());
        }
};
So, can we get rid of d_allocator? Since it's just a placeholder for global memory_resource.

Yes, we can.

How? The data member has the memory_resource which shares it. We can grab it from there.

Sum up

Excerpt from the talk, allocator awareness best practices:
  • Fix allocator at construction
  • Allocator argument to constructor, copy constructor, move constructor, and move copy constructor passing allocator to data members(containers)
  • move constructor takes extra new allocator argument
  • A member type of std::pmr::polymorphic_allocator<std::byte>
  • get_allocator call from data member to get the shared memory_resource
  • Always use global storage for the default allocator(arguable..)
  • Set the default allocator only in main(reasoning due to static namespace variable in different TU init. sequence isn't guaranteed.
It's becoming complicated...
The doctrine to keep in mind is that, fix allocator in the code path. If it's not, there is a problem.

I do not think this pmr adding is mature enough for all data type instances since it's not being supported by all std:: datatypes and corner cases needs to be concerned. Yet, it could; however, being used in scope to tackle with certain problems.
i.e memory pool

Reference:
How tcmalloc Works
jemalloc

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.