Speed up trace mode (GSoC Project)

Fri Apr 5 19:35:08 UTC 2019

Hi Clemens,

I see you're getting the hang of the difficulties of the project now :)
>

That’s really encouraging for me to know that you think so ^_^

> You have the right ideas to solve the problem. Do keep in mind though
> that CAS will only work up to a word size or a double word size at most,
> i.e. swapping more than 64 bit with CAS atomically is probably not going
> to work.
>

Got it. I may swap one by one instead if swapping the complete structure
and its just 4 variables.

>
> I'm not sure whether we will actually need a separate process to
> increase the allocation size. Consider this proposal:
>
>   struct shared_block_mgmt {
>     size_t last_known_size;
>     size_t refcnt;
>     int blockfd;
>     struct shared_block* block;
>   };
>
>   struct shared_block_mgmt block_mgmt;
>
>   struct shared_block {
>     size_t size;
>     size_t used;
>     char memory[];
>   };
>
> This struct would mean that the first `used` bytes of `memory` are
> in-use by our data structure and everything after that up to `size`
> bytes is free[1].
>
> Any allocation request where used + request_size <= size would succeed
> and we could change `used` using CAS to ensure that this allocation
> operation is atomic.
>
> Any allocation request where used + request_size > size would trigger
> growing the memory area. Growing would calculate the size we want to
> grow to, let's call this `target_size`. Then, it would attempt to grow
> atomically using:
>
>   size_t local_size;
>   size_t local_used;
>   size_t target_used;
>   size_t target_size;
>   bool needs_resize;
>   do {
>     local_used = block_mgmt.block->used;
>     local_size = block_mgmt.block->size;
>     target_used = local_used + request_size;
>     needs_resize = target_used < local_size;
>     if (needs_resize) {
>       // growing required
>       target_size = local_size + BLOCK_GROWTH;
>       ftruncate(block_mgmt.blockfd, target_size);
>

What I was doing is to keep a file created beforehand and append it
whenever needed to the existing file.
ftruncate() is much better & cleaner approach I guess.

>     }
>   } while (
>       (needs_resize &&
>         !CAS(&block_mgmt.block->size, local_size, target_size) &&
>         !CAS(&block_mgmt.block->used, local_used, target_used)) ||
>       (!needs_resize &&
>         !CAS(&block_mgmt.block->used, local_used, target_used)));
>
>   // At this point, either resizing was not needed and block->used is
>   // what we expect it to be (i.e. allocation succeeded), or resizing
>   // was needed, and we did successfully resize, and after resizing we
>   // did update block->used (i.e. allocation also succeeded).
>
> This will oportunistically call ftruncate with the bigger size on the
> mmap'd file descriptor, but calling ftruncate in multiple processes at
> the same time should not be a problem unless the truncation would ever
> shrink the file (which we can not completely avoid, but make very
> unlikely by making BLOCK_GROWTH

Maybe, we can call ftruncate outside the loop and just set size first.
I dunno if it is possible but if it is we can modify compare and swap to
behave in the following way :
If the process detects that used + request_size > size and area needs to be
grown, instead of swapping the size with
its new value, we can just increment the old value without caring what is
the old value by ‘requested_size’.
Now in this type of swapping, we just don’t need to care what is old value,
all we do is increment requested size to it.
So if 2 or more process do this simultaneously, they just end up
incrementing the memory correctly. And to keep a margin we can add some
extra_growth to each.
Although I m not sure if atomic operations allow this,
a better approach maybe to just share a variable called maxFileMemory which
can only be swapped if the new value being provided is greater.
Before mmaping for write or read we may just ensure if current file size
matches maxFileMemory.
This maybe would prevent shrinking.

Or maybe we somehow can block shrinking by implementing our own ftruncate
or some other way maybe.

I guess I will need to think of them more briefly about cases and
everything to get it in right words I guess.

> sufficiently large so that any pending
> extension requests would be processed before another extension is
> required).

>
You correctly identified that we'd have to re-mmap(2) the file after
> this operation, which is why I've included last_known_size in struct
> shared_block_mgmt. If last_known_size differs from the size stored in
> struct shared_memory, that would trigger creation of a new struct
> shared_block_mgmt with a fresh mmap with the larger size. We cannot
> unmap the old mapping until all threads are no longer using it, so we'd
> have to keep a reference count (field refcnt). And finally, we'd
> probably have to check whether any offsets in the data structure are
> larger than last_known_size, and if so, retry the current lookup
> operation with a fresh mmap(2) of the shared memory area, since it must
> have grown during the pending lookup.
>

Yes true and actually this way implementing it makes it more precise and
clean.

>
> I believe this would allow us to avoid the separate process to grow the
> shared memory and also not require the use of a spinlock to block all
> processes for the resize operation. What do you think?
>

That’s right, this will do it in a single process.

The reasons for me creating a new process were:

a) The process expanding the memory can  die in between because we dunno
about that process
Although the above approach would make the other process enter a loop of
creating memory automatically, so this removes the need of separate process.

b) As I was creating a brand new file which was to be appended, if many
build process tried to create more memory simultaneously, the last process
which creates that new file would actually create that and if the any other
processes of the many existing ones were gonna append the file of what they
think it was, would end up appending the one created by last process and
not actually appending the file they desired.

c) It mainly was intended to give more control over the tasks.

> [1] This memory management approach may actually not work for us later
> on and we may have to use a free-list or buddy system with a bitmap, but
> for the point of discussing the extension, let's simplify the memory
> allocation for now.
>

Got it!

Tomorrow I will try to analyse all the cases on this implementation so if
any downsides are left, they can be removed and I will update you with a
pdf.

Regards,
Mihir
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.macports.org/pipermail/macports-dev/attachments/20190406/7c00b670/attachment.html>