Skip to content

Conversation

rockwotj
Copy link
Contributor

Wasmtime assumes that memory returned from allocators is zero filled. We need to do this for our custom allocators too.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.2.x
  • v23.1.x
  • v22.3.x

Release Notes

  • none

This is assumed by wasmtime for these memory APIs

Signed-off-by: Tyler Rockwood <[email protected]>
@rockwotj
Copy link
Contributor Author

Force push: Fix release build

@rockwotj
Copy link
Contributor Author

Force push: Remove a debug build check that was causing asan false positives

* don't have to touch all the bytes.
*/
void deallocate(heap_memory);
void deallocate(heap_memory, size_t used_amount);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This amount is user configurable and in theory could be 10s of MB, I guess that is too expensive to memset in a single go? Do we need to do this in an async friendly way?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like chunk-wise with yields inerspersed or something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does N have a concrete upper bound?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4GiB, probably in practice it's much lower (the default is 2MiB), but for example golang (not tinygo) is requiring 50MB in my tests for a non trivial program (seems like a lot but 🤷)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have an idea to do this in a fairly lightweight way, but I will do in a followup PR, not this one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would assume that as long as memset(0 is on a contiguous region it is very fast. i dunno tho what the limit is where it becomes costly enough to be a concern.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion is to get this in, then I'll create a benchmark

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also page aligned, that should help

Copy link
Member

@oleiman oleiman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

* don't have to touch all the bytes.
*/
void deallocate(heap_memory);
void deallocate(heap_memory, size_t used_amount);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like chunk-wise with yields inerspersed or something?

Comment on lines +125 to +127
allocated = allocator.allocate(req);
ASSERT_TRUE(allocated.has_value());
EXPECT_THAT(allocated, Optional(HeapIsZeroed()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: what's the purpose of this second allocation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's reused, so its ensured memory is zeroed when returned "dirty" to the allocator.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah i see

@rockwotj rockwotj requested a review from oleiman November 22, 2023 02:05
@rockwotj
Copy link
Contributor Author

/ci-repeat

oleiman
oleiman previously approved these changes Nov 22, 2023
Copy link
Member

@oleiman oleiman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might want input from noah regarding the big memset, but lgtm

@vbotbuildovich
Copy link
Collaborator

Copy link
Member

@dotnwat dotnwat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm


void heap_allocator::deallocate(heap_memory m) {
void heap_allocator::deallocate(heap_memory m, size_t used_amount) {
std::memset(m.data.get(), 0, std::min(used_amount, m.size));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defensive programming and all, yay. but it also reads as if more could have been used than the buffer size. is that a thing? maybe used_amount corresponds to an amount that perhaps spans instances of heap_memory? that would explain it i think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I should just remove the std::min it's overly defensive

* don't have to touch all the bytes.
*/
void deallocate(heap_memory);
void deallocate(heap_memory, size_t used_amount);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would assume that as long as memset(0 is on a contiguous region it is very fast. i dunno tho what the limit is where it becomes costly enough to be a concern.

wasmtime assumes that memory is zero filled, so ensure that heap memory
is zero filled. We also force callers to tell the allocator how much of
that heap was actually used, so that deallocation can memset only the
changed memory - this is an optimization for large memories if only part
of the memory is used.

Signed-off-by: Tyler Rockwood <[email protected]>
Adds tests that we ensure memory is always zero filled

Signed-off-by: Tyler Rockwood <[email protected]>
@rockwotj
Copy link
Contributor Author

Force push: Remove overly defensive cap to memset

@rockwotj rockwotj requested review from oleiman and dotnwat November 22, 2023 15:27
Copy link
Member

@oleiman oleiman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rubber stamp

@rockwotj rockwotj merged commit 35528be into redpanda-data:dev Nov 22, 2023
@rockwotj rockwotj deleted the memset branch November 22, 2023 16:21
@rockwotj
Copy link
Contributor Author

@oleiman @dotnwat on my local machine:

test iterations median mad min max allocs tasks inst
Memset.Speed_20MiB 3323 299.810us 208.985ns 299.365us 300.527us 1.000 0.000 367.9
Memset.Speed_100MiB 719 1.383ms 4.891us 1.378ms 1.392ms 1.000 0.000 371.1
Memset.Speed_2MiB 64465 16.100us 9.519ns 15.781us 16.112us 1.000 0.000 361.0

Presumably arm is a little slower, so I suspect most initial transforms will be fine in terms of budget, but there are other things going in this task when we dealloc an instance, so I can make it async

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants