-
Notifications
You must be signed in to change notification settings - Fork 678
wasm: asynchronously zero deallocated heaps #15101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Turns out the VM guest code failures where due to the memory not being zero allocated. Now that we memset(0) memory before giving it to the VM we can run this without failures, so let's enable it in CI. Signed-off-by: Tyler Rockwood <[email protected]>
We can't use this with our custom heap allocator, so just disable its usage. Before this was causing issues where disabling this caused our transforms to fail, but now that we memset(0) memory before giving it to the VM we can disable its generation. Signed-off-by: Tyler Rockwood <[email protected]>
We have to memset linear memory and this adds a benchmark to see if it could exceed the task budget. It looks like larger memories can. +---------------------------------------------------------------------------------------------+ | test iterations median mad min max inst | +=============================================================================================+ | Memset.Speed_2_MiB 62234 16.173us 3.127ns 16.152us 16.176us 340.0 | +---------------------------------------------------------------------------------------------+ | Memset.Speed_10_MiB 7069 141.089us 173.902ns 140.483us 141.595us 343.4 | +---------------------------------------------------------------------------------------------+ | Memset.Speed_20_MiB 3323 299.730us 2.807us 296.837us 325.636us 347.1 | +---------------------------------------------------------------------------------------------+ | Memset.Speed_30_MiB 2267 438.801us 2.918us 434.562us 448.917us 347.3 | +---------------------------------------------------------------------------------------------+ | Memset.Speed_50_MiB 1443 690.689us 208.414ns 690.007us 695.562us 423.0 | +---------------------------------------------------------------------------------------------+ | Memset.Speed_80_MiB 902 1.104ms 543.139ns 1.104ms 1.105ms 349.3 | +---------------------------------------------------------------------------------------------+ | Memset.Speed_100_MiB 719 1.401ms 20.215us 1.381ms 1.422ms 350.1 | +---------------------------------------------------------------------------------------------+ Signed-off-by: Tyler Rockwood <[email protected]>
250dea9
to
fd02ec4
Compare
Force push: fixup an extra header |
fd02ec4
to
b8b0d0a
Compare
b8b0d0a
to
e6049ee
Compare
Force Push: simplify allocator API and pass memory into VM via a thread local |
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/41828#018c1454-490e-45a0-b1ef-18e53a54013f ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/41871#018c1711-fe02-4597-ac81-46f5d1fa44a1 ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/41891#018c182f-a723-4a39-b2b2-367614cead6b |
1d41140
to
9941b6a
Compare
// should always be unset | ||
// | ||
// NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables) | ||
static thread_local std::optional<heap_memory> prereserved_memory; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iiuc then this locks us into always running on the reactor? not that we will go back to the before times, but it seems like a month or two ago we rid ourselves of the last remaining bits running off-reactor? are there changes that might occur in the future where the thread local assumption (ie reactor and the consumer/producer of this rendezvous object are always on the same thread) will no longer hold and we'll get undefined behavior since we have an unchecked deference to this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are there changes that might occur in the future where the thread local assumption
Maybe? I cannot predict what we do in the future, but the tests fail loudly. And there is no ub or unchecked dereference - or did I miss something?
Where we grab the memory we use standard error handling:
auto memory = std::exchange(prereserved_memory, std::nullopt);
if (!memory) {
vlog(
wasm_log.error,
"attempted to allocate heap memory without any memory in thread "
"local storage");
return wasmtime_error_new("preserved memory was missing");
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And there is no ub or unchecked dereference - or did I miss something?
No you didn't miss anything. I was mixing up the name of the thread_local prereserved_memory
with this other line auto requested = _preinitialized->mem_limits();
that was added.
I cannot predict what we do in the future
yeh, sorry didn't mean to suggest we needed to predict the future, just that it was loud when it did fail. that the dereference is checked i think satisfies that goal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mixing up the name
Happy to take naming suggestions! BTW I've been noodling about breaking up this file into multiple as it's grown quite large and I think it's harder to grok what's going on at this point than when I initially wrote it. The benefit is constructs like prereserved_memory
could be hidden behind some API like with_scoped_variable([]() { /* */ })
We need to memset(0) memory for wasmtime, however if the user has configured large chunks of memory, then this can cause us to go over the task budget. In an effort to prevent that, we deallocate asynchronously. Since allocation is now asynchronous (due to a pending zero operation) we need a way to fit that into wasmtime's custom memory allocator API. In order to do that we use a thread local variable as a side channel to pass the allocated buffer into the VM, since wasmtime's APIs don't give us the ability to pass any data into the allocation request. Signed-off-by: Tyler Rockwood <[email protected]>
9941b6a
to
9341ce7
Compare
Force push: Clarify comment about how the thread local is used. |
// should always be unset | ||
// | ||
// NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables) | ||
static thread_local std::optional<heap_memory> prereserved_memory; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And there is no ub or unchecked dereference - or did I miss something?
No you didn't miss anything. I was mixing up the name of the thread_local prereserved_memory
with this other line auto requested = _preinitialized->mem_limits();
that was added.
I cannot predict what we do in the future
yeh, sorry didn't mean to suggest we needed to predict the future, just that it was loud when it did fail. that the dereference is checked i think satisfies that goal.
Zeroing memory can cause reactor stalls if a user has configured dozens of MB of memory for a single transform. Prevent this from happening by chunking zeroing into many tasks.
Followup from #15063
Backports Required
Release Notes