Skip to content

Conversation

rockwotj
Copy link
Contributor

@rockwotj rockwotj commented Nov 22, 2023

Zeroing memory can cause reactor stalls if a user has configured dozens of MB of memory for a single transform. Prevent this from happening by chunking zeroing into many tasks.

Followup from #15063

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.2.x
  • v23.1.x
  • v22.3.x

Release Notes

  • none

Turns out the VM guest code failures where due to the memory not being
zero allocated. Now that we memset(0) memory before giving it to the VM
we can run this without failures, so let's enable it in CI.

Signed-off-by: Tyler Rockwood <[email protected]>
@rockwotj rockwotj self-assigned this Nov 22, 2023
We can't use this with our custom heap allocator, so just disable its
usage. Before this was causing issues where disabling this caused our
transforms to fail, but now that we memset(0) memory before giving it to
the VM we can disable its generation.

Signed-off-by: Tyler Rockwood <[email protected]>
We have to memset linear memory and this adds a benchmark to see if it
could exceed the task budget. It looks like larger memories can.

+---------------------------------------------------------------------------------------------+
| test                   iterations      median         mad         min         max      inst |
+=============================================================================================+
| Memset.Speed_2_MiB          62234    16.173us     3.127ns    16.152us    16.176us     340.0 |
+---------------------------------------------------------------------------------------------+
| Memset.Speed_10_MiB          7069   141.089us   173.902ns   140.483us   141.595us     343.4 |
+---------------------------------------------------------------------------------------------+
| Memset.Speed_20_MiB          3323   299.730us     2.807us   296.837us   325.636us     347.1 |
+---------------------------------------------------------------------------------------------+
| Memset.Speed_30_MiB          2267   438.801us     2.918us   434.562us   448.917us     347.3 |
+---------------------------------------------------------------------------------------------+
| Memset.Speed_50_MiB          1443   690.689us   208.414ns   690.007us   695.562us     423.0 |
+---------------------------------------------------------------------------------------------+
| Memset.Speed_80_MiB           902     1.104ms   543.139ns     1.104ms     1.105ms     349.3 |
+---------------------------------------------------------------------------------------------+
| Memset.Speed_100_MiB          719     1.401ms    20.215us     1.381ms     1.422ms     350.1 |
+---------------------------------------------------------------------------------------------+

Signed-off-by: Tyler Rockwood <[email protected]>
@rockwotj rockwotj force-pushed the async-memset branch 2 times, most recently from 250dea9 to fd02ec4 Compare November 22, 2023 21:08
@rockwotj
Copy link
Contributor Author

Force push: fixup an extra header

@rockwotj
Copy link
Contributor Author

Force Push: simplify allocator API and pass memory into VM via a thread local

@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Nov 28, 2023

@rockwotj rockwotj requested a review from dotnwat November 28, 2023 15:05
@rockwotj rockwotj force-pushed the async-memset branch 2 times, most recently from 1d41140 to 9941b6a Compare November 28, 2023 16:22
dotnwat
dotnwat previously approved these changes Nov 28, 2023
// should always be unset
//
// NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)
static thread_local std::optional<heap_memory> prereserved_memory;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iiuc then this locks us into always running on the reactor? not that we will go back to the before times, but it seems like a month or two ago we rid ourselves of the last remaining bits running off-reactor? are there changes that might occur in the future where the thread local assumption (ie reactor and the consumer/producer of this rendezvous object are always on the same thread) will no longer hold and we'll get undefined behavior since we have an unchecked deference to this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there changes that might occur in the future where the thread local assumption

Maybe? I cannot predict what we do in the future, but the tests fail loudly. And there is no ub or unchecked dereference - or did I miss something?

Where we grab the memory we use standard error handling:

    auto memory = std::exchange(prereserved_memory, std::nullopt);
    if (!memory) {
        vlog(
          wasm_log.error,
          "attempted to allocate heap memory without any memory in thread "
          "local storage");
        return wasmtime_error_new("preserved memory was missing");
    }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And there is no ub or unchecked dereference - or did I miss something?

No you didn't miss anything. I was mixing up the name of the thread_local prereserved_memory with this other line auto requested = _preinitialized->mem_limits(); that was added.

I cannot predict what we do in the future

yeh, sorry didn't mean to suggest we needed to predict the future, just that it was loud when it did fail. that the dereference is checked i think satisfies that goal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mixing up the name

Happy to take naming suggestions! BTW I've been noodling about breaking up this file into multiple as it's grown quite large and I think it's harder to grok what's going on at this point than when I initially wrote it. The benefit is constructs like prereserved_memory could be hidden behind some API like with_scoped_variable([]() { /* */ })

We need to memset(0) memory for wasmtime, however if the user has
configured large chunks of memory, then this can cause us to go over the
task budget. In an effort to prevent that, we deallocate asynchronously.

Since allocation is now asynchronous (due to a pending zero operation)
we need a way to fit that into wasmtime's custom memory allocator API.
In order to do that we use a thread local variable as a side channel to
pass the allocated buffer into the VM, since wasmtime's APIs don't give
us the ability to pass any data into the allocation request.

Signed-off-by: Tyler Rockwood <[email protected]>
@rockwotj
Copy link
Contributor Author

Force push: Clarify comment about how the thread local is used.

@rockwotj rockwotj requested a review from dotnwat November 28, 2023 21:54
// should always be unset
//
// NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)
static thread_local std::optional<heap_memory> prereserved_memory;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And there is no ub or unchecked dereference - or did I miss something?

No you didn't miss anything. I was mixing up the name of the thread_local prereserved_memory with this other line auto requested = _preinitialized->mem_limits(); that was added.

I cannot predict what we do in the future

yeh, sorry didn't mean to suggest we needed to predict the future, just that it was loud when it did fail. that the dereference is checked i think satisfies that goal.

@rockwotj rockwotj merged commit dbd628d into redpanda-data:dev Nov 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants