-
Notifications
You must be signed in to change notification settings - Fork 456
sensors: cache spec when loading maps #3685
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
92fefa3
to
7422adb
Compare
For every map we load, we find its program, load its spec and then load the map. This introduces unnecessary overhead when loading the maps. This commit, caches the specs and only loads them if we have not seen the program before. This speeds up loading a sensor, but also tests. For example, Without the patch, running: go test -exec sudo ./pkg/sensors/tracing -bpf-lib $(pwd)/bpf/objs -test.run TestKprobeSelectors -count 1 three times, results in: ok github.com/cilium/tetragon/pkg/sensors/tracing 8.773s ok github.com/cilium/tetragon/pkg/sensors/tracing 8.703s ok github.com/cilium/tetragon/pkg/sensors/tracing 8.739s With the patch, the same command results in: ok github.com/cilium/tetragon/pkg/sensors/tracing 7.419s ok github.com/cilium/tetragon/pkg/sensors/tracing 7.532s ok github.com/cilium/tetragon/pkg/sensors/tracing 7.491s Which is a ~14% improvement. Signed-off-by: Kornilios Kourtis <[email protected]>
7422adb
to
e9ca74d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, I wonder we could put loaderCache in Sensor object and factor the code so we could use it from within doLoadProgram .. but that might up with some struggle, so could be a follow up ;-) thanks
Might be nice to check memory usage from caching here. |
Commit 8ce47ef, introduced a cache for the program specs when loading maps. The go tests did not run in the corresponding PR (#3685) and, as a result, we ended up with the following failure in main for some (rhel8.9, 5.15, 5.10, 5.3) kernels for the TestKproveOverideMulti test: observer_test_helper.go:465: SensorManager.AddTracingPolicy error: sensor generic_kprobe from collection sys-openat-signal-override failed to load: failed prog /home/kkourt/src/tetragon/bpf/objs/bpf_generic_kprobe_v511.o kern_version 331700 loadInstance: opening collection '/home/kkourt/src/tetragon/bpf/objs/bpf_generic_kprobe_v511.o' failed: using replacement map override_tasks: MaxEntries: 32768 changed to 1: map spec is incompatible with existing map The problem is that the map spec is updated and, because it is now cached, it is reused across multiple instances of the map. These instances have different pin paths and may have different configurations. Specifically, in the TestKproveOverideMulti one instance of the map is configured with a single entry, while all others are configured with 32768 entries. To address this problem, this commits makes a copy of the mapSpec before modifying it. Compared to 8ce47ef, this leads to worse performance but still better than without the cache. Running: go test -exec sudo ./pkg/sensors/tracing -bpf-lib $(pwd)/bpf/objs -test.run TestKprobeSelectors -count 1 This commit: ok github.com/cilium/tetragon/pkg/sensors/tracing 7.831s ok github.com/cilium/tetragon/pkg/sensors/tracing 7.849s ok github.com/cilium/tetragon/pkg/sensors/tracing 7.781s Without the cache (21b326c): ok github.com/cilium/tetragon/pkg/sensors/tracing 9.078s ok github.com/cilium/tetragon/pkg/sensors/tracing 9.074s ok github.com/cilium/tetragon/pkg/sensors/tracing 9.068s Which is a ~13.5% improvement. NB: not sure why the "without the cache" numbers are different from the ones reported in 8ce47ef. Maybe something changed in overhead in the meantime. Fixes: 8ce47ef ("sensors: cache spec when loading maps") Signed-off-by: Kornilios Kourtis <[email protected]>
Commit 8ce47ef, introduced a cache for the program specs when loading maps. The go tests did not run in the corresponding PR (#3685) and, as a result, we ended up with the following failure in main for some (rhel8.9, 5.15, 5.10, 5.3) kernels for the TestKprobeOverrideMulti test: observer_test_helper.go:465: SensorManager.AddTracingPolicy error: sensor generic_kprobe from collection sys-openat-signal-override failed to load: failed prog /home/kkourt/src/tetragon/bpf/objs/bpf_generic_kprobe_v511.o kern_version 331700 loadInstance: opening collection '/home/kkourt/src/tetragon/bpf/objs/bpf_generic_kprobe_v511.o' failed: using replacement map override_tasks: MaxEntries: 32768 changed to 1: map spec is incompatible with existing map The problem is that the map spec is updated and, because it is now cached, it is reused across multiple instances of the map. These instances have different pin paths and may have different configurations. Specifically, in the TestKproveOverideMulti one instance of the map is configured with a single entry, while all others are configured with 32768 entries. To address this problem, this commits makes a copy of the mapSpec before modifying it. Compared to 8ce47ef, this leads to worse performance but still better than without the cache. Running: go test -exec sudo ./pkg/sensors/tracing -bpf-lib $(pwd)/bpf/objs -test.run TestKprobeSelectors -count 1 This commit: ok github.com/cilium/tetragon/pkg/sensors/tracing 7.831s ok github.com/cilium/tetragon/pkg/sensors/tracing 7.849s ok github.com/cilium/tetragon/pkg/sensors/tracing 7.781s Without the cache (21b326c): ok github.com/cilium/tetragon/pkg/sensors/tracing 9.078s ok github.com/cilium/tetragon/pkg/sensors/tracing 9.074s ok github.com/cilium/tetragon/pkg/sensors/tracing 9.068s Which is a ~13.5% improvement. NB: not sure why the "without the cache" numbers are different from the ones reported in 8ce47ef. Maybe something changed in overhead in the meantime. Fixes: 8ce47ef ("sensors: cache spec when loading maps") Signed-off-by: Kornilios Kourtis <[email protected]>
Yeah, my initial patch did this, but I thought it was more substantial changes for not much performance benefit. We can revisit though. |
The cache is only maintained when we preload the maps of a sensor. When we are done with them, it goes away, so there shouldn't be a memory issue here. |
Commit 8ce47ef, introduced a cache for the program specs when loading maps. The go tests did not run in the corresponding PR (#3685) and, as a result, we ended up with the following failure in main for some (rhel8.9, 5.15, 5.10, 5.3) kernels for the TestKprobeOverrideMulti test: observer_test_helper.go:465: SensorManager.AddTracingPolicy error: sensor generic_kprobe from collection sys-openat-signal-override failed to load: failed prog /home/kkourt/src/tetragon/bpf/objs/bpf_generic_kprobe_v511.o kern_version 331700 loadInstance: opening collection '/home/kkourt/src/tetragon/bpf/objs/bpf_generic_kprobe_v511.o' failed: using replacement map override_tasks: MaxEntries: 32768 changed to 1: map spec is incompatible with existing map The problem is that the map spec is updated and, because it is now cached, it is reused across multiple instances of the map. These instances have different pin paths and may have different configurations. Specifically, in the TestKproveOverideMulti one instance of the map is configured with a single entry, while all others are configured with 32768 entries. To address this problem, this commits makes a copy of the mapSpec before modifying it. Compared to 8ce47ef, this leads to worse performance but still better than without the cache. Running: go test -exec sudo ./pkg/sensors/tracing -bpf-lib $(pwd)/bpf/objs -test.run TestKprobeSelectors -count 1 This commit: ok github.com/cilium/tetragon/pkg/sensors/tracing 7.831s ok github.com/cilium/tetragon/pkg/sensors/tracing 7.849s ok github.com/cilium/tetragon/pkg/sensors/tracing 7.781s Without the cache (21b326c): ok github.com/cilium/tetragon/pkg/sensors/tracing 9.078s ok github.com/cilium/tetragon/pkg/sensors/tracing 9.074s ok github.com/cilium/tetragon/pkg/sensors/tracing 9.068s Which is a ~13.5% improvement. NB: not sure why the "without the cache" numbers are different from the ones reported in 8ce47ef. Maybe something changed in overhead in the meantime. Fixes: 8ce47ef ("sensors: cache spec when loading maps") Signed-off-by: Kornilios Kourtis <[email protected]>
For every map we load, we find its program, load its spec and then load the map. This introduces unnecessary overhead when loading the maps. This commit, caches the specs and only loads them if we have not seen the program before.
This speeds up loading a sensor, but also tests.
For example, Without the patch, running:
go test -exec sudo ./pkg/sensors/tracing -bpf-lib $(pwd)/bpf/objs -test.run TestKprobeSelectors -count 1
three times, results in:
ok github.com/cilium/tetragon/pkg/sensors/tracing 8.773s
ok github.com/cilium/tetragon/pkg/sensors/tracing 8.703s
ok github.com/cilium/tetragon/pkg/sensors/tracing 8.739s
With the patch, the same command results in:
ok github.com/cilium/tetragon/pkg/sensors/tracing 7.419s
ok github.com/cilium/tetragon/pkg/sensors/tracing 7.532s
ok github.com/cilium/tetragon/pkg/sensors/tracing 7.491s
Which is a ~14% improvement.