-
Notifications
You must be signed in to change notification settings - Fork 4.7k
Description
I recently wrote a similar tool to run-resourcewatch here: https://codeberg.org/mdbooth/k8s-object-collector. I suspect I could bring some of that code to run-resourcewatch.
From skimming code comments I might be able to help with:
- Robust handling of restarts, including detection of missed deletes
origin/pkg/resourcewatch/operator/starter.go
Lines 18 to 19 in 2525941
// this doesn't appear to handle restarts cleanly. To do so it would need to compare the resource version that it is applying // to the resource version present and it would need to handle unobserved deletions properly. both are possible, neither is easy. - Asynchronous handling of informer notifications (and no informer)
origin/pkg/resourcewatch/controller/configmonitor/crd_controller.go
Lines 17 to 21 in 2525941
// this is an unusual controller. it really wants an pure watch stream, but that change is too big to reason about at // the moment. For the moment we'll allow it have synchronous handling of informer notifications. This has severe consequences // for cache correctness and latency, but it keeps me from having rip out more logic than I want to. // It doesn't logically need to run because there is no sync method. it's all handled by the gitStorage. // if you ask for a resource that doesn't exist, it will simply repeated error until it appears while watching all the other types.
k8s-object-collector runs each resource collector in a separate go thread in a loop which does list
and watch
. It doesn't use an informer. It emits meta events for the list and watch operations, so it can detect if an object it knows about wasn't listed, i.e. we missed a delete. The threads all write to a single channel, so the output is a synchronous combined stream of objects.
My test case wasn't collecting as many resources as run-resourcewatch, but it was collecting events and pods which likely make up the bulk of the volume. My workstation was entirely un-stressed simply writing an un-processed stream of json objects.
A concrete proposal:
- Copy (with appropriate refactoring) the
collect
andfilter
packages from k8s-object-collector, which do raw collection and de-deduplication/delete reconstruction respectively. - Add 2 new commands to run-resourcewatch
collect
does resource collection only, writing to a raw json fileto-git
post-processes the output ofcollect
into the same format that is currently produced
- Maintain the existing behaviour of run-resourcewatch when called with the same arguments, doing both synchronously for compatibility with existing jobs
- Re-use the existing
resourcewatch/storage
package to ensure the resulting git-repo output remains the same.
IIUC an issue with the current implementation is performance writing to git. Assuming the performance of writing a stream of json objects remains acceptable, moving the creation of the git repo to a post-processing step should resolve this.