Skip to content

Conversation

odinuge
Copy link
Member

@odinuge odinuge commented Apr 10, 2025

The HUBBLE_NODE_NAME env var is useful to add extra metadata to the "node_name" field for exported events. Often the Kubernetes node name is not very descriptive, and additional information is useful. This can be archived by eg. overriding the following helm values, where k8s will replace the $(NODE_NAME) with the existing NODE_NAME env var in the helm chart, populated from the actual node name.

tetragon.extraEnv=[{"name":"HUBBLE_NODE_NAME", "value":"$(NODE_NAME)-additional-info-here.domain.tld"}]

Prior to bcf9429 ("Watcher: fix NODE_NAME if missing") this worked as expected where the export events keep this value, and the Kubernetes pod watcher used the existing NODE_NAME. After that commit, the node watcher stated using the HUBBLE_NODE_NAME, resulting in it not receiving any pod events and pod attribution not working for exported events.

Fixes: bcf9429 ("Watcher: fix NODE_NAME if missing")

Fixes

Description

Changelog

events: fix  source pod attribution when env var HUBBLE_NODE_NAME is set

@odinuge
Copy link
Member Author

odinuge commented Apr 10, 2025

When installing using;

$ helm install tetragon cilium/tetragon -n kube-system '--set-json=tetragon.extraEnv=[{"name":"HUBBLE_NODE_NAME", "value":"$(NODE_NAME)-additional-info-here.domain.tld"}]'

Current main:

{
  "process_exec": {
    "process": {
      "exec_id": "a2luZC1jb250cm9sLXBsYW5lLWFkZGl0aW9uYWwtaW5mby1oZXJlLmRvbWFpbi50bGQ6MzEzNDAyODE0ODA3MjE5Ojg2OTIyOA==",
      "pid": 869228,
      "uid": 0,
      "cwd": "/",
      "binary": "/usr/bin/wget",
      "arguments": "tetragon.io",
      "flags": "execve rootcwd",
      "start_time": "2025-04-10T08:29:58.907019715Z",
      "auid": 4294967295,
      "docker": "97efb07e10aefbfacb68ee439eb1099",
      "parent_exec_id": "a2luZC1jb250cm9sLXBsYW5lLWFkZGl0aW9uYWwtaW5mby1oZXJlLmRvbWFpbi50bGQ6MDo4Njg5OTY=",
      "tid": 869228,
      "in_init_tree": false
    }
  },
  "node_name": "kind-control-plane-additional-info-here.domain.tld",
  "time": "2025-04-10T08:29:58.906997952Z"
}

With this patch;

{
  "process_exec": {
    "process": {
      "exec_id": "a2luZC1jb250cm9sLXBsYW5lLWFkZGl0aW9uYWwtaW5mby1oZXJlLmRvbWFpbi50bGQ6MzE1MzI2MDIzMTU0Mzg4Ojg5MjMxOA==",
      "pid": 892318,
      "uid": 0,
      "cwd": "/",
      "binary": "/usr/bin/wget",
      "arguments": "tetragon.io",
      "flags": "execve rootcwd",
      "start_time": "2025-04-10T09:02:02.143911508Z",
      "auid": 4294967295,
      "pod": {
        "namespace": "default",
        "name": "sh",
        "container": {
          "id": "containerd://97efb07e10aefbfacb68ee439eb109998d40639ea2a04dc4184b53b7efffecd1",
          "name": "sh",
          "image": {
            "id": "docker.io/library/alpine@sha256:a8560b36e8b8210634f77d9f7f9efd7ffa463e380b75e2e74aff4511df3ef88c",
            "name": "docker.io/library/alpine:latest"
          },
          "start_time": "2025-04-10T08:27:23Z",
          "pid": 81
        },
        "pod_labels": {
          "run": "sh"
        },
        "workload": "sh",
        "workload_kind": "Pod"
      },
      "docker": "97efb07e10aefbfacb68ee439eb1099",
      "parent_exec_id": "a2luZC1jb250cm9sLXBsYW5lLWFkZGl0aW9uYWwtaW5mby1oZXJlLmRvbWFpbi50bGQ6MDo4Njg5OTY=",
      "tid": 892318,
      "in_init_tree": false
    }
  },
  "node_name": "kind-control-plane-additional-info-here.domain.tld",
  "time": "2025-04-10T09:02:02.143938494Z"
}

Notice the lack of the pod field in the prior, and the fact that the node_name field contains the additional information we provided.

The HUBBLE_NODE_NAME env var is useful to add extra metadata to the
"node_name" field for exported events. Often the Kubernetes node name is
not very descriptive, and additional information is useful. This can be
archived by eg. overriding the following helm values, where k8s will
replace the $(NODE_NAME) with the existing NODE_NAME env var in the helm
chart, populated from the actual node name.

tetragon.extraEnv=[{"name":"HUBBLE_NODE_NAME", "value":"$(NODE_NAME)-additional-info-here.domain.tld"}]

Prior to bcf9429 ("Watcher: fix NODE_NAME if missing") this worked
as expected where the export events keep this value, and the Kubernetes
pod watcher used the existing NODE_NAME. After that commit, the node
watcher stated using the HUBBLE_NODE_NAME, resulting in it not receiving any
pod events and pod attribution not working for exported events.

Fixes: bcf9429 ("Watcher: fix NODE_NAME if missing")
Signed-off-by: Odin Ugedal <[email protected]>
Signed-off-by: Odin Ugedal <[email protected]>
@odinuge
Copy link
Member Author

odinuge commented Apr 10, 2025

cc @kevsecurity since you made the initial PR. I'm not 110% sold we need the fallback for k8s node since the helm chart always adds the NODE_NAME env var, and I'm not fully sure about the intention of #2824 - but I'm keeping the fallback for now.

We could also logline on startup with the actual name for easier debugging, since we spent a lot of time debugging why our setup broke.

@kkourt kkourt added the release-note/bug This PR fixes an issue in a previous release of Tetragon. label Apr 10, 2025
Copy link
Contributor

@kkourt kkourt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks.

@odinuge odinuge marked this pull request as ready for review April 10, 2025 09:41
@odinuge odinuge requested a review from a team as a code owner April 10, 2025 09:41
@odinuge odinuge requested a review from tixxdz April 10, 2025 09:41
@kkourt kkourt added needs-backport/1.2 This PR needs backporting to 1.2 needs-backport/1.3 This PR needs backporting to 1.3 needs-backport/1.4 labels Apr 10, 2025
@kevsecurity
Copy link
Contributor

cc @kevsecurity since you made the initial PR. I'm not 110% sold we need the fallback for k8s node since the helm chart always adds the NODE_NAME env var, and I'm not fully sure about the intention of #2824 - but I'm keeping the fallback for now.

We could also logline on startup with the actual name for easier debugging, since we spent a lot of time debugging why our setup broke.

Happy to revert. If the original situation that prompted my change arises again, I'll document better and take this use case into account.

@kkourt
Copy link
Contributor

kkourt commented Apr 10, 2025

cc @kevsecurity since you made the initial PR. I'm not 110% sold we need the fallback for k8s node since the helm chart always adds the NODE_NAME env var, and I'm not fully sure about the intention of #2824 - but I'm keeping the fallback for now.

For some context, a reason for this fallback is for setups where the tetragon agent runs in a k8s node but not via a daemonset deployed by helm, but via, for example, a systemd service.

@kkourt kkourt merged commit 29e9ebe into cilium:main Apr 10, 2025
43 of 44 checks passed
@kkourt kkourt mentioned this pull request Apr 10, 2025
@odinuge odinuge deleted the node-name branch April 10, 2025 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-done/1.4 needs-backport/1.2 This PR needs backporting to 1.2 needs-backport/1.3 This PR needs backporting to 1.3 release-note/bug This PR fixes an issue in a previous release of Tetragon.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants