Skip to content

After enabling web authentication for Prometheus and Alertmanager, they may be incorrectly identified as being in a "down" state. #2515

@pepezzzz

Description

@pepezzzz

Bug Report

Please answer these questions before submitting your issue. Thanks!

  1. What did you do?

$ htpasswd -nBC 12 '' | tr -d ':\n'
New password: test
Re-type new password: test
$2y$12$lxnYTbGAW8m08Ss5GkCuSut7dw4kOKQ2abR.iXMr0o7LfwRD1mKNW

$ cd /tidb-deploy/prometheus-9090/bin/prometheus
$ cat webconfig.yml
basic_auth_users:
admin: $2y$12$lxnYTbGAW8m08Ss5GkCuSut7dw4kOKQ2abR.iXMr0o7LfwRD1mKNW

$ tiup cluster edit-config tidb-test

monitoring_servers:
  ssh_port: 22
  port: 9090
  ng_port: 12020
  deploy_dir: /tidb-deploy/prometheus-9090
  data_dir: /tidb-data/prometheus-9090
  log_dir: /tidb-deploy/prometheus-9090/log
  external_alertmanagers: []
  arch: amd64
  os: linux
  additional_args:
  - --web.config.file=/tidb-deploy/prometheus-9090/bin/prometheus/webconfig.yml

$ tiup cluster reload tidb-test -R prometheus

a. incorrectly identified as being in a "down" state.

$ tiup cluster display tidb-test
...
172.16.201.145:9090 prometheus 172.16.201.145 9090/12020 linux/x86_64 Down /tidb-data/prometheus-9090 /tidb-deploy/prometheus-9090

b. tiup cluster diag is unable to collect monitoring data properly.

tiup cluster diag reports "Error collecting metrics from Prometheus node: failed to get metric list from 172.16.201.18:9103: operation exceeds the max retry attempts of 3.error of last attempt: [401] Unauthorized the data might be incomplete."

c. Grafana is unable to access Prometheus configured with web authentication properly.
Grafana reports "Templating [instance] Error updating options: Authentication to data source failed "
If you manually configure web authentication information in the Grafana settings interface, this information will be lost if the cluster undergoes a reload operation.

d. Prometheus is unable to invoke the Alertmanager configured with web authentication.
If you manually configure the web authentication information in the Prometheus YAML file, this information will be lost if the cluster performs a reload operation.

  1. What did you expect to see?

Prometheus status is up.
tiup cluster diag is able to collect monitoring data properly.
Grafana is able to access Prometheus configured with web authentication properly.
Prometheus is able to invoke the Alertmanager configured with web authentication.

  1. What did you see instead?

Prometheus status is down. The liveness probe mechanism of the TiUP tool for Prometheus does not support web authentication.

  1. What version of TiUP are you using (tiup --version)?

tiup 1.16.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    type/bugCategorizes issue as related to a bug.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions