[Feature Request]: Prometheus Metrics Instrumentation using prometheus-fastapi-instrumentator

### 🧭 Epic

**Title:** Prometheus Metrics Instrumentation
**Goal:** Every FastAPI service publishes a rich set of **Prometheus‑compatible** metrics (request count, latency, size, in‑progress, custom labels) at **/metrics** for unified observability across the platform.
**Why now:** Enables SLO dashboards, proactive alerting, and capacity planning before the traffic ramp in Q4.

---

### 🧭 Type of Feature

* [x] Observability / Monitoring

---

### 🙋‍♂️ User Story 1 — *Expose metrics endpoint*

**As a:** Site‑Reliability Engineer
**I want:** each container to expose Prometheus metrics at **/metrics**
**So that:** the platform Prometheus server can scrape, store and alert on service behaviour.

#### ✅ Acceptance Criteria

```gherkin
Scenario: Prometheus scrapes metrics
Given a running service container
When Prometheus sends GET /metrics
Then the response status is 200 OK
And the payload contains "http_requests_total" and "http_request_duration_seconds" metrics
```

---

### 🙋‍♂️ User Story 2 — *Standard HTTP request metrics*

**As a:** Backend Developer
**I want:** automatic instrumentation of request count, latency, and payload sizes broken down by handler, method, and status code
**So that:** I can track performance regressions without writing boilerplate.

#### ✅ Acceptance Criteria

* **Counter** `http_requests_total{handler,method,status}` increments on every request.
* **Histogram** `http_request_duration_seconds{handler,method}` uses buckets `0.05,0.1,0.3,1,3,5`.
* **Summary** `http_request_size_bytes` & `http_response_size_bytes` aggregate payload sizes per handler.
* Metrics appear within one scrape interval (≤ 15 s) after the first request.

---

### 🙋‍♂️ User Story 3 — *Configurable & performant*

**As a:** Platform Engineer
**I want:** to toggle instrumentation via an env var and exclude noisy paths
**So that:** the overhead stays below **3 % CPU** and metric cardinality remains manageable.

#### ✅ Acceptance Criteria

* Setting `ENABLE_METRICS=false` disables both instrumentation and the `/metrics` route.
* Regex list `METRICS_EXCLUDED_HANDLERS` prevents instrumentation of matching paths (e.g. `.*admin.*`).
* P99 latency of a no‑op endpoint increases by **< 1 ms** with instrumentation enabled.

---

### 🗺️ High‑Level Implementation Notes

| Area / Component     | Change (what/where)                                                                                                                                                                                      |
| -------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Application code** | Create singleton `Instrumentator()` in `service/metrics.py`; configure with `should_group_status_codes=False`, `should_ignore_untemplated=True`, `should_respect_env_var=True`, `excluded_handlers=[…]`. |
| **App lifespan**     | In `service/main.py` add `@app.on_event("startup")` to call `instrumentator.instrument(app).expose(app, include_in_schema=False, should_gzip=True)` when metrics are enabled.                            |
| **Environment**      | New env vars: `ENABLE_METRICS="true"` (default), `METRICS_EXCLUDED_HANDLERS`, `METRICS_NAMESPACE`, `METRICS_SUBSYSTEM`.                                                                                  |
| **Helm chart**       | `values.yaml`: `metrics.enabled`, `metrics.port`, `metrics.serviceMonitor.enabled`, `metrics.customLabels`. Template Deployment/ServiceMonitor resources.                                                |
| **Dockerfile**       | Add `prometheus-fastapi-instrumentator*` to `pip install` layer. No extra port—metrics served on same container port (e.g. 8000).                                                                    |
| **CI / Tests**       | Add job `make metrics-test`: start container → probe `/metrics` → assert required metric names exist.                                                                                                    |
| **Docs**             | `docs/docs/manage/observability.md`: how to enable metrics locally, dashboards link, common pitfalls (high cardinality, gzip vs CPU).                                                                                |

---

### 🛠 Required Code Changes (proposed)

| Codebase Root           | File(s) / Module        | Change Type                                                                  | Detail |
| ----------------------- | ----------------------- | ---------------------------------------------------------------------------- | ------ |
| `service/metrics.py`    | **NEW**                 | Instrumentator factory with helper `def setup(app): …`                       |        |
| `service/main.py`       | FastAPI entry‑point     | Call `setup_metrics(app)` in startup; add `metrics_router` if separated      |        |
| `service/settings.py`   | Pydantic config         | Add `ENABLE_METRICS`, `METRICS_*` fields with defaults                       |        |
| `charts/values.yaml`    | Helm values             | New key `metrics:` block                                                     |        |
| `charts/templates/deployment.tpl` | K8s Deployment template | Conditional container env + port + annotations for ServiceMonitor            |        |
| `tests/test_metrics.py` | Pytest                  | Integration test to assert `/metrics` endpoint presence and sample label set |        |
| `docs/docs/manage/observability.md` | Docs                    | Usage guide and troubleshooting                                              |        |

---

### 📐 Design Sketch

```mermaid
sequenceDiagram
    participant Prometheus
    participant Service
    Prometheus->>Service: GET /metrics (scrape)
    Service-->>Prometheus: 200 OK + text/plain (metrics)
```

---

### 🔄 Alternatives Considered

| Option                                          | Pros                                 | Cons                                                            |
| ----------------------------------------------- | ------------------------------------ | --------------------------------------------------------------- |
| Built‑in `prometheus_client` middleware         | Full control, battle‑tested          | More boilerplate, manual handler mapping                        |
| OpenTelemetry + OTEL Collector exporter         | Vendor‑neutral, traces + metrics     | Extra infra (collector), multi‑hop latency, slightly higher CPU |
| **Chosen:** `prometheus-fastapi-instrumentator` | Minimal code, rich defaults, modular | Slight overhead, fewer power‑user knobs than raw client         |

---

### 📓 Additional Context / Checklist

* [ ] `/metrics` **must not** require auth inside the cluster; use NetworkPolicy/Ingress to limit external access.
* [ ] Enable gzip compression by default; measure CPU impact.
* [ ] Use `CUSTOM_LABELS` to add `service` & `environment` tags for multi‑cluster federation.
* [ ] Alert rules: HTTP 5xx > 1 % for 5 min, high latency > 1 s P99.
* [ ] Dashboard widgets: request rate, error rate, latency histogram, in‑progress gauge.


Area / Component	Change (what/where)
Application code	Create singleton `Instrumentator()` in `service/metrics.py`; configure with `should_group_status_codes=False`, `should_ignore_untemplated=True`, `should_respect_env_var=True`, `excluded_handlers=[…]`.
App lifespan	In `service/main.py` add `@app.on_event("startup")` to call `instrumentator.instrument(app).expose(app, include_in_schema=False, should_gzip=True)` when metrics are enabled.
Environment	New env vars: `ENABLE_METRICS="true"` (default), `METRICS_EXCLUDED_HANDLERS`, `METRICS_NAMESPACE`, `METRICS_SUBSYSTEM`.
Helm chart	`values.yaml`: `metrics.enabled`, `metrics.port`, `metrics.serviceMonitor.enabled`, `metrics.customLabels`. Template Deployment/ServiceMonitor resources.
Dockerfile	Add `prometheus-fastapi-instrumentator*` to `pip install` layer. No extra port—metrics served on same container port (e.g. 8000).
CI / Tests	Add job `make metrics-test`: start container → probe `/metrics` → assert required metric names exist.
Docs	`docs/docs/manage/observability.md`: how to enable metrics locally, dashboards link, common pitfalls (high cardinality, gzip vs CPU).

Codebase Root	File(s) / Module	Change Type
`service/metrics.py`	NEW	Instrumentator factory with helper `def setup(app): …`
`service/main.py`	FastAPI entry‑point	Call `setup_metrics(app)` in startup; add `metrics_router` if separated
`service/settings.py`	Pydantic config	Add `ENABLE_METRICS`, `METRICS_*` fields with defaults
`charts/values.yaml`	Helm values	New key `metrics:` block
`charts/templates/deployment.tpl`	K8s Deployment template	Conditional container env + port + annotations for ServiceMonitor
`tests/test_metrics.py`	Pytest	Integration test to assert `/metrics` endpoint presence and sample label set
`docs/docs/manage/observability.md`	Docs	Usage guide and troubleshooting

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature Request]: Prometheus Metrics Instrumentation using prometheus-fastapi-instrumentator #218

🧭 Epic

🧭 Type of Feature

🙋‍♂️ User Story 1 — Expose metrics endpoint

✅ Acceptance Criteria

🙋‍♂️ User Story 2 — Standard HTTP request metrics

✅ Acceptance Criteria

🙋‍♂️ User Story 3 — Configurable & performant

✅ Acceptance Criteria

🗺️ High‑Level Implementation Notes

🛠 Required Code Changes (proposed)

📐 Design Sketch

🔄 Alternatives Considered

📓 Additional Context / Checklist

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Option	Pros	Cons
Built‑in `prometheus_client` middleware	Full control, battle‑tested	More boilerplate, manual handler mapping
OpenTelemetry + OTEL Collector exporter	Vendor‑neutral, traces + metrics	Extra infra (collector), multi‑hop latency, slightly higher CPU
Chosen: `prometheus-fastapi-instrumentator`	Minimal code, rich defaults, modular	Slight overhead, fewer power‑user knobs than raw client

[Feature Request]: Prometheus Metrics Instrumentation using prometheus-fastapi-instrumentator #218

Description

🧭 Epic

🧭 Type of Feature

🙋‍♂️ User Story 1 — Expose metrics endpoint

✅ Acceptance Criteria

🙋‍♂️ User Story 2 — Standard HTTP request metrics

✅ Acceptance Criteria

🙋‍♂️ User Story 3 — Configurable & performant

✅ Acceptance Criteria

🗺️ High‑Level Implementation Notes

🛠 Required Code Changes (proposed)

📐 Design Sketch

🔄 Alternatives Considered

📓 Additional Context / Checklist

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions