Skip to content

Commit 25f7419

Browse files
Liudmila Molkovatrask
andauthored
DB: fix how operation and collection names are recorded for complex queries (opt-in db.query.text on metrics, new db.query.summary recommended attribute) (#1482)
Co-authored-by: Trask Stalnaker <[email protected]>
1 parent a365a5e commit 25f7419

File tree

19 files changed

+676
-225
lines changed

19 files changed

+676
-225
lines changed

.chloggen/1482.yaml

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
# Use this changelog template to create an entry for release notes.
2+
#
3+
# If your change doesn't affect end users you should instead start
4+
# your pull request title with [chore] or use the "Skip Changelog" label.
5+
6+
# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
7+
change_type: bug_fix
8+
component: db
9+
note: |
10+
Fix telemetry for complex queries:
11+
12+
- introduce the `db.query.summary` attribute to provide a concise, low-cardinality
13+
representation of the query text.
14+
- use `db.query.summary` as the span name and as a recommended attribute on metrics.
15+
- avoid capturing `db.operation.name` and `db.collection.name` when the query
16+
involves multiple operations or collections, to prevent ambiguity.
17+
18+
issues: [521, 805, 1159]

docs/attributes-registry/db.md

Lines changed: 48 additions & 22 deletions
Large diffs are not rendered by default.

docs/database/cassandra.md

Lines changed: 46 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -34,30 +34,52 @@ The Semantic Conventions for [Cassandra](https://cassandra.apache.org/) extend a
3434
| [`db.cassandra.page_size`](/docs/attributes-registry/db.md) | int | The fetch size used for paging, i.e. how many rows will be returned at once. | `5000` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
3535
| [`db.cassandra.speculative_execution_count`](/docs/attributes-registry/db.md) | int | The number of times a query was speculatively executed. Not set or `0` if the query was not executed speculatively. | `0`; `2` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
3636
| [`db.operation.batch.size`](/docs/attributes-registry/db.md) | int | The number of queries included in a batch operation. [11] | `2`; `3`; `4` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
37-
| [`db.query.text`](/docs/attributes-registry/db.md) | string | The database query being executed. [12] | `SELECT * FROM wuser_table where username = ?`; `SET mykey "WuValue"` | `Recommended` [13] | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
38-
| [`network.peer.address`](/docs/attributes-registry/network.md) | string | Peer address of the database node where the operation was performed. [14] | `10.1.2.80`; `/tmp/my.sock` | `Recommended` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |
37+
| [`db.query.summary`](/docs/attributes-registry/db.md) | string | Low cardinality representation of a database query text. [12] | `SELECT wuser_table`; `INSERT shipping_details SELECT orders`; `get user by id` | `Recommended` [13] | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
38+
| [`db.query.text`](/docs/attributes-registry/db.md) | string | The database query being executed. [14] | `SELECT * FROM wuser_table where username = ?`; `SET mykey ?` | `Recommended` [15] | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
39+
| [`network.peer.address`](/docs/attributes-registry/network.md) | string | Peer address of the database node where the operation was performed. [16] | `10.1.2.80`; `/tmp/my.sock` | `Recommended` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |
3940
| [`network.peer.port`](/docs/attributes-registry/network.md) | int | Peer port number of the network connection. | `65123` | `Recommended` if and only if `network.peer.address` is set. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |
40-
| [`server.address`](/docs/attributes-registry/server.md) | string | Name of the database host. [15] | `example.com`; `10.1.2.80`; `/tmp/my.sock` | `Recommended` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |
41-
| [`db.query.parameter.<key>`](/docs/attributes-registry/db.md) | string | A query parameter used in `db.query.text`, with `<key>` being the parameter name, and the attribute value being a string representation of the parameter value. [16] | `someval`; `55` | `Opt-In` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
41+
| [`server.address`](/docs/attributes-registry/server.md) | string | Name of the database host. [17] | `example.com`; `10.1.2.80`; `/tmp/my.sock` | `Recommended` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |
42+
| [`db.query.parameter.<key>`](/docs/attributes-registry/db.md) | string | A query parameter used in `db.query.text`, with `<key>` being the parameter name, and the attribute value being a string representation of the parameter value. [18] | `someval`; `55` | `Opt-In` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
4243

4344
**[1]:** It is RECOMMENDED to capture the value as provided by the application without attempting to do any case normalization.
44-
If the collection name is parsed from the query text, it SHOULD be the first collection name found in the query and it SHOULD match the value provided in the query text including any schema and database name prefix.
45-
For batch operations, if the individual operations are known to have the same collection name then that collection name SHOULD be used, otherwise `db.collection.name` SHOULD NOT be captured.
45+
46+
A single database query may involve multiple collections.
47+
48+
If the collection name is parsed from the query text, it SHOULD only be captured for queries that
49+
contain a single collection and it SHOULD match the value provided in
50+
the query text including any schema and database name prefix.
51+
52+
For batch operations, if the individual operations are known to have the same collection name
53+
then that collection name SHOULD be used.
54+
55+
If the operation or query involves multiple collections, `db.collection.name`
56+
SHOULD NOT be captured.
57+
4658
This attribute has stability level RELEASE CANDIDATE.
4759

48-
**[2]:** If readily available. The collection name MAY be parsed from the query text, in which case it SHOULD be the first collection name found in the query.
60+
**[2]:** If readily available and if a database call is performed on a single collection. The collection name MAY be parsed from the query text, in which case it SHOULD be the single collection name in the query.
4961

5062
**[3]:** If a database system has multiple namespace components, they SHOULD be concatenated (potentially using database system specific conventions) from most general to most specific namespace component, and more specific namespaces SHOULD NOT be captured without the more general namespaces, to ensure that "startswith" queries for the more general namespaces will be valid.
5163
Semantic conventions for individual database systems SHOULD document what `db.namespace` means in the context of that system.
5264
It is RECOMMENDED to capture the value as provided by the application without attempting to do any case normalization.
5365
This attribute has stability level RELEASE CANDIDATE.
5466

55-
**[4]:** It is RECOMMENDED to capture the value as provided by the application without attempting to do any case normalization.
56-
If the operation name is parsed from the query text, it SHOULD be the first operation name found in the query.
57-
For batch operations, if the individual operations are known to have the same operation name then that operation name SHOULD be used prepended by `BATCH `, otherwise `db.operation.name` SHOULD be `BATCH` or some other database system specific term if more applicable.
67+
**[4]:** It is RECOMMENDED to capture the value as provided by the application
68+
without attempting to do any case normalization.
69+
70+
A single database query may involve multiple operations. If the operation
71+
name is parsed from the query text, it SHOULD only be captured for queries that
72+
contain a single operation or when the operation name describing the
73+
whole query is available by other means.
74+
75+
For batch operations, if the individual operations are known to have the same operation name
76+
then that operation name SHOULD be used prepended by `BATCH `,
77+
otherwise `db.operation.name` SHOULD be `BATCH` or some other database
78+
system specific term if more applicable.
79+
5880
This attribute has stability level RELEASE CANDIDATE.
5981

60-
**[5]:** If readily available. The operation name MAY be parsed from the query text, in which case it SHOULD be the first operation name found in the query.
82+
**[5]:** If readily available and if there is a single operation name that describes the database call. The operation name MAY be parsed from the query text, in which case it SHOULD be the single operation name found in the query.
6183

6284
**[6]:** The status code returned by the database. Usually it represents an error code, but may also represent partial success, warning, or differentiate between various types of successful outcomes.
6385
Semantic conventions for individual database systems SHOULD document what `db.response.status_code` means in the context of that system.
@@ -76,18 +98,25 @@ Instrumentations SHOULD document how `error.type` is populated.
7698
**[11]:** Operations are only considered batches when they contain two or more operations, and so `db.operation.batch.size` SHOULD never be `1`.
7799
This attribute has stability level RELEASE CANDIDATE.
78100

79-
**[12]:** For sanitization see [Sanitization of `db.query.text`](../../docs/database/database-spans.md#sanitization-of-dbquerytext).
101+
**[12]:** `db.query.summary` provides static summary of the query text. It describes a class of database queries and is useful as a grouping key, especially when analyzing telemetry for database calls involving complex queries.
102+
Summary may be available to the instrumentation through instrumentation hooks or other means. If it is not available, instrumentations that support query parsing SHOULD generate a summary following [Generating query summary](../../docs/database/database-spans.md#generating-a-summary-of-the-quey-text) section.
103+
This attribute has stability level RELEASE CANDIDATE.
104+
105+
**[13]:** if readily available or if instrumentation supports query summarization.
106+
107+
**[14]:** For sanitization see [Sanitization of `db.query.text`](../../docs/database/database-spans.md#sanitization-of-dbquerytext).
80108
For batch operations, if the individual operations are known to have the same query text then that query text SHOULD be used, otherwise all of the individual query texts SHOULD be concatenated with separator `; ` or some other database system specific separator if more applicable.
81109
Even though parameterized query text can potentially have sensitive data, by using a parameterized query the user is giving a strong signal that any sensitive data will be passed as parameter values, and the benefit to observability of capturing the static part of the query text by default outweighs the risk.
82110
This attribute has stability level RELEASE CANDIDATE.
83111

84-
**[13]:** SHOULD be collected by default only if there is sanitization that excludes sensitive information. See [Sanitization of `db.query.text`](../../docs/database/database-spans.md#sanitization-of-dbquerytext).
112+
**[15]:** Non-parameterized query text SHOULD NOT be collected by default unless there is sanitization that excludes sensitive data, e.g. by redacting all literal values present in the query text. See [Sanitization of `db.query.text`](../../docs/database/database-spans.md#sanitization-of-dbquerytext).
113+
Parameterized query text SHOULD be collected by default (the query parameter values themselves are opt-in, see [`db.query.parameter.<key>`](../../docs/attributes-registry/db.md)).
85114

86-
**[14]:** If a database operation involved multiple network calls (for example retries), the address of the last contacted node SHOULD be used.
115+
**[16]:** If a database operation involved multiple network calls (for example retries), the address of the last contacted node SHOULD be used.
87116

88-
**[15]:** When observed from the client side, and when communicating through an intermediary, `server.address` SHOULD represent the server address behind any intermediaries, for example proxies, if it's available.
117+
**[17]:** When observed from the client side, and when communicating through an intermediary, `server.address` SHOULD represent the server address behind any intermediaries, for example proxies, if it's available.
89118

90-
**[16]:** Query parameters should only be captured when `db.query.text` is parameterized with placeholders.
119+
**[18]:** Query parameters should only be captured when `db.query.text` is parameterized with placeholders.
91120
If a parameter has no name and instead is referenced only by index, then `<key>` SHOULD be the 0-based index.
92121
This attribute has stability level RELEASE CANDIDATE.
93122

@@ -97,6 +126,7 @@ and SHOULD be provided **at span creation time** (if provided at all):
97126
* [`db.collection.name`](/docs/attributes-registry/db.md)
98127
* [`db.namespace`](/docs/attributes-registry/db.md)
99128
* [`db.operation.name`](/docs/attributes-registry/db.md)
129+
* [`db.query.summary`](/docs/attributes-registry/db.md)
100130
* [`db.query.text`](/docs/attributes-registry/db.md)
101131
* [`server.address`](/docs/attributes-registry/server.md)
102132
* [`server.port`](/docs/attributes-registry/server.md)

0 commit comments

Comments
 (0)