Metric queries in Loki - Fork My Brain

# [[Metric queries in Loki]] ![[Metric queries.svg]] In [[Grafana Loki|Loki]], metric queries are [[LogQL]] queries that return numbers rather than log lines, unlike [[Log queries in Loki]]. Loki always stores logs in their raw form and does not store any pre-aggregated metrics, so all processing is done at query. Metric queries take raw log lines and turn them into numbers using [[Aggregating monitoring data|Aggregation]] operators. ## Structure of a metric query Metric queries still require a log stream selector like [[Log queries in Loki]], but then wrap the selector with aggregation functions and operators. ``` aggregation_operator({labels} |= "filter" [range]) ``` A metric query usually consists of: - an aggregation operator - a label - a filter condition - a time range ## Output of a metric query Metric queries usually return [[Vector|vectors]]. A vector is a set of [[Time Series]] (numbers measured over time), but evaluated at *one specific timestamp*. Metric queries can return: - Instant vectors - Range vectors ### Instant vectors [[Instant vectors]] are a set of time series each with *one value* at a specific moment of time. They are the default output of metric queries. An instant vector is like a snapshot of the values at a particular time. Here's a sample query that returns an instant vector: ```logql rate({job="api"} |= "GET" [5m]) ``` The result returned for this query could be an instant vector of two elements: | Labels | Value | | --------------------------- | ------ | | `{job="api", instance="a"}` | `0.15` | | `{job="api", instance="b"}` | `0.10` | These two time series are the `rate` for the time that the query was run. ### Range vectors [[Range vectors|Range vectors]] are a set of time series each with *multiple values* across a time range. The time range for range vectors are specified within the query. A range vector is like a history over the specified time interval. Here's an example of a query that returns a range vector: ```logql {job="api"} |= "error" [5m] ``` The output of this would be multiple time series (one per label set), and each one would have log lines over the last 5 minutes. ### Scalars [[LogQL queries don't return scalar values]]. ## Metrics functions ### Aggregation operators Aggregation operators take a [[Log stream|log stream]] as input. LogQL can use the following functions for aggregations: - `sum()` - `avg()` - `min()` - `max()` - `stddev()`: [[Standard Deviation]] - `stdvar()`: [[Standard Variance]] - `count()` - `topk()`: largest k elements in vector - `bottomk()`: smallest k elements in vector ### Instant vector functions These functions take [[Range vectors in LogQL|range vectors]] as input: - `rate()` - `increase()` - `irate()` - `idelta()` - `delta()` - `resets()` They always return instant vectors as output. ### Range vector aggregations [[Range vectors]] aggregations create metrics that describe some numerical quality of the logs *over a certain time interval*. #### Log range aggregations Log range aggregations are queries that return a subsection of time series data, containing all the data points within a time range. Log range aggregations apply one of the following functions to the log query: - `rate()`: number of entries per second - `count_over_time()`: count of entries within time interval - `bytes_rate()`: number of bytes per second - `absent_over_time()` empty if the input has elements and `1` if it doesn't (useful for alerting) > [!question]- What's the difference between `rate()` and `count_over_time()`? > `Count_over_time()` returns how many log lines match your query within a given time range. For example, it could answer the question "How many errors occurred in each 5-minute window?" > > `rate()`, on the other hand, calculates the per-second average rate of log entries over the given time range. It answers the question "On average, how many errors happen per second over each 5-minute window?" > > Here's an example: > > `count_over_time({job="app"} |= "error" [5m]) > > `rate({job="app"} |= "error" [5m])` > > The `count_over_time` would be 37, if there were 37 logs with errors in the last 5 minutes. > > The `rate` would be 37/300 = `0.123` (37 errors divided by 300 seconds). Log range aggregations then specify a duration either after the log stream selector or at the end of the log pipeline: ``` sum by (host) (rate({job="mysql"} |= "error" != "timeout" | json | duration > 10s [1m])) ``` #### Unwrapped range aggregations Unwrapped range aggregations have a result that is a single scalar value (a number) based on the entire time period. Here's an example that calculates the 99th percentile latency given [[Nginx]] logs: ``` quantile_over_time(0.99, {cluster="ops-tools1",container="ingress-nginx"} | json | __error__ = "" | unwrap request_time [1m]) by (path) ``` [^docs] The unwrapped range aggregation query result would be a single number, for example `323`, that represents the 99th percentile. It does not include the full data points as the log range aggregation does. These functions are used to carry out unwrapped range aggregations: - `rate()`: per second rate of sum of all values in interval - `rate_counter()` - `sum_over_time()` - `avg_over_time()` - `max_over_time()` - `min_over_time()` - `first_over_time()` - `last_over_time()` - `stdvar_over_time()` - `stddev_over_time()` - `quantile_over_time()` - `absent_over_time(unwrapped-range)` Unwrapped range aggregations results can be converted using these functions: - `duration_seconds(label_identifier)` or short equivalent `duration()`, which converts the value to seconds in Go duration format: `12s30ms`. - `bytes(label_identifier)`, which converts the result to bytes like `5 MiB` ### Other functions LogQL also supports the following functions: - `vector(s scalar)`: returns the scalar `s` as an empty vector - `approx_top(k, <vector expression>`: returns an estimated `topk()` when the data exceeds 1,000 elements (the limit that `topk()` can return). - `sort()`: return elements sorted by sample values in ascending order - `sort_desc()`: like `sort()` but descending ## Steps A step is the resolution of data points over time, or how often data points are evaluated and returned within a [[Range queries in LogQL|range query]]. Steps can be set: - in the [[Grafana]] UI - as a URL parameter when using [[Loki API]] (`&step=30s`) Steps are only relevant for metric queries, because [[Log queries in Loki|log queries]] only return all the log streams that fulfill the conditions in the query and don't get sampled. Steps are not the same as intervals. An interval *can* be set within a LogQL query unlike steps and refers to the range of time that a value should be calculated for. Steps are *how often* the query should be evaluated. For example, consider the query below: ```logql rate(http_requests_total[5m]) ``` `5m` is the interval. Assuming that the step is 1 minute, this means that the rate of HTTP requests will be calculated 5 times: every minute (the step), the rate over the last 5 minutes (the interval) will be calculated. ## Related <iframe width="560" height="315" src="https://www.youtube.com/embed/tKcnQ0Q2E-k" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> [^1] %% # Excalidraw Data ## Text Elements ## Drawing ```json { "type": "excalidraw", "version": 2, "source": "https://github.com/zsviczian/obsidian-excalidraw-plugin/releases/tag/2.1.4", "elements": [ { "id": "4y8R7iOA", "type": "text", "x": 118.49495565891266, "y": -333.44393157958984, "width": 3.8599853515625, "height": 24, "angle": 0, "strokeColor": "#1e1e1e", "backgroundColor": "transparent", "fillStyle": "solid", "strokeWidth": 2, "strokeStyle": "solid", "roughness": 1, "opacity": 100, "groupIds": [], "frameId": null, "roundness": null, "seed": 967149026, "version": 2, "versionNonce": 939059582, "isDeleted": true, "boundElements": null, "updated": 1713723615080, "link": null, "locked": false, "text": "", "rawText": "", "fontSize": 20, "fontFamily": 4, "textAlign": "left", "verticalAlign": "top", "containerId": null, "originalText": "", "lineHeight": 1.2 } ], "appState": { "theme": "dark", "viewBackgroundColor": "#ffffff", "currentItemStrokeColor": "#1e1e1e", "currentItemBackgroundColor": "transparent", "currentItemFillStyle": "solid", "currentItemStrokeWidth": 2, "currentItemStrokeStyle": "solid", "currentItemRoughness": 1, "currentItemOpacity": 100, "currentItemFontFamily": 4, "currentItemFontSize": 20, "currentItemTextAlign": "left", "currentItemStartArrowhead": null, "currentItemEndArrowhead": "arrow", "scrollX": 583.2388916015625, "scrollY": 573.6323852539062, "zoom": { "value": 1 }, "currentItemRoundness": "round", "gridSize": null, "gridColor": { "Bold": "#C9C9C9FF", "Regular": "#EDEDEDFF" }, "currentStrokeOptions": null, "previousGridSize": null, "frameRendering": { "enabled": true, "clip": true, "name": true, "outline": true } }, "files": {} } ``` %% [^1]: van der Hoeven, N., Clifford, J., Tovena, C. (2025). *How to turn logs into metrics with Grafana Loki (Loki Community Call July 2025)*. Retrieved from [YouTube](https://youtube.com/live/tKcnQ0Q2E-k). [[How to turn logs into metrics with Grafana Loki - Loki Community Call July 2025|My notes]].