GOH 30 - State of the Observability Databases

# [[GOH 30 - State of the Observability Databases]] [in developer-advocacy](obsidian://open?vault=developer-advocacy&file=projects%2FGrafana%20Office%20Hours%2FGOH%2030%20-%20Direction%20of%20Grafana%20Loki%2C%20Mimir%2C%20Tempo%2C%20and%20Pyroscope) ![[GOH 30 - State of the Observability Databases]] <iframe width="560" height="315" src="https://www.youtube.com/embed/6Ph5nvicm6M" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> Related:: "[[Dee Kitchen]], [[Jay Clifford]], [[Grafana Mimir]], [[Grafana Loki|Loki]], [[Grafana Pyroscope]], [[Grafana Tempo]]" ## Talking points - Architectural changes in observability databases - [[Cassandra]] had an ingester-distributor architecture and that's where our databases started too. Distributors hand data to three different ingesters. This is a replication factor of 3. - The industry has consolidated around [[Object storage]] as THE data storage. A lot of cloud-native apps are adopting object storage. - The way to ingest is consolidating around [[Apache Kafka|Kafka]] but Kafka has problems: - difficult to run - difficult to build at scale - So people have instead rallied around the Kafka API for buffering and ingestion - Mimir uses [[WarpStream]], owned by [[Confluent]] - This decouples read and write paths in our databases. - But we're doing this in phases so that it's not a huge change all at once. - Like: initially have ingesters read from Kafka, put Kafka in (reducing replication factor), but leave the read path going to the ingesters as they were for the most recent data rather than Kafka. - The databases are all adopting it for different reasons - Mimir: They sometimes run 500 ingesters, and the ingester-distributor model gets less reliable at scale (it's good for <50 though) - Loki: When logs go through a replication factor, they have to get deduplicated. Logs are a volumetric signal: unlike metrics and profiles which are aggregates, logs and traces explode volumetrically.the storage requirements are so large. - Tempo: Handle volumetric load like Loki. - Pyroscope: Handle burst load and increase developer speed. - OTel compatibility across the databases - Tempo: most mature. Derived from [[Jaeger]] so from day 1, Tempo was a columnar store. It's a structured signal. The value is in the structure. - Pyroscope: Also very structured. But OTel hasn't fully adopted profiles. Not much metadata. ONce accepted, Pyroscope should be fine. - Mimir and Loki are more tricky. - Mimir - Prometheus is still widely adopted and is here to stay. Prometheus Remote Write is incredibly efficient. Having to migrate can be costly. - In metrics you want timeliness. In part for alerting. - We can do most things, we're pretty good. - Loki - Semi-structured (not totally unstructured) - they have a timestamp, an HTTP code. - Trend: structured logs (not created by, but definitely championed by [[ClickHouse]]). - [[Splunk]] is the king here. Loki is tiny. ClickHouse is tinier. - OTel has added pressure for more structured capability to Loki, which was originally built for unstructured/semi-structured data. - Plus, rise of adding business data to logs (like associating to customer order). - Loki is the weakest. It doesn't do everything and the UX is lacking too. - OTel imposes a disadvantage on the Kafka-based storage rearchitecture - Prometheus Remote Write by default enables 50 concurrent streams of data, but most of the OTel collectors and agents default to a very low concurrency: 1. - A batch write in Kafka takes about 500ms, so there is a tiny latency that is introduced into the architecture. Mostly imperceptible, but this can add up. - But OTel collectors and agents can still be reconfigured to increase the stream concurrency. People just need to make sure they do that. - Pyroscope decided to build in the same type of architecture into Pyroscope itself. - Pyroscope is more integrated and is more modular. The benefits they get out of the data streaming storage model is "astonishing". - Advantage of Kafka: More flexibility. If you're migrating from old cluster to new, then you can just adjust your readers to point to the existing Kafka. - Columnar storage - ClickHouse is a columnar store and it makes it better for analytical use cases. - There are two types of queries: OLTP and OLAP. - OLTP: T = transactional. Think typical `select` queries that are returning raw rows. - OLAP: A = analytical. You're doing sums, group bys, max. - Loki was built for the observability use case. That's a transactional query: find the row/log line where something went wrong. But the use of logs have evolved: metrics can now be derived from logs. - If your company uses many vendors and you don't have control over how logs are written, then Loki remains the best for that use case. You'll just take a performance hit for analytical use cases. - If you have full ownership of your log formats, then something like ClickHouse (a columnar store) is better. But this has a catch too. If you want to do an OLTP query out of an OLAP store, then you're effectively querying columns and restitching it back together-- so it's still expensive just in a different way. - Dee: You probably want a mix of both. - Loki team is learning from Tempo's [[Parquet]]. Parquet as a data store provides the benefits that a columnar storage does. But it's really for file system, not object storage. So Loki is experimenting with something called a Data Object that is Parquet-derived, but better for object storage. - Adaptive Telemetry across databases: a way to prioritize the data you keep. Ideally you only keep what you need, but realistically you'll keep a bit more than what you need. - ## Timestamps 00:00:00 Introduction to Dee Kitchen and our observability databases 00:04:54 Architectural changes to storage across all databases 00:16:07 OTel compatibility across the databases 00:28:18 Q - When will Mimir 3.0 and other databases' rearchitecture be released? 00:36:58 Q - Will our databases move into columnar storage? 00:51:36 Adaptive Telemetry across databases %% # Excalidraw Data ## Text Elements ## Embedded Files 29be1fe8fe2c5ef90e811c7d4eefbdd8856482e1: [[GOH 30 - State of Observability Databases.png]] ## Drawing ```json { "type": "excalidraw", "version": 2, "source": "https://github.com/zsviczian/obsidian-excalidraw-plugin/releases/tag/2.11.1", "elements": [ { "id": "LzJRAjUO", "type": "image", "x": 220.5794893503189, "y": 14.740692138671875, "width": 500, "height": 281.25, "angle": 0, "strokeColor": "transparent", "backgroundColor": "transparent", "fillStyle": "hachure", "strokeWidth": 1, "strokeStyle": "solid", "roughness": 1, "opacity": 100, "roundness": null, "seed": 88322, "version": 2, "versionNonce": 1228872535, "updated": 1747122028610, "isDeleted": false, "groupIds": [], "boundElements": [], "link": null, "locked": false, "fileId": "29be1fe8fe2c5ef90e811c7d4eefbdd8856482e1", "scale": [ 1, 1 ], "index": "a0" } ], "appState": { "theme": "dark", "viewBackgroundColor": "#ffffff", "currentItemStrokeColor": "#1e1e1e", "currentItemBackgroundColor": "transparent", "currentItemFillStyle": "solid", "currentItemStrokeWidth": 2, "currentItemStrokeStyle": "solid", "currentItemRoughness": 1, "currentItemOpacity": 100, "currentItemFontFamily": 5, "currentItemFontSize": 20, "currentItemTextAlign": "left", "currentItemStartArrowhead": null, "currentItemEndArrowhead": "arrow", "currentItemArrowType": "round", "scrollX": 649.4220581054688, "scrollY": 367.1957702636719, "zoom": { "value": 1 }, "currentItemRoundness": "round", "gridSize": 20, "gridStep": 5, "gridModeEnabled": false, "gridColor": { "Bold": "rgba(217, 217, 217, 0.5)", "Regular": "rgba(230, 230, 230, 0.5)" }, "currentStrokeOptions": null, "frameRendering": { "enabled": true, "clip": true, "name": true, "outline": true }, "objectsSnapModeEnabled": false, "activeTool": { "type": "selection", "customType": null, "locked": false, "fromSelection": false, "lastActiveTool": null } }, "files": {} } ``` %%