Prometheus - Fork My Brain

%% date:: [[2022-09-08]], [[2024-01-09]], [[2024-07-10]], [[2024-07-11]] %% # [[Prometheus]] [site](https://prometheus.io) Prometheus is an [[Open source]] monitoring and alerting toolkit for capturing metrics. It's one of the best monitoring solutions out on the market, in part because of its ease of use. It includes a [[Time-series database]]. It is implemented in Go. Prometheus is the defacto standard for cloud native metric monitoring and beyond, and this includes Prometheus's exposition format. It is the second hosted project of the [[Cloud Native Computing Foundation]], after [[Kubernetes]]. ## History It was originally developed by [[SoundCloud]] and adopted from [[Google]]'s [[Borgmon]] [^prom], which was a way to capture, collect, and analyse metrics very efficiently. Prometheus got its name from the Greek mythological story of a mortal stealing fire from the gods to give it to mortals. In a similar way, Prometheus took Borgmon from Google and gave it to mortals. %% Of course, in the story, Prometheus also got chained to a rock for eternity while scavenger birds pecked at his flesh, which some argue might be what it's like to use PromQL. [[2024-07-10 Developer Advocacy Weekly]] %% ## Functions - scrapes data - stores data - allows querying, graphing, and alerting - provides endpoints to other API consumers like [[Grafana]] [^logz] ## Features ### Tags/labels Unlike traditional data collection systems like [[Graphite]], Prometheus does not use file structures to store data. Instead, it tags time-series data, allowing for richer organization and querying while optimizing performance. Data in Prometheus is also not stored in a hierarchical structure on purpose, using labels and [[Key-value pairs]] as a more powerful and lightweight alternative. ### [[PromQL]] One of its key features is PromQL, which is a very powerful query language for retrieving metrics from the database. ### [[Cloud-native]] Prometheus is designed to be cloud-native, run in containers in the cloud. #### Works well with [[Kubernetes|k8s]] Kubernetes is Borg, and Prometheus is Borgmon. They were designed with each other in mind. [^richih] #### Works well with [[k6 (tool)|k6]] [[k6 (tool)|k6]] has [an integration with Prometheus](https://k6.io/docs/cloud/integrations/cloud-apm/prometheus-remote-write/#new-relic-setup) that involves adding options to a k6 script. #### Works well for [[Microservices]] Prometheus's multi-dimensional data collection and querying makes it very appropriate for monitoring microservices and [[Event-driven architecture|Event-based architecture]]. [^prom] However, Prometheus itself is a [[Monolith]] by design. ### UI Prometheus has a built-in UI for making queries and visualising data using rudimentary graphs. ### Exemplars Another is its use of exemplars. Exemplars are individual requests that happened within a specific period that may help to explain an issue. ### Pull- and push-based data collection Prometheus is mostly a [[Pull-based monitoring]] system, meaning that it actively pulls data from different servers over HTTP. However, it does also have [[Prometheus Remote Write]], which is a way to [[Push-based monitoring|push]] metrics. ### White-box and black-box monitoring Prometheus can do both white-box (instrumenting code from the inside) and black-box monitoring (looking at a service from the outside). [^richih] - White-box: How much time does this subroutine take? - Black-box: Does the server respond to an HTTP request? ### Has a LOT of exporters Many services already expose Prometheus metrics by default, meaning that an added Prometheus server would be able to access the data from those services. For those that don't natively expose Prometheus metrics, we can use an exporter to convert data to the format that Prometheus can understand and expose the data to allow for retrieval. Prometheus has thousands of exporters because of the fact that it is the standard. Almost anything can be monitored with Prometheus. Prometheus comes with libraries to instrument relevant servers. The [Node Exporter](https://prometheus.io/docs/guides/node-exporter/) exposes hardware metrics of servers. There are third-party exporters and integrations for a variety of databases, storage volumes, etc. ### Designed for reliability Prometheus doesn't exist as an agent on a machine that might lose data if that component goes down. Prometheus is built to be standalone, so it will maintain data and also continue to collect data from working components even if some parts of the system are broken. ### Alerting Prometheus has [[Alerting]] built in natively in [[Prometheus Alertmanager]]. ### Federation Prometheus can be federated: every Prometheus instance can itself be a source of data of other Prometheus instances. ## Prometheus metric types 1. Counter - how many times something happened 2. Gauge - current value of a metric 3. Histogram - how long or how large [^techworld] ## What it doesn't do well ### Long-term storage Prometheus isn't great at long-term storage. For that, there are two options that work well with Prometheus: - [[Thanos]]: easier to run, but slower - [[Cortex]]: easy to run - [[Grafana Mimir|Mimir]]: based on Cortex, maintained by [[Grafana Labs]] ### Visualization Prometheus does have graphing capabilities, but they are generally considered insufficient. Prometheus is often used with Grafana. ### 100% accuracy Prometheus's time series only goes down to the millisecond, so for per-request billing, it won't be 100% accurate or detailed. [^prom] ## Components - [[Time-series database]] to store metric data - Data retrieval worker to pull metric data - HTTP Server to accept PromQL queries [^techworld] - Alertmanager - Prometheus Web UI, Grafana, or other visualization tools ![[Prometheus architecture.png]] [^techworld] ## [[Setting up Prometheus]] ## Prometheus compared to other tools - [[Prometheus vs Graphite]] - [[Prometheus vs InfluxDB]] - [[Prometheus vs Victoria Metrics]] ## Standardization Prometheus's popularity and ubiquity of use has inspired some projects to take and improve on Prometheus's format. - [[OpenMetrics]] is a project that wants to standardize Prometheus's exposition format and make it a standard for metric collection beyond just Prometheus. - [[Grafana Loki]] is marketed as "like Prometheus, but for logs". Loki's query language, [[LogQL]], is based on Prometheus's [[PromQL]]. ## References [^richih]: Hartmann, R. _Intro to observability with Prometheus and beyond_. Retrieved in July 2021 from https://grafana.com/go/webinar/intro-to-observability-with-prometheus/ [^logz]: Berman, D. (2020). _Prometheus vs. Graphite: Which should you choose for time series or monitoring?_. Retrieved from https://logz.io/blog/prometheus-vs-graphite/ [^techworld]: Tech World with Nana. (2020) _How Prometheus monitoring works | Prometheus architecture explained_. Retrieved from https://www.youtube.com/watch?v=h4Sl21AKiDg [^prom]: _Overview Prometheus_. Prometheus. Retrieved from https://prometheus.io/docs/introduction/overview/ . [[Overview Prometheus|My highlights]]. [^hartmann]: Hartmann, R. (2024). *Prometheus background and basics*. [[2024-07-10 Developer Advocacy Weekly|Internal meeting]].