Push-based monitoring vs. pull-based monitoring

%% date:: [[2024-07-11]] parent:: %% # [[Push-based monitoring vs. pull-based monitoring]] ![[Push-based monitoring vs. pull-based monitoring.svg]] The question in [[Monitoring]] of whether to use [[Pull-based monitoring]] or [[Push-based monitoring]] is a hotly debated one. The two approaches are equivalent, but pull-based monitoring is provably more robust for a few reasons. [^hartmann] ## Why pull-based is better - There is less overhead on the part of the application component servers, because the agents are only responsible for converting metrics to a format and making them accessible. They don't need to be able to send data somewhere else. - This approach is less affected by network issues. - Letting the monitoring server choose when to pull metrics means there is a lower (and configurable) likelihood of network congestion. - You can create more than one monitoring server to spread the load of pulling metrics. - Pull-based monitoring also allows server uptime to be monitored. In push-based monitoring, if a monitoring server does not send metrics, it could be due to network issues or many other potential problems. In a pull-based system, not being able to retrieve metrics is a more reliable way to determine whether the component is not responding. This means that the data is "cleaner" by default, in that you can't get data from a server that no longer exists. - The ability to do [[Service Discovery]] means that pull-based is more resilient because it crawls the architecture and finds targets, reacting quickly to targets that are missing or named incorrectly. - Pull-based is more secure and [[Performance|performant]] by design, because it involves less traffic being sent to the central data store, unlike in push-based, where multiple targets send traffic to a single very important endpoint. - Pull-based prevents [[Thundering herd problem]] by design, because the central data store can control how much data it scrapes and stores. - Pull-based also tends to be easier to set up initially, especially if open standards are already in place to expose the data to be scraped. - Push-based has [[Delta temporality]]: it only sends incremental changes in metrics. By comparison, pull-based tends to deal in absolute values, incrementing as needed. This means that losing some push requests might lead to total data loss, whereas losing some pull requests is more resilient and can be corrected later. ## Why push-based is better However, there are still some situations where push-based monitoring is better. - Push-based is more useful for [[Serverless computing]] or where there are very ephemeral targets that would be difficult to scrape. - If the data can be guaranteed to be clean (valid, not stale, and in the right format), push-based tends to be easier to use. [^hartmann] ## Can you combine push-based and pull-based monitoring? Yes, you can. But you probably shouldn't. [^hartmann] Data acquired from push-based and pull-based monitoring differ not just in how they were collected; they also tend to differ in their integrity. Pull-based methods by their very nature have a built-in data integrity check, in that no data can be scraped from a target that no longer exists or is invalid. This safety is not present in push-based methods. Mixing push-based and pull-based monitoring in the same system may lead to stale data being mixed in with the clean data, causing issues later. %% # Text Elements # Drawing ```json { "type": "excalidraw", "version": 2, "source": "https://github.com/zsviczian/obsidian-excalidraw-plugin/releases/tag/2.1.4", "elements": [ { "id": "4y8R7iOA", "type": "text", "x": 118.49495565891266, "y": -333.44393157958984, "width": 3.8599853515625, "height": 24, "angle": 0, "strokeColor": "#1e1e1e", "backgroundColor": "transparent", "fillStyle": "solid", "strokeWidth": 2, "strokeStyle": "solid", "roughness": 1, "opacity": 100, "groupIds": [], "frameId": null, "roundness": null, "seed": 967149026, "version": 2, "versionNonce": 939059582, "isDeleted": true, "boundElements": null, "updated": 1713723615080, "link": null, "locked": false, "text": "", "rawText": "", "fontSize": 20, "fontFamily": 4, "textAlign": "left", "verticalAlign": "top", "containerId": null, "originalText": "", "lineHeight": 1.2 } ], "appState": { "theme": "dark", "viewBackgroundColor": "#ffffff", "currentItemStrokeColor": "#1e1e1e", "currentItemBackgroundColor": "transparent", "currentItemFillStyle": "solid", "currentItemStrokeWidth": 2, "currentItemStrokeStyle": "solid", "currentItemRoughness": 1, "currentItemOpacity": 100, "currentItemFontFamily": 4, "currentItemFontSize": 20, "currentItemTextAlign": "left", "currentItemStartArrowhead": null, "currentItemEndArrowhead": "arrow", "scrollX": 583.2388916015625, "scrollY": 573.6323852539062, "zoom": { "value": 1 }, "currentItemRoundness": "round", "gridSize": null, "gridColor": { "Bold": "#C9C9C9FF", "Regular": "#EDEDEDFF" }, "currentStrokeOptions": null, "previousGridSize": null, "frameRendering": { "enabled": true, "clip": true, "name": true, "outline": true } }, "files": {} } ``` %% [^hartmann]: Hartmann, R. (2024). *Prometheus background and basics*. [[2024-07-10 Developer Advocacy Weekly|Internal meeting]].