Cost of downtime - Fork My Brain

%% Last Updated: - [[2021-02-11]] %% The cost of an outage for an application can be a useful tool for getting management buy-in for forms of [[Operational testing]] that could have prevented the outage. [[Ana Medina]] talks not only of measurable costs, but also those that can't easily be represented with a single number. ## Quantifiable ### Revenue loss A significant outage makes an application inaccessible to would-be customers. Depending on the industry, this could be catastrophic. [[Tabcorp]] makes a sizable chunk of its profit from the [[Melbourne Cup]] race; in particular the few seconds leading up to the race. For most companies, this is going to be the easiest to measure and the most compelling reason to improve testing. ### Employee productivity Production incidents take priority over all other work, but it's a very reactive type of work that engineers need to do. The work that is put on hold, as well as the decrease in efficiency due to stress from the outage, is a cost not often talked about. ### Customer chargebacks (SLA breaches) Being in breach of uptime or response time SLAs means that companies can expect a certain percentage of refund requests from clients. ## Unquantifiable ### Brand defamation [[Public Performance Fails]] can lead to a considerable decrease in the reputation of the company. Social media and the [[Immediacy of news]] exacerbate this effect. ### Employee attrition Loss of faith can occur from within the company as well, with employees becoming disillusioned or embarrassed by an outage, especially if they feel the company as a whole did not react well to it. ## References - [[In the kitchen - a sprinkle of fire and chaos]] - [[Workload Modeling - Preparing for Large Events Like the Melbourne Cup]] - [[Public Performance Fails]] - [[FOMO and Performance Testing - Why Robinhood went down]]