Principles of improving work performance

%% Related: [[Load Testing]] [[Go is built with performance in mind]] ## The Eureka moment [[Kuya]] sent me a photo of [[Kyan]] and [[Nico]] using markers to re-ink dice with faces that were hard to read. I joked that the reason he had kids was to create little workers. `#WorkerPlacement101`, I joked. ![](assets/1621807257_46.png) I happened to be going through my notes on [[The Scheduler Saga]] at the time, and I was already thinking about how [[Go Scheduler]] improves work efficiency. It clicked: [[Worker Placement]] Strategy in board games has a lot of overlap with software performance optimization. date:: [[2022-09-09]], [[2022-09-09]], [[2023-02-25]], [[2023-03-10]], [[2023-04-25]], [[2023-12-04]] %% # [[Principles of improving work performance]] There are some general concepts that can improve performance, whether it's about [[Performance|Application Performance]] in [[Tech|Computer science]] or the [[Productivity]] of individuals and teams. In fact, the same optimization concepts can be applied to [[Worker Placement]] board games as well. In general, there are three things we can do to improve performance: - optimize the processing time - control the flow of input or users - improve the experience of waiting [^disney] ## Increase the number of workers - [[Concurrency]] - [[Multitasking]] ### [[Multithreading]] In computer hardware, a processor that allows multiple OS or [[Kernel thread]]s to work on instructions at the same time often performs faster than one that is single-threaded, even if those threads have to share resources. Multithreading works on the assumption that most instructions do not require all of a processor's computing power, so increasing concurrency while decreasing a thread's individual access to resources increases performance rather than decreasing it. In board games, the first thing you do in a worker placement game is to use your workers to buy more workers. Doing it as early as possible is the most profitable because doing so lets you maximize the number of rounds of the game that you can have those extra workers working for you. In load testing, increasing the number of virtual users that are running your script will usually increase load to your server, but only if these users are run simultaneously with the old ones. ### [[Parallelism]] In computer hardware, this means increasing the number of cores in a machine. The more cores, the more processing power, and the more kernel threads can be used concurrently to do work. In load testing, there comes a point where you can no longer increase the number of virtual users on a load generator without maxing out the machine's resources and effectively creating a performance bottleneck. You need to increase the number of load generators you are executing the script on to proceed further. In [[Go]] apps, [[Goroutines]] operate concurrently to the order of hundreds of thousands per kernel thread. ### Problems with multitasking Tasks must first be split up into parts that can be solved simultaneously. [^parallelizing] When this activity can be done easily, it is said to be [[Embarrassingly parallel]]. However, the parallelizing activity can add [[Organizational overhead]]. Some tasks are non-parallelizable: either they cannot be split up into easily parallelizable parts, or each part is sufficiently integrated, such that they cannot be worked on separately. [[User concurrency is an ambiguous measure of throughput]]: So work is being done quickly, but what quality of work? [[Diminishing marginal returns]] applies to increasing concurrency. At first, multithreading or parallelism brings significant performance gains. But just like too many chefs in [[Overcooked]] turns it from a well-orchestrated system to sheer chaos, there comes a point where adding more workers may actually hurt your bottom line. ## Make existing workers more efficient Instead of increasing the number of workers, what about training those workers to do what they're already doing better? Upskilling talented workers can be more productive than hiring a new one. ### [[Specialization yields comparative advantages]] Train a worker to do one thing extremely well, even if that means they are less efficient at doing another thing. The principle of [[Comparative advantage]] hinges on capitalizing on the fact that some workers might have a lower [[Marginal cost]] to train in one skill than other workers. Breaking a long process out into smaller, repeatable tasks is also what the industrial [[Production line]] is based on. #### [[Cache|Caching]] and indexing #### Precomputation Precomputation refers to executing computationally expensive and frequently used processes upfront instead of executing them only when needed. For example, [[Recording rules]] in [[Prometheus]] precompute common queries once that might otherwise be computed multiple times. ### Distribution of work #### [[Respect Levels of Abstraction|Abstraction]] Appoint department heads rather than talking to individual employees. Create levels of abstraction and hub and spoke patterns in architecture. This goes both ways because lower level languages run faster, but the higher level languages are easier for humans to code in. - interfaces like [[API]]s #### Outsourcing Outsourcing work can improve efficiency by reducing scope. In a way, it is related to [[Parallelism]]; however, outsourcing is more about decreasing the work you're responsible for. In terms of people, outsourcing could refer to hiring specialists to take on parts of a job and realizing efficiency gains either from the reduced cost of hiring the specialists or from the increased quality from their specialist experience. In computing, outsourcing looks like shifting processing away from the application server and towards [[Fat Clients]] that do much of the processing on the client side; that is, on the user's side. #### [[Sharding]] #### [[Distributed runqueue]] Specialization would also lead to workers having a distributed runqueue. Instead of all workers taking a mix of work from the same place, it's more efficient for workers to maintain their own runqueues, which they populate with work from the main runqueue. You might even end up with groups of workers with a shared runqueue, which is the case in some customer support teams, where a tiered support level emerges. Experienced workers can handle longer, more technical conversations, but newer workers make sure that other conversations are still answered, preventing the system from coming to a standstill. [[Goroutines]] have their own runqueues for this reason. In [[Load Testing]], this type of distributed queue is best seen in [[Test Data]]. [[Data partitioning across load generators]] works by having separate queues of data, with each one being available only to a certain virtual user. This solves the problem of contention for read/write access and allows virtual users to function independently as long as there is enough data. ### [[Thread reuse]] If the work is seasonal or irregular, you might be tempted to hire a temporary worker and then let them go when the work is done. This could be a viable option, but only if you're ensuring that you're including the financial (and emotional) costs associated with hiring and firing, which may be more than you think. Alternatively, what if you just give different types of work to the same worker? This is on the other end of the spectrum to specializing, but it can also work really well-- especially if you're a polymath like me, and thrive when given interdisciplinary work. At [[k6 (Company)]], we have an "interning" process where some of us can volunteer to spend a fixed amount of time in another department, learning how to do another job. It can be anywhere from two weeks to one year, and those who have participated in it are always refreshed by the change of pace. Part of the beauty of [[Go]] is that kernel threads are reused: threads are assigned a goroutine, and when that goroutine's instructions have been executed and there is no remaining work, the thread is "parked". It's not destroyed-- because that would be quite resource intensive. It's just allowed to idle, waiting for future work. In load testing, load generators on the cloud are often provisioned at the start of a test. That approach quickly becomes time-consuming when it takes 2-4 minutes to restart virtual machines just because you ran the wrong script. At [[k6 (Company)]], we actually intelligently reuse load generators. If you start a test in a region where we already have a load generator running, we just reuse that load generator. Since the machine is wiped from data between tests, there are no security issues, yet it translates into reduced startup time for testers using our system. [[Amortizing performance cost]] ### [[Timeboxing]] Timeboxing is a concept in productivity that has repeatedly proven to be effective. Timeboxing is the concept behind the [[Agile]] sprint. Work is agreed upon and defined in small chunks, and teams are given a fixed amount of time to complete it. Timeboxing allows work to progress while also setting a duration to it, which limits the time and effort that can be spent on work that is misplanned or takes too long. In [[Go]], long-running (>10ms) [[Goroutines]] are "preempted" by a `sysmon` thread, or system monitor. Preemption means that the offending goroutine is interrupted so that other work can continue. ### [[Thread priority]] Thread priority involves the prioritization of work (more important work gets done first). [[Linux]] has this by default, and so does [[Java]]. Also see [[Dimming]]. ### [[Dynamic scaling]] Virtual machines on the cloud can be used to increase or decrease the size and power of the application servers in response to changes in demand. ### Pull-based vs push-based - [[Pull-based monitoring]] vs [[Push-based monitoring]] - It's more effective and efficient in the long run to let people have a say in what type of work they do, instead of blindly assigning them work. Taking into account their individual proclivities and strengths will ensure everyone is more productive. ## Reorganize workers ### Introduce a manager that can run interference #### [[An external monitor that oversees the flow of work improves general performance]] - Example: [[Message Broker]] #### [[Work-stealing]] #### [[Work hand-off]] External monitor ([[Go Scheduler]] or [[Load Balancer]] or manager) that oversees work and makes sure work is efficiently assigned ### Remove organizational layers The reverse can also be helpful: removing managerial levels, or simplifying organizational structures, can also reduce overall cognitive burden. - Moving from [[Microservices]] back to [[Monolith|Monoliths]] ### Problems How do we make sure the monitor knows what's happening (load generators phoning home in a timely way to central control) How should work be assigned? - round robin, but this doesn't take context into account - assign work according to ability (support tiers, location of load generator), but this doesn't take into account urgency ## Measure what's important You can improve performance by visibly and publicly [[Measurement|tracking]] the outputs you want to encourage. If response time is what matters, graph that continuously. If conversions are what's important, create a dashboard to track that. In personal productivity, you can also track the outputs you want to encourage, such as content produced. [[Learning in public]] can be thought of as an [[Observability]] practice for productivity. ## Slow down work Ironically, you can gain increased productivity by purposely slowing down work. ### [[Circuit breaker pattern]] The circuit breaker pattern institutes a "failsafe" switch that gets triggered and shuts down the component when the work it receives is in significant excess of what it can handle. This circuit breaker performs a dual role in both reducing the scope of an outage to that component and in allowing the component some time to recover. ### [[Ways to fail gracefully|Failing gracefully]] When performance degrades enough to be noticeable, one strategy to deal with it is to accept the degradation and give feedback, such as with human-readable error messages that nicely ask a user to try again later or that reduce the amount of load the struggling component receives. Planning for an eventuality of failure lets you use that as an opportunity to improve performance in the future. ## Other examples - [[An external monitor that oversees the flow of work improves general performance]] - A traffic cop - A load balancer - Humans: a manager and employees - Goroutines and schedulers [^parallelizing]: [[Ramping Up Production]], brilliant.org [^disney]: Kinni, T. (2011). The Disney Institute. *Be our guest: Perfecting the art of customer service.*