Algorithms to Live By - Fork My Brain

# Algorithms to Live By ![rw-book-cover](https://images-na.ssl-images-amazon.com/images/I/51FhJXhhK6L._SL200_.jpg) Author:: Brian Christian, Tom Griffiths ## Highlights > It is the provably optimal solution. We know this because finding an apartment belongs to a class of mathematical problems known as “optimal stopping” problems. The 37% rule defines a simple series of steps—what computer scientists call an “algorithm”—for solving these problems. ([Location 45](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=45)) > At the next level, computer science gives us a vocabulary for understanding the deeper principles at play in each of these domains. As Carl Sagan put it, “Science is a way of thinking much more than it is a body of knowledge.” ([Location 85](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=85)) > As computers become better tuned to real-world problems, they provide not only algorithms that people can borrow for their own lives, but a better standard against which to compare human cognition itself. ([Location 104](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=104)) ### 1 Optimal Stopping > In any optimal stopping problem, the crucial dilemma is not which option to pick, but how many options to even consider. ([Location 162](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=162)) > In your search for a secretary, there are two ways you can fail: stopping early and stopping late. When you stop too early, you leave the best applicant undiscovered. When you stop too late, you hold out for a better applicant who doesn’t exist. ([Location 199](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=199)) > the optimal solution takes the form of what we’ll call the Look-Then-Leap Rule: You set a predetermined amount of time for “looking”—that is, exploring your options, gathering data—in which you categorically don’t choose anyone, no matter how impressive. After that point, you enter the “leap” phase, prepared to instantly commit to anyone who outshines the best applicant you saw in the look phase. ([Location 214](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=214)) > As the applicant pool grows, the exact place to draw the line between looking and leaping settles to 37% of the pool, yielding the 37% Rule: look at the first 37% of the applicants,* choosing none, then be ready to leap for anyone better than all those you’ve seen so far. ([Location 231](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=231)) > If you’re stopping optimally, your chance of finding the single best applicant in a pool of a hundred is 37%. And in a pool of a million, believe it or not, your chance is still 37%. Thus the bigger the applicant pool gets, the more valuable knowing the optimal algorithm becomes. It’s true that you’re unlikely to find the needle the majority of the time, but optimal stopping is your best defense against the haystack, no matter how large. ([Location 244](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=244)) > He didn’t know how many women he could expect to meet in his lifetime, but there’s a certain flexibility in the 37% Rule: it can be applied to either the number of applicants or the time over which one is searching. Assuming that his search would run from ages eighteen to forty, the 37% Rule gave age 26.1 years as the point at which to switch from looking to leaping. A number that, as it happened, was exactly Trick’s age at the time. So when he found a woman who was a better match than all those he had dated so far, he knew exactly what to do. He leapt. ([Location 256](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=256)) > We can instead use the Threshold Rule, where we immediately accept an applicant if they are above a certain percentile. We don’t need to look at an initial group of candidates to set this threshold—but we do, however, need to be keenly aware of how much looking remains available. The math shows that when there are a lot of applicants left in the pool, you should pass up even a very good applicant in the hopes of finding someone still better than that—but as your options dwindle, you should be prepared to hire anyone who’s simply better than average. ([Location 319](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=319)) > If it wasn’t above your threshold then, it won’t be above your threshold now. What you’ve paid to keep searching is a sunk cost. Don’t compromise, don’t second-guess. And don’t look back. ([Location 393](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=393)) > Intuitively, we think that rational decision-making means exhaustively enumerating our options, weighing each one carefully, and then selecting the best. But in practice, when the clock—or the ticker—is ticking, few aspects of decision-making (or of thinking more generally) are as important as one: when to stop. ([Location 527](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=527)) ### 4 Caching Forget About It > you first need to decide what to keep, and second, how to arrange it. ([Location 1526](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=1526)) > Your closet presents much the same challenge that a computer faces when managing its memory: space is limited, and the goal is to save both money and time. ([Location 1535](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=1535)) > Caching plays a critical role in the architecture of memory, and it underlies everything from the layout of processor chips at the millimeter scale to the geography of the global Internet. It offers a new perspective on all the various storage systems and memory banks of human life—not only our machines, but also our closets, our offices, our libraries. And our heads. ([Location 1545](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=1545)) > In 1946, Arthur Burks, Herman Goldstine, and John von Neumann, working at the Institute for Advanced Study in Princeton, laid out a design proposal for what they called an electrical “memory organ.” In an ideal world, they wrote, the machine would of course have limitless quantities of lightning-fast storage, but in practice this wasn’t possible. (It still isn’t.) Instead, the trio proposed what they believed to be the next best thing: “a hierarchy of memories, each of which has greater capacity than the preceding but which is less quickly accessible.” By having effectively a pyramid of different forms of memory—a small, fast memory and a large, slow one—maybe we could somehow get the best of both. ([Location 1556](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=1556)) > this smaller and faster memory wasn’t just a convenient place to work with data before saving it off again. It could also be used to deliberately hold on to pieces of information likely to be needed later, anticipating similar future requests—and dramatically speeding up the operation of the machine. If what you needed was still in the working memory, you wouldn’t have to load it from the drum at all. ([Location 1569](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=1569)) > Computer science’s best defense against hitting that wall has been an ever more elaborate hierarchy: caches for caches for caches, all the way down. Modern consumer laptops, tablets, and smartphones have on the order of a six-layer memory hierarchy, and managing memory smartly has never been as important to computer science as it is today. ([Location 1590](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=1590)) > Wilkes wrote, “Since the [cache] can only be a fraction of the size of the main memory, words cannot be preserved in it indefinitely, and there must be wired into the system an algorithm by which they are progressively overwritten.” These algorithms are known as “replacement policies” or “eviction policies,” or simply as caching algorithms. ([Location 1600](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=1600)) > As it explains, the goal of cache management is to minimize the number of times you can’t find what you’re looking for in the cache and must go to the slower main memory to find it; these are known as “page faults” or “cache misses.” The optimal cache eviction policy—essentially by definition, Bélády wrote—is, when the cache is full, to evict whichever item we’ll need again the longest from now. Of course, knowing exactly when you’ll need something again is easier said than done. The hypothetical all-knowing, prescient algorithm that would look ahead and execute the optimal policy is known today in tribute as Bélády’s Algorithm. ([Location 1609](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=1609)) > We could just try Random Eviction, adding new data to the cache and overwriting old data at random. One of the startling early results in caching theory is that, while far from perfect, this approach is not half bad. As it happens, just having a cache at all makes a system more efficient, regardless of how you maintain it. Items you use often will end up back in the cache soon anyway. Another simple strategy is First-In, First-Out (FIFO), where you evict or overwrite whatever has been sitting in the cache the longest (as in Martha Stewart’s question “How long have I had it?”). A third approach is Least Recently Used (LRU): evicting the item that’s gone the longest untouched (Stewart’s “When was the last time I wore it or used it?”). ([Location 1620](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=1620)) > Caching is just as useful when it’s proximity, rather than performance, that’s the scarce resource. This fundamental insight—that in-demand files should be stored near the location where they are used—also translates into purely physical environments. ([Location 1706](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=1706)) > Having a cache is efficient, but having multiple levels of caches—from smallest and fastest to largest and slowest—can be even better. ([Location 1751](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=1751)) > The left-side insertion rule, Noguchi specifies, has to be followed for old files as well as new ones: every time you pull out a file to use its contents, you must put it back as the leftmost file when you return it to the box. And when you search for a file, you always start from the left-hand side as well. The most recently accessed files are thus the fastest to find. ([Location 1773](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=1773)) > if you follow the LRU principle—where you simply always put an item back at the very front of the list—then the total amount of time you spend searching will never be more than twice as long as if you’d known the future. That’s not a guarantee any other algorithm can make. ([Location 1798](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=1798)) > The need for a computer memory hierarchy, in the form of a cascade of caches, is in large part the result of our inability to afford making the entire memory out of the most expensive type of hardware. The fastest cache on current computers, for instance, is made with what’s called SRAM, which costs roughly a thousand times as much per byte as the flash memory in solid-state drives. But the true motivation for caching goes deeper than that. In fact, even if we could get a bespoke machine that used exclusively the fastest form of memory possible, we’d still need caches. As John Hennessy explains, size alone is enough to impair speed: ([Location 1863](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=1863)) > Unavoidably, the larger a memory is, the more time it takes to search for and extract a piece of information from it. ([Location 1873](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=1873)) > Recent work by a team of psychologists and linguists led by Michael Ramscar at the University of Tübingen has suggested that what we call “cognitive decline”—lags and retrieval errors—may not be about the search process slowing or deteriorating, but (at least partly) an unavoidable consequence of the amount of information we have to navigate getting bigger and bigger. ([Location 1884](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=1884)) > No matter how good your organization scheme is, having to search through more things will inevitably take longer. It’s not that we’re forgetting; it’s that we’re remembering. We’re becoming archives. ([Location 1891](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=1891)) ## New highlights added January 18, 2024 at 11:04 AM > Gittins (albeit many years before the first episode of Deal or No Deal aired) realized that the multi-armed bandit problem is no different. For every slot machine we know little or nothing about, there is some guaranteed payout rate which, if offered to us in lieu of that machine, will make us quite content never to pull its handle again. This number—which Gittins called the “dynamic allocation index,” and which the world now knows as the Gittins index—suggests an obvious strategy on the casino floor: always play the arm with the highest index. ([Location 680](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=680)) > Now, actually calculating the Gittins index for a specific machine, given its track record and our discounting rate, is still fairly involved. But once the Gittins index for a particular set of assumptions is known, it can be used for any problem of that form. Crucially, it doesn’t even matter how many arms are involved, since the index for each arm is calculated separately. ([Location 689](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=689)) > something you have no experience with whatsoever is more attractive than a machine that you know pays out 70% of the time! ([Location 705](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=705)) > the exploration bonus is a powerful force. ([Location 709](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=709)) > Exploration in itself has value, since trying new things increases our chances of finding the best. So taking the future into account, rather than focusing just on the present, drives us toward novelty. ([Location 720](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=720)) > If the Gittins index is too complicated, or if you’re not in a situation well characterized by geometric discounting, then you have another option: focus on regret. When we choose what to eat, who to spend time with, or what city to live in, regret looms large—presented with a set of good options, it is easy to torture ourselves with the consequences of making the wrong choice. These regrets are often about the things we failed to do, the options we never tried. ([Location 740](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=740)) > Starting with Lai and Robbins, researchers in recent decades have set about looking for algorithms that offer the guarantee of minimal regret. Of the ones they’ve discovered, the most popular are known as Upper Confidence Bound algorithms. Visual displays of statistics often include so-called error bars that extend above and below any data point, indicating uncertainty in the measurement; the error bars show the range of plausible values that the quantity being measured could actually have. This range is known as the “confidence interval,” and as we gain more data about something the confidence interval will shrink, reflecting an increasingly accurate assessment. ([Location 770](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=770)) > In a multi-armed bandit problem, an Upper Confidence Bound algorithm says, quite simply, to pick the option for which the top of the confidence interval is highest. ([Location 776](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=776)) > Upper Confidence Bound algorithms implement a principle that has been dubbed “optimism in the face of uncertainty.” Optimism, they show, can be perfectly rational. By focusing on the best that an option could be, given the evidence obtained so far, these algorithms give a boost to possibilities we know less about. As a consequence, they naturally inject a dose of exploration into the decision-making process, leaping at new options with enthusiasm because any one of them could be the next big thing. ([Location 787](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=787)) > The success of Upper Confidence Bound algorithms offers a formal justification for the benefit of the doubt. Following the advice of these algorithms, you should be excited to meet new people and try new things—to assume the best about them, in the absence of evidence to the contrary. In the long run, optimism is the best prevention for regret. ([Location 792](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=792)) > We think of the young as stereotypically fickle; the old, stereotypically set in their ways. In fact, both are behaving completely appropriately with respect to their intervals. The deliberate honing of a social network down to the most meaningful relationships is the rational response to having less time to enjoy them. Recognizing that old age is a time of exploitation helps provide new perspectives on some of the classic phenomena of aging. For example, while going to college—a new social environment filled with people you haven’t met—is typically a positive, exciting time, going to a retirement home—a new social environment filled with people you haven’t met—can be painful. And that difference is partly the result of where we are on the explore/exploit continuum at those stages of our lives. ([Location 1026](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=1026)) > What an explorer trades off for knowledge is pleasure. The Gittins index and the Upper Confidence Bound, as we’ve seen, inflate the appeal of lesser-known options beyond what we actually expect, since pleasant surprises can pay off many times over. But at the same time, this means that exploration necessarily leads to being let down on most occasions. Shifting the bulk of one’s attention to one’s favorite things should increase quality of life. And it seems like it does: ([Location 1036](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=1036)) ### 3 Sorting Making Order ## New highlights added January 24, 2024 at 9:54 PM > We refer to things like Google and Bing as “search engines,” but that is something of a misnomer: they’re really sort engines. What makes Google so dominant as a means of accessing the world’s information is less that it finds our text within hundreds of millions of webpages—its 1990s competitors could generally do that part well enough—but that it sorts those webpages so well, and only shows us the most relevant ten. ([Location 1087](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=1087)) > The more you take on, the worse it gets. This is the first and most fundamental insight of sorting theory. Scale hurts. ([Location 1105](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=1105)) > one of the best preventives against the computational difficulty of sock sorting is just doing your laundry more often. ([Location 1108](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=1108)) > Computer science has developed a shorthand specifically for measuring algorithmic worst-case scenarios: it’s called “Big-O” notation. Big-O notation has a particular quirk, which is that it’s inexact by design. That is, rather than expressing an algorithm’s performance in minutes and seconds, Big-O notation provides a way to talk about the kind of relationship that holds between the size of the problem and the program’s running time. Because Big-O notation deliberately sheds fine details, what emerges is a schema for dividing problems into different broad classes. ([Location 1134](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=1134)) > Imagine you’re hosting a dinner party with n guests. The time required to clean the house for their arrival doesn’t depend on the number of guests at all. This is the rosiest class of problems there is: called “Big-O of one,” written O(1), it is also known as “constant time.” Notably, Big-O notation doesn’t care a whit how long the cleaning actually takes—just that it’s always the same, totally invariant of the guest list. You’ve got the same work to do if you have one guest as if you have ten, a hundred, or any other n. Now, the time required to pass the roast around the table will be “Big-O of n,” written O(n), also known as “linear time”—with twice the guests, you’ll wait twice as long for the dish to come around. And again, Big-O notation couldn’t care less about the number of courses that get served, or whether they go around for second helpings. ([Location 1139](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=1139)) > What if, as the guests arrived, each one hugged the others in greeting? Your first guest hugs you; your second guest has two hugs to give; your third guest, three. How many hugs will there be in total? This turns out to be “Big-O of n-squared,” written O(n2) and also known as “quadratic time.” Here again, we only care about the basic contours of the relationship between n and time. There’s no O(2n2) for two hugs apiece, or O(n2 + n) for hugs plus passing the food around, or O(n2 + 1) for hugs plus home cleaning. It’s all quadratic time, so O(n2) covers everything. ([Location 1153](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=1153)) > It gets worse from there. There’s “exponential time,” O(2n), where each additional guest doubles your work. Even worse is “factorial time,” O(n!), a class of problems so truly hellish that computer scientists only talk about it when they’re joking—as we were in imagining shuffling a deck until it’s sorted—or when they really, really wish they were. ([Location 1163](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=1163)) > Imagine you want to alphabetize your unsorted collection of books. A natural approach would be just to scan across the shelf looking for out-of-order pairs—Wallace followed by Pynchon, for instance—and flipping them around. Put Pynchon ahead of Wallace, then continue your scan, looping around to the beginning of the shelf each time you reach the end. When you make a complete pass without finding any out-of-order pairs on the entire shelf, then you know the job is done. This is Bubble Sort, and it lands us in quadratic time. There are n books out of order, and each scan through the shelf can move each one at most one position. ([Location 1173](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=1173)) > You might take a different tack—pulling all the books off the shelf and putting them back in place one by one. You’d put the first book in the middle of the shelf, then take the second and compare it to the first, inserting it either to the right or to the left. Picking up the third book, you’d run through the books on the shelf from left to right until you found the right spot to tuck it in. Repeating this process, gradually all of the books would end up sorted on the shelf and you’d be done. Computer scientists call this, appropriately enough, Insertion Sort. The good news is that it’s arguably even more intuitive than Bubble Sort and doesn’t have quite the bad reputation. The bad news is that it’s not actually that much faster. You still have to do one insertion for each book. And each insertion still involves moving past about half the books on the shelf, on average, to find the correct place. Although in practice Insertion Sort does run a bit faster than Bubble Sort, again we land squarely, if you will, in quadratic time. Sorting anything more than a single bookshelf is still an unwieldy prospect. ([Location 1184](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=1184)) ## New highlights added January 24, 2024 at 11:54 PM > The program that John von Neumann wrote in 1945 to demonstrate the power of the stored-program computer took the idea of collating to its beautiful and ultimate conclusion. Sorting two cards is simple: just put the smaller one on top. And given a pair of two-card stacks, both of them sorted, you can easily collate them into an ordered stack of four. Repeating this trick a few times, you’d build bigger and bigger stacks, each one of them already sorted. Soon enough, you could collate yourself a perfectly sorted full deck—with a final climactic merge, like a riffle shuffle’s order-creating twin, producing the desired result. This approach is known today as Mergesort, one of the legendary algorithms in computer science. As a 1997 paper put it, “Mergesort is as important in the history of sorting as sorting in the history of computing.” ([Location 1217](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=1217)) > If you’re still strategizing about that bookshelf, the Mergesort solution would be to order a pizza and invite over a few friends. Divide the books evenly, and have each person sort their own stack. Then pair people up and have them merge their stacks. Repeat this process until there are just two stacks left, and merge them one last time onto the shelf. ([Location 1233](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=1233)) --- Title: Algorithms to Live By Author: Brian Christian, Tom Griffiths Tags: readwise, books date: 2024-01-30 --- # Algorithms to Live By ![rw-book-cover](https://images-na.ssl-images-amazon.com/images/I/51FhJXhhK6L._SL200_.jpg) Author:: Brian Christian, Tom Griffiths ## AI-Generated Summary None ## Highlights > It is the provably optimal solution. We know this because finding an apartment belongs to a class of mathematical problems known as “optimal stopping” problems. The 37% rule defines a simple series of steps—what computer scientists call an “algorithm”—for solving these problems. ([Location 45](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=45)) > At the next level, computer science gives us a vocabulary for understanding the deeper principles at play in each of these domains. As Carl Sagan put it, “Science is a way of thinking much more than it is a body of knowledge.” ([Location 85](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=85)) > As computers become better tuned to real-world problems, they provide not only algorithms that people can borrow for their own lives, but a better standard against which to compare human cognition itself. ([Location 104](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=104)) ### 1 Optimal Stopping > In any optimal stopping problem, the crucial dilemma is not which option to pick, but how many options to even consider. ([Location 162](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=162)) > In your search for a secretary, there are two ways you can fail: stopping early and stopping late. When you stop too early, you leave the best applicant undiscovered. When you stop too late, you hold out for a better applicant who doesn’t exist. ([Location 199](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=199)) > the optimal solution takes the form of what we’ll call the Look-Then-Leap Rule: You set a predetermined amount of time for “looking”—that is, exploring your options, gathering data—in which you categorically don’t choose anyone, no matter how impressive. After that point, you enter the “leap” phase, prepared to instantly commit to anyone who outshines the best applicant you saw in the look phase. ([Location 214](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=214)) > As the applicant pool grows, the exact place to draw the line between looking and leaping settles to 37% of the pool, yielding the 37% Rule: look at the first 37% of the applicants,* choosing none, then be ready to leap for anyone better than all those you’ve seen so far. ([Location 231](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=231)) > If you’re stopping optimally, your chance of finding the single best applicant in a pool of a hundred is 37%. And in a pool of a million, believe it or not, your chance is still 37%. Thus the bigger the applicant pool gets, the more valuable knowing the optimal algorithm becomes. It’s true that you’re unlikely to find the needle the majority of the time, but optimal stopping is your best defense against the haystack, no matter how large. ([Location 244](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=244)) > He didn’t know how many women he could expect to meet in his lifetime, but there’s a certain flexibility in the 37% Rule: it can be applied to either the number of applicants or the time over which one is searching. Assuming that his search would run from ages eighteen to forty, the 37% Rule gave age 26.1 years as the point at which to switch from looking to leaping. A number that, as it happened, was exactly Trick’s age at the time. So when he found a woman who was a better match than all those he had dated so far, he knew exactly what to do. He leapt. ([Location 256](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=256)) > We can instead use the Threshold Rule, where we immediately accept an applicant if they are above a certain percentile. We don’t need to look at an initial group of candidates to set this threshold—but we do, however, need to be keenly aware of how much looking remains available. The math shows that when there are a lot of applicants left in the pool, you should pass up even a very good applicant in the hopes of finding someone still better than that—but as your options dwindle, you should be prepared to hire anyone who’s simply better than average. ([Location 319](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=319)) > If it wasn’t above your threshold then, it won’t be above your threshold now. What you’ve paid to keep searching is a sunk cost. Don’t compromise, don’t second-guess. And don’t look back. ([Location 393](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=393)) > Intuitively, we think that rational decision-making means exhaustively enumerating our options, weighing each one carefully, and then selecting the best. But in practice, when the clock—or the ticker—is ticking, few aspects of decision-making (or of thinking more generally) are as important as one: when to stop. ([Location 527](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=527)) ### 2 Explore/Exploit The Latest vs. the Greatest > Every day we are constantly forced to make decisions between options that differ in a very specific dimension: do we try new things or stick with our favorite ones? ([Location 537](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=537)) > But to a computer scientist, these words have much more specific and neutral meanings. Simply put, exploration is gathering information, and exploitation is using the information you have to get a known good result. ([Location 552](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=552)) > In computer science, the tension between exploration and exploitation takes its most concrete form in a scenario called the “multi-armed bandit problem.” The odd name comes from the colloquial term for a casino slot machine, the “one-armed bandit.” Imagine walking into a casino full of different slot machines, each one with its own odds of a payoff. The rub, of course, is that you aren’t told those odds in advance: ([Location 567](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=567)) > We have the expression “Eat, drink, and be merry, for tomorrow we die,” but perhaps we should also have its inverse: “Start learning a new language or an instrument, and make small talk with a stranger, because life is long, and who knows what joy could blossom over many years’ time.” When balancing favorite experiences and new ones, nothing matters as much as the interval over which we plan to enjoy them. ([Location 593](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=593)) > A sobering property of trying new things is that the value of exploration, of finding a new favorite, can only go down over time, as the remaining opportunities to savor it dwindle. ([Location 602](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=602)) > Robbins specifically considered the case where there are exactly two slot machines, and proposed a solution called the Win-Stay, Lose-Shift algorithm: choose an arm at random, and keep pulling it as long as it keeps paying off. If the arm doesn’t pay off after a particular pull, then switch to the other one. Although this simple strategy is far from a complete solution, Robbins proved in 1952 that it performs reliably better than chance. ([Location 629](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=629)) > Gittins (albeit many years before the first episode of Deal or No Deal aired) realized that the multi-armed bandit problem is no different. For every slot machine we know little or nothing about, there is some guaranteed payout rate which, if offered to us in lieu of that machine, will make us quite content never to pull its handle again. This number—which Gittins called the “dynamic allocation index,” and which the world now knows as the Gittins index—suggests an obvious strategy on the casino floor: always play the arm with the highest index. ([Location 680](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=680)) > Now, actually calculating the Gittins index for a specific machine, given its track record and our discounting rate, is still fairly involved. But once the Gittins index for a particular set of assumptions is known, it can be used for any problem of that form. Crucially, it doesn’t even matter how many arms are involved, since the index for each arm is calculated separately. ([Location 689](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=689)) > something you have no experience with whatsoever is more attractive than a machine that you know pays out 70% of the time! ([Location 705](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=705)) > the exploration bonus is a powerful force. ([Location 709](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=709)) > Exploration in itself has value, since trying new things increases our chances of finding the best. So taking the future into account, rather than focusing just on the present, drives us toward novelty. ([Location 720](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=720)) > If the Gittins index is too complicated, or if you’re not in a situation well characterized by geometric discounting, then you have another option: focus on regret. When we choose what to eat, who to spend time with, or what city to live in, regret looms large—presented with a set of good options, it is easy to torture ourselves with the consequences of making the wrong choice. These regrets are often about the things we failed to do, the options we never tried. ([Location 740](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=740)) > Starting with Lai and Robbins, researchers in recent decades have set about looking for algorithms that offer the guarantee of minimal regret. Of the ones they’ve discovered, the most popular are known as Upper Confidence Bound algorithms. Visual displays of statistics often include so-called error bars that extend above and below any data point, indicating uncertainty in the measurement; the error bars show the range of plausible values that the quantity being measured could actually have. This range is known as the “confidence interval,” and as we gain more data about something the confidence interval will shrink, reflecting an increasingly accurate assessment. ([Location 770](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=770)) > In a multi-armed bandit problem, an Upper Confidence Bound algorithm says, quite simply, to pick the option for which the top of the confidence interval is highest. ([Location 776](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=776)) > Upper Confidence Bound algorithms implement a principle that has been dubbed “optimism in the face of uncertainty.” Optimism, they show, can be perfectly rational. By focusing on the best that an option could be, given the evidence obtained so far, these algorithms give a boost to possibilities we know less about. As a consequence, they naturally inject a dose of exploration into the decision-making process, leaping at new options with enthusiasm because any one of them could be the next big thing. ([Location 787](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=787)) > The success of Upper Confidence Bound algorithms offers a formal justification for the benefit of the doubt. Following the advice of these algorithms, you should be excited to meet new people and try new things—to assume the best about them, in the absence of evidence to the contrary. In the long run, optimism is the best prevention for regret. ([Location 792](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=792)) > We think of the young as stereotypically fickle; the old, stereotypically set in their ways. In fact, both are behaving completely appropriately with respect to their intervals. The deliberate honing of a social network down to the most meaningful relationships is the rational response to having less time to enjoy them. Recognizing that old age is a time of exploitation helps provide new perspectives on some of the classic phenomena of aging. For example, while going to college—a new social environment filled with people you haven’t met—is typically a positive, exciting time, going to a retirement home—a new social environment filled with people you haven’t met—can be painful. And that difference is partly the result of where we are on the explore/exploit continuum at those stages of our lives. ([Location 1026](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=1026)) > What an explorer trades off for knowledge is pleasure. The Gittins index and the Upper Confidence Bound, as we’ve seen, inflate the appeal of lesser-known options beyond what we actually expect, since pleasant surprises can pay off many times over. But at the same time, this means that exploration necessarily leads to being let down on most occasions. Shifting the bulk of one’s attention to one’s favorite things should increase quality of life. And it seems like it does: ([Location 1036](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=1036)) ### 3 Sorting Making Order > We refer to things like Google and Bing as “search engines,” but that is something of a misnomer: they’re really sort engines. What makes Google so dominant as a means of accessing the world’s information is less that it finds our text within hundreds of millions of webpages—its 1990s competitors could generally do that part well enough—but that it sorts those webpages so well, and only shows us the most relevant ten. ([Location 1087](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=1087)) > The more you take on, the worse it gets. This is the first and most fundamental insight of sorting theory. Scale hurts. ([Location 1105](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=1105)) > one of the best preventives against the computational difficulty of sock sorting is just doing your laundry more often. ([Location 1108](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=1108)) > Computer science has developed a shorthand specifically for measuring algorithmic worst-case scenarios: it’s called “Big-O” notation. Big-O notation has a particular quirk, which is that it’s inexact by design. That is, rather than expressing an algorithm’s performance in minutes and seconds, Big-O notation provides a way to talk about the kind of relationship that holds between the size of the problem and the program’s running time. Because Big-O notation deliberately sheds fine details, what emerges is a schema for dividing problems into different broad classes. ([Location 1134](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=1134)) > Imagine you’re hosting a dinner party with n guests. The time required to clean the house for their arrival doesn’t depend on the number of guests at all. This is the rosiest class of problems there is: called “Big-O of one,” written O(1), it is also known as “constant time.” Notably, Big-O notation doesn’t care a whit how long the cleaning actually takes—just that it’s always the same, totally invariant of the guest list. You’ve got the same work to do if you have one guest as if you have ten, a hundred, or any other n. Now, the time required to pass the roast around the table will be “Big-O of n,” written O(n), also known as “linear time”—with twice the guests, you’ll wait twice as long for the dish to come around. And again, Big-O notation couldn’t care less about the number of courses that get served, or whether they go around for second helpings. ([Location 1139](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=1139)) > What if, as the guests arrived, each one hugged the others in greeting? Your first guest hugs you; your second guest has two hugs to give; your third guest, three. How many hugs will there be in total? This turns out to be “Big-O of n-squared,” written O(n2) and also known as “quadratic time.” Here again, we only care about the basic contours of the relationship between n and time. There’s no O(2n2) for two hugs apiece, or O(n2 + n) for hugs plus passing the food around, or O(n2 + 1) for hugs plus home cleaning. It’s all quadratic time, so O(n2) covers everything. ([Location 1153](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=1153)) > It gets worse from there. There’s “exponential time,” O(2n), where each additional guest doubles your work. Even worse is “factorial time,” O(n!), a class of problems so truly hellish that computer scientists only talk about it when they’re joking—as we were in imagining shuffling a deck until it’s sorted—or when they really, really wish they were. ([Location 1163](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=1163)) > Imagine you want to alphabetize your unsorted collection of books. A natural approach would be just to scan across the shelf looking for out-of-order pairs—Wallace followed by Pynchon, for instance—and flipping them around. Put Pynchon ahead of Wallace, then continue your scan, looping around to the beginning of the shelf each time you reach the end. When you make a complete pass without finding any out-of-order pairs on the entire shelf, then you know the job is done. This is Bubble Sort, and it lands us in quadratic time. There are n books out of order, and each scan through the shelf can move each one at most one position. ([Location 1173](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=1173)) > You might take a different tack—pulling all the books off the shelf and putting them back in place one by one. You’d put the first book in the middle of the shelf, then take the second and compare it to the first, inserting it either to the right or to the left. Picking up the third book, you’d run through the books on the shelf from left to right until you found the right spot to tuck it in. Repeating this process, gradually all of the books would end up sorted on the shelf and you’d be done. Computer scientists call this, appropriately enough, Insertion Sort. The good news is that it’s arguably even more intuitive than Bubble Sort and doesn’t have quite the bad reputation. The bad news is that it’s not actually that much faster. You still have to do one insertion for each book. And each insertion still involves moving past about half the books on the shelf, on average, to find the correct place. Although in practice Insertion Sort does run a bit faster than Bubble Sort, again we land squarely, if you will, in quadratic time. Sorting anything more than a single bookshelf is still an unwieldy prospect. ([Location 1184](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=1184)) > The program that John von Neumann wrote in 1945 to demonstrate the power of the stored-program computer took the idea of collating to its beautiful and ultimate conclusion. Sorting two cards is simple: just put the smaller one on top. And given a pair of two-card stacks, both of them sorted, you can easily collate them into an ordered stack of four. Repeating this trick a few times, you’d build bigger and bigger stacks, each one of them already sorted. Soon enough, you could collate yourself a perfectly sorted full deck—with a final climactic merge, like a riffle shuffle’s order-creating twin, producing the desired result. This approach is known today as Mergesort, one of the legendary algorithms in computer science. As a 1997 paper put it, “Mergesort is as important in the history of sorting as sorting in the history of computing.” ([Location 1217](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=1217)) > If you’re still strategizing about that bookshelf, the Mergesort solution would be to order a pizza and invite over a few friends. Divide the books evenly, and have each person sort their own stack. Then pair people up and have them merge their stacks. Repeat this process until there are just two stacks left, and merge them one last time onto the shelf. ([Location 1233](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=1233)) ### 4 Caching Forget About It > you first need to decide what to keep, and second, how to arrange it. ([Location 1526](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=1526)) > Your closet presents much the same challenge that a computer faces when managing its memory: space is limited, and the goal is to save both money and time. ([Location 1535](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=1535)) > Caching plays a critical role in the architecture of memory, and it underlies everything from the layout of processor chips at the millimeter scale to the geography of the global Internet. It offers a new perspective on all the various storage systems and memory banks of human life—not only our machines, but also our closets, our offices, our libraries. And our heads. ([Location 1545](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=1545)) > In 1946, Arthur Burks, Herman Goldstine, and John von Neumann, working at the Institute for Advanced Study in Princeton, laid out a design proposal for what they called an electrical “memory organ.” In an ideal world, they wrote, the machine would of course have limitless quantities of lightning-fast storage, but in practice this wasn’t possible. (It still isn’t.) Instead, the trio proposed what they believed to be the next best thing: “a hierarchy of memories, each of which has greater capacity than the preceding but which is less quickly accessible.” By having effectively a pyramid of different forms of memory—a small, fast memory and a large, slow one—maybe we could somehow get the best of both. ([Location 1556](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=1556)) > this smaller and faster memory wasn’t just a convenient place to work with data before saving it off again. It could also be used to deliberately hold on to pieces of information likely to be needed later, anticipating similar future requests—and dramatically speeding up the operation of the machine. If what you needed was still in the working memory, you wouldn’t have to load it from the drum at all. ([Location 1569](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=1569)) > Computer science’s best defense against hitting that wall has been an ever more elaborate hierarchy: caches for caches for caches, all the way down. Modern consumer laptops, tablets, and smartphones have on the order of a six-layer memory hierarchy, and managing memory smartly has never been as important to computer science as it is today. ([Location 1590](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=1590)) > Wilkes wrote, “Since the [cache] can only be a fraction of the size of the main memory, words cannot be preserved in it indefinitely, and there must be wired into the system an algorithm by which they are progressively overwritten.” These algorithms are known as “replacement policies” or “eviction policies,” or simply as caching algorithms. ([Location 1600](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=1600)) > As it explains, the goal of cache management is to minimize the number of times you can’t find what you’re looking for in the cache and must go to the slower main memory to find it; these are known as “page faults” or “cache misses.” The optimal cache eviction policy—essentially by definition, Bélády wrote—is, when the cache is full, to evict whichever item we’ll need again the longest from now. Of course, knowing exactly when you’ll need something again is easier said than done. The hypothetical all-knowing, prescient algorithm that would look ahead and execute the optimal policy is known today in tribute as Bélády’s Algorithm. ([Location 1609](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=1609)) > We could just try Random Eviction, adding new data to the cache and overwriting old data at random. One of the startling early results in caching theory is that, while far from perfect, this approach is not half bad. As it happens, just having a cache at all makes a system more efficient, regardless of how you maintain it. Items you use often will end up back in the cache soon anyway. Another simple strategy is First-In, First-Out (FIFO), where you evict or overwrite whatever has been sitting in the cache the longest (as in Martha Stewart’s question “How long have I had it?”). A third approach is Least Recently Used (LRU): evicting the item that’s gone the longest untouched (Stewart’s “When was the last time I wore it or used it?”). ([Location 1620](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=1620)) > Caching is just as useful when it’s proximity, rather than performance, that’s the scarce resource. This fundamental insight—that in-demand files should be stored near the location where they are used—also translates into purely physical environments. ([Location 1706](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=1706)) > Having a cache is efficient, but having multiple levels of caches—from smallest and fastest to largest and slowest—can be even better. ([Location 1751](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=1751)) > The left-side insertion rule, Noguchi specifies, has to be followed for old files as well as new ones: every time you pull out a file to use its contents, you must put it back as the leftmost file when you return it to the box. And when you search for a file, you always start from the left-hand side as well. The most recently accessed files are thus the fastest to find. ([Location 1773](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=1773)) > if you follow the LRU principle—where you simply always put an item back at the very front of the list—then the total amount of time you spend searching will never be more than twice as long as if you’d known the future. That’s not a guarantee any other algorithm can make. ([Location 1798](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=1798)) > The need for a computer memory hierarchy, in the form of a cascade of caches, is in large part the result of our inability to afford making the entire memory out of the most expensive type of hardware. The fastest cache on current computers, for instance, is made with what’s called SRAM, which costs roughly a thousand times as much per byte as the flash memory in solid-state drives. But the true motivation for caching goes deeper than that. In fact, even if we could get a bespoke machine that used exclusively the fastest form of memory possible, we’d still need caches. As John Hennessy explains, size alone is enough to impair speed: ([Location 1863](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=1863)) > Unavoidably, the larger a memory is, the more time it takes to search for and extract a piece of information from it. ([Location 1873](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=1873)) > Recent work by a team of psychologists and linguists led by Michael Ramscar at the University of Tübingen has suggested that what we call “cognitive decline”—lags and retrieval errors—may not be about the search process slowing or deteriorating, but (at least partly) an unavoidable consequence of the amount of information we have to navigate getting bigger and bigger. ([Location 1884](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=1884)) > No matter how good your organization scheme is, having to search through more things will inevitably take longer. It’s not that we’re forgetting; it’s that we’re remembering. We’re becoming archives. ([Location 1891](https://readwise.io/to_kindle?action=open&asin=B015CKNWJI&location=1891))