Same Stats Different Graphs

- Last Updated: [[2020-12-28]] - [[Statistics]] - Source: https://damassets.autodesk.net/content/dam/autodesk/www/autodesk-reasearch/Publications/pdf/same-stats-different-graphs.pdf - {{pdf: assets/1622579438_248.pdf} - # Abstract - > Datasets which are identical over a number of statistical properties, yet produce dissimilar graphs, are frequently used to illustrate the importance of graphical representations when exploring data. This paper presents a novel method for generating such datasets, along with several examples. Our technique varies from previous approaches in that new datasets are iteratively generated from a seed dataset through random perturbations of individual data points, and can be directed towards a desired outcome through a simulated annealing optimization strategy. Our method has the benefit of being agnostic to the particular statistical properties that are to remain constant between the datasets, and allows for control over the graphical appearance of resulting output. - ![](assets/1622579439_249.png) - > The pseudocode for the high-level algorithm is listed below: - ```javascript current_ds ← initial_ds for x iterations, do: test_ds ← PERTURB(current_ds, temp) if ISERROROK(test_ds, initial_ds): current_ds ← test_ds function PERTURB(ds, temp): loop: test ← MOVERANDOMPOINTS(ds) if FIT(test) > FIT(ds) or temp > RANDOM(): return test``` - # Literature notes - [[Anscombe's Quartet]] consists of four different data sets that were purposefully chosen because they exhibit nearly identical statistics (mean, average, standard deviation, etc) but very different graphs. The purpose of the quartet is to illustrate the importance of [[Data Visualization]]. This paper describes a method for generating more of these types of data sets, given certain parameters. -