- [[Statistics]] [[Data science]] - Date Created: [[2020-09-26]] - Source: - Calculator for sample size: http://www.raosoft.com/samplesize.html - Weakness: This calculation is based on a normal distribution. - The sample size is the number of observations gathered from which a statistical conclusion will be made. - The sample size is important because a small sample may be insufficiently representative of the target population, while a larger sample may be infeasible to collect. Ideally, a data scientist would strike a balance between the two, choosing a sample size that is large enough to allow comparisons to be drawn from the test population and applied to the target population __and__ small enough that its collection is efficient. - The size of a good sample also depends on the following factors: - Required [[Confidence interval]] - Maximum [[Margin of error]] that can be tolerated - The size of the [[Target population]] - The shape of the [[Probability distribution]] of the target population - A [[Normal distribution]] would yield very different probabilities than a [[Poisson Probability Distribution]] - [[Sample sizes for load testing]]