- [[Statistics]] [[Data science]]
- Date Created: [[2020-09-26]]
- Source:
- Calculator for sample size: http://www.raosoft.com/samplesize.html
- Weakness: This calculation is based on a normal distribution.
- The sample size is the number of observations gathered from which a statistical conclusion will be made.
- The sample size is important because a small sample may be insufficiently representative of the target population, while a larger sample may be infeasible to collect. Ideally, a data scientist would strike a balance between the two, choosing a sample size that is large enough to allow comparisons to be drawn from the test population and applied to the target population __and__ small enough that its collection is efficient.
- The size of a good sample also depends on the following factors:
- Required [[Confidence interval]]
- Maximum [[Margin of error]] that can be tolerated
- The size of the [[Target population]]
- The shape of the [[Probability distribution]] of the target population
- A [[Normal distribution]] would yield very different probabilities than a [[Poisson Probability Distribution]]
- [[Sample sizes for load testing]]