# Basic Statistical Analysis

## Definition

#### What is Statistics

Statistics is a quantitative and numerical method that is primarily involved with the collection and interpretation of statistical data. Statistics are frequently used to find relationships between data sets, extracting important properties from data sets, and visualizations.

As Stats are primarily a numerical method, they only produce facts about the numbers themselves and not the objects under study. A statistical relationship is only as good as the validity of the data being analyzed. A firm understanding of both the statistical method and the numerical interpretations are important in order to get the most out of any statistical analysis.

## How Statistics Work

Statistics makes a number of very broad assumptions about any data set that is being studied. The most important assumptions will be discussed below.

#### Random Number

Statistics assumes that all measurments are random. The true measurement of any numerical value is not a specific value but instead follows some form of probability distribution. In the general case, the standard distribution is assumed; however, there are numerous other mathematical distributions available for comparison.

#### Rule of Large Numbers

Statistics are generally collected for a subset of the group of objects under study. For example, as it is nearly impossible to collect data on every book ever written, a representative subset can be studied instead. Any subset of a larger dataset will not accuratly represent the true distribution of data.

The Rule of large numbers governs the relationship between subsets and data sets. It states that the larger the subset the more accuratly the distribution of the subset will reflect the larger set. For example, rolling a six sided dice sixty times will produce a distribution closer to the actual distribution then simply rolling the same dice ten times.

#### The P Value

Because statistics makes use of probabilities, there is allways a chance that percived relationships can emerge from data where no relationship actually exists. Most statistical methods will produce a P value which is effectivly the probability of such a coincodence. A low P value indicates a high probability that there is real relationship in the data while a high P value indicates that any percived relationship is probobly due to random fluctuations in the data.

It is important to pick a P value prior to performing any statistical analysis. In general a P value less then 5% is desired; however, for important or sensative data P values lower then 1% and lower are generally desired.

#### Interpreting Corelation

A strong statistical relationship between two data sets does not imply that one causes the other. It only implies that one is a good predictor of the other, or that these two values are somehow related. Statistical relationships cannot be used to prove that one object is causally linked to another object; only that these two objects have a relationship that merits further study.