# CIRCA:Basic Statistical Analysis

### From CIRCA

Line 3: | Line 3: | ||

== Definition == | == Definition == | ||

- | ==== What Statistics | + | ==== What Statistics is Statistics ==== |

- | + | Statistics involve both the collection of data, and numerous methods for analyzing and interpreting the data. Statistics is useful for infering relationship between data sets, extracting properties from data sets, and visualizing patterns in data sets. | |

+ | |||

+ | Those involved in Statistics should be careful to note that the relationship between statistical fact and facts about the objects under study is only as good as the relationship between the data set and the objects under study. Statistics never produces facts about the objects directly and only analyses the data under study. Statistical analysis is just as biased as the data being analysed, yet this bias is difficult to see from the perspective of someone without a firm understanding of the method of statistical analysis. | ||

== How Statistics Work == | == How Statistics Work == | ||

+ | |||

+ | Statistics makes a number of very broad assumptions about any data set that is being studied. The most important assumptions will be discussed below. | ||

==== Random Number ==== | ==== Random Number ==== | ||

+ | |||

+ | Statistics assumes that all measurments are random. The true measurement of any numerical value is not a specific value but instead follows some form of probability distribution. In the general case, the standard distribution is assumed; however, there are numerous other mathematical distributions available for comparison. | ||

==== Rule of Large Numbers ==== | ==== Rule of Large Numbers ==== | ||

+ | |||

+ | Statistics are generally collected for a subset of the group of objects under study. For example, as it is nearly impossible to collect data on every book ever written, a representative subset can be studied instead. Any subset of a larger dataset will not accuratly represent the true distribution of data. | ||

+ | |||

+ | The Rule of large numbers governs the relationship between subsets and data sets. It states that the larger the subset the more accuratly the distribution of the subset will reflect the larger set. For example, rolling a six sided dice sixty times will produce a distribution closer to the actual distribution then simply rolling the same dice ten times. | ||

==== The P Value ==== | ==== The P Value ==== | ||

+ | |||

+ | Because statistics makes use of probabilities, there is allways a chance that percived relationships can emerge from data where no relationship actually exists. Most statistical methods will produce a P value which is effectivly the probability of such a coincodence. A low P value indicates a high probability that there is real relationship in the data while a high P value indicates that any percived relationship is probobly due to random fluctuations in the data. | ||

+ | |||

+ | It is important to pick a P value prior to performing any statistical analysis. In general a P value less then 5% is desired; however, for important or sensative data P values lower then 1% and lower are generally desired. | ||

==== Interpreting Corelation ==== | ==== Interpreting Corelation ==== | ||

+ | |||

+ | A strong statistical relationship between two data sets does not imply that one causes the other. It only implies that one is a good predictor of the other, or that these two values are somehow related. Statistical relationships cannot be used to prove that one object is causally linked to another object; only that these two objects have a relationship that merits further study. | ||

==Examples == | ==Examples == |

## Revision as of 23:47, 2 April 2013

## Contents |

# Basic Statistical Analysis

## Definition

#### What Statistics is Statistics

Statistics involve both the collection of data, and numerous methods for analyzing and interpreting the data. Statistics is useful for infering relationship between data sets, extracting properties from data sets, and visualizing patterns in data sets.

Those involved in Statistics should be careful to note that the relationship between statistical fact and facts about the objects under study is only as good as the relationship between the data set and the objects under study. Statistics never produces facts about the objects directly and only analyses the data under study. Statistical analysis is just as biased as the data being analysed, yet this bias is difficult to see from the perspective of someone without a firm understanding of the method of statistical analysis.

## How Statistics Work

Statistics makes a number of very broad assumptions about any data set that is being studied. The most important assumptions will be discussed below.

#### Random Number

Statistics assumes that all measurments are random. The true measurement of any numerical value is not a specific value but instead follows some form of probability distribution. In the general case, the standard distribution is assumed; however, there are numerous other mathematical distributions available for comparison.

#### Rule of Large Numbers

Statistics are generally collected for a subset of the group of objects under study. For example, as it is nearly impossible to collect data on every book ever written, a representative subset can be studied instead. Any subset of a larger dataset will not accuratly represent the true distribution of data.

The Rule of large numbers governs the relationship between subsets and data sets. It states that the larger the subset the more accuratly the distribution of the subset will reflect the larger set. For example, rolling a six sided dice sixty times will produce a distribution closer to the actual distribution then simply rolling the same dice ten times.

#### The P Value

Because statistics makes use of probabilities, there is allways a chance that percived relationships can emerge from data where no relationship actually exists. Most statistical methods will produce a P value which is effectivly the probability of such a coincodence. A low P value indicates a high probability that there is real relationship in the data while a high P value indicates that any percived relationship is probobly due to random fluctuations in the data.

It is important to pick a P value prior to performing any statistical analysis. In general a P value less then 5% is desired; however, for important or sensative data P values lower then 1% and lower are generally desired.

#### Interpreting Corelation

A strong statistical relationship between two data sets does not imply that one causes the other. It only implies that one is a good predictor of the other, or that these two values are somehow related. Statistical relationships cannot be used to prove that one object is causally linked to another object; only that these two objects have a relationship that merits further study.