Basic Statistical Analysis

From CIRCA

Jump to: navigation, search
VTracker
Content deleted. (2 Occurances)
Content inserted. (3 Occurances)
Content structure inserted. (2 Occurances)
Content structure deleted. (2 Occurances)
Content changed. (6 Occurances)

[This page is not finished.]

Contents

Basic Statistical Analysis

Definition

What is Statistics

Statistics is a quantitative and numerical method that is primarily involved with the collection and interpretation of statistical data. Statistics are frequently used to find relationships between data sets, extracting important properties from data sets, and visualizations.

As Stats are primarily a numerical method, they only produce facts about the numbers themselves and not the objects under study. A statistical relationship is only as good as the validity of the data being analyzed. A firm understanding of both the statistical method and the numerical interpretations are important in order to get the most out of any statistical analysis.

How Statistics Work

Statistics makes a number of very broad assumptions about any data set that is being studied. The most important assumptions will be discussed below.

Random Number

*Statistics are generally collected from a representative sample of a larger group of object. Statistics always assumes that each measurement is a single randomly generated value that follows a distribution property associated with the larger group. The most commonly assumed probability distribution is the standard distribution which assumes that most measurements will be close to the classes mean value; variations from the mean are possible but get rarer as distance from the mean increases. One of the main goals of statistics is to find this probability distribution and use it to make inferences about objects outside the representative set.

There are numerous other mathematical distribution available for study.

Rule of Large Numbers

*

The rule of large numbers is a simple relationship between thedistribution of a sample set, and the distribution of the entireclass of objects. In general, larger sample sizes better representthe class as a whole.

For a dice has an even probability distribution. In theory each side of a dice has equal probability of coming up if rolled. If I were to roll ten dice, then the probability distribution will probably not come up even with some outcomes coming up much more frequently then others. If I were to roll the same dice sixty times, the distribution would be much more even. The rule of large numbers stats that the more I roll the dice, the closer the outcomes will be.

The P Value

Because statistics makes use of probabilities, there is always achance that perceived relationships can emerge from data where norelationships actually exists. Most statistical methods willproduce a P-value which is effectively the probability of such acoincidence. A low P value indicates a high confidence that theoutcome does not represent random fluctuations in the data. Highp-values represent low confidence and indicate no relationshipbeyond simple noise.

It is important to pick a P value prior to performing anystatistical analysis. In general a P value less then 5% is desired;however, for important or sensitive data P values of 1% or lowerare generally desired.

Interpreting Correlation

Statistics never produce a causal model of anything that itanalyses. A strong statistical relationship between two data setsdoes not imply that one causes the other. It only implies that oneis a good predictor of the other, or that these two values aresomehow related. Statistical relationships cannot be used to provethat one object is causally linked to another object; only thatthese two objects are somehow related.

Examples

T - Test
Linear Regression

References and Further Readings

Personal tools