Revision as of 21:02, 10 April 2013

[This page is not finished.]

Basic Statistical Analysis

Definition

What is Statistics

Statistics is a quantitative and numerical method that is primarily involved with the collection and interpretation of statistical data. Statistics are frequently used to find relationships between data sets, extracting important properties from data sets, and visualizations.

As Stats are primarily a numerical method, they only produce facts about the numbers themselves and not the objects under study. A statistical relationship is only as good as the validity of the data being analyzed. A firm understanding of both the statistical method and the numerical interpretations are important in order to get the most out of any statistical analysis.

How Statistics Work

Statistics makes a number of very broad assumptions about any data set that is being studied. The most important assumptions will be discussed below.

Random Number

Statistics are generally collected from a representative sample of a larger group of object. Statistics always assumes that each measurement is a single randomly generated value that follows a distribution property associated with the larger group. The most commonly assumed probability distribution is the standard distribution which assumes that most measurements will be close to the classes mean value; variations from the mean are possible but get rarer as distance from the mean increases. One of the main goals of statistics is to find this probability distribution and use it to make inferences about objects outside the representative set.

There are numerous other mathematical distribution available for study.

Rule of Large Numbers

The rule of large numbers is a simple relationship between the distribution of a sample set, and the distribution of the entire class of objects. In general, larger sample sizes better represent the class as a whole.

For a dice has an even probability distribution. In theory each side of a dice has equal probability of coming up if rolled. If I were to roll ten dice, then the probability distribution will probably not come up even with some outcomes coming up much more frequently then others. If I were to roll the same dice sixty times, the distribution would be much more even. The rule of large numbers stats that the more I roll the dice, the closer the outcomes will be.

The P Value

Because statistics makes use of probabilities, there is always a chance that perceived relationships can emerge from data where no relationships actually exists. Most statistical methods will produce a P-value which is effectively the probability of such a coincidence. A low P value indicates a high confidence that the outcome does not represent random fluctuations in the data. High p-values represent low confidence and indicate no relationship beyond simple noise.

It is important to pick a P value prior to performing any statistical analysis. In general a P value less then 5% is desired; however, for important or sensitive data P values of 1% or lower are generally desired.

Interpreting Correlation

Statistics never produce a causal model of anything that it analyses. A strong statistical relationship between two data sets does not imply that one causes the other. It only implies that one is a good predictor of the other, or that these two values are somehow related. Statistical relationships cannot be used to prove that one object is causally linked to another object; only that these two objects are somehow related.

@@ Line 17: / Line 17: @@
 ==== Random Number ====
-Statistics assumes that all measurments are random. The true measurement of any numerical value is not a specific value but instead follows some form of probability distribution. In the general case, the standard distribution is assumed; however, there are numerous other mathematical distributions available for comparison.
+Statistics are generally collected from a representative sample of a larger group of object. Statistics always assumes that each measurement is a single randomly generated value that follows a distribution property associated with the larger group. The most commonly assumed probability distribution is the standard distribution which assumes that most measurements will be close to the classes mean value; variations from the mean are possible but get rarer as distance from the mean increases. One of the main goals of statistics is to find this probability distribution and use it to make inferences about objects outside the representative set.
+There are numerous other mathematical distribution available for study.
 ==== Rule of Large Numbers ====
-Statistics are generally collected for a subset of the group of objects under study. For example, as it is nearly impossible to collect data on every book ever written, a representative subset can be studied instead. Any subset of a larger dataset will not accuratly represent the true distribution of data.
+The rule of large numbers is a simple relationship between the distribution of a sample set, and the distribution of the entire class of objects. In general, larger sample sizes better represent the class as a whole.
-The Rule of large numbers governs the relationship between subsets and data sets. It states that the larger the subset the more accuratly the distribution of the subset will reflect the larger set. For example, rolling a six sided dice sixty times will produce a distribution closer to the actual distribution then simply rolling the same dice ten times.
+For a dice has an even probability distribution. In theory each side of a dice has equal probability of coming up if rolled. If I were to roll ten dice, then the probability distribution will probably not come up even with some outcomes coming up much more frequently then others. If I were to roll the same dice sixty times, the distribution would be much more even. The rule of large numbers stats that the more I roll the dice, the closer the outcomes will be.
 ==== The P Value ====
-Because statistics makes use of probabilities, there is allways a chance that percived relationships can emerge from data where no relationship actually exists. Most statistical methods will produce a P value which is effectivly the probability of such a coincodence. A low P value indicates a high probability that there is real relationship in the data while a high P value indicates that any percived relationship is probobly due to random fluctuations in the data.
+Because statistics makes use of probabilities, there is always a chance that perceived relationships can emerge from data where no relationships actually exists. Most statistical methods will produce a P-value which is effectively the probability of such a coincidence. A low P value indicates a high confidence that the outcome does not represent random fluctuations in the data. High p-values represent low confidence and indicate no relationship beyond simple noise.
-It is important to pick a P value prior to performing any statistical analysis. In general a P value less then 5% is desired; however, for important or sensative data P values lower then 1% and lower are generally desired.
+It is important to pick a P value prior to performing any statistical analysis. In general a P value less then 5% is desired; however, for important or sensitive data P values of 1% or lower are generally desired.
-==== Interpreting Corelation ====
+==== Interpreting Correlation ====
-A strong statistical relationship between two data sets does not imply that one causes the other. It only implies that one is a good predictor of the other, or that these two values are somehow related. Statistical relationships cannot be used to prove that one object is causally linked to another object; only that these two objects have a relationship that merits further study.
+Statistics never produce a causal model of anything that it analyses. A strong statistical relationship between two data sets does not imply that one causes the other. It only implies that one is a good predictor of the other, or that these two values are somehow related. Statistical relationships cannot be used to prove that one object is causally linked to another object; only that these two objects are somehow related.
 ==Examples ==

CIRCA:Basic Statistical Analysis

From CIRCA

Revision as of 21:02, 10 April 2013

Contents

Basic Statistical Analysis

Definition

What is Statistics

How Statistics Work

Random Number

Rule of Large Numbers

The P Value

Interpreting Correlation

Examples

T - Test

Linear Regression

References and Further Readings

Views

Personal tools

Navigation

Search

Toolbox