https://circa.cs.ualberta.ca/index.php?title=CIRCA:Basic_Statistical_Analysis&feed=atom&action=historyCIRCA:Basic Statistical Analysis - Revision history2024-03-19T08:12:01ZRevision history for this page on the wikiMediaWiki 1.15.1https://circa.cs.ualberta.ca/index.php?title=CIRCA:Basic_Statistical_Analysis&diff=4297&oldid=prevRecharti: /* Rule of Large Numbers */2013-04-11T04:27:54Z<p><span class="autocomment">Rule of Large Numbers</span></p>
<table style="background-color: white; color:black;">
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr valign='top'>
<td colspan='2' style="background-color: white; color:black;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black;">Revision as of 04:27, 11 April 2013</td>
</tr>
<tr><td colspan="2" class="diff-lineno">Line 25:</td>
<td colspan="2" class="diff-lineno">Line 25:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>For a dice has an even probability distribution. In theory each side of a dice has equal probability of coming up if rolled. If I were to roll ten dice, then the probability distribution will probably not come up even with some outcomes coming up much more frequently then others. If I were to roll the same dice sixty times, the distribution would be much more even. The rule of large numbers stats that the more I roll the dice, the closer the outcomes will be.</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>For a dice has an even probability distribution. In theory each side of a dice has equal probability of coming up if rolled. If I were to roll ten dice, then the probability distribution will probably not come up even with some outcomes coming up much more frequently then others. If I were to roll the same dice sixty times, the distribution would be much more even. The rule of large numbers stats that the more I roll the dice, the closer the outcomes will be.</div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;">[[IMage:Dice.png|Dice distribution]]</ins></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>==== The P Value ====</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>==== The P Value ====</div></td></tr>
<!-- diff generator: internal 2024-03-19 08:12:01 -->
</table>Rechartihttps://circa.cs.ualberta.ca/index.php?title=CIRCA:Basic_Statistical_Analysis&diff=4295&oldid=prevRecharti at 04:24, 11 April 20132013-04-11T04:24:00Z<p></p>
<table style="background-color: white; color:black;">
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr valign='top'>
<td colspan='2' style="background-color: white; color:black;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black;">Revision as of 04:24, 11 April 2013</td>
</tr>
<tr><td colspan="2" class="diff-lineno">Line 1:</td>
<td colspan="2" class="diff-lineno">Line 1:</td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><del style="color: red; font-weight: bold; text-decoration: none;">[This page is not finished.]</del></div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>= Basic Statistical Analysis =</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>= Basic Statistical Analysis =</div></td></tr>
<!-- diff generator: internal 2024-03-19 08:12:01 -->
</table>Rechartihttps://circa.cs.ualberta.ca/index.php?title=CIRCA:Basic_Statistical_Analysis&diff=4294&oldid=prevRecharti: /* What is Statistics */2013-04-11T04:19:03Z<p><span class="autocomment">What is Statistics</span></p>
<table style="background-color: white; color:black;">
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr valign='top'>
<td colspan='2' style="background-color: white; color:black;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black;">Revision as of 04:19, 11 April 2013</td>
</tr>
<tr><td colspan="2" class="diff-lineno">Line 5:</td>
<td colspan="2" class="diff-lineno">Line 5:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>== Definition ==</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>== Definition ==</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>==== What <del class="diffchange diffchange-inline">is </del>Statistics ====</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>==== What <ins class="diffchange diffchange-inline">are </ins>Statistics ====</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>Statistics is a quantitative and numerical method that is primarily involved with the collection and interpretation of statistical data. Statistics are frequently used to find relationships between data sets, extracting important properties from data sets, and visualizations.</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>Statistics is a quantitative and numerical method that is primarily involved with the collection and interpretation of statistical data. Statistics are frequently used to find relationships between data sets, extracting important properties from data sets, and visualizations.</div></td></tr>
<!-- diff generator: internal 2024-03-19 08:12:01 -->
</table>Rechartihttps://circa.cs.ualberta.ca/index.php?title=CIRCA:Basic_Statistical_Analysis&diff=4293&oldid=prevRecharti: /* References and Further Readings */2013-04-11T04:16:25Z<p><span class="autocomment">References and Further Readings</span></p>
<table style="background-color: white; color:black;">
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr valign='top'>
<td colspan='2' style="background-color: white; color:black;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black;">Revision as of 04:16, 11 April 2013</td>
</tr>
<tr><td colspan="2" class="diff-lineno">Line 73:</td>
<td colspan="2" class="diff-lineno">Line 73:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>==References and Further Readings==</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>==References and Further Readings==</div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;">[http://en.wikipedia.org/wiki/Statistics Statistics Wikipedia Article]</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;">[http://www.r-project.org/ R Project Website]</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;">[http://www.vassarstats.net/ VassarStats Website]</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;">[http://www.gutenberg.org/ Project Gutenburg]</ins></div></td></tr>
<!-- diff generator: internal 2024-03-19 08:12:01 -->
</table>Rechartihttps://circa.cs.ualberta.ca/index.php?title=CIRCA:Basic_Statistical_Analysis&diff=4292&oldid=prevRecharti: /* T - Test */2013-04-11T04:06:23Z<p><span class="autocomment">T - Test</span></p>
<table style="background-color: white; color:black;">
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr valign='top'>
<td colspan='2' style="background-color: white; color:black;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black;">Revision as of 04:06, 11 April 2013</td>
</tr>
<tr><td colspan="2" class="diff-lineno">Line 70:</td>
<td colspan="2" class="diff-lineno">Line 70:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div> 125.7389 157.4202 </div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div> 125.7389 157.4202 </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>The P value in this case is much lower then 1% so there is a strong statistical difference between the average sentance length of Pride and Prejudice <del class="diffchange diffchange-inline">versus </del>Flatland.</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>The P value in this case is much lower then 1% so there is a strong statistical difference between the average sentance length of <ins class="diffchange diffchange-inline">'</ins>Pride and Prejudice<ins class="diffchange diffchange-inline">' and '</ins>Flatland<ins class="diffchange diffchange-inline">'</ins>.</div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div> </div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>==References and Further Readings==</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>==References and Further Readings==</div></td></tr>
<!-- diff generator: internal 2024-03-19 08:12:01 -->
</table>Rechartihttps://circa.cs.ualberta.ca/index.php?title=CIRCA:Basic_Statistical_Analysis&diff=4291&oldid=prevRecharti: /* Linear Regression */2013-04-11T04:05:31Z<p><span class="autocomment">Linear Regression</span></p>
<table style="background-color: white; color:black;">
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr valign='top'>
<td colspan='2' style="background-color: white; color:black;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black;">Revision as of 04:05, 11 April 2013</td>
</tr>
<tr><td colspan="2" class="diff-lineno">Line 72:</td>
<td colspan="2" class="diff-lineno">Line 72:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>The P value in this case is much lower then 1% so there is a strong statistical difference between the average sentance length of Pride and Prejudice versus Flatland.</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>The P value in this case is much lower then 1% so there is a strong statistical difference between the average sentance length of Pride and Prejudice versus Flatland.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><del style="color: red; font-weight: bold; text-decoration: none;"></del></div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><del style="color: red; font-weight: bold; text-decoration: none;">===== Linear Regression =====</del></div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>==References and Further Readings==</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>==References and Further Readings==</div></td></tr>
<!-- diff generator: internal 2024-03-19 08:12:01 -->
</table>Rechartihttps://circa.cs.ualberta.ca/index.php?title=CIRCA:Basic_Statistical_Analysis&diff=4290&oldid=prevRecharti at 04:05, 11 April 20132013-04-11T04:05:12Z<p></p>
<table style="background-color: white; color:black;">
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr valign='top'>
<td colspan='2' style="background-color: white; color:black;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black;">Revision as of 04:05, 11 April 2013</td>
</tr>
<tr><td colspan="2" class="diff-lineno">Line 1:</td>
<td colspan="2" class="diff-lineno">Line 1:</td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;">[This page is not finished.]</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;">= Basic Statistical Analysis =</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;">== Definition ==</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;">==== What is Statistics ====</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;">Statistics is a quantitative and numerical method that is primarily involved with the collection and interpretation of statistical data. Statistics are frequently used to find relationships between data sets, extracting important properties from data sets, and visualizations.</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;">As Stats are primarily a numerical method, they only produce facts about the numbers themselves and not the objects under study. A statistical relationship is only as good as the validity of the data being analyzed. A firm understanding of both the statistical method and the numerical interpretations are important in order to get the most out of any statistical analysis.</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;">== How Statistics Work ==</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;">Statistics makes a number of very broad assumptions about any data set that is being studied. The most important assumptions will be discussed below.</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;">==== Random Number ====</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;">Statistics are generally collected from a representative sample of a larger group of object. Statistics always assumes that each measurement is a single randomly generated value that follows a distribution property associated with the larger group. The most commonly assumed probability distribution is the standard distribution which assumes that most measurements will be close to the classes mean value; variations from the mean are possible but get rarer as distance from the mean increases. One of the main goals of statistics is to find this probability distribution and use it to make inferences about objects outside the representative set. </ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;">There are numerous other mathematical distribution available for study. </ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;">==== Rule of Large Numbers ====</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;">The rule of large numbers is a simple relationship between the distribution of a sample set, and the distribution of the entire class of objects. In general, larger sample sizes better represent the class as a whole.</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;">For a dice has an even probability distribution. In theory each side of a dice has equal probability of coming up if rolled. If I were to roll ten dice, then the probability distribution will probably not come up even with some outcomes coming up much more frequently then others. If I were to roll the same dice sixty times, the distribution would be much more even. The rule of large numbers stats that the more I roll the dice, the closer the outcomes will be.</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;">==== The P Value ====</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;">Because statistics makes use of probabilities, there is always a chance that perceived relationships can emerge from data where no relationships actually exists. Most statistical methods will produce a P-value which is effectively the probability of such a coincidence. A low P value indicates a high confidence that the outcome does not represent random fluctuations in the data. High p-values represent low confidence and indicate no relationship beyond simple noise.</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;">It is important to pick a P value prior to performing any statistical analysis. In general a P value less then 5% is desired; however, for important or sensitive data P values of 1% or lower are generally desired.</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;">==== Interpreting Correlation ====</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;">Statistics never produce a causal model of anything that it analyses. A strong statistical relationship between two data sets does not imply that one causes the other. It only implies that one is a good predictor of the other, or that these two values are somehow related. Statistical relationships cannot be used to prove that one object is causally linked to another object; only that these two objects are somehow related.</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;">==Examples ==</ins></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>===== T - Test =====</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>===== T - Test =====</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td colspan="2" class="diff-lineno">Line 32:</td>
<td colspan="2" class="diff-lineno">Line 71:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>The P value in this case is much lower then 1% so there is a strong statistical difference between the average sentance length of Pride and Prejudice versus Flatland.</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>The P value in this case is much lower then 1% so there is a strong statistical difference between the average sentance length of Pride and Prejudice versus Flatland.</div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;">===== Linear Regression =====</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;">==References and Further Readings==</ins></div></td></tr>
<!-- diff generator: internal 2024-03-19 08:12:01 -->
</table>Rechartihttps://circa.cs.ualberta.ca/index.php?title=CIRCA:Basic_Statistical_Analysis&diff=4289&oldid=prevRecharti at 04:02, 11 April 20132013-04-11T04:02:34Z<p></p>
<table style="background-color: white; color:black;">
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr valign='top'>
<td colspan='2' style="background-color: white; color:black;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black;">Revision as of 04:02, 11 April 2013</td>
</tr>
<tr><td colspan="2" class="diff-lineno">Line 1:</td>
<td colspan="2" class="diff-lineno">Line 1:</td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><del class="diffchange diffchange-inline">[This page is not finished.]</del></div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">===== T - Test =====</ins></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><del class="diffchange diffchange-inline">= Basic Statistical Analysis =</del></div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">One statistic useful in text analysis is the average length of word that is used. Similar vocabularies will generally produce similar distributions in the character length of a word. A test for authorship, for example, could assume that all works by the same author would have a similar vocabulary and therfore a similar average word length. T-Tests are a statistical method that compares the mean of two sample sets and asks if these two sets represent the same distribution. </ins></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><del class="diffchange diffchange-inline">== Definition ==</del></div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">Using R code (using openNLP package)</ins></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>==== <del class="diffchange diffchange-inline">What is Statistics ==</del>==</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline"> #import two books from project Gutenburg</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline"> pride <- scan(file</ins>=<ins class="diffchange diffchange-inline">"http://www.gutenberg.org/cache/epub/1342/pg1342.txt", what</ins>=<ins class="diffchange diffchange-inline">'char', sep</ins>=<ins class="diffchange diffchange-inline">"\n")</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline"> flat <- scan(file</ins>=<ins class="diffchange diffchange-inline">"http://www.gutenberg.org/cache/epub/97/pg97.txt", what</ins>=<ins class="diffchange diffchange-inline">'char', sep</ins>=<ins class="diffchange diffchange-inline">"\n")</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline"> </ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline"> #tokenize sentances</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline"> pride.sen <- sentDetect(pride)</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline"> flat.sen <- sentDetect(flat)</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline"> </ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline"> #get character count for each word</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline"> pride.nchar <- nchar(pride.sen)</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline"> flat.nchar <- nchar(flat.sen)</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline"> </ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline"> #perform T-test</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline"> t.test(pride.nchar, flat.nchar)</ins></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><del class="diffchange diffchange-inline">Statistics is a quantitative and numerical method that is primarily involved with the collection and interpretation of statistical data. Statistics are frequently used to find relationships between data sets, extracting important properties from data sets, and visualizations.</del></div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">Which produces </ins>as <ins class="diffchange diffchange-inline">output</ins>.</div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div> </div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div></div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><del class="diffchange diffchange-inline">As Stats are primarily a numerical method, they only produce facts about the numbers themselves and not the objects under study. A statistical relationship is only </del>as <del class="diffchange diffchange-inline">good as the validity of the data being analyzed. A firm understanding of both the statistical method and the numerical interpretations are important in order to get the most out of any statistical analysis.</del></div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div></div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div> </div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div></div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><del class="diffchange diffchange-inline">== How Statistics Work ==</del></div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div></div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div> </div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div></div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><del class="diffchange diffchange-inline">Statistics makes a number of very broad assumptions about any data set that is being studied. The most important assumptions will be discussed below.</del></div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div></div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div> </div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div></div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><del class="diffchange diffchange-inline">==== Random Number ====</del></div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div></div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div> </div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div></div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><del class="diffchange diffchange-inline">Statistics are generally collected from a representative sample of a larger group of object. Statistics always assumes that each measurement is a single randomly generated value that follows a distribution property associated with the larger group. The most commonly assumed probability distribution is the standard distribution which assumes that most measurements will be close to the classes mean value; variations from the mean are possible but get rarer as distance from the mean increases. One of the main goals of statistics is to find this probability distribution and use it to make inferences about objects outside the representative set. </del></div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div></div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div> </div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div></div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><del class="diffchange diffchange-inline">There are numerous other mathematical distribution available for study. </del></div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div></div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div> </div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div></div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><del class="diffchange diffchange-inline">==== Rule of Large Numbers ====</del></div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div></div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div> </div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div></div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><del class="diffchange diffchange-inline">The rule of large numbers is a simple relationship between the distribution of a sample set, and the distribution of the entire class of objects. In general, larger sample sizes better represent the class as a whole.</del></div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div></div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div> </div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div></div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><del class="diffchange diffchange-inline">For a dice has an even probability distribution. In theory each side of a dice has equal probability of coming up if rolled. If I were to roll ten dice, then the probability distribution will probably not come up even with some outcomes coming up much more frequently then others. If I were to roll the same dice sixty times, the distribution would be much more even. The rule of large numbers stats that the more I roll the dice, the closer the outcomes will be.</del></div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div></div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div> </div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div></div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><del class="diffchange diffchange-inline">==== The P Value ====</del></div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div></div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div> </div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div></div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><del class="diffchange diffchange-inline">Because statistics makes use of probabilities, there is always a chance that perceived relationships can emerge from data where no relationships actually exists. Most statistical methods will produce a P-value which is effectively the probability of such a coincidence. A low P value indicates a high confidence that the outcome does not represent random fluctuations in the data. High p-values represent low confidence and indicate no relationship beyond simple noise.</del></div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div></div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div> </div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div></div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><del class="diffchange diffchange-inline">It is important to pick a P value prior to performing any statistical analysis. In general a P value less then 5% is desired; however, for important or sensitive data P values of 1% or lower are generally desired.</del></div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div></div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div> </div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div></div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><del class="diffchange diffchange-inline">==== Interpreting Correlation ====</del></div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div></div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div> </div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div></div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><del class="diffchange diffchange-inline">Statistics never produce a causal model of anything that it analyses. A strong statistical relationship between two data sets does not imply that one causes the other. It only implies that one is a good predictor of the other, or that these two values are somehow related. Statistical relationships cannot be used to prove that one object is causally linked to another object; only that these two objects are somehow related</del>.</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div></div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div> </div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div></div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><del class="diffchange diffchange-inline">==Examples ==</del></div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div></div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div> </div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div></div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><del class="diffchange diffchange-inline">===== T - Test =====</del></div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>===<del class="diffchange diffchange-inline">== Linear Regression =====</del></div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline"> data: pride.nchar and flat.nchar </ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline"> t </ins>= <ins class="diffchange diffchange-inline">-8.1743, df </ins>= <ins class="diffchange diffchange-inline">1713.104, p-value </ins>= <ins class="diffchange diffchange-inline">5.726e-16</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline"> alternative hypothesis: true difference in means is not equal to 0 </ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline"> 95 percent confidence interval:</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline"> -39.28293 -24.07959 </ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline"> sample estimates:</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline"> mean of x mean of y </ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline"> 125.7389 157.4202 </ins></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><del class="diffchange diffchange-inline">==References </del>and <del class="diffchange diffchange-inline">Further Readings==</del></div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">The P value in this case is much lower then 1% so there is a strong statistical difference between the average sentance length of Pride </ins>and <ins class="diffchange diffchange-inline">Prejudice versus Flatland.</ins></div></td></tr>
<!-- diff generator: internal 2024-03-19 08:12:01 -->
</table>Rechartihttps://circa.cs.ualberta.ca/index.php?title=CIRCA:Basic_Statistical_Analysis&diff=4288&oldid=prevRecharti: /* How Statistics Work */2013-04-11T03:02:53Z<p><span class="autocomment">How Statistics Work</span></p>
<table style="background-color: white; color:black;">
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr valign='top'>
<td colspan='2' style="background-color: white; color:black;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black;">Revision as of 03:02, 11 April 2013</td>
</tr>
<tr><td colspan="2" class="diff-lineno">Line 17:</td>
<td colspan="2" class="diff-lineno">Line 17:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>==== Random Number ====</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>==== Random Number ====</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>Statistics <del class="diffchange diffchange-inline">assumes that all measurments </del>are <del class="diffchange diffchange-inline">random</del>. <del class="diffchange diffchange-inline">The true </del>measurement <del class="diffchange diffchange-inline">of any numerical value </del>is <del class="diffchange diffchange-inline">not </del>a <del class="diffchange diffchange-inline">specific </del>value <del class="diffchange diffchange-inline">but instead </del>follows <del class="diffchange diffchange-inline">some form of probability </del>distribution. <del class="diffchange diffchange-inline">In the general case, </del>the standard distribution <del class="diffchange diffchange-inline">is assumed</del>; <del class="diffchange diffchange-inline">however, there </del>are numerous other mathematical <del class="diffchange diffchange-inline">distributions </del>available for <del class="diffchange diffchange-inline">comparison</del>. </div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>Statistics are <ins class="diffchange diffchange-inline">generally collected from a representative sample of a larger group of object</ins>. <ins class="diffchange diffchange-inline">Statistics always assumes that each </ins>measurement is a <ins class="diffchange diffchange-inline">single randomly generated </ins>value <ins class="diffchange diffchange-inline">that </ins>follows <ins class="diffchange diffchange-inline">a </ins>distribution <ins class="diffchange diffchange-inline">property associated with the larger group</ins>. <ins class="diffchange diffchange-inline">The most commonly assumed probability distribution is </ins>the standard distribution <ins class="diffchange diffchange-inline">which assumes that most measurements will be close to the classes mean value</ins>; <ins class="diffchange diffchange-inline">variations from the mean are possible but get rarer as distance from the mean increases. One of the main goals of statistics is to find this probability distribution and use it to make inferences about objects outside the representative set. </ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div> </div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">There </ins>are numerous other mathematical <ins class="diffchange diffchange-inline">distribution </ins>available for <ins class="diffchange diffchange-inline">study</ins>. </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>==== Rule of Large Numbers ====</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>==== Rule of Large Numbers ====</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><del class="diffchange diffchange-inline">Statistics are generally collected for </del>a <del class="diffchange diffchange-inline">subset </del>of the <del class="diffchange diffchange-inline">group </del>of objects <del class="diffchange diffchange-inline">under study</del>. <del class="diffchange diffchange-inline">For example</del>, <del class="diffchange diffchange-inline">as it is nearly impossible to collect data on every book ever written, a representative subset can be studied instead. Any subset of a </del>larger <del class="diffchange diffchange-inline">dataset will not accuratly </del>represent the <del class="diffchange diffchange-inline">true distribution of data</del>.</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">The rule of large numbers is </ins>a <ins class="diffchange diffchange-inline">simple relationship between the distribution </ins>of <ins class="diffchange diffchange-inline">a sample set, and </ins>the <ins class="diffchange diffchange-inline">distribution of the entire class </ins>of objects. <ins class="diffchange diffchange-inline">In general</ins>, larger <ins class="diffchange diffchange-inline">sample sizes better </ins>represent the <ins class="diffchange diffchange-inline">class as a whole</ins>.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><del class="diffchange diffchange-inline">The Rule </del>of <del class="diffchange diffchange-inline">large numbers governs the relationship between subsets and data sets</del>. <del class="diffchange diffchange-inline">It states that the larger the subset the more accuratly </del>the distribution <del class="diffchange diffchange-inline">of the subset </del>will <del class="diffchange diffchange-inline">reflect the larger set</del>. <del class="diffchange diffchange-inline">For example, rolling a six sided </del>dice sixty times <del class="diffchange diffchange-inline">will produce a distribution closer to </del>the <del class="diffchange diffchange-inline">actual </del>distribution <del class="diffchange diffchange-inline">then simply rolling </del>the <del class="diffchange diffchange-inline">same </del>dice <del class="diffchange diffchange-inline">ten times</del>.</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">For a dice has an even probability distribution. In theory each side </ins>of <ins class="diffchange diffchange-inline">a dice has equal probability of coming up if rolled</ins>. <ins class="diffchange diffchange-inline">If I were to roll ten dice, then </ins>the <ins class="diffchange diffchange-inline">probability </ins>distribution will <ins class="diffchange diffchange-inline">probably not come up even with some outcomes coming up much more frequently then others</ins>. <ins class="diffchange diffchange-inline">If I were to roll the same </ins>dice sixty times<ins class="diffchange diffchange-inline">, </ins>the distribution <ins class="diffchange diffchange-inline">would be much more even. The rule of large numbers stats that the more I roll </ins>the dice<ins class="diffchange diffchange-inline">, the closer the outcomes will be</ins>.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>==== The P Value ====</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>==== The P Value ====</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>Because statistics makes use of probabilities, there is <del class="diffchange diffchange-inline">allways </del>a chance that <del class="diffchange diffchange-inline">percived </del>relationships can emerge from data where no <del class="diffchange diffchange-inline">relationship </del>actually exists. Most statistical methods will produce a P value which is <del class="diffchange diffchange-inline">effectivly </del>the probability of such a <del class="diffchange diffchange-inline">coincodence</del>. A low P value indicates a high <del class="diffchange diffchange-inline">probability </del>that <del class="diffchange diffchange-inline">there is real relationship in </del>the <del class="diffchange diffchange-inline">data while a high P value indicates that any percived relationship is probobly due to </del>random fluctuations in the data. </div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>Because statistics makes use of probabilities, there is <ins class="diffchange diffchange-inline">always </ins>a chance that <ins class="diffchange diffchange-inline">perceived </ins>relationships can emerge from data where no <ins class="diffchange diffchange-inline">relationships </ins>actually exists. Most statistical methods will produce a P<ins class="diffchange diffchange-inline">-</ins>value which is <ins class="diffchange diffchange-inline">effectively </ins>the probability of such a <ins class="diffchange diffchange-inline">coincidence</ins>. A low P value indicates a high <ins class="diffchange diffchange-inline">confidence </ins>that the <ins class="diffchange diffchange-inline">outcome does not represent </ins>random fluctuations in the data<ins class="diffchange diffchange-inline">. High p-values represent low confidence and indicate no relationship beyond simple noise</ins>.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>It is important to pick a P value prior to performing any statistical analysis. In general a P value less then 5% is desired; however, for important or <del class="diffchange diffchange-inline">sensative </del>data P values <del class="diffchange diffchange-inline">lower then </del>1% <del class="diffchange diffchange-inline">and </del>lower are generally desired.</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>It is important to pick a P value prior to performing any statistical analysis. In general a P value less then 5% is desired; however, for important or <ins class="diffchange diffchange-inline">sensitive </ins>data P values <ins class="diffchange diffchange-inline">of </ins>1% <ins class="diffchange diffchange-inline">or </ins>lower are generally desired.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>==== Interpreting <del class="diffchange diffchange-inline">Corelation </del>====</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>==== Interpreting <ins class="diffchange diffchange-inline">Correlation </ins>====</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>A strong statistical relationship between two data sets does not imply that one causes the other. It only implies that one is a good predictor of the other, or that these two values are somehow related. Statistical relationships cannot be used to prove that one object is causally linked to another object; only that these two objects <del class="diffchange diffchange-inline">have a relationship that merits further study</del>.</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">Statistics never produce a causal model of anything that it analyses. </ins>A strong statistical relationship between two data sets does not imply that one causes the other. It only implies that one is a good predictor of the other, or that these two values are somehow related. Statistical relationships cannot be used to prove that one object is causally linked to another object; only that these two objects <ins class="diffchange diffchange-inline">are somehow related</ins>.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>==Examples ==</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>==Examples ==</div></td></tr>
<!-- diff generator: internal 2024-03-19 08:12:01 -->
</table>Rechartihttps://circa.cs.ualberta.ca/index.php?title=CIRCA:Basic_Statistical_Analysis&diff=4287&oldid=prevRecharti: /* What Statistics is Statistics */2013-04-11T02:41:22Z<p><span class="autocomment">What Statistics is Statistics</span></p>
<table style="background-color: white; color:black;">
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr valign='top'>
<td colspan='2' style="background-color: white; color:black;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black;">Revision as of 02:41, 11 April 2013</td>
</tr>
<tr><td colspan="2" class="diff-lineno">Line 5:</td>
<td colspan="2" class="diff-lineno">Line 5:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>== Definition ==</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>== Definition ==</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>==== What <del class="diffchange diffchange-inline">Statistics </del>is Statistics ====</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>==== What is Statistics ====</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>Statistics <del class="diffchange diffchange-inline">involve both </del>the collection of <del class="diffchange diffchange-inline">data, and numerous methods for analyzing and interpreting the </del>data. Statistics <del class="diffchange diffchange-inline">is useful for infering relationship </del>between data sets, extracting properties from data sets, and <del class="diffchange diffchange-inline">visualizing patterns in data sets</del>.</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>Statistics <ins class="diffchange diffchange-inline">is a quantitative and numerical method that is primarily involved with </ins>the collection <ins class="diffchange diffchange-inline">and interpretation </ins>of <ins class="diffchange diffchange-inline">statistical </ins>data. Statistics <ins class="diffchange diffchange-inline">are frequently used to find relationships </ins>between data sets, extracting <ins class="diffchange diffchange-inline">important </ins>properties from data sets, and <ins class="diffchange diffchange-inline">visualizations</ins>.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><del class="diffchange diffchange-inline">Those involved in Statistics should be careful to note that the relationship between statistical fact and </del>facts about the objects under study is only as good as the <del class="diffchange diffchange-inline">relationship between </del>the data <del class="diffchange diffchange-inline">set and the objects under study</del>. <del class="diffchange diffchange-inline">Statistics never produces facts about </del>the <del class="diffchange diffchange-inline">objects directly </del>and <del class="diffchange diffchange-inline">only analyses </del>the <del class="diffchange diffchange-inline">data under study. Statistical analysis is just as biased as the data being analysed, yet this bias is difficult </del>to <del class="diffchange diffchange-inline">see from </del>the <del class="diffchange diffchange-inline">perspective of someone without a firm understanding of the method </del>of statistical analysis. </div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">As Stats are primarily a numerical method, they only produce </ins>facts about <ins class="diffchange diffchange-inline">the numbers themselves and not </ins>the objects under study<ins class="diffchange diffchange-inline">. A statistical relationship </ins>is only as good as the <ins class="diffchange diffchange-inline">validity of </ins>the data <ins class="diffchange diffchange-inline">being analyzed</ins>. <ins class="diffchange diffchange-inline">A firm understanding of both </ins>the <ins class="diffchange diffchange-inline">statistical method </ins>and the <ins class="diffchange diffchange-inline">numerical interpretations are important in order </ins>to <ins class="diffchange diffchange-inline">get </ins>the <ins class="diffchange diffchange-inline">most out </ins>of <ins class="diffchange diffchange-inline">any </ins>statistical analysis.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>== How Statistics Work ==</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>== How Statistics Work ==</div></td></tr>
<!-- diff generator: internal 2024-03-19 08:12:01 -->
</table>Recharti