代写范文

留学资讯

写作技巧

论文代写专题

服务承诺

资金托管
原创保证
实力保障
24小时客服
使命必达

51Due提供Essay,Paper,Report,Assignment等学科作业的代写与辅导,同时涵盖Personal Statement,转学申请等留学文书代写。

51Due将让你达成学业目标
51Due将让你达成学业目标
51Due将让你达成学业目标
51Due将让你达成学业目标

私人订制你的未来职场 世界名企,高端行业岗位等 在新的起点上实现更高水平的发展

积累工作经验
多元化文化交流
专业实操技能
建立人际资源圈

Quiz

2013-11-13 来源: 类别: 更多范文

2010 Biostatistics 08 Normal Distribution ORIGIN  0 1 The Normal Distribution The Normal Distribution, also known as the "Gaussian Distribution" or "bell-curve", is the most widely employed function relating observations X with probabilty P(X) in statistics. Many natural populations are approximately normally distributed, as are several important derived quantitities even when the original population is not normally distributed. P roperly speaking, the Normal Distribution is a continuous "probability density function" meaning that values of a random variable X may take on any numerical value, not just discrete values. In addition, because the values of X are infinite the "exact" probabiliy P(X) for any X is zero. Thus, in order to determine probabilities one typically looks at invervals of X such as X >2.3 or 1< X < 2 and so forth. It is interesting to note that because the probability P(X) = 0, we don't have to worry about correctly interpreting pesky boundaries, as seen in discrete distributions, since X > 2 means the same thing as X  2 and X < 2 is the same as X  2. As described previously, the Normal distribution N(,  2) consists of a family of curves that are specified by supplying values for two parameters:  = the mean of the Normal population, and 2 = the variance of the same population. Prototyping the Normal Function using the Gaussian formula: Making the plot of N(50,100): < specifying mean (   50   2 i  0  100 < Defining a bunch of X's ranging in value from 0 to 100. Remember that the range of X is infinite, but we'll plot 101 point here. That should give us enough points to give us an idea of the Gaussian function shape! X  i i Y1  i < specifying variance ( 2)   100 100 1  2   1  X   2  2 i   2   e < Formula for Normal distribution. Here we have computed P(X) for each of our X's. Zar 2010 Eq. 6.1, p. 66. Now, let's compare with Mathcad's built-in function:  Y2  dnorm X     i i  2   100 < MathCad's function asks us provide standard deviation rather than variance... Plotting the two sets of Y's: 0.04 The two approaches give the > same probability function P(X) for X, so this prototype confirms the built-in function. 0.03 Y1i Prototype in R: Y2i 0.02 dnorm(x,mu,sigma) 0.01 ^ R has a nearly identical function, see Lecture Worksheet 07 0 0 20 40 60 Xi 80 100 2010 Biostatistics 08 Normal Distribution 2 What happens when  or 2 is changed: Location of mode changes (translation of ) and width of hump changes showing greater or lesser variance - see 2010 Biostatistics Lecture Worksheet 07. Simulation of Normally Distributed Data:   65 2   25   625 X  rnorm 1000      150 Descriptive Statistics for X: n  length ( X) n  1000 100 mean( X)  63.5061 Xi n 50  var ( X)  606.3107 n1 Var ( X)  606.3107 0 < Note: mathcad has two functions: var(X) = population variance Var(X) = sample variance 50 0 20 40 60 80 100 i ^ Mean and variance of this sample are close, but not exactly equal to N(65,625). This is to be expected of a sample as opposed to the entire population Histogram of X: plot  histogram( 50  X) 60 Prototype in R: 40 #CREATING A PSEUDORANDOM  NORMAL DISTRIBUTION: X=rnorm(1000,65,25) hist(X,nclass=50,col="gray",border="red") 1 plot 20 Histogram of X 50 0 50 0 plot 100 40 20 < R has a nearly identical function rnorm(n,mu,sigma) where n = number of points desired 0 Frequency 60 80 0 0 50 100 X 150 2010 Biostatistics 08 Normal Distribution 3 Standardizing the Normal Distribution: In many instances, we have a sample that we may wish to compare with a Normal Distribution. Using computer-based functions, as above, one has little difficulty calculating probabilities P(X) and simulating additional samples from a Normally Distributed population N(,  2). When using published tables, however, it is often useful to compare probabilities with the Standard Normal Distribution ~N(0,1). This is done by Standardizing the Data: Given your X's ~N(,  2) you create a new variable Z ~N(0,1) by means of a Linear Transformation: i  0  999 Z  Xi    i < Z's are now Standardized ~N(0,1)  mean( Z)  0.0598 < sample estimates are close, but not exactly equal to N(0,1) Var ( Z)  0.9701 Histogram of Z: plot  histogram( 50  Z) 80 60 Prototype in R: 1 plot #STANDARDIZING DATA: mu=65 sigma=25 Z=(X‐mu)/sigma hist(Z,nclass=50,col="gray",border="red") 40 20 0 4 2 0 0 plot 2 20 Note: in both cases here, we had prior knowledge of  and  2. With real-world data, we will have to estimate 10 these values, usually with Xbar & s2. 0 Frequency 30 40 His to g ra m o f Z -3 -2 -1 0 1 Z 2 3 4 2010 Biostatistics 08 Normal Distribution 4 Calculating Probabilities & Quantiles: The above graphs display the relationship between X values, or observations (also called quantiles), and the probability that a range (or bin) of X is expected to have given the assumption of Normal probability for X, indicated as P(X). Most statistical software packages have standard "p" and "q" functions allowing conversion from X to P(X) and vice versa. In the most useful form, the probability function is given as a Cumulative Probability (X) starting from X values of minus infinity up to X. In each case a specific cumulative probability function reqires that one provides specific parameter values for the curve (,  ,), along with X OR (X). Probabilities of the Normal Distribution and Cumulative Normal Distribution N(0,1): i  0  100 X  i  50 i < scaling 101 X's to a reasonable scale... 10   0 2   1  1 i  Y4  pnorm X      i i Y3  dnorm X     i < parameters of the Normal N(0,1) distribution... < Interval Estimate of probability P(X) for each X < Cumulative probability (X) for each X Plots of Normal Distribution and Cumulative Normal Distributions 1 0.8 Y3i 0.6 N(0,1) Y4i 0.4 0.2 0 4 2 0 Xi Prototype in R: #PQ FUNCTIONS FOR NORMAL DISTRIBUTION: mu=0 sigma=1 X=1.6449 PHI=0.90 dnorm(x,mu,sigma)       # interval estimate P(X) given X pnorm(x,mu,sigma)       # cumulative phi(X) given X qnorm(PHI,mu,sigma)   # X given cumulative phi(X) 2 4 2010 Biostatistics 08 Normal Distribution 5 Calculating Intervals of the Cumulative Normal Distribution: 0 1 < Normal distribution parameters (change these if desired Probability that X ranges between -1 and 1: dnorm 1       0.242 dnorm 1       0.242 < P(X) pnorm 1       0.1587 pnorm 1       0.8413 < (X) pnorm 1       pnorm 1       0.6827 < Calculating MAX cut-off - MIN cut-off ^ cumulative value at MIN of interval 68.27% ^ cumulative value at MAX of interval Probability that X ranges between -2.576 and 2.576: dnorm 2.576       0.0145 dnorm 2.576       0.0145 < P(X) pnorm 2.576       0.005 pnorm 2.576       0.995 < (X) pnorm 2.576       pnorm 2.576       0.99 < Calculating MAX cut-off - MIN cut-off ^ cumulative value at MIN of interval 99% ^ cumulative value at MAX of interval Probability that X ranges between -1.96 and 1.96 dnorm 1.96       0.0584 dnorm 1.96       0.0584 < P(X) pnorm 1.96       0.025 < (X) pnorm 1.96       0.975 pnorm 1.96       pnorm 1.96       0.95 < Calculating MAX cut-off - MIN cut-off ^ cumulative value at MIN of interval ^ cumulative value at MAX of interval Prototype in R: #EXAMPLE INTERVAL CALCULATIONS: mu=0 sigma=1 MIN=pnorm(‐1,mu,sigma) MAX=pnorm(1,mu,sigma) MAX‐MIN MIN=pnorm(‐2.576,mu,sigma) MAX=pnorm(2.576,mu,sigma) MAX‐MIN MIN=pnorm(‐1.96,mu,sigma) MAX=pnorm(1.96,mu,sigma) MAX‐MIN 95% 2010 Biostatistics 09 Assessing Data Normality ORIGIN  1 1 Assessing Data Normality Assessing Normality of sample data is an essential part of statistical analysis. Q-Q Plots are one easy way to do this. They are also interesting at this point in our course since they demonstrate the use of the inverse cumulative probability function for the Normal Distribution. Q-Q Plots: Reading Anderson's Iris data: iris  READPRN( "c:/2010BiostatsData/iris.txt" ) 2 SL  iris < assigning variable SL n  length ( SL) n  150 i  1  n < n = number of observations X < constructing index variable i XbarSL  mean( SL) XbarSL  5.8433 < mean of X SD SL  SD SL  0.8281 < sample standard deviation of X SESL  0.0676 < standard error of the sample mean of X SESL  Var ( SL) SD SL n Calculating Cumultive Probability levels N(X): We will look at variable SL here: 1 1 5.1 2 4.9 3 1 1 4.7 SLsort  sort ( SL) 2 4.4 3 First we sort SL: 4.3 4.4 4 4.6 4 4.4 5 5 5 4.5 6 5.4 6 4.6 7 4.6 7 4.6 SL  8 5 9 SLsort  8 4.6 4.4 9 4.6 Now we treat each index of SLsort as a quantile, and each observed value as a normal cumulative probability (X): i  1     2  i  n 1 1 0.0033 2 0.01 3 0.0167 4 0.0233 5 0.03 6 0.0367 7 0.0433  8 0.05 9 0.0567 10 4.9 10 4.7 11 5.4 11 4.7 12 4.8 12 4.8 13 4.8 13 4.8 13 0.0833 14 4.3 14 4.8 14 15 5.8 15 4.8 15 0.0967 16 5.7 16 4.8 16 0.1033 From the values of (X), we now convert back to X  Q  qnorm  i  0  1 i  ^ the 1/2 here is a correction factor 10 0.0633 11 0.07 12 0.0767 0.09 2010 Biostatistics 09 Assessing Data Normality 2 Plotting SLsort vs Q: 8 1 1 2 3 -2.128 4 -1.9893 5 -1.8808 6 -1.7908 7 -1.7132 Q 8 -1.6449 9 -1.5834 7.5 -2.7131 -2.3263 7 6.5 SLsort 6 10 -1.5274 11 -1.4758 12 -1.4279 13 5.5 -1.383 14 -1.3408 15 -1.3008 5 16 -1.2628 4.5 4 3 2 1 0 1 2 3 Q If the sample data are distributed close to the Normal distribution, the Q-Q plot should be mostly a straight line in the center with an overall S-shaped curve towards each end. 2010 Biostatistics 09 Assessing Data Normality 3 Prototype in R: #READ IRIS TABLE AND ASSIGN  VARIABLE SL K=read.table("c:/2010BiostatsData/iris.txt") attach(K) SL=Sepal.Length #LOAD PACKAGE ‐ choose "lattice" from pop‐up list local({pkg  3 is leptokurtic - there is a more acute peak at at the mean and fatter tails. 8 2010 Biostatistics 10 Repeated Sampling 1 Repeated Sampling: Distribution of Means and Confidence Intervals ORIGIN  0 Given the general setup in statistics between random variable X and the probability P(X) governed by a Probability Density Function such as the Normal Distribution, one typically uses a specific random sample to estimate the population parameters. Estimation of this sort also involves considering what happens when a population is repeatedly sampled. One is particularly interested in the sampling distribution of repeated estimates, such as the mean, and how these estimates may be related to probability. For the Normal Distribution, the population parameters are:  2 = population mean = population variance From our sample, we have the analogous calculations termed Xbar = sample mean s2 point estimates: = sample variance Different kinds of statistical theory underlie point estimates generally allowing them to be categorized in one of two ways: - "minimum variance", also known as "least squares minimum" "unbiased" or "Normal theory" estimators, and - "maximum liklihood" estimators. How to calculate estimators of these two types is beyond the scope of introductory statistics courses. The important thing to remember is that the two methods of estimation often, but not always, yield the same point estimators. The point estimators, then feed into specific statistical techniques. Thus, it is sometimes important to know which estimator is associated with a particular technique so as not mix approaches. Maximum liklihood estimators, based on newer theory, are often specifically indicated as such (often using 'hat' notation). In the case estimating parameters for the Normal Distribution, Xbar is the point estimate for  under both estimation theories. However s2 sum of squares with (n-1) as divisor is the point estimate using Normal theory whereas  2hat with same sum of sqares but using (n) as divisor is the point estimate using "maximum liklihood" theory. Confusing, yes, but now that you know the difference not all that bad... Estimating error on point estimates of the mean: Although Xbar is our Normal theory estimate of population parameter  based on a single sample, one might readily expect Xbar to differ from sample to sample, and it does. Thus, we need to estimate how much Xbar will vary from sample to sample. Multiple sampled means differ from each other much less than individual sample values of X will. The relationship is called the standard variance of the mean. The square root of variance for the mean is called the standard error of the mean or simply standard error. Standard Variance of the Mean = sample variance/n or Standard Error of the Mean (SEM) = sample standard deviation / n 2010 Biostatistics 10 Repeated Sampling 2 Central Limit Theorem: This result is one of the reasons why Normal theory, and the Normal Distribution underlie much of "parametric" statistics. It says that although the populations from which random variable X are drawn may not necessarily be normally distributed, the population of means derived by replicate sampling will be normally distributed. This result allows us to use the Normal Distribution with parameters  2 estimated respectively by Xbar and s2 (or occasionally  2hat) to estimate probabilities of means P(X) for various values of X. Statistics evaluating location of the mean: Suppose we collect a sample from a population and calculate the mean Xbar. How reliable is Xbar as an estimate of ' The usual approach is to estimate a difference (also called a distance) between Xbar and  scaled to the variability in Xbar encountered from one sample to the next: Z  Xbar    < distance divided by Standard Error of the Mean n If somehow we know the population parameter  then we can resort directly to the standardized Normal Distribution ~N(0,1) to calculate probabilities P(Z) or cumulative probabilities (Z) . However, in real life situations,  is not known and we must estimate  by s. When we do this, the analogous variable t: t  Xbar   s n < Same standardizing approach but using s instead of  is no longer Normally distributed. Instead, we resort to a new probability density function, known as "Student's t" to calculate P(t) or (t) given t. Student's t is a commonly employed statistical function ranking high in importance along with the chi-square distribution (2) and the F distribution. The Student's t distribution looks very much like the Normal distribution in shape, but is leptokurtic. Typically in statistical software, both distributions are utilized with analogous functions. See Lecture Worksheet 07 and the Prototype in R below for them. Although Zar in Chapter 6 perfers only to talk about the Normal distribution by assuming he/we know   I think it may be clearer to talk about both together here. The arguments are identical with the difference between them related to whether we know  or whether we estimate  by s. Prototype in R: #ANALOGOUS FUNCTIONS FOR  #NORMAL AND T DISTRIBUTIONS #NORMAL DISTRIBUTION mu=0        #parameter for mean sigma=1    #paramater for standard deviation n=1000     #number of randomly generated data points X=1.96      #quantile X P=0.95      # cumulative probability phi(X) rnorm(n,mu,sigma)   #to generate random data points dnorm(X,mu,sigma)   #P(X) from X pnorm(X,mu,sigma)   #phi(X) from X qnorm(P,mu,sigma)   #X from phi(X) #t DISTRIBUTION df=5       #degrees of freedom parameter n=1000   #number of randomly  generated data points X=1.96    #quantile X P=0.95   # cumulative probability phi(X)  rt(n,df)  #to generate random data points dt(X,df)  #P(X) from X pt(X,df)  #phi(X) from X qt(P,df)    #X from phi(X) 2010 Biostatistics 10 Repeated Sampling 3 Confidence Interval for the Mean: A sample Confidence Interval (CI) for a sample mean of X (or equivalently in Z or t) is the estimated range over which repeated samples of Xbar (or Zbar or tbar) are expected to fall (1-)x100 % of the time. If a hypothesized value for mean, say 0, falls within a CI, then we say 0 is "enclosed" or "captured" by the CI with a confidence of (1-). Equivalently, for repeated samples, 0 will be enclosed within repeated CI's (1- )x100 percent of the time. Let's calculate CI from a pseudo-random example: X  rnorm 100  50  100  < here in fact we know =50 and  2 = 100 n  length ( X) n  100   50     10 100 Xbar  mean( X) Xbar  48.4955 s  s  96.4487 2 Var ( X) < known population standard deviation  < we can also pretend that we don't know the population parameters and must use sample mean and variance instead as one usually would with real data. Calculation of Confidence Intervals: < We choose a limit probability allowing sample means to differ from   X 100 percent of the time...   0.05 1    0.95 ^ since both the Normal Distribution and the t probability distributions are symmetrical, there are equal-sized tails above and below hypothesized or known . Each tail therefore has /2 probability. This is commonly known as the Two-Tail case... If  and  are known - the Normal Distribution Case:   50   10    0  1  2  L  qnorm   U  qnorm 1  CI   2 n  100 L  1.96    0  1 U  1.96   L     U      n n  CI  ( 48.04 51.96 )  2 < lower limit of N(0,1) for /2  0.025 1  2 < upper limit of N(0,1) for /2  0.975 < calculating Confidence Interval using population  and  . Note here that I calculated each tail explicitly so I added both L and U to determine the CI. However, since the distribution is symmetrical, one might alternatively use: C = the absolute value of L or U. In that case one subtracts C   from the mean for the lower n limit and adds C   to the mean for the upper limit. n Note here that Error of the Mean is derived from known population parameters. 2010 Biostatistics 10 Repeated Sampling 4 If  and  are unknown - the t Distribution Case: Parameters  and  must be estimated by sample Xbar and s: Xbar  48.4955 s  9.8208 df  n  1 df  99    df   2  L  qt   U  qt 1  CI   2 L  1.9842    df  U  1.9842 X  L  s X  U  s  bar  bar  n n  < single parameter of Student's t distribution called "degrees of freedom" df = (n-1) where n is sample size.  2  0.025 1  2  0.975 < calculating Confidence Interval. Note here that I calculated each tail explicitly so I added both L and U to determine the CI. Note also SEM is measured by the sample quantity CI  ( 46.5468 50.4441 ) Prototype in R: #CONFIDENCE INTERVALS mu=50 sigma=10 n=100 X=rnorm(100,mu,sigma) alpha=0.05    #NORMAL DISTRIBUTION L=qnorm((alpha/2),0,1) L U=qnorm((1‐alpha/2),0,1) U #confidence interval: mu+L*(sigma/sqrt(n)) mu+U*(sigma/sqrt(n)) #t DISTRIBUTION df=n‐1 s=sqrt(var(X)) L=qt((alpha/2),df) L U=qt((1‐alpha/2),df) U #confidence interval: mu+L*(s/sqrt(n)) mu+U*(s/sqrt(n)) #NOTE: These values don't match MathCad #because they are based on a different sample! s n 2010 Biostatistics 11 ORIGIN  0 Formal Statistical Tests The Formal Logic of Statistical Tests The biological literature is full of scientific research papers in which data that are presumably random samples of larger populations are collected. From these, sample descriptive statistics are calculated and summarized. The authors then proceed to advance one or more hypotheses concerning the problem under study. From this, usually in the Results section or associated tables, these hypotheses or related derived statistics are judged either to be statitically significant or insignificant, and often Probabilitye values and/or confidence intervals are reported. All of this, regarding hypotheses, significance and confidence intervals, falls under the rubric of Inferential Statistics. As an associate editor of a major journal and frequent reviewer, I very often receive papers to appraise that include inferential statistics. It is depressingly common to see results summarized in an incoherent fashion. Usually, incompletely labeled tables are presented that are strikingly similar to the output of one or another statistical "black box" with significance levels indicated by ** etc. However, it remains unclear just what the author(s) had in mind, or just what conclusions they or the reader are supposed to draw from the output. In reading the Material & Methods section of the paper, these authors are often very precise about the software utilized (e.g., SPSS vers. xxx, such-and-such a procedure with whatever options chosen) but frustratingly vague about WHY a particular technique was chosen given their data, or WHAT their statistical hypotheses might have been, or even HOW the results derived from the "black box" relate to the conclusions they are trying to draw. Sometimes I come the the conclusion that the authors know what they are doing but are simply unclear in their presentation. In other instances, however, the authors are clearly relying too much on the "black box" to do the thinking for them. (As an aside, I tend to have fewer problems of this type with authors who use R. My guess is that in order to use R, one usually has to spend a little more time learning proper statistical technique...) In conducting inferential statistics in biological research, therefore, it is very important to be consider carefully and be explicit about the logic of what one is doing, and to provide readers of your papers with sufficient information that they can fill in the gaps where necessary. Most textbooks in statistics present this logic reasonably well at least the first time encountered in the book. Many, including Zar, become a little sloppy thereafter because they assume that they have already told you how the logic works (which they have) and are subsequently trying to add new issues into the mix along with, perhaps, an intuitive rationale. Also, in the case of Zar, the author is attempting to be comprehensive, necessitating brevity within the extended narrative of the book. In my opinion, biologists conducing statistical analysis have the following multi-part problem: 1). First, one must state clearly just what biological hypothesis, or hypotheses (one at a time), are the subject of the study. Such hypotheses must be independent, preferably stated prior to, data collection. 2). Given the biological hypothesis, one must find an appropriate statistical procedure, or perhaps several, with underlying assumptions that qualify them as most readily applicable. 3). Data must be collected and analyzed in a way that is consistent with all of the assumptions of the chosen statistical procedure(s). The procedures typically follow a specific logic that must be understood and strictly followed. 4). Results then need to be presented in a way that repects the logic of each statistical test and allows for reconstruction of missing steps, when necessary, by potential readers. 5). Finally, and most importantly, there must be an explicit consideration of whether any of the statistical results actually mean anything as far as the original biological hypotheses were concerned. 1 2010 Biostatistics 11 Formal Statistical Tests Logic of Statistical Tests: Here's an excellent framework to follow in conducting a statistical test (Example comes from a One-Sample t-test of the mean. We'll see this shortly): Assumptions: - Observed values X1, X2, X3, ... Xn are a random sample from ~N(,  2). - Variance  2 of the population is unknown. ^ Each statistical test is only applicable to specific kinds of samples drawn from a population with specific properties. In this case, data values X are a properly drawn random sample from a population that has a Normal Distribution with population parameters mean= and variance= 2 that are unknown. The researcher needs to verify whether the data at hand might be drawn from a Normal Distribution. If so, then one can proceed. If not, the test is formally inapplicable. In many instances, however, tests may be robust to violations of one or more assumptions. For example, the t-test is reasonably robust to the assumption of population normal distribution, so usually one can proceed as long as the sample isn't wildly non-Normal. Hypotheses: H0:  = 0 H1:  0 < 0 is a specified value for  < Two sided test or H0:  = 0 H1:  < 0 < 0 is a specified value for  < One sided test ^ Biological hypotheses are restated formally in a statistical test as statistical hypotheses. Statistical hypotheses consist of a matched pair of hypotheses that together comprise all possible events (i.e., outcomes) in the sample space (i.e., set of all possible outcomes - See Lecture Worksheet 05). In other words, the probability of the Union of hypotheses is exactly 1.0. The pair of hypotheses consist of: the null hypothesis H0 - a biologically "uninteresting" hypothesis often indicating no effect for treatments, random behavior, or otherwise non-biological results the alternative hypothesis H1. - a biologically "interesting" hypothesis perhaps indicating a value or difference for a biological treatment, etc. The general strategy of a statistical test is to use a probability distribution to determine whether H0 is likely or unlikely. If unlikely, we can reject H0 and in turn accept H1. Acceptance of H1 would then be a statistical decision based on the fact that H1 is the only alternative hypothesis presented in the test. Consideration of the biological interpretion of the test, and multiple possible alternative explanations, comes later. In some instances, statistical hypotheses are termed "two-sided" if two distinct possibilities are implicit in H1. For instance, in the two-sided statement of hypotheses above, H1 says that  < 0 or  > 0 . By contrast, the one-sided statement of hypotheses above allows for only one possibility  < 0 . 2 2010 Biostatistics 11 Formal Statistical Tests Test Statistic: t  Xbar   0 s < t is the normalized distance between means Xbar and 0 n ^ A test statistic is a number calculated from the sample used in making a statistical decision between H0 and H1. Test statistics are usually calculated so that one may consult a well-known statistical distribution. In this case the value of the test statistic t will be compared with the t-distribution to find P(t). Note that statistic t and the t-distribution are different things. Sampling Distribution: If Assumptions hold and H0 is true, then t ~t(n-1) ^ Test statistics X are carefully chosen to have probabilities P(X) and cumulative probabilities (X) that are understood. In this case if H0 is true, then the t statistic is distributed according to Student's t-distribution t(t) with (n-1) degrees of freedom. Critical Value of the Test:   0.05 < Probability of Type I error must be explicitly set ^ See below for the definition of "Type 1" or "" error. This is a criterion for how stringent the test will be. Stringency, however, is a tradeoff with "Type 2" or "" error as described below. Both types of error are dependent on the number of observations n. C  qt   n  1  ^ A probability P(X) is set above by . From this, one needs to find the quantile, that is, an X value for which one has P(X)=under some probability distribution. Standard statistical tables provide a way to find X from P(X) as do explicit functions built into modern statistical software. In both cases one typically works with the cumulative probability function (X). In the example above, since t ~t(n-1) we use the inverse cumulative t function qt() to find the Critical Value C. Note: C is a quantile - a cut off value of the test statistic t. Decision Rule: IF t > C, THEN REJECT H0 OTHERWISE ACCEPT H0 < One-way case IF |t| > C, THEN REJECT H0 OTHERWISE ACCEPT H0 < Two-way case ^ The decision rule compares the calculated test statistic of the sample with the critical value C. If the rule is determined to be true, then H0 is rejected, and the alternative H1 accepted for statistical purposes. Of course, upon rejecting H0, deciding whether H1 is the only viable biological hypothesis comes later. 3 2010 Biostatistics 11 Formal Statistical Tests Probability Value: P = tt < probability of finding test statistic t given the Assumptions and if H0 is true. ^ Although not part of the formal statistical test, it is common practice to provide a probability value P(X) for the test statistic X calculated in the test assuming H0 to be true. In the case above, since t ~t(n-1) and we have statistic t, we use the cumulative probability density function pt(t) to find P(t). Common attributions for P: IF IF IF IF 0.001 0.001 0.01 0.05
上一篇:Reflection_on_Gold 下一篇:Pttls_Assignment_4_-_Ground_Ru