样本⼤⼩的确定_显着功效样本⼤⼩效应⼤⼩之间的关系
样本⼤⼩的确定
中国民间风俗Congratulations, your experiment has yielded significant results! You can be sure (well, 95% sure) that the independent variable influenced your dependent variable. I guess all you have left to do is write up your discussion and submit your results to a scholarly journal. Right…………?
Çongratulations,实验取得了显著的结果! 您可以肯定(95%确信)⾃变量影响了您的因变量。 我想您剩下要做的就是写下您的讨论,然后将结果提交给学术期刊。 对…………?
Obtaining significant results is a tremendous accomplishment in itlf lf but it does not tell the entire story behind your results. I want to take this time and discuss statistical significance, sample size, statistical power, and effect size, all of which have an enormous impact on how we interpret our results.
获得重⼤成果本⾝就是⼀项巨⼤的成就,但并不能说明结果背后的全部故事。 我想借此时间讨论统计显着性,样本量,统计功效和效应量,所有这些因素对我们解释结果的⽅式产⽣巨⼤影响。
显着性(p = 0.05) (Significance (p = 0.05))
First and foremost, let’s discuss statistical significance as it forms the cornerstone of inferential statistics. We’ll discuss significance in the context of true experiments as it is the most relevant and easily understood. A true experiment is ud to test a specific hypothesis(s) we have regarding the causal relationship between one or many variables. Specifically, we hypothesize that one or more variables (ie. independent variables) produce a change in another variable (ie. dependent variable). The change is our inferred causality. If you would like to learn more about the various rearch design types visit my article ().
⾸先,让我们讨论统计意义,因为统计意义构成推理统计的基础。 我们将在真实实验的背景下讨论重要性,因为它是最相关且最容易理解的。 ⼀个真实的实验⽤于检验关于⼀个或多个变量之间因果关系的特定假设。 具体⽽⾔,我们假设⼀个或多个变量(即⾃变量)在另⼀个变量(即因变量)中产⽣了变化。 变化是我们推断的因果关系。 如果您想了解有关各种研究设计类型的更多信息,请访问我的⽂章( )。
For example, we want to test a hypothesis that an authoritative teaching style will produce higher test scores in students. In order to accurately test this hypothesis, we randomly lect 2 groups of students that get randomly placed into one of two classrooms. One classroom is taught by an authoritarian teacher and one taught by an authoritative teacher. Throughout the mester, we collec
黄鳝怎么做好吃
t all the test scores among all the classrooms. At the end of the year, we average all the scores to produce a grand average for each classroom. Let’s assume the average test score for the authoritarian classroom was 80%, and the authoritative classroom was 88%. It would em your hypothesis was correct, the students taught by the authoritative teacher scored on average 8% higher on their tests compared to the students taught by the authoritarian teacher. However, what if we ran this experiment 100 times, each time with different groups of students do you think we would obtain similar results? What is the likelihood that this effect of teaching style on student test scores occurred by chance or another latent (ie. unmeasured) variable? Last but not least, is 8% considered “high enough” to be that杰弗森
different from 80%?
例如,我们要检验⼀个假设,即⼀种权威的教学风格会在学⽣中产⽣更⾼的考试成绩。 为了准确检验该假设,我们随机选择两组学⽣,将其随机分配到两个教室之⼀中。 ⼀间教室由⼀位威权⽼师教,⼀间教室由⼀位权威⽼师教。 在整个学期中,我们收集所有教室中的所有考试成绩。 到年底,我们将所有分数平均,以得出每个教室的平均分数。 假设威权教室的平均考试分数为80%,威权教室为88%。 看来您的假设是正确的,由权威⽼师教的学⽣⽐由权威⽼师教的学⽣平均分数⾼8%。 但是,如果我们对每个不同的学⽣组进⾏100次此实验,您认为我们会得到类似的结果吗? 教学⽅式对学⽣
测验分数的这种影响是偶然还是其他潜在(即⽆法衡量的)变量发⽣的可能性是什么? 最后但并⾮最不重要的⼀点是,8%被认为与80%相⽐“⾜够⾼”吗?
Null Hypothesis: Assumed hypothesis which states there are no significant differences between groups. In our teaching style example, the null hypothesis would predict no differences between student test scores bad on teaching styles.
零假设:假设假设指出各组之间没有显着差异。 在我们的教学风格⽰例中,零假设将根据教学风格预测学⽣考试成绩之间没有差异。
Alternative or Rearch Hypothesis: Our original hypothesis which predicts the authoritative teaching style will produce the highest average student test scores.
另类或研究假设 :我们预测权威教学风格的原始假设将产⽣最⾼的平均学⽣考试成绩。
现在我们已经准备好阶段,让我们定义什么是p值,以及什么对您的结果有意义。 (Now that we have t the stage let’s define what is a p-value and what it means for your results to be significant.)
The p-value (also known as Alpha) is the probability that our Null Hypothesis is true. Obtaining a significant result simply means the p-value obtained by your statistical test was equal to or less than
your alpha, which in most cas is 0.05.
p值(也称为Alpha)是零假设成⽴的概率。 获得显着结果只是意味着您通过统计检验获得的p值等于或⼩于您的alpha(在⼤多数情况下为
0.05)。
A p-value of 0.05 is a common standard ud in many areas of rearch.
p值0.05是许多研究领域使⽤的通⽤标准。
A significant p-value (ie. less than 0.05) would indicate that there is a less than 5% chance that your null hypothesis is correct. If this is the ca, we reject the null hypothesis, accept our alternative hypothesis, and determine the student test scores are significantly different from each other. Notice we didn’t say the different teaching styles caud the significant differences in student test scores. The p-value only tells us whether or not the groups are different from each other, we need to make the inferential leap assume teaching styles influenced the groups to be different.
显着的p值(即⼩于0.05)将表明原假设正确的可能性⼩于5%。 在这种情况下,我们拒绝原假设,接受我们的替代假设,并确定学⽣的考试成绩彼此之间存在显着差异。 注意,我们并不是说不同的教学⽅
式导致了学⽣考试成绩的显着差异。 p值仅告诉我们各组是否彼此不同,我们需要使推论性飞跃假定教学风格对各组的影响不同。
Another way of looking at a significant p-value is to consider the probability that if we run this experiment 100 times, we could expect at least 5 times the student test scores to be very similar to each other.
查看有效p值的另⼀种⽅法是,考虑如果我们进⾏100次实验,我们可以预期⾄少5倍的学⽣考试成绩彼此⾮常相似。
If we t our alpha to 0.01, we would need our resulting p-value is be equal to or less than 0.01 (ie. 1%) in order to consider our results significant. Of cour, this would impo a stricter criterion and if found significant we would conclude there is a less than 1% chance the null hypothesis is correct.
如果我们将alpha设置为0.01,则我们需要得到的p值等于或⼩于0.01(即1%),以使我们的结果有意义。 当然,这将施加更严格的标准,如果发现有意义,我们将得出原假设正确的可能性⼩于1%。
统计功效 (Statistical Power)归脾丸禁忌
The sample size or the number of participants in your study has an enormous influence on whether
or not your results are significant. The larger the actual difference between the groups (ie. student test scores) the smaller of a sample we’ll need to find a significant difference (ie. p ≤ 0.05). Theoretically, with can find a significant difference in most experiments with a large enough sample size. However, extremely large sample sizes require expensive studies and are extremely difficult to obtain.
研究的样本量或参与者⼈数对您的结果是否显着影响很⼤。 两组之间的实际差异越⼤(即学⽣考试成绩),则需要⼀个显着差异(即p≤0.05)的样本就越⼩。 从理论上讲,在样本量⾜够⼤的⼤多数实验中,可以找到显着差异。 但是,⾮常⼤的样本量需要昂贵的研究,并且极难获得。
Type I error (α) or fal positives, the probability of concluding the groups are significantly different when in reality they
are not. We are will to concede a 5% chance that we incorrectly reject the null hypothesis.
I型错误(α)或误报,得出结论的可能性实际上是⾮常不同的。 我们将承认有5%的机会我们错误地拒绝了原假设。
Type II error (β) or fal negatives, is the probability of concluding the groups are not significantly diff
erent when in fact they are different. We can decrea the probability of committing a Type II error by making sure our statistical test has the appropriate amount of Power.
II型错误(β)或假阴性是在实际上两组不同时得出结论的⼏率⽆明显差异的可能性。 通过确保统计测试具有适当的功效,我们可以降低发⽣II 型错误的可能性。
Image for post胎儿缺氧
Power is defined as 1 — probability of type II error (β). In other words, it is the probability of detecting a difference between the groups when the difference actually exists (ie. the probability of correctly rejecting the null hypothesis). Therefore, as we increa the power of a statistical test we increa its ability to detect a significant (ie. p ≤ 0.05) difference between the groups.
功效定义为1 – II型错误的概率( β)。 换句话说,它是当差异实际存在时检测到组之间差异的概率(即正确拒绝⽆效假设的概率)。 因此,随着我们增加统计检验的能⼒,我们也增加了其检测两组之间显着(即,p≤0.05)差异的能⼒。
It is generally accepted we should aim for a power of 0.8 or greater.
⼀般认为,我们的⽬标应该是0.8或更⾼的功效。
Then we will have an 80% chance of finding a statistically significant difference. That said, we still have a 20% chance of not being able to detect an actual significant difference between the groups.
然后,我们将有80%的机会找到具有统计意义的差异。 也就是说,我们仍然有20%的机会⽆法检测出两组之间的实际差异。
规模效应 (Effect Size)
If you recall our teaching style example, we found significant differences between the two groups of teachers. The average authoritarian classroom test score 80% and the authoritative classroom was 88%. Effect size tries to answer the question
of “Are the differences large enough to be meaningful despite being statistically significant?”.
如果您还记得我们的教学风格⽰例,我们发现两组⽼师之间存在显着差异。 威权课堂的平均考试分数为80%,威权课堂的平均分数为88%。 效应⼤⼩试图回答“尽管统计上显着,这些差异是否⾜够⼤以⾄于有意义?”。
Effect size address the concept of “minimal important difference” which states that at a certain point a significant difference (ie p≤ 0.05) is so small that it wouldn’t rve any benefits in the real worl
诚信手抄报内容
d. Therefore, effect size tries to determine whether or not the 8% increa in student test scores between authoritative and authoritarian teachers is large enough to be considered important. Keep in mind, by small we do not mean a small p-value.
效应⼤⼩涉及“最⼩重要差异”的概念,该概念指出在某个点上的显着差异(即p≤0.05)⾮常⼩,以⾄于⽆法在现实世界中发挥任何作⽤。因此,效果量试图确定权威和威权⽼师之间的学⽣考试分数8%的增长是否⾜够⼤以⾄于被认为是重要的。 请记住,总的说来,我们并不意味着⼀个⼩的p值。
A different way to look at effect size is the quantitative measure of how much the IV affected the DV. A high effect size would indicate a very important result as the manipulation on the IV produced a large effect on the DV.
观察效应⼤⼩的另⼀种⽅法是定量测量IV对DV的影响。 ⾼效果的⼤⼩将表明⾮常重要的结果,因为对IV的操纵对DV产⽣了很⼤的影响。
Effect size is typically expresd as Cohen’s d. Cohen described a small effect = 0.2, medium effect size = 0.5 and large effect size = 0.8
效应⼤⼩通常表⽰为Cohen d。 科恩描述⼩效果= 0.2,中效果⼤⼩= 0.5,⼤效果⼤⼩= 0.8
抱关击柝的意思Image for post
Smaller p-values (0.05 and below) don’t suggest the evidence of large or important effects, nor do high p-values (0.05+) imply insignificant importance and/or small effects. Given a large enough sample size, even very small effect sizes can produce significant p-values (0.05 and below). In other words, statistical significance explores the probability our results were due to chance and effect size explains the importance of our results.
较⼩的p值(0.05及以下)并不表⽰有较⼤或重要影响的证据,较⾼的p值(0.05+)也并不表⽰重要性不⼤和/或较⼩的影响。 给定⾜够⼤的样本量,即使很⼩的效应量也可以产⽣显着的p值(0.05及以下)。 换句话说,统计显着性探讨了我们的结果归因于机会的可能性,效应⼤⼩说明了我们的结果的重要性。
放在⼀起(功耗分析) (Putting it all Together (Power Analysis))
We can calculate the minimum required sample size for our experiment to achieve a specific statistical power and effect size for our analysis. This analysis should be conducted a priori to actually conducting the experiment.
我们可以计算实验所需的最⼩样本量,以实现分析所需的特定统计功效和效应量。 该分析应在实际进⾏实验之前进⾏。
Power analysis is a critical procedure to conduct during the design pha of your study. This way you will have a good idea of the number of participants needed for each experiment group (including control) to find a significant difference(s) if there is one to be found.
功效分析是在研究设计阶段进⾏的关键程序。 这样,您将对每个实验组(包括对照)发现显着差异( 如果有 )所需的参与者数量有了⼀个很好的了解。
G*Power is a great open-source program ud to quickly calculate the required sample size bad on your power and effect size parameters.
G * Power是⼀款出⾊的开源程序,可⽤于根据您的功效和效果量参数快速计算所需的样本量。
G *功率 (G*Power)
Image for post
1. Select the “Test Family” appropriate for your analysis
选择适合您分析的“测试族”
we’ll lect t-tests
我们将选择t检验
2. Select the “Statistical Test” you are using for your analysis
nuxe
2.选择⽤于分析的“统计检验”
We will u Means: Difference between two independent means (two groups)
我们将使⽤均值:两个独⽴均值(两组)之间的差异
3. Select the “Type of Power Analysis”
3.选择“功率分析类型”
We will lect “A priori” to determine the required sample for the power and effect size you wish to achieve.
我们将选择“先验”来确定所需的⼒量和效果⼤⼩所需的样本。
4. Select the number of tails
4.选择尾数
U one tail if you only wish to determine a significant difference between groups in one direction. Typically, we lect a 2-tailed test.
如果仅希望确定⼀个⽅向上的组之间的显着差异,请使⽤⼀条尾巴。 通常,我们选择2尾测试。
We will lect a two-tailed test
我们将选择⼀个两尾测试
5. Select the Desired Effect Size or “Effect size d”
5.选择所需的效果尺⼨或“ 效果尺⼨d”
we’ll go through a range of effect sizes
我们将介绍各种效果⼤⼩
6. Select “α erro prob” or Alpha or the probability of not rejecting the null hypothesis when there is an actual difference between the groups.
6.选择“α错误概率”或“阿尔法”,或者在组之间存在实际差异时选择不拒绝原假设的概率。
We’ll u 0.05
我们⽤0.05
7. Select the power you wish to achieve.
7.选择您想要获得的功率 。
We’ll lect 0.8 or 80% power and 0.9 or 90%
我们将选择0.8或80%的功率以及0.9或90%的功率
Select “Allocation Ratio N2/N1”
选择“分配⽐率N2 / N1”
If you are expecting to have an equal number of participants in each group (treatment and control) then lect 1. If you have twice as many in one group compared to the other group then lect 2.
如果希望每个组(治疗组和对照组)的参与者⼈数相等,则选择1。如果⼀个组的⼈数是另⼀组的两倍,则选择2。
Image for post
Alpha = 0.05
阿尔法= 0.05
In general, large effect sizes require smaller sample sizes becau they are “obvious” for the analysis to e/find. As we decrea in effect size we required larger sample sizes as smaller effect sizes are harder to find. This works in our favor as the larger the effect size the more important our results and fewer participants we need to recruit for our study.
通常,较⼤的效应量需要较⼩的样本量,因为它们对于分析可见/发现是“明显的”。 随着效应⼤⼩的减⼩,由于难以找到较⼩的效应⼤⼩,因此需要更⼤的样本⼤⼩。 这对我们有利,因为效应量越⼤,我们的结果越重要,我们需要招募的参与者越少。
Last but not least, the are the sample sizes requires for each participant group. For example, an experiment with one IV with 4 groups/levels and one DV, where you wish to find a large effect size (0.8+) with a power of 80%, you will need a sample size of 52 participants per group or 208 in total.
最后但并⾮最不重要的是,这些是每个参与者组所需的样本量。 例如,对于⼀个具有4个组/级别的IV
和⼀个DV的实验,您希望以80%的功效找到较⼤的效应⼤⼩(0.8+),则每组需要52个参与者的样本量或208总共。
样本⼤⼩的确定