NOTE: This page was developed using G*Power version 3.0.10. Youcan download the current version of G*Power fromhttp://www.psycho.uni-duesseldorf.de/abteilungen/aap/gpower3/ . Youcan also find help files, the manual and the user guide on this website.
Introduction
Power analysis is the name given to the process for determining the samplesize for a research study. The technical definition of power is that it is theprobability of detecting a “true” effect when it exists. Many students thinkthat there is a simple formula for determining sample size for every researchsituation. However, the reality it that there are many research situations thatare so complex that they almost defy rational power analysis. In most cases,power analysis involves a number of simplifying assumptions, in order to makethe problem tractable, and running the analyses numerous times with differentvariations to cover all of the contingencies.
In this unit we will try to illustrate the power analysis process using asimple four group design.
Description of the experiment
We wish to conduct a study in the area of mathematics education involvingdifferent teaching methods to improve standardized math scores in localclassrooms. The study will include four different teaching methods and usefourth grade students who are randomly sampled from a large urban schooldistrict and are then random assigned to the four different teaching methods.
Here are the four different teaching methods which will be examined: 1) Thetraditional teaching method where the classroom teacher explains the conceptsand assigns homework problems from the textbook; 2) the intensive practicemethod, in which students fill out additional work sheets both before and afterschool; 3) the computer assisted method, in which students learn math conceptsand skills from using various computer based math learning programs; and, 4) thepeer assistance learning method, which pairs each fourth grader with a fifthgrader who helps them learn the concepts followed by the student teaching thesame material to another student in their group.
Students will stay in their math learning groups for an entire academic year. At the end of the Spring semester all students will take the Multiple MathProficiency Inventory (MMPI). This standardized test has a mean for fourthgraders of 550 with a standard deviation of 80.
The experiment is designed so that each of the four groups will have the samesample size. One of the important questions we need to answer in designing thestudy is, how many students will be needed in each group?
The power analysis
In order to answer this question, we will need to make some assumptions andsome educated guesses about the data. First, we will assume that the standarddeviation for each of the four groups will be equal and will be equal to thenational value of 80. Further, because of prior research, we expect that thetraditional teaching group (Group 1) will have the lowest mean score and thatthe peer assistance group (Group 4) will have the highest mean score on the MMPI.In fact, we expect that Group 1 will have a mean of 550 and that Group 4 willhave mean that is greater by 1.2 standard deviations, i.e., the mean will equalat least 646. For the sake of simplicity, we will assume that the means of theother two groups will be equal to the grand mean.
To begin, the program should be set to the F family of tests, to a one-wayANOVA, and to the ‘A Priori’ power analysis necessary to identify sample size. From there we need the following information: the alpha level, the power, thenumber of groups and the effect size.
The latter can be determined via the ‘Determine’ button, which calls up amenu requesting the number of groups, their shared standard deviation, and themean of each group. All of our known variables can now be inputted. Asstated above, there are four groups, a=4. We will set alpha = 0.05. We alreadyhave the mean = 550 for the lowest group and the mean = 646 for the highestgroup. We will first set the means for the two middle groups to be the grandmean. Based on this setup and the assumption that the common standard deviationis equal to 80, we can do some simply calculation to see that the grand meanwill be 598 [Note: “SD σ within each group” is 1 in the image below, but should be set to 80 before hitting “Calculate” to follow this specific analysis].
Let’s set the power to be .8 and calculate the corresponding sample size.A click of ‘Calculate and transfer to main window’, followed by the mainwindow’s ‘Calculate’ button produces the following result.
A total of 68 students will be required for the test; 17 for each class. Now, ifwe want to see how sample size affects power, we can click ‘X-Y plot for a rangeof values’, provide a range of sample sizes, and follow a graph with power asthe dependent variable. Simply set power as a function of sample size withan appropriate set of sizes, here 40 students through 200 in steps of 10.
So we see that when we have 100 subjects (25 in each group), we will havepower of .951.
In the setup above, we have arranged so that the two middle groups will havemeans equal to the grand mean. In general, the means for the two middle groupscan be anything in between the extreme values. If you have a good idea on whatthese means should be, you might want to make use of this piece of informationin your power analysis. Let’s say, for instance, that the means for the twomiddle groups should be 575 and 635. We will compute the power for a sequence ofsample sizes as we did earlier.
Inputting the new effect size into the plot, we get:
So we see that to produce a power of .8 we need fewer subjects than in theearlier case when the two middle groups have the grand mean as their means. Thisshould be expected since the power here is the overall power of the F test forANOVA, and since the means are more polarized towards the two extreme ends, itis easier to detect the group effect.
Effect size
The difference of the means between the lowest group and the highest groupover the common standard deviation is a measure of effect size. In thecalculation above, we have used 550 and 646 with common standard deviation of80. This gives effect size of (646-550)/80 = 1.2. This is considered to be alarge effect size. Let’s say now we have a medium effect size of .75. What doesthis translate into in terms of groups means? Well, we can always use 550 forthe lowest group. The mean for the highest group will be .75*80 + 550 = 610. Let’s assume the two middle groups have the means of grand mean, say g. Then wehave (550 + g + g + 610) / 4 = g. This gives us g = (550 + 610)/2 = 580. Let’snow redo our sample size calculation with this set of means.
So we see that at a power of .8, we have a sample size of 160, or 40 for eachgroup.
What about a small effect size; say, .25? We can do the same calculation aswe did previously. The mean for each of the groups will be 550 , 560, 560 and570.
Now the sample size goes way up.
Discussion
The sample size calculation is based a number of assumptions. One of these isthe normality assumption for each group. We also assume that the groups have thesame common variance. As our power analysis calculation is rooted in theseassumptions it is important to remain aware of them.
We have also assumed that we have knowledge of the magnitude of effect we aregoing to detect which is described in terms of group means. Whenwe are unsure about the groups means, we should use more conservative estimates. For example, we might not have a good idea on the two means for the two middlegroups, then setting them to be the grand mean is more conservative than settingthem to be something arbitrary.
Here are the sample sizes per group that we have come up with in our poweranalysis: 17 (best case scenario), 40 (medium effect size), and 350 (almost theworst case scenario). Even though we expect a large effect, we will shoot for asample size of between 40 and 50. This will help ensure that we haveenough power in case some of the assumptions mentioned above are not met or incase we have some incomplete cases (i.e., missing data).