ALL VOLUMES SEARCH TIEE
VOLUME 6 TEACHING ISSUES AND EXPERIMENTS IN ECOLOGY
RESEARCH

Assessment of the teaching of evolution by natural selection through a hands-on simulation

Introduction

Science education reform continually occurs through the replacement of current teaching methods with alternatives that are designed to enhance student performance.  Newly adopted teaching strategies incorporate ideas, drawn from the cognitive and educational research literature, that have been shown to be effective in controlled settings (D'Avanzo 2003; Donovan and Bransford 2005).  As the environment in actual classrooms differs greatly from laboratory settings, the efficacy of new teaching strategies should also be validated where they are applied (D'Avanzo 2003; Handelsman et al. 2004). Student-active science, which encourages students to take a more active role in generating, evaluating and reflecting on their own knowledge, is based on such literature and has successfully supplanted traditional teaching methods in several redesigned undergraduate courses (Ebert-May et al. 1997; Nehm and Reilly 2007; Sundberg 1997; Udovic et al. 2002) and many individual lessons and activities at all academic levels.

An important step in evaluating student-active learning strategies is to determine which specific aspects of these methods are generating learning gains.  This is often difficult because of the logistical issues associated with establishing the proper controls in educational studies (Kember 2003).  Our study takes one example of student-active learning, having students collect and analyze their own data, and asks whether the physical aspect of collecting data increases students’ learning of scientific concepts and affects their enjoyment of science activities.   

Many topics taught in introductory biology classes, such as cell division, protein synthesis and natural selection, occur at scales too small or time periods too long to be easily observable in the classroom.  In these cases, alternative representations, such as role playing simulation activities, where students become the phenomena they are studying, can aid understanding (Chinnici et al. 2004; McSharry and Jones 2000; Stencil 1993; Warren 1997).  Simulations also allow students to integrate multiple forms of sensory information as they manipulate materials and move around the room (Duveen 1994; Ridgway et al. 1999; Warren 1997).  When combined with opportunities for discussion and reflection, simulations are a recommended way to incorporate activities more carefully into the science curriculum (Bybee and Van Scotter 2006-2007).

In this study, we tested the hypothesis that physical activity, in the form of a role playing simulation where students generate their own data, will help students understand a complex and abstract concept better than a more passive approach where they are shown materials and given an explanation of the simulation along with sample data. To test their comprehension of the ideas demonstrated by the simulation, we used a content knowledge assessment.  As students may have greater recall of experiences that they enjoy (Cohen and Bradley 1978), a survey of attitudes toward the simulation activity and science activities in general was also administered.  Although active learning strategies can impact student learning in many ways, we chose to primarily assess content knowledge because of the growing emphasis on standardized tests and the steep pressure for schools to increase their performance.

The subject of our investigation was the topic of evolution by natural selection, which is notoriously difficult for students to understand (Alters and Nelson 2002; Anderson et al. 2002; Jensen and Finley 1997; Passmore and Stewart 2002).  As biological evolution has been identified by the National Research Council (NRC) and American Association for the Advancement of Science (AAAS) as one of the specific content areas in the life sciences that are important for all high school students to study (AAAS 1993; NRC1996), it is imperative to find a successful way to teach this subject. The brief introduction that students receive in high school could be their only opportunity for learning this important concept and addressing common misconceptions (Bishop and Anderson 1990).  In our simulation activity the principles of natural selection are demonstrated by students (equipped with different feeding implements) as they forage for different colored prey on different colored and textured habitats.  Although natural selection is taught in this way (Fifield and Fall 1992; Kuhn 1969), it is not known if student learning outcomes reliably justify the expense of supplies, classroom time, and considerable teacher effort required to perform the simulation.

Methods

Subjects and setting

A total of 101 10th grade students (63 boys and 38 girls) from five classes and instructed by the same biology teacher participated in the study, although not all students completed both the pre-test and post-test.  At the participating school, 71.6% of students were African American, 11.9% White, 2.5% Asian, 14.0% Latino (School District of Philadelphia, http://www.phila.k12.pa.us/). This study was performed with institutional review board (IRB) approval for research to improve science, technology, engineering and math education in Philadelphia schools through the University of Pennsylvania’s NSF GK-12 program, under which school and teacher identity must remain confidential. The teaching of natural selection takes place during a three week long unit on evolution specified by the school district as part of their Biology Core Curriculum.  This curriculum is designed to standardize the instructional content, order and timing of topics taught throughout the district.  The natural selection simulation activity described below is an expanded version of the one prescribed by the curriculum.

Treatment

All students first received one week of direct instruction on the topic of evolution and the mechanism of natural selection by their teacher.  They were then given a pre-test, followed by a teacher-led discussion based on a handout describing the concepts of adaptation, fitness, heritability, differential reproduction, and the idea that natural selection takes place over many generations.  These concepts were presented through the active interpretation of figures and tables (Resource 1). The next day, students in four of the five classes were divided into two groups, the activity group and the worksheet group.  Students in the fifth class were all assigned to the activity group due to the class’s small size (15 students).  This design allowed us to separate the effect of our treatment from other potentially confounding factors such as time of day.  Group assignments were done so as to approximately equalize the average and range of abilities in each treatment group.  Students in each class were ranked using their cumulative biology class grade from the two preceding units (Mendelian genetics and molecular biology) and were divided, every other student, into the two groups. The activity group performed a natural selection simulation (described below) and analyzed the data they collected using a worksheet (Resource 2).  In contrast, the worksheet group was taken to a nearby classroom where the simulation was explained to them, and they then analyzed data similar to the expected results of the activity group  without actually performing the simulation. The following day, students in both groups reviewed the worksheet with their teacher and answered follow-up questions.  One week after the pre-test, both groups were given a post-test and enjoyment of science activities survey. 

Natural selection simulation

The goal of the simulation is to demonstrate that variation in a trait, heritability of the trait, and differential reproductive success with respect to the trait can lead to evolution by natural selection.  In the simulation, students play the roles of predators with within-species variation in mouth part shape (i.e., knife, fork, or spoon) as they capture prey with within-species color variation (i.e., red, black, and white crafting pom poms). This activity was adapted from the University of California, Los Angeles Life Sciences 1 Demonstration Manual (Phelan 2002).  The predators forage for prey in two different color and texture habitats (i.e. black faux fur and red furry fleece).  After each of three 30-second rounds of foraging, remaining prey reproduce and predators suffer differential mortality based on how many pom poms they managed to capture.  Students acting as predators that don't survive are "reborn" as the offspring of those that survive.  Students participate in all aspects of the simulation, including counting out the correct number of new prey items and distributing them across the habitats after each round of foraging. 

Assessment

To assess students’ understanding of the mechanism of natural selection we chose four multiple choice and one short essay question from a published assessment by Bishop and Anderson (1986).  We wrote an alternative version of the assessment so that students answered questions with similar content and phrasing on the pre- and post-tests (Resource 3).  The two versions of the test were counterbalanced, or distributed evenly, between the pre- and post-test to avoid test form prejudice.  In the three multiple choice questions the students had to discriminate between two statements about an evolutionary scenario and decide which one was the most correct.  The confidence in their answer was scored on a Likert-type scale where the responses ranged from strongly agree with the statement on the left to strongly agree with the statement on the right.  Using this scale the center option was equivocal and indicated that both statements were equally correct.  The fourth multiple choice question asked students to choose the best option among four possible answers.  Additionally, in the short essay question, students were asked to describe how a biologist would explain the evolution of a specific trait. 

After the activity, we also administered a survey published by Shepardson and Pizzini (1993) that is designed to assess students’ perceptions of science activities.  This assessment consisted of five Likert-type items where students agreed or disagreed with statements concerning their enjoyment of science activities.

Data Analyses

Due to routine absences only 90 out of the 101 students completed the multiple choice portion of the pre-test, 88 the post-test, and 77 both tests.  Some students who completed the multiple choice portion of the assessments did not provide a response to the essay question; 87 students completed the pre-test essay, 81 the post-test, and 67 both tests.  Students’ responses to the four multiple choice questions were scored as correct, equivocal (only an option for the first three Likert-type questions), or misconception responses.  The number of each type of response per student was analyzed using a two factor MANOVA with time (pre-test vs. post-test) and treatment (activity vs. worksheet) and their interaction (time x treatment) as fixed factors.  Each type of response was then also individually tested with an ANOVA.  Multiple choice data were analyzed using all available student responses.   

Responses to the essay question were scored in five categories: identification of selective pressure, understanding of fitness, relevance of ancestor population, role of mutation, and mode of inheritance.  Within each category, there were three ranked responses which we discriminated among using the rubric in Table 1. We considered the inclusion of an incorrect belief about mutation in a student’s response to be an improvement over not mentioning mutation and equal to an incorrect belief about mutation, as it demonstrates that the student understands that the source of variation in a population is mutation and is important for natural selection.  To determine if students’ responses within treatment groups changed over time, we performed a test of independence for each scoring category and treatment group combination, resulting in a total of 10 two-row by three-column contingency tables.  If one or more cells contained fewer than five responses we performed the Freeman-Halton extension of the Fisher exact probability test for two-row by three-column contingency tables (Freeman and Halton 1951).  If not, we performed G-tests for independence.  These tests were done using all available student responses.  We then investigated if individual student performance increased or decreased by comparing the pre- and post-test responses of the 67 students who completed both essays.  If the student had a higher ranked response on the post-test, the student was considered to have increased their score and vice versa. 

Table 1. Scoring rubric for short essay question on evolution of a trait. Responses are ranked in ascending order.

Category

Ordinal Response Types

Selective Pressure

1) No mention
2) Selective pressure mentioned
3) Selective pressure exerted by the environment

Fitness

1) No mention
2) Mentioned survival and/or reproduction
3) Differential survival and/or reproduction

Ancestor Population

1) No mention
2) Ancestors without variation
3) Ancestors with variation

Mutation

1) No mention
2) Mentioned mutation or an incorrect belief about mutation
3) Mutation as a source of variation

Inheritance

1) Trait acquired by all members of population
2) Trait inherited by all members of population
3) Trait inherited through family lines

The five statements from the enjoyment of science survey were first analyzed together using a MANOVA with treatment as a fixed factor.  Responses to each statement were then also tested individually with an ANOVA.  All analyses were performed in JMP version 7 (SAS 2007); the Fisher exact tests were performed using the Fisher 2 x 3 test in VassarStats (Lowry 2008) (http://faculty.vassar.edu/lowry/fisher2x3.html).

Results

Multiple Choice
In general, students did not perform well on the multiple choice section of the assessments either before or after the activity; the average number of correct responses per student was 0.66 ± 0.07 (mean ± SE) out of 4 on the pre-test and 0.63 ± 0.09 on the post-test (Fig. 1b).  The MANOVA revealed no overall effect of time, treatment, or an interaction of the two on student responses to the multiple choice questions (Table 2).  When each type of response was considered individually, there was also no interaction or significant main effects.  However, two trends were observed: 1) the activity group had more misconception responses than the worksheet group overall (Fig. 1a) and 2) there were more equivocal responses on the post-test than on the pre-test (Fig. 1b).

Figure 1

Figure 1. a) Number of correct, equivocal, and misconception responses per student in the activity and worksheets groups averaged over both the pre- and post-tests. b) Number of correct, equivocal, and misconception responses per student on pre- and post-tests averaged over both treatment groups.  Data are means ± SE.  # indicates a p-value of 0.06.

Table 2. MANOVA and ANOVA results for the multiple choice responses.

Effect

F- value

p-value

MANOVA

Time

1.37

0.25

Treatment

1.60

0.19

Time x Treatment

0.21

0.89

ANOVA Correct Responses

Time

0.048

0.83

Treatment

0.074

0.79

Time x Treatment

0.43

0.51

ANOVA Equivocal Responses

Time

3.60

0.06

Treatment

1.13

0.29

Time x Treatment

0.0003

0.99

ANOVA Misconception Responses

Time

2.46

0.12

Treatment

3.51

0.06

Time x Treatment

0.090

0.76


Short Essay
Nine of ten tests of independence examining students’ essay responses showed no differences over time (Table 3).  The tenth test showed a significant change in the activity group’s responses in the mode of inheritance category.  Students in the activity group increased their mention of traits being inherited through family lines; no students mentioned it in the pre-test essay and six students mentioned it the post-test.  There was no similar change in the worksheet group; three students mentioned it in the pre-test and two students mentioned it in the post-test.  When looking across treatments, we found that individual student responses changed very little over time, especially in the mutation and mode of inheritance categories (Fig. 2a).  Additionally, when students did change their answers, both increases and decreases in performance were common.


For the most part, student essays were not complete descriptions of evolution by natural selection.  Most students did not mention the concepts of fitness, an ancestor population, or mutation (Fig. 2b).  Of the students who did mention mutation, the only two correct usages were found in post-tests.  Mode of inheritance was mentioned by all students, but the majority indicated that traits were inherited by all members of a population equally.  The category that students seemed to understand the best was that the environment exerts a selective pressure on individuals.

Table 3. Tests for independence of students’ essay responses over time. Reported p-values were calculated from either a G-test of independence (with x2) or a Fisher’s exact test.

Treatment

Response Category

Χ2

p-value

Activity Group

Selective Pressure

1.21

0.55

Fitness

0.11

Ancestor Population

0.61

0.73

Mutation

0.27

Inheritance

0.03

Worksheet Group

Selective Pressure

0.79

0.67

Fitness

0.69

Ancestor Population

0.61

Mutation

0.11

Inheritance

0.51

Figure 2

Figure 2. a) Number of students (of the 67 who completed both pre- and post-test essays) whose response increased in rank, decreased in rank, or remained the same in each response category.  b) Number of students giving each type of ordinal response (see Table 1).  Each pair of columns represents pre- and post-test responses.  The data shown are from all students who completed an essay, 87 pre-tests and 81 for post-tests.  For the first four categories, response 1 indicates that the student did not mention that category in their answer.

Enjoyment Assessment
Students who performed the activity had a more positive assessment of science activities in general and this one in particular.  There was a significant effect of treatment on the average student response to 3 out of 5 items (Table 4, Fig. 3). Although marginally significant, the first and the fifth items showed a positive student reaction (p = 0.06 and p = 0.10 respectively). 

Figure 1. Mean

Figure 3. Average student response on a Likert scale (± SE) for each item in the enjoyment of science class survey.  On the Likert scale a response of 1 meant the student strongly disagreed with a statement while a response of 5 meant the student strongly agreed.  * indicates p-value < 0.05.

Table 4. MANOVA and ANOVA results for the enjoyment of science activities survey analyzed by treatment.

Effect

F- value

p-value

MANOVA

24.131

<0.0001

ANOVAs

Item 1

3.539

0.06

Item 2

6.433

0.01

Item 3

7.780

0.007

Item 4

9.005

0.004

Item 5

2.76

0.10

Discussion

The physical act of having students collect their own data during the natural selection simulation activity did not increase their content knowledge substantially as measured by either student performance on the multiple choice questions or essay.  However, the activity group did show small learning gains through an increase in the number of students who correctly stated that traits are inherited through family lines, not equally by the entire population, in their short essay post-test responses. This difference may be related to the experiences of each group.  Only the activity group actually saw that beneficial traits (feeding implement or color) were passed on from parents to offspring, and the students exhibited obvious preferences for more successful implements. Experiential learning activities may be more successful when they increase the relevance of material or evoke an emotional response (DebBurman 2002; Hamer 2000; McCarthy and Anderson 2000; Stamper 1973).

Though students in the activity group did not show considerable content knowledge gains, they did have a significantly more positive perception of science activities.  Doing the simulation activity increased student interest and motivation similar to other studies which have found that students prefer hands-on activities (Killerman 1996).  While laboratory activities using innovative teaching strategies certainly can increase student performance (Ertepinar and Geban 1996; Sundberg 1997), not all studies have found that laboratory and other hands-on activities substantially increase content knowledge (Burron et al. 1993; Chang and Lederman 1994; Shepardson and Pizzini 1993).  Nevertheless, more positive attitudes and increased interest toward science, a widely held goal of science teachers (Diem 2001; Ediger 2005), may be more important than small content knowledge gains, as it could be a determining factor in whether students’ pursue future scientific careers (Osborne 2003).

It is not definitively known what ancillary or long-term benefits a greater enjoyment of science class may bring for high school students (Hofstein and Lunetta 2004; Osborne 2003).  Students might be more likely to look forward to or take additional science classes, study independently, think about science in their everyday lives, do their homework or simply come to school more often.  This last benefit would be especially important in Philadelphia and similar school districts which have very high levels of absenteeism (18% of students in Philadelphia, approximately 32,000, are absent each day, Grace 2006) and high drop-out rates (10% of students each year, Mezzacappa 2007).  More research into the long-term effects of students’ attitudes, enjoyment and perception of science will help teachers to make informed decisions when choosing activities.  If there are lasting benefits to improving student attitudes, teachers should adopt the use of more activities that balance the learning of content knowledge and enjoyment.

Overall, students in both treatments performed poorly on each type of content knowledge assessment, before and after the treatment.  Although we did not find more correct answers after instruction, there was a trend for students in both groups to choose more equivocal answers on the post-test, indicating that they were less confident in their previously wrong answers.  The generally poor performance indicates that the 5 days of instruction between pre- and post-assessments and the week of instruction on evolution before the pre-test is not sufficient.  Though disheartening, this result is not surprising.  Evolution by natural selection is a complex topic, and students at all levels, high school through undergraduate, struggle with it (Alters and Nelson 2002; Bishop and Anderson 1986, 1990; Nehm and Reilly 2007).  We feel that the three weeks designated for students to learn evolution as an isolated topic within the Biology Core Curriculum, at the level required by the Pennsylvania State Standards, is probably not an adequate amount of time.  We recommend that the teaching of evolution be started in earlier grades (National Research Council 1996), be taught for an extended length of time (Passmore and Stewart 2002; Robbins and Roy 2007), or be linked to topics continually using resources such as Evolution Plug-Ins Across the Curriculum (www.projectdragonfly.org/masters/EPI/index.htm; University of Missouri in Collaboration with Miami University). Repeated exposure to the implications of evolution and the mechanisms by which it occurs will hopefully not only increase students’ understanding but also encourage them to recognize the role of evolution as a powerful and unifying theory in all of biology (Dobzhansky 1973). 

Practitioner Reflections

We initially became involved with education at the high school level as graduate fellows in the National Science Foundation’s GK-12 program.  As fellows, one of our many duties was to bring in activities that could be done easily and relatively inexpensively in a typical high school classroom.  We felt that this activity and the accompanying worksheets we provided for teachers had great potential for successfully teaching the process of natural selection, particularly because there was built-in time for students to review and reflect on the activity they had performed and the data they had gathered.  The next step for us was to examine what effect this activity actually had on student learning.

At the onset of this study, we were naïve practitioners of education research and did not realize the difficulties inherent in assessing learning outcomes. There are a variety of published assessments used to examine various aspects of students’ understanding of evolution by natural selection, including preconceptions, misconceptions and conceptual knowledge, so we chose to use an already developed instrument (Bishop and Anderson 1986) with our own modifications.  Due to our familiarity with the habits of Philadelphia high school students, we shortened the instrument to a manageable length (four multiple choice questions and one short essay).

After the activity, both we and the classroom teacher felt the students understood the mechanism of selection, especially as it pertained to prey populations and camouflage, and we were surprised at their poor performance on the post-test.  Upon reflection, this apparent discrepancy could be due to at least two factors: 1) difficulties in transferring knowledge from the context in which the activity was conducted to other scenarios, and 2) difficulties in extending the understanding of the mechanism of selection from the evolution of discrete characteristics, such as pom pom color, to the evolution of quantitative traits, such as the length of a giraffe’s neck.  Both of these tasks require additional cognitive steps beyond a limited knowledge of the mechanism of natural selection as presented in the scenario.  In future assessments, we recommend that questions incorporating details from the activity scenario, in addition to transfer questions where students have to apply knowledge to a novel scenario, should be included.  This will allow researchers to more accurately gauge where the teaching intervention is succeeding or failing.  For example, in this case, if we had included questions regarding pom pom evolution in addition to real world scenarios, we could have asked: are students grasping the concept of differential reproductive success and having difficulty applying that concept to a read world scenario, or are the students just not understanding the logic of the concept?  We feel that the choice of assessment is the key aspect of education research and that assessment should be carefully chosen with both the specific learning objectives and the student population characteristics in mind. 

Our time as NSF GK-12 fellows has drastically affected our lives and the way we think about teaching and learning.  Before we were fellows, we were “good TAs”.  We gave thoughtful explanations to our students’ questions during lab and office hours and graded exams and quizzes fairly.  Like many graduate students, we did not think much about how what we were doing could be improved.  Now we spend a considerable amount of time and energy reflecting on our teaching and our students’ learning, and we plan to integrate research on science learning into our careers.

References