Monday, August 25, 2014

Is Education Really a Science? (Hint: No.)
It doesn't take long before one realizes that only thing constant about public education is change. Students, of course, progress (usually) from grade to grade, and move away from, or into, a school district; teachers and administrators are hired, (sometimes) fired, quit, and retire. Changes in the local population will sometimes necessitate opening new schools, closing old schools, or expanding or changing the mix of students in existing schools.

But I suspect that most people, even parents of school-age children, are unaware of the constant undercurrents in curriculum, teaching materials, testing, and pedagogy. Indeed, spend some time on a school board or at school board meetings, and soon you'll see teachers and administrators asking for funds for some new program, materials, or conference, that will promise some sort of breakthrough for at least a portion of the student population. They'll provide background materials for the boards that with increasing frequency are highly polished, along with personal testimonials. Sometimes, they'll even offer personal statements from students and parents. Often, the supporters will argue that the subject matter for which the seek approval is supported by some sort of "scientific" research.

And one would be hard pressed to refute or question such an argument. After all, it's hard to argue with statistics and published "scientific" research. Many parents and board members lack any experience with the statistical and research methods used in the publications teachers and administrators point to with such confidence. Even when a parent or board member does have relevant education and experience, mounting a challenge is still difficult in the face of parents and board members who are afraid to challenge someone they consider to be an "expert". Often the challenger is branded "arrogant" and "obstructive".

In short then, the appeal to published research and statistics gives teachers and administrators a lot of power over the public and school boards who are awed by their "expertise".

But read any book about the history of education reform in America, or compare your experience in public school to the major reforms being championed, and you begin to wonder. Often it seems that the same sorts of ideas come and go about every two or three decades—about the time for a generation of students and teachers to move through the schools. Some subjects, such as phonics-based or whole-word-based methods for teaching reading, or whether to have students memorize multiplication tables, have been the subject or raging arguments for decades with no resolution.

This raises an important question—If education is supposed to be based on "scientific" research, and if that research can't settle major questions about materials and pedagogy for basic subjects that all students take, then how good can the research be? And if the research is not reliable, then what basis do teachers and administrators have to claim their "expertise"? And if that basis is more narrow than assumed by the public and school board members, then shouldn't there be a more equal discussion about what new ideas and programs are really worth the investment of time and money? Should we spend so much on "professional development" based on unreliable research, or instead let the teachers develop their own styles and methods (and save money)?

This is a vital question in education, as Valerie Strauss, the Washington Post education correspondent, points out in a recent post on her 'blog, "The Answer Sheet", quoting education Professor Robert H. Bauernfeind's 1968 article titled, "The Need for Replication in Education Research", published in the Kappan magazine:
The principle of replication is the cornerstone of scientific inquiry. This principle holds that under similar conditions, one should obtain similar results. Replication has long been an essential aspect of research in the natural sciences, where science findings are not published until their repeatability has been demonstrated. In the natural science, the investigator may repeat his experiment 10 or 20 times, cross-comparing all results, prior to publishing his "findings" ….
Yet the process of replication is much more vital in our field than in the natural sciences or even the biological sciences. The reason, simply, is that more things can go wrong in a behavioral research project than a physical research project. There is a higher probability that the findings of a single behavioral study might be in serious error, or might not be generalizable beyond the specific circumstances of the specific study ….
Strauss refers to Bauernfeind's comments to introduce an "important and cautionary" (her words) new paper, titled "Facts Are More Important Than Novelty Replication in the Education Sciences", in the current issue of Educational Researcher (August 13, 2014), the "shocking statistics" (again, her words) of which shine a very harsh light on the question of the reliability of educational research and throws the whole consideration of education being a "science" or "scientific" into serious question. Why such urgent words from Strauss? Consider the findings of the study (citations omitted, emphasis added):
The present study analyzed the publication histories of the education journals with the top 100 five-year impact factors and found 0.13% of education publications were replications, substantially lower than the replication rates of previously analyzed domains. Contrary to previous findings in medical fields, but similar to psychology research, the majority (67.4%) of education replications successfully replicated the original studies. However, replications were significantly less likely to be successful when there was no overlap in authorship between the original and replicating articles. This difference raises questions regarding potential biases in replicating one's own work and may be related to previous findings of questionable research practices in the social sciences.
Given such low replication rates, the need to increase is apparent and permeates all levels of education research. We believe that any finding should be directly replicated before being put in the WWC [What Works Clearinghouse]. We cannot know with sufficient confidence that an intervention works or that an effect exists until it has been directly replicated, preferably by independent researchers.
Think of it this way—Only one out of every 800 education research papers can be verified independently. And we can't rely on any research that hasn't been independently confirmed. One can only wonder if any research in use today or proposed has any value at all. Who then can be considered an "expert" in education? How can education research offer any help in setting educational policies or reforming education? How can our current "data-driven" reforms, like Race to the Top and the Common Core make any sense?

The most fundamental aspect of science is reliability as established by replication, defined by the authors as "the purposeful repetition of previous research to corroborate or disconfirm the previous results." The authors in fact state that replication of results exceeds the "gold standard" of experimental design, the randomized controlled trial, which, given the limitations of doing experimental work in education (as in any of the social sciences), has notable limitations. The authors note that replication serves the following: "to control for sampling error, to control for artifacts, to control for fraud, to generalize to different/larger populations, or to assess the general hypothesis of a previous study."

In short, replication establishes reliability—the confidence that a reported phenomenon is indeed true, because it has been experienced independently by others. If the same experiment performed by many different experimenters always produces the same result, then those results, and the causal explanation demonstrated by the experiment, show the earmarks of truth by being invariant. Experiments that can't be reproduced are thus suspect, because their results, and therefore causal explanations, aren't reliable. "Replication research can help identify, diagnose, and minimize many of the methodological biases" (see the list above). The authors note that Harry Collins, the Distinguished Research Professor of Sociology and director of the Centre for the Study of Knowledge, Expertise, and Science at Cardiff University, and a fellow of the British Academy, has "gone so far as to call replication the Supreme Court of science."

"Despite the benefits that replication brings to the research table, conducting replications is largely viewed in the social science research community as lacking prestige, originality, or excitement … a bias that is not always shared in the natural sciences." Why the difference? The authors point to a number of well-established biases against replication research in the social sciences (citations omitted):
  • Submission bias. Conducting research and submitting for publication is time-consuming, and investigators may purposefully remove replications from the publication process to focus on other projects or because they believe replications cannot be published.
  • Funding bias. Research, including and especially RCTs [radomized controlled trials], requires resources, making replications difficult to conduct if not funded.
  • Editor/reviewer bias. Journal editors and reviewers may be more likely to reject replications, driven by an implicit (or even explicit) belief that replications are not as prestigious as nonreplication articles.
  • Journal publication policy bias. Journals may have policies against publishing replications.
  • Hiring bias. Institutions may not hire researchers who conduct replications, with Biases 2 and 3 possibly playing a role in these decisions.
  • Promotion bias. Similar to hiring bias, organizations may not value replication research as favorably as new and groundbreaking research within promotion and tenure activities.
  • Journals-analyzed bias. Previous research analyzing replication rates may have selected journals that publish few replications. Because each journal has its own editorial policies, it may be that some journals are more likely to accept replications than others.
  • Novelty equals creativity bias. Editors, reviewers, and researchers value creative contributions, but novelty and creativity are not synonymous. Most definitions of creativity and innovation propose criteria of novelty and utility; a novel result that cannot be replicated is by definition not creative.
And yet, the authors note in puzzlement, "these biases exist even though the call for replications has existed for generations. …. Furchtgott (1984), in a discussion of the need to alter the outlook on publishing replications, stated that 'not only will this have an impact on investigations that are undertaken, but it will reduce the space devoted to the repetitious pleas to replicate experiments.'" Nonetheless, the authors note (citations omitted, emphasis added):
52% of surveyed social science editors reported that being a replication contributes to being rejected for publication. In fact, the only factors associated more strongly with rejection were the paper being published in the proceedings of a national (61%) or regional (53%) conference and an experiment that did not have a control group (54%). With such a high rejection rate, the disincentives to attempt replications are considerable. With the obvious lack of replicability in [medical studies], the concern over the veracity of some bedrock empirical beliefs should be high, making a lack of published replications a major weakness in any empirical field.
In short, there is a strong bias against publishing replications even though everyone knows such work is critical to establishing scientific reliability. As a result, relying on the conclusions from research in the social sciences should be viewed as no better than buying a pig in a poke.

When it comes to research in education, the authors point out that the importance of replicating education research has been recognized for decades, with the first paper on this subject by C.C. Peters, titled "An Example of Replication of an Experiment for Increased Reliability" having been published in the Journal of Educational Research nearly 80 years ago (1938). According to the authors, Peters focused on the need for "independent tests to understand the reliability of a particular finding." As Peters concluded "with great prescience" (citation omitted):
It is best not to place much confidence in a mathematically inferred ratio as far as its exact size is concerned but to stop with the assurance that a set of differences prevailing in the same direction indicates greater reliability than that expressed by the ratios of the samples taken singly.
Peters's conclusion is extremely important. We cannot rely on numbers and magnitudes reported in single reports. Instead, we can only rely on trends shown by looking at multiple repetitions of the same study.

The authors conclude (citations omitted, emphasis added):
Like Campbell and Stanley (1963) noted a half century ago about experimental design, replication is not a panacea. It will not resolve all issues and concerns about rigor, reliability, precision, and validity of education research. However, implicitly or explicitly dismissing replication indicates a value of novelty over truth and a serious misunderstanding of both science and creativity. If education research is to be relied upon to develop sound policy and practice, then conducting replications on important findings is essential to moving toward a more reliable and trustworthy understanding of educational environments. Although potentially beneficial for the individual researcher, an overreliance on large effects from single studies drastically weakens the field as well as the likelihood of effective, evidence-based policy. By helping, as Carl Sagan (1997) noted, winnow deep truths from deep nonsense, direct replication of important educational findings will lead to stronger policy recommendations while also making such recommendations more likely to improve education practice and, ultimately, the lives of children.
To emphasize: Current education research stresses novelty over reliability. Current researchers in education, as a group, misunderstand both science and creativity. Relying on "large effects" in single studies weakens the entire field of education research and its value in in evidence-based policy making. We cannot rely on education research until the field drastically changes its attitudes to provide reliable scientific research.

But until that happens, what's the use of "evidence-based" education reform or "data-driven" policies?


No comments:

Post a Comment