Commentary | Education

Lessons—Apples, Oranges and Eighth Graders

These pieces originally appeared as a weekly column entitled “Lessons” in The New York Times between 1999 and 2003.


Apples, Oranges and Eighth Graders

By Richard Rothstein

Reports on school performance usually compare test scores of the same grade from one year to the next. For example, most states ask schools to report whether this year’s eighth grade did better than last year’s. A new federal law says that every state must track progress by comparing the same grades year after year.

But this method can not truly identify effective schools because it compares different groups of students. Last year’s eighth graders are in ninth grade this year.

Eighth-grade scores could now be higher than last year’s, even if eighth-grade teaching did not improve, if this year’s eighth graders started in the fall with more ability than last year’s, if they learned more at home or if their seventh-grade teacher was better than the previous class’s seventh-grade teacher.

William Sanders, a statistician, has proposed a solution to this problem. He has helped Tennessee design its school reporting system; other states may now copy his methods.

Instead of comparing this year’s eighth graders to last year’s, Dr. Sanders examines the same group’s growth from seventh grade to eighth. He properly insists this is a better way to judge schools and teachers.

When, instead, conventional analyses compare the same grade from year to year, they must account for different home advantages that influence learning. Distinct scores may be given for minority and low- income students. Or all scores may be statistically reinterpreted as though each school had the same demographic mix. This is called controlling for background.

Dr. Sanders’s idea eliminates the need for such background controls. The way it works sounds complicated but is actually simple. Imagine a child who scores at the 65th percentile in the third through seventh grades – in other words, knows more each year than about two- thirds of his peers. The child’s rank each year is the same because his annual score gain is only average. Then, imagine that in the eighth grade, his score jumps to the 75th percentile.

Dr. Sanders reasons that the earlier consistent 65th percentile results were probably caused by personal advantages like genetic ability or effects of affluence – like having a home computer or going to summer camp. But the extra eighth-grade gain (Dr. Sanders calls it “value added”) must be the result of a more effective teacher.

Of course, this is not always true, because personal traits are not permanently fixed. Family income can change. A student can mature, developing more motivation, relative to peers, than before.

Yet despite its limitations, the value-added method could help solve a big  problem for education research. It is difficult to gather accurate student background data (only race and subsidized lunch eligibility are usually known), so analysts can never be sure how much of a student’s learning is caused by schools and how much by unknown personal traits.

Dr. Sanders concludes that if you collect scores from several years on the same students, the underlying pattern can estimate hidden influences of poverty and other student characteristics. Scores that depart from the pattern may reflect the power of unusual teachers.

Dr. Sanders assumes that fixed personal traits influence initial ability but not the rate at which students make relative gains. Good teachers, he says, can produce greater gains for high- and low-achieving students alike. In other words, if a low-income student goes from the 40th to the 45th percentile while an affluent one goes from the 60th to the 65th, Dr. Sanders considers the teacher to be equally effective with each. This assumption is still unproven.

While measuring gains from year to year is a simple idea, the actual analysis is complex because it requires scores for each Tennessee student for each year and subject, linked to a specific teacher for each test. Such gargantuan computations have never been tried in education. Dr. Sanders’s training was in agricultural statistics, in which many environmental and genetic influences must be separated to improve plant and livestock breeding.

Value-added analysis is now familiar to many educators, but misconceptions about it abound. Many observers wrongly assume that because Dr. Sanders does not explicitly account for student race or poverty, these must not matter. But in the case of the student described above who usually scores at the 65th percentile, the above-average rank was mostly owing to continuing personal and demographic traits, not unusual teacher skills. More effective teaching may have only caused the extra gain to the 75th percentile in eighth grade.

Dr. Sanders has shown, perhaps better than others, how to identify the relative power of teachers. Using his technique, researchers can explore how better schools maximize the learning that exceeds students’ normal patterns of growth.

Return to the Education Column Archive

See related work on Education

See more work by Richard Rothstein