Jo Boaler, Tracking, Education Research, and Honesty

Image credit:

A few years ago I read Jo Boaler’s book, “Mathematical Mindsets” and I thought it contained some good ideas. There were a few things that I thought were not realistic or would be difficult to scale, but overall I found the book useful. Our department read it together, and I remember a colleague pointing out that Boaler often cited her own research. That revelation made me more skeptical of her work, especially when she provided citations.

Was she only seeking out data that confirmed what she already believed?

Despite this I would have considered myself a “fan” of her work. In subsequent years I used some of her resources from Youcubed and subscribed to her email newsletter. The content was a mixed-bag of resources and opinions, but she was clear in her book and in her work that “tracking,” the education model in which students are separated into courses based on their ability, needed to be eliminated. It was unfair, meant weaker math students weren’t exposed to “rich” mathematics, perpetuated inequities, and the research showed de-tracking was better for all students. If we wanted to fix mathematics education in the United States, according to Boaler, we would need to de-track mathematics, helping teachers to incorporate “low floor, high ceiling” tasks that are accessible to all students at a particular grade level.

My intuition about de-tracking is skepticism for reasons I might explore in another post. For now I want to focus on what’s happening in California surrounding their proposed mathematics curriculum, the de-tracking program San Francisco Unified School District implemented in 2014, and how advocates, Jo Boaler chief among them, are using misleading (or missing) data to push for policy that has little to no evidence to support its adoption.

This NY Times article gives context regarding the curriculum and the debate around it. From the article:

The California guidelines, which are not binding, could overhaul the way many school districts approach math instruction. The draft rejected the idea of naturally gifted children, recommended against shifting certain students into accelerated courses in middle school and tried to promote high-level math courses that could serve as alternatives to calculus, like data science or statistics.

The draft also suggested that math should not be colorblind and that teachers could use lessons to explore social justice — for example, by looking out for gender stereotypes in word problems, or applying math concepts to topics like immigration or inequality.

Evaluating the Success of SFUSD’s Framework

The evidence that advocates are using to promote this curriculum is largely from San Francisco Unified School District’s de-tracking program which they implemented in 2014. (“It also promoted something called de-tracking, which keeps students together longer instead of separating high achievers into advanced classes before high school. The San Francisco Unified School District already does something similar.”) They claim that they reached all of their goals with the program and show progress along many metrics . But a group of data scientists, teachers, lawyers, parents, and students put together a report outlining how SFUSD is either actively hiding data (as some California Public Records Act requests, California’s version of FOIA, have been ignored), intentionally misleading the public, or is simply incompetent.

The group, Families for San Francisco, put together a report (cited in the NY Times) explaining the problems with SFUSD’s nationwide campaign espousing the success of its new framework. They share evidence that de-tracking was an overwhelming positive decision for the students in SFUSD. Here are a few revelations from the report worth noting. They begin by going through each of the three goals outlined by SFUSD, the claims made by SFUSD about progress on those goals, and their analysis.

The first goal was to “Reduce the number of students forced to retake Algebra 1, Geometry, or Algebra 2 by 50% from numbers recorded for 6/2013.” SFUSD claims a “dramatic increase in student comprehension” and a drop in Algebra 1 repeaters from 40% to 7% in a press release from 2017. Here is the analysis of this claim from Families for San Francisco.

Facts: The grade distribution we received from SFUSD showed no improvement at all in Algebra 1 grades. The repeat rate did come down, but only because in 2015 SFUSD eliminated the requirement to pass the Algebra 1 California Standards Test (CST) exit exam as a condition of progressing. The effect of this change was later partially acknowledged by the Math department in the speaker’s notes in one of their presentation slides in 2020: “The drop from 40% of students repeating Algebra 1 to 8% of students repeating Algebra 1, we saw as a one-time major drop due to both the change in course sequence and the change in placement policy.” Finally, in conducting our review of SFUSD’s claims, we were unable to obtain any such “longitudinal data” they refer to nor could we replicate the repeat rate numbers quoted by SFUSD using data obtained via a CPRA request. We have deep concerns that SFUSD is claiming credit for student achievement that is either untrue or unsubstantiated by the data or both.

The second goal was to “Increase the number of students who take and pass 4th year math courses (post- Algebra 2 courses) with a C or better by 10% by 6/2018.” SFUSD claims that “456 more students, or 10.4% more students are taking courses beyond Algebra 2 in 2018-2019 than were in 2017-2018.” Unfortunately, this claim is misleading. Here is the analysis from Families for San Francisco.

Facts: Enrollment in advanced math classes at SFUSD has gone down, not up, and SFUSD has produced no data about pass rates.Advanced math is commonly understood to mean courses beyond Algebra 2, including Precalculus, Statistics, and Calculus; however, SFUSD’s claim that its enrollment in “Advanced Math” enrollment has increased depends entirely on counting students enrolled in its “compression course” — a third-year course combining Algebra 2 with Precalculus. The problem with this framing is that the University of California (UC) rejected SFUSD’s classification of its compression class as an advanced math course due to its failure to meet UC standards for Precalculus content. Once we exclude the enrollment data for the compression course, the enrollment number for advanced math shows a net decrease (emphasis mine) from 2017-2018 (the final cohort prior to the implementation of the new math course sequence).

The third goal was to “Increase AP Math enrollment & pass rate for Latino & African American students by 20% by 6/2018.” SFUSD claimed that “(a) ‘AP Math enrollment has also increased over a two-year period from 2016-17 to 2018-19’; (b) that ‘AP Statistics enrollment has increased 48.4%’; and (c) that Latinx AP Math enrollment increased 27% over the same period.” Families for San Francisco’s investigation found these claims inconclusive because they were unable to get all the data necessary to verify them. Here’s what they write.

Facts:Whether SFUSD met its original goal to increase Latinx and African American AP Math enrollment by 20% from June 2014 to June 2018 is unknown because in spite of our requests, SFUSD has not produced complete data for this period. For the two-year period from 2016–2017 to 2018–2019, African Americans are not listed among “subgroups who met or exceeded the 10% growth target” and SFUSD has not disclosed any performance outcomes. The five-year data for school years 2016-2017 through 2020-2021 shows that enrollment by African American students has fluctuated from year to year while enrollment by Latinx students has been more or less on the rise. And because SFUSD does not release data on the pass rate for AP Math exams, its success rate is unknowable.

Not only are the pass rates unknown, the enrollment data available shows that the claim of increased AP math enrollment is misleading.

Meanwhile, the claim of increased AP Math enrollment overall is misleading. The number of SFUSD students overall taking AP Calculus is down. The number taking AP Statistics is up but it is concentrated at three specific school sites (Lowell, Ruth Asawa SOTA and Balboa). The other schools showed no significant increase.

It seems clear that SFSUD did not meet their goals and is intentionally spreading misinformation with data “showing” they’ve reached their goals. This is frustrating enough as it demonstrates a clear effort by supporters to confirm their own bias and manipulate the data to mislead the public. But Families for San Francisco also points out that new inequities were introduced since the overhaul of their math program (which, keep in mind, is a model for the new California math sequence). For example, by the end of tenth grade “Algebra 2 enrollments of Black and brown students have declined because most students cannot afford the costly work-arounds afforded by their white and Asian counterparts.” Read their report for a more detailed explanation of why, but essentially it comes down to parents that can afford a workaround will do so, and those that can’t likely won’t.

None of this has stopped advocates from pushing the narrative that de-tracking, at least in the approach that SFUSD took, is good for students. Jo Boaler, a researcher in math education, should be able to take a dispassionate look at the evidence and conclude that the experiment failed. But she seems unable or unwilling to do this.

Here’s a tweet from Boaler in 2018 pointing to SFUSD data.

Here’s another one.

For a more detailed argument from Boaler and other math education experts, here’s a piece entitled “Opinion: How one city got math right,” which concludes with the statement, “We congratulate San Francisco Unified on its wisdom in building math sequences that serve all students increasingly well.”

That piece is from 2018, but she doesn’t seem to have revised her opinion, at least not publicly. I’m subscribed to her newsletter and in one she sent out in August entitled “New Evidence Supports De-Tracking” she links to a recent paper by her and David Foster. Throughout this paper she cites her own work to justify claims. It doesn’t appear to be peer-reviewed and is not published in a journal but hosted on her Youcubed website. The research in that paper looks promising, but given everything I’ve laid out above it’s difficult for the average educator to tell if this is real evidence or if the goal posts have been moved. Would a 23-page report by an interested organization yield all the same problems as discovered by Families for San Francisco in their extended report analyzing the data from SFUSD?

I don’t know the answer to that. I do know, however, that this case study represents a problem endemic in education research. I’ve yet to go to a serious professional development session in which research couldn’t be found to support whatever intervention the organizer was promoting. When educators seek out research on different topics in education it is very difficult to find a consensus. Given that, as of 2014, less than 1% of education articles were replication studies and that “replications were significantly less likely to be successful when there was no overlap in authorship between the original and replicating articles,” educators, in my experience, are understandably skeptical of education research. Boaler and other researchers that agree with her on this give us a reason to maintain that skepticism.


No one is immune to confirmation bias. The longer you’ve maintained a viewpoint, I suspect, the harder it is to let go of that viewpoint. However, one of the antidotes to confirmation bias is surrounding yourself with honest people who hold a diversity of viewpoints. Surrounding yourself with people who agree with you, or who disagree but won’t point out the errors in your thinking, means you will remain in error. If you are a person with the ear of a great number of educators, that means those educators who listen to you uncritically will also remain in error. If you are rewriting a curriculum for a state whose standards have an outsized impact on standards adopted throughout the country, then your errors will have an outsized impact on math education in general.

Jonathon Haidt explains how easy it is for humans, and social scientists, to fall into the trap of motivated reasoning in this piece, “Why Universities Must Choose One Telos: Truth or Social Justice.”

A consistent finding about human reasoning: If we WANT to believe X, we ask ourselves: Can-I-Believe-It?”But when we DON’T want to believe a proposition, we ask: Must-I-Believe-It?”This holds for scholars too, with these results:

Scholarship undertaken to support a political agenda almost always “succeeds.”

A scholar rarely believes she was biased

Motivated scholarship often propagates pleasing falsehoods that cannot be removed from circulation, even after they are debunked.

Damage is contained if we can count on “institutionalized disconfirmation” – the certainty that other scholars, who do not share our motives, will do us the favor of trying to disconfirm our claims.

I’m not claiming that everything Boaler says is incorrect and I’m sure her intentions are good. As I mentioned above, she is indicative of a much wider problem. I’m saying that someone who either knowingly manipulates data or can’t see the error in her analysis of the data shouldn’t go on perpetuating those ideas without criticism from educators who care about math education. In most situations educators don’t have the time or expertise to truly evaluate the claims made by education researchers. In this case Families for San Francisco did the legwork and revealed that the emperor has no clothes.

I encourage math educators that read this to share their report widely, counter claims and proposed changes that are based on SFUSD data, and promote an environment in which leaders in education are concerned about truth and pursue it through viewpoint diversity. Since Boaler is a leader in progressive math education, it’s up to people who support her work to point out how her analysis is flawed. We must call on her to stop pushing for policies that don’t help, and may actually harm, students in the name of falsely vindicating her ideas. It may be easy to dismiss critique’s from the “other side” as they will always have critiques. It’s much harder to dismiss a careful critique from within one’s tribe.

If this ordeal shows us anything it’s that we must be careful who we valorize – and that we must keep our eyes open and be ready to criticize their ideas when appropriate.

4 thoughts on “Jo Boaler, Tracking, Education Research, and Honesty

  1. This is a really interesting and well-written post. I’m a math teacher who was trained in research and data analysis. I do some data analysis on the side right now for UCSF. You sound informed about the rigor of peer review and quoting your own work and posting your own research on your own website. I appreciate your challenge to the claims of the SFUSD math de-tracking success story.

    My view of de-tracking is more about the fact that I often see white faces in the honors and advanced courses and brown faces in the regular or remedial courses. That seems to me to be perpetuating some systemic racism. I’d love to hear your thoughts on this aspect of course placement. I’ve taught Algebra II forever and think the expectations of that course are very demanding and make it complicated to teach in a way that allows for content coverage and to differentiate lessons and activities to meet the needs of all learners. And, we have tracking at our school. Sometimes I think that if we had those honors kids in the room, rich discussions would ensue amongst students.

    I’m trying to brief, but I think this is a dissertation topic. My main question is, what do you think about the well-intentioned academic tracking and resulting racial make-up of the classes? Have you ever noticed this?

    • Thanks for taking the time to comment! Sorry it’s taken me a while to get back to you. I have a few thoughts but no real strong beliefs.

      I teach AP calculus and I have noticed the pattern in honors classes as well, to a certain degree. Although, it does vary from year to year and it seems to track more with socioeconomic status, familial status, and trauma history than it does with race. That is, students in calculus are more likely to come from more affluent households, have parents that are married and/or educated beyond high school, and don’t have a history of trauma in the family (a low ACE score).
      But I know those disparities exist in many places, even when controlling for some of the factors I just outlined. So, what do I make of that?

      This may seem like a dodge, but I don’t know for sure. My default stance is that when there’s a disparity we should look at what might be causing it very closely. Is it some sort of racism in the system? If it is, then can we mitigate or eliminate it? And if it is, we have to be specific about what we mean by racism, because a disparity does not imply racism is the causation. (I add that to make clear that I’m using a narrower definition of racism than what might be popular today.) There’s likely many factors that contribute, so I would be surprised if there was only one cause. I can imagine things like teacher recommendation being a place where racism/bias could seep in and reduce the number of kids of color in honors tracks. It seems like the solution to that would be as-objective-as-possible tests. However, there is a current running against that in many places.

      I can imagine other reasons that are cultural. For example, my family and peer group wasn’t interested in playing music. No one ever encouraged me to play an instrument and only a couple friends of friends played instruments – and only in band class. I barely knew they played one. Because of this I never really had any interest in learning to play one. I know that this was a cultural influence because when I went to college I picked up guitar and fell in love with it (maybe “culture” isn’t the right word, but I’m sure you see my point). If I had been born into a family of musicians I’m certain I would’ve picked it up much earlier in life.

      The tragedy, as I’m sure I don’t need to explain to you, is that there are brilliant kids of color that for myriad reasons don’t make it to the honors level math and therefore have fewer opportunities in post-secondary schools and later on in life. I think we need to work very hard to solve that problem. In solving it, assuming it can be solved, we have to always be mindful of second-order effects. If we focus too much one metric we’ll likely create disaster in our wake, the whole time thinking we’re doing good (like telling ourselves we’ve reduced the algebra 1 failure rate 75% but only because we got rid of the test).

  2. I’ve followed the debate over Boaler’s Railside study, which found that a discover math curriculum (College Preparatory Math, or CPM) led to rapid acheivement gains. Several UC math professors all but accused her of dishonesty, which spiraled down into the ususal back-and-forth. But they did have what seemed like valid points that her study didn’t adequately control for differences in the student bodies between schools, nor that the math test she devised on was representative other tests of math skills. In general, I think the standard of evidence in education research papers is lower than in the fields in which I personally work (econ or public policy), especially if the papers confirm the priors of the Ed reserach establishment. FWIW: My own school district in rural Oregon adopted the CPM math curriculum Boaler’s study touted and scores on state exams didn’t improve; Eugene, OR, a much larger district, has slightly lower scores after adopting Boaler’s preferred curriculum. So call me skeptical…

    • Thanks for taking the time to comment, Andrew!

      I hadn’t heard of the debate around her Railside study, so thanks for bringing that to my attention. I don’t doubt that education research is difficult given all that there is to control for, however I also suspect the standard of evidence is lower than other fields, as you mentioned. And I’m sure it suffers from the same publishing bias problems that every field suffers from (that studies that don’t yield a result don’t get published). I also suspect, although I can’t point to hard evidence for this, that the field of education selects for a certain personality type that is more agreeable. A field that skews towards agreeableness is unlikely to consistently level critiques at colleagues. This is especially true when those colleagues have similar worldviews. Compound that with an environment in which playing devil’s advocate or expressing a view that runs against the zeitgeist can be seen as being against equity or inclusion, and you have an environment in which flawed research can flourish.

      That’s really interesting about your experience with her curriculum. I suppose she might argue that those tests are not good assessments of what the curriculum is aiming to cultivate in students. Maybe there’s some truth in that. But also, isn’t the idea that students learn the content at a deeper level and should therefore be able to answer the more rote-style questions on standardized tests?

      Anyway, I suspect there’s a balance we need to strike. Should all mathematics be drill work on math facts? Of course not. Should all of it be “explore this pattern and talk about what you notice with a partner”? Probably not. But we won’t be able to strike that balance if only one approach is seen as Right.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s