Brief FAQ on “Genetic Associations with Mathematics Tracking and Persistence”

12 min readFeb 7, 2020

This FAQ provides information about “Genetic Associations with Mathematics Tracking and Persistence in Secondary School,” a scientific paper that was published in npj Science of Learning. This paper was led by Dr. Paige Harden (University of Texas at Austin), and Dr. Ben Domingue (Stanford University), in collaboration with Drs. Dan Belsky, Jason Boardman, Rob Crosnoe, Margherita Malanchini, Michel Nivard, Elliot Tucker-Drob, and Kathie Harris.

Questions and comments about the paper or this FAQ should be sent to harden@utexas.edu. For the particularly interested reader, this FAQ contains links to articles from the scholarly literature, which can be inaccessible to people without a university library subscription. Please contact Dr. Harden if you need assistance accessing one of the cited papers. This FAQ might be updated in response to feedback from the scholarly community, journalists, and members of the public.

What did this study do?

This study uses results from a type of study called a “genome-wide association study,” or GWAS (pronounced JEE-wahs). Your genome is the complete sequence of DNA that you have in all of the cells in your body. A DNA strand is made up of a unique sequence of DNA “letters” — abbreviated as G, C, T, and A. People can differ in single DNA letters. For example, at a particular spot in the genome, you might have an A, whereas I have a C. These differences in single DNA letters are called “single nucleotide polymorphisms,” or SNPs (pronounced “snips.”) SNPs are one type of genetic variant, or DNA difference between people. A GWAS measures hundreds of thousands of SNPs and correlates each one with the characteristic that is being studied. If, for example, one is studying diabetes, a GWAS is estimating which SNPs are more common in people with diabetes compared to people who don’t have diabetes. A GWAS is a correlational study, which means that it does not necessarily identify SNPs that cause the outcome that is being studied. It only identifies SNPs that are more common in one group of people versus another.

A previous study conducted a GWAS of educational attainment. The authors of that paper wrote an extensive FAQ, and people are encouraged to read their FAQ in its entirety. Our study uses the results of that educational attainment GWAS as a starting point, and any limitations of that study also apply to our study.

In the previous study, educational attainment is defined as the number of years of formal schooling that a person completes. You can review the original paper and previous FAQ for a discussion of how educational attainment was measured and how the researchers handled differences across countries and across historical time in the structure of educational systems.

The GWAS of educational attainment found very many SNPs that each had a very small association with educational attainment. These tiny associations can be added up to create what are called polygenic scores. A polygenic score is a single number that reflects researchers’ best estimate of how likely someone is to show a particular outcome (or phenotype) based on their DNA. Most polygenic scores — including, importantly, the one that we calculate for educational attainment— are not very accurate estimates of an individual person’s outcome!

In our study, we used a polygenic score created from the GWAS of educational attainment in a sample of U.S. students who were in high school in 1994–1995. All the students in the sample had European genetic ancestry and identified as being White. (See “Who are the people in this study and why does it matter?” below). We tested whether the polygenic score predicted students’ progress through the math curriculum while they were in high school — which math class did they take or did they drop out of math entirely?

What did this study find?

We found three main things.

Students with higher polygenic scores were tracked to more advanced math classes in the 9th grade.

This was true even when we statistically controlled for family socioeconomic status (SES) and school SES. This was also true when we compared students who attended the same high school.

Students with higher polygenic scores persisted in math for more years.

In the mid-1990s, it was not mandatory in most U.S. states for students to take math for all 4 years of high school. Students with higher polygenic scores were more likely to enroll in math classes every year. This was true even when we statistically controlled for family SES and school SES, and when we compared students who attended the same high school.

Students with lower polygenic scores were less likely to drop out of math if they attended advantaged schools.

We measured school advantage by calculating whether the parents of students in a school had a high school diploma. In schools where most students had educated parents, even students who had low polygenic scores persisted in math. In contrast, in schools where most students had parents who didn’t finish high school, students with low polygenic scores were at high risk for dropping out of math.

This result is telling us about inequality of opportunity in America. A student with a certain polygenic score who attends a disadvantaged school is not going as far in their math education as another student who has the same polygenic score but who attends an advantaged school.

Information about genetics can be used to shine a spotlight on unequal environmental opportunities to learn.

Wait … what is a polygenic score again?

Polygenic scores are easy to misinterpret. A polygenic score is a single number that adds up information from a person’s entire genome, in order to reflect scientists’ best estimate of how likely a person is to show a particular phenotype, based solely on their DNA. It is important to realize what polygenic scores are NOT:

Polygenic scores are NOT “fortune-tellers.”

Polygenic scores typically account for only a few percentage points of the total amount of variation in an outcome. This is not a strong enough effect to be certain about what is going to happen in the life of an individual person. As we explained in a previous blog post, “polygenic scores are useful for social science researchers who are interested in average trends, but specific predictions about an individual human life will be wildly uncertain.”

Instead, a polygenic score is a risk factor. By way of analogy, having high cholesterol makes it more likely that you’ll have a heart attack, but it doesn’t determine that outcome — lots of people have high cholesterol but don’t have a heart attack, and you can take steps to prevent a heart attack if you are at high risk. Similarly, a high polygenic score “for” educational attainment means you have a slightly higher probability of going far in school if you are also experiencing similar environmental conditions as people in the original study. But that higher probability does not mean destiny!

Polygenic scores are NOT free of environmental or social processes.

There are two components to creating a polygenic score. The first component is information about a person’s SNPs, which are fixed at conception and do not change over the course of their life. The second component is a set of weights that reflect the correlation between each SNP and a target phenotype, like educational attainment. These weights can carry information about the social environment.

For example, let’s imagine a society where tall children are favored by society, whereas short children are discriminated against. Tall children are called on more often by teachers, sent to better schools, are read to more by their parents, etc. In this example, a genetically-influenced characteristic (height) is acted on by the social environment, producing a correlation between height-influencing genes and how successful a student is in school. If, then, someone conducted a GWAS in this society, and used the results to create a polygenic score “for” educational attainment, then children with height-increasing SNPs would have higher polygenic scores. The polygenic score, then, would capture some of the social biases that create disparities in educational outcomes.

Polygenic scores are NOT a measure of a child’s “innate” or “inborn” potential to succeed in school.

We know that educational systems differ around the world and have differed throughout human history — and could be modified in the future. If an educational system changes, then the characteristics of students who do well in that system might also change, which would change the results of genetic analysis.

For example, if schools typically start in the early morning, then students who tend to be more awake and alert in the morning will tend, on average, to do better in school. (This doesn’t mean, of course, that all morning “larks” do great in school or that all evening “owls” do poorly. We are always talking about average trends.) But, imagine that the school day was changed to begin at 1 PM and end at 8 PM. Now, night owls might have the advantage! The results of a genetic study can tell you about what genetically-influenced characteristics are selected for, as students are sifted through many years of formal education. Genetic research cannot tell you what could be if the educational system were different. And, genetic research cannot tell you how an educational system should be structured.

The word “innate” implies that there is something about a certain child’s biology that pre-determines them to be successful in school no matter what. But nothing about a child’s polygenic score tells you about what could be true about that child if society or our educational system were different.

Who are the people in this study, and why does that matter?

This study builds on a previous genome-wide association study of educational attainment that used data from 1.1 million people, all of whom were of “European” genetic ancestry. Genetic ancestry is measured by an analysis of patterns seen in genetic data, often in comparison to a reference panel of global genetic variation. People in the U.S. who have European ancestry will most often self-identify their race as “White” or “Caucasian.” Also, all of the participants completed their education in Europe, New Zealand, or the U.S. in the 20th or 21st centuries.

There are important reasons for conducting a GWAS within a group of people who are all somewhat genetically similar by virtue of their relatively recent ancestors coming from the same part of the world (in this case, people with relatively recent European ancestors). The focus on only European-ancestry people, however, has several important implications. First, the results of this study might not generalize to people with different genetic ancestry. If, as hoped, genetic research is ultimately useful for improving people’s lives, it is problematic if those benefits are not available to minority populations that are already marginalized. Second, as a result, Any attempts to compare racial or ethnic groups on their polygenic score are scientifically meaningless.

The results of the study do not and cannot tell us anything about the sources of educational disparities between different racial or ethnic groups.

By pooling people who lived at different times and places (e.g., people who finished secondary school in mid-1990s America versus in mid-1950s Britain), the previous GWAS of educational attainment was picking up on genetic “signal” that is relatively consistent across those times and places. Genetically-influenced characteristics that have different associations with going far in school in different environmental contexts are less likely to be detected. If, for example, being aggressive was favored by educators in one school, but was punished by educators in another school, then SNPs associated with being more aggressive would be less likely to be detected in an analysis of success in school that pooled data across these different educational contexts.

In our study, we also pooled together data from men and women. Men and women have historically faced different structural barriers to advancing in their education. And, the same personality characteristics or behaviors might elicit different responses from teachers and other socially-significant others, depending on whether the person displaying that characteristic or behavior is male or female. Previous research using polygenic scores has shown that their association with educational attainment differs by gender and historical period. supporting the idea that “genetic influence must be understood through the lens of historical change, the life course, and social structures like gender.” Research on the genetics of education always needs to consider historical and social structures.

What does this study not mean?

Genetic research has a long history of being misinterpreted and misused to argue that social inequality is inevitable, that social programs designed to improve people’s lives are bound to fail, and that some people are “naturally” inferior to other people. We wholeheartedly reject these claims on both scientific and moral grounds.

Furthermore, genetic associations with educational outcomes do not mean that the environment does not make a difference. New research has shown that many polygenic scores capture what most people would think of as environmental effects. For example, parents with particular genetic variants could be more likely to go on to have high levels of educational attainment, and educated parents could have more access to high-quality pre-school programs that boost their children’s later academic achievement. This means that children’s polygenic scores will be correlated with their academic achievement for enviromental reasons. The role of these “indirect” genetic effects should be investigated in future studies.

Genetic associations with mathematics coursetaking do not mean that interventions or policy reforms designed to boost mathematics achievement are bound to fail. To use a classic example, just because you might be genetically predisposed to poor eyesight doesn’t mean that your eyeglasses won’t work. We already know that high-quality early childhood programs can have lifelong positive effects. More recent research has found that a brief growth mindset intervention increases enrollment in advanced math classes.

Nothing about the current study undermines support for investments in bettering human lives.

The existence of genetic associations within a group of people (see “Who are the people in this study and why is that important?”, above) does not tell you anything about whether there are average differences between racial or ethnic groups, or why such differences, if they are observed, occur. This is an important point because racist and classist ideas about the allegedly “inferior” character of people of color and the poor have been used to justify eugenic policies. Nothing about this study gives any sort of empirical support to these ideas.

Should polygenic scores be used to select students for certain types of educational opportunities?

Several researchers and policymakers have suggested that polygenic scores could be used in an educational context to select students for certain curricular tracks or to personalize education. These suggestions are misguided, for five reasons. First, polygenic scores do not predict outcomes for individual students with any certainty, particularly when you consider the other sources of information that might be available about an individual student.

Second, polygenic scores might be correlated with a student’s outcomes because the score is telling you something about the environmental opportunities provided by a student’s parents. As a result, using a polygenic score for individual selection might be lead to discriminating against students based on their parents’ characteristics.

Third, polygenic scores predict average levels of an outcome but do not necessarily carry information about whether students would benefit more or less from any particular intervention or curriculum.

Fourth, suggestions to personalize education or track students based on genetics overlook the fact that personalization and tracking are themselves controversial among educators and educational policymakers.

Fifth, most polygenic scores are currently valid only for students of European ancestry. (Again, by “valid,” we mean that the polygenic score predicts average trends, not individual outcomes.) In the U.S., only about half of students in public K-12 schools are expected to be of European ancestry. Deploying a tool that applies to only half of the public school population is, in our view, not consistent with the goals of diversity, equity, and inclusion.

So… are there any policy implications of this study?

We think that our genetic study is policy-relevant because it reveals inequalities of environmental opportunity. Closing these inequalities, so that every child in America enjoys the opportunity to learn, has long been an important goal for policymakers and educators.

One challenge to understanding which schools are providing children with the best opportunities to learn is the fact that schools differ in the composition of their student bodies. For example, comparing the average test scores of a school with many rich students to the average test scores of a school with many poor students is a misleading, apples-to-oranges comparison: The latter school might have lower average test scores not because it is a “worse” school, but because it is serving more high-needs students. Researchers and policymakers already incorporate data about students’ family backgrounds in order to make more informative, apples-to-apples comparisons: Among schools that primarily serve poor students, which schools do the best job of maximizing students’ learning outcomes? Our study shows that genetic data, like data on socioeconomic status, can be used to tell us something about which schools are maximizing students’ learning outcomes. This information is relevant to the policy goal of improving educational systems.

The European Commission recently published a report reviewing some policy areas that might be affected by genetic research, including insurance markets, labor markets, personalized medicine, reproductive technology. This report can be accessed here. Generally, we think it will be important to remain vigilant against the possibility that genetic data will be used in ways that introduce or exacerbate inequalities in the distribution of freedoms, resources, or welfare. Additionally, we as scientists hope to contribute to discussions about how this research can be used to illuminate sources of injustice and to maximize the unique potential of each child.