Investigating the Genetic Architecture of Non-Cognitive Skills Using GWAS-by-Subtraction

Frequently Asked Questions

Paige Harden
15 min readJan 14, 2020

This FAQ provides information about “Investigating the Genetic Architecture of Non-Cognitive Skills Using GWAS-by-Subtraction,” a scientific paper that was posted to bioRxiv. This paper was led by Dr. Daniel Belsky (Columbia University), Dr. Paige Harden (University of Texas at Austin), and Dr. Michel Nivard (VU Amsterdam). The paper has not yet been peer-reviewed. The paper and this FAQ are expected to change in response to feedback from the scholarly community, journalists, and members of the public. Questions and comments about the paper or this FAQ should be sent to Dr. Paige Harden at For the particularly interested reader, this FAQ contains links to articles from the scholarly literature, which can be inaccessible to people without a university library subscription. Please contact Dr. Harden if you need assistance accessing one of the cited papers.

What are non-cognitive skills?

This study uses genetic data to study what economists have called “non-cognitive skills.” Non-cognitive skills are defined by what they aren’t — they are behaviors and abilities that are not measured by traditional IQ tests but are thought to help people be more successful in school, in their jobs, and in life generally. Non-cognitive skills have also been called “soft skills”, “character,” or “social-emotional learning.”

As you might guess from a vague name like “non-cognitive skills,” there is a lot of debate about what non-cognitive skills are and how best to measure them. Motivation, persistence, grit, curiosity, self-control, growth mindset — these are just a few of the things that people have suggested are important non-cognitive skills.

What does it mean to say that non-cognitive skills are “heritable”?

Previous studies have measured different types of non-cognitive skills (e.g., grit, growth mindset) in identical and fraternal twins, and found that identical twins are more similar in their non-cognitive skills than fraternal twins are. This result tells us that non-cognitive skills are heritable. Because we now know that virtually all aspects of behavior and personality that differ between people are at least partly heritable, this result is not surprising. The heritability of non-cognitive skills should not be taken to mean that non-cognitive skills are especially “genetic” in some way.

Heritable” is a confusing word that is often misinterpreted to mean that heritable characteristics are determined by biology. This is not true. As we describe below, heritable characteristics — including non-cognitive skills — can be influenced by the environment, can be changed by interventions or policy reforms, and are not destined to develop in a particular way based just on a person’s genes. “Heritable” means that, within the sample of people being studied, those who are more genetically different from each other tend to show more different characteristics. In this case, twin studies of non-cognitive skills show that, on average, people who are more genetically different from each other tend to differ more in their curiosity, mindsets, self-control, etc. We return to this point — that heritable characteristics are not biologically determined — in the sections, “Who are the people in this study, and why does that matter?” and “What does this study not mean?”

Why study the genetics of non-cognitive skills?

Knowing about the genes associated with non-cognitive skills has the potential to help future research in multiple ways, by giving researchers new tools to study:

  • How parents can help (or fail to help) develop the non-cognitive skills of their children through their parenting and other environmental mechanisms. For example, previous research has used insights from genetics to study the importance of mothers’ cognitively stimulating parenting behavior for their children’s educational outcomes.
  • How different school contexts can maximize (or fail to maximize) the performance of their students. For example, previous research has used insights from genetics to show how students’ progress through the high school math curriculum is impeded in disadvantaged schools.
  • Whether an intervention or policy reform is successful in helping people who are at the highest risk for poor outcomes. For example, previous research has used insights from genetics to show that an educational reform successfully reduced obesity particularly among people at high genetic risk for high body weight.

What is the method for studying genetics used in this paper?

This study uses a method called a “genome-wide association study,” or GWAS (pronounced JEE-wahs). Your genome is the complete sequence of DNA that you have in all of the cells in your body. A DNA strand is made up of a unique sequence of DNA “letters” — abbreviated as G, C, T, and A. People can differ in single DNA letters. For example, at a particular spot in the genome, you might have an A, whereas I have a C. These differences in single DNA letters are called “single nucleotide polymorphisms,” or SNPs (pronounced “snips.”) SNPs are one type of genetic variant, or DNA difference between people. A GWAS measures hundreds of thousands of SNPs and correlates each one with the characteristic that is being studied. If, for example, one is studying diabetes, a GWAS is estimating which SNPs are more common in people with diabetes compared to people who don’t have diabetes. A GWAS is a correlational study, which means that it does not necessarily identify SNPs that cause the outcome that is being studied; it only identifies SNPs that are more common in one group of people versus another.

What did this study do?

This study builds on results from a previous study that conducted GWASs of educational attainment and cognitive test performance. The authors of that paper wrote an extensive FAQ, and people are encouraged to read their FAQ in its entirety.

In the previous study, as well as in our new paper, educational attainment is defined as the number of years of formal schooling that a person completes. You can review the original paper and previous FAQ for a discussion of how educational attainment was measured and how the researchers handled differences across countries and across historical time in the structure of educational systems. In the previous study and our new paper, cognitive ability (also referred to as cognitive performance) is measured as performance on an IQ test. Historically, the IQ test has been misinterpreted as a measure of a person’s worth, and IQ scores have been misused to justify oppression, such as when they were used in decisions about involuntary sterilization. In contrast, the modern science of intelligence focuses on how environmental and genetic factors combine throughout development to shape various aspects of cognition. For more information about the relationship between intelligence test performance and school performance, job performance, longevity, and other life outcomes, we recommend the book Intelligence by Stuart Ritchie.

The original strategy of economists who were studying non-cognitive skills was to study people who differed in their educational attainment but who were similar in their performance on tests of cognitive ability. That is, after you take out cognitive ability from educational attainment, what is left over? We borrowed this strategy and asked what SNPs are associated with differences in how far people go in school (higher educational attainment), above and beyond their association with cognitive test performance. We were able to conduct this type of analysis because of a new method called Genomic Structural Equation Modeling, which is a way of combining data from multiple GWASs at the same time.

The result of using Genomic Structural Equation Modeling in this way is a list of SNPs and how strongly correlated each one is with going further in school independent of cognitive test performance. Each SNP has a very small association, but we can add up these tiny associations and create what are called polygenic scores. A polygenic score is a single number that reflects researchers’ best estimate of how likely someone is to show a particular outcome (or phenotype) based on their DNA. Most polygenic scores — including, importantly, the one that we calculate for non-cognitive skills — are not very accurate estimates of an individual person’s outcome! As we explained in a previous blog post, “polygenic scores are useful for social science researchers who are interested in average trends, but specific predictions about an individual human life will be wildly uncertain.”

Although polygenic scores are generally poor predictors of individual outcomes, we were able to use them in our research to explore the characteristics that are seen, on average, in people who have higher polygenic scores for non-cognitive skills. In our paper, we created polygenic scores in six data sets that span the globe — the Netherlands, the U.S., the U.K., and New Zealand. People in these data sets were also born at different times throughout the 20th century. We also calculated genetic correlations between non-cognitive skills and other phenotypes that have been the focus of large GWASs, such as obesity, smoking, and psychiatric diseases. Finally, we used tools from the field of bioinformatics to explore what parts of the body and brain are most important for the genetics of non-cognitive skills.

What did this study find?

We found five important results. All of these results must be interpreted in light of the limitations and caveats that we describe in the following two sections (“Who are the people in this study and why does it matter?” and “What does this study not mean?”)

  1. The genetics of non-cognitive skills are correlated with important life outcomes. By design, we are identifying SNPs that are associated with educational attainment. We also see that these SNPs are correlated with how much money a person makes as an adult, whether they live in a poor or rich neighborhood, and how long they live. In many cases, these genetic correlations of life outcomes with non-cognitive skills are as strong, or stronger, than correlations with cognitive performance. This result is further evidence for the idea that heritable individual differences in things other than cognitive ability are important for understanding differences in people’s life outcomes.
  2. The genetics of non-cognitive skills are correlated with personality traits that have been described by psychologists as part of their “Big Five” theory of personality. The strongest correlation is with a personality trait called Openness to Experience, which captures being curious, eager to learn, and open to novel experiences. Smaller correlations were found with being Conscientious (industrious and orderly), Extraverted (enthusiastic and assertive), and Agreeable (polite and compassionate), and with not being Neurotic (not being emotionally volatile).
  3. The genetics of non-cognitive skills are correlated with the ability to defer gratification, as measured by people’s preferences for larger, later rewards over smaller, immediate rewards. The genetics of non-cognitive skills are also generally correlated with less risk-taking behavior, with later fertility, and with less obesity — but were positively correlated with cannabis use. Together with result #2, these results tell us that many different genetically-associated traits and behaviors contribute to going further in school.
  4. The SNPs correlated with non-cognitive skills are also correlated with higher risk for several mental disorders, including schizophrenia, bipolar disorder, anorexia nervosa, and obsessive-compulsive disorder. This is an example of what geneticists call pleiotropy. Modern, industrialized economies value success in formal education, and our educational systems are set up to select students who have certain traits, abilities, and characteristics. This result warns us against viewing the genetic variants that are associated with going further in current systems of formal education as always being associated with “good” things. The same genetic variant that predisposes someone to go further in school might also elevate their risk of developing schizophrenia or another serious mental disorder.
  5. The genes associated with non-cognitive skills are primarily active in the brain, rather than in other parts of the body. When comparing the genetics of non-cognitive skills with the genetics of cognitive ability, the biology of the two traits looks very similar — involving the same types of cells in the brain. This result shows us that “non-cognitive” is a bad name for what we are studying. Non-cognitive skills involve similar biological processes as the cognitive skills that are tested by traditional IQ tests. This result also suggests that genetic studies of educational attainment are largely not picking up on heritable physical characteristics, such as physical attractiveness.

Who are the people in this study, and why does that matter?

This study builds on a previous GWAS of educational attainment that used data from 1.1 million people, all of whom were of “European” genetic ancestry. Genetic ancestry is measured by an analysis of patterns seen in genetic data, often in comparison to a reference panel of global genetic variation. People in the U.S. who have European ancestry will most often self-identify their race as “White” or “Caucasian.” Also, all of the participants completed their education in Europe, New Zealand, or the U.S. in the 20th or 21st centuries.

By pooling people who lived at different times and places (e.g., people who finished secondary school in mid-1990s America versus in mid-1950s Britain), the analysis is picking up on genetic “signal” that is consistent across those times and places. Genetically-influenced characteristics that have different associations with going far in school in different environmental contexts are less likely to be detected. If, for example, being aggressive was favored by educators in one school, but was punished by educators in another school, then SNPs associated with being more aggressive would be less likely to be detected in an analysis of success in school that pooled data across these different educational contexts.

At the same time, educational systems differ around the world and have differed throughout human history — and could be modified in the future. If an educational system changes, then the characteristics of students who do well in that system might also change, which would change the results of genetic analysis. For example, we found a slight genetic correlation between non-cognitive skills and chronotype, i.e., whether or not you are a morning person. If schools typically start in the early morning, then students who tend to be more awake and alert in the morning will tend, on average, to do better in school. (This doesn’t mean, of course, that all morning “larks” do great in school or that all evening “owls” do poorly. We are always talking about average trends.) But, imagine that the school day was changed to begin at 1 PM and end at 8 PM. Now, night owls might have the advantage! The results of a genetic study like the current paper can tell you about what is selected for, as students are sifted through many years of formal education. But it can’t tell you what could be if the system were different. And, it can’t tell you how an educational system should be structured.

This study also pooled together data from men and women. Men and women have historically faced different structural barriers to advancing in their education. And, the same personality characteristics or behaviors might elicit different responses from teachers and other socially-significant others, depending on whether the person displaying that characteristic or behavior is male or female. Previous research using polygenic scores has shown that their association with educational attainment differs by gender and historical period, supporting the idea that “genetic influence must be understood through the lens of historical change, the life course, and social structures like gender.” We encourage future research on the genetics of non-cognitive skills to similarly consider historical and social structures.

There are important reasons for conducting a GWAS within a group of people who are all somewhat genetically similar by virtue of their relatively recent ancestors coming from the same part of the world (in this case, people with relatively recent European ancestors). The focus on only European-ancestry people, however, has several important implications. First, the results of this study might not generalize to people with different genetic ancestry. If, as hoped, genetic research is ultimately useful for improving people’s lives, it is problematic if those benefits are not available to minority populations that are already marginalized. Second, as a result, the results of the study do not and cannot tell us anything about the sources of educational disparities between different racial or ethnic groups. Any attempts to compare racial or ethnic groups on their non-cognitive skills polygenic score are scientifically meaningless.

What does this study not mean?

Genetic research has a long history of being misinterpreted and misused to argue that social inequality is inevitable, that social programs designed to improve people’s lives are bound to fail, and that some people are “naturally” inferior to other people. We wholeheartedly reject these claims on both scientific and moral grounds.

A high or low polygenic score should not be interpreted to mean that someone is destined or determined to show a particular characteristic. It is not a “fortune teller.” It is not a pure measure of someone’s genetic “endowment.” It is a risk factor. By way of analogy, having high cholesterol makes it more likely that you’ll have a heart attack, but it doesn’t determine that outcome — lots of people have high cholesterol but don’t have a heart attack, and you can take steps to prevent a heart attack if you are at high risk. Similarly, a high polygenic score means you have a slightly higher probability of developing high non-cognitive skills (if you are also experiencing similar environmental conditions as people in the original study), but that higher risk doesn’t mean destiny!

Genetic associations with non-cognitive skills do not mean that the environment does not make a difference. New research has shown that many polygenic scores capture what most people would think of as environmental effects. If, for example, parents with particular genetic variants are more likely to go on to have high levels of educational attainment, and if educated parents have more access to high-quality pre-school programs that boost their children’s non-cognitive skills, then this means that their children’s polygenic scores for educational attainment will be correlated with their non-cognitive skills. The role of these “indirect” genetic effects, also known as “genetic nurture”, should be investigated in future studies.

Genetic associations with non-cognitive skills do not mean that interventions or policy reforms designed to boost non-cognitive skills are bound to fail. To use a classic example, just because you might be genetically predisposed to poor eyesight doesn’t mean that your eyeglasses won’t work. We already know that high-quality early childhood programs can boost non-cognitive skills, with lifelong effects. Some “light touch” interventions later in development also show promise. Nothing about the current study undermines support for these investments in bettering human lives. Instead, this research highlights potentially important targets of intervention, such as decision-making about delayed rewards. It also gives researchers a new tool for studying which social contexts maximize the development of non-cognitive skills.

The existence of genetic associations with non-cognitive skills within a group of people (see “Who are the people in this study and why is that important?”, above) does not tell you anything about whether there are average differences between racial or ethnic groups, or why such differences, if they are observed, occur. This is an important point because racist and classist ideas about the allegedly “inferior” character of people of color and the poor have been used to justify eugenic policies. Nothing about this study gives any sort of empirical support to these ideas.

Should polygenic scores for non-cognitive skills be used in an educational context?

Several researchers and policymakers have suggested that polygenic scores could be used in an educational context to select students for certain curricular tracks or to personalize education. These suggestions are misguided, for five reasons. First, in this paper, the non-cognitive polygenic score predicted only ~2% of the variation in students’ academic achievement in reading and math. This is not a strong enough association to be able to predict outcomes for an individual with any certainty, particularly when you consider the other sources of information that might be available about an individual student.

Second, polygenic scores might be correlated with a student’s outcomes because the score is telling you something about the environmental opportunities provided by a student’s parents. As a result, using a polygenic score for individual selection might be lead to discriminating against students based on their parents’ characteristics.

Third, polygenic scores predict average levels of an outcome but do not necessarily carry information about whether students would benefit more or less from any particular intervention or curriculum.

Fourth, suggestions to personalize education or track students based on genetics overlook the fact that personalization and tracking are themselves controversial among educators and educational policymakers.

Fifth, most polygenic scores are currently valid only for students of European ancestry. (Again, by “valid,” we mean that the polygenic score predicts average trends, not individual outcomes.) In the U.S., only about half of students in public K-12 schools are expected to be of European ancestry. Deploying a tool that applies to only half of the public school population is, in our view, not consistent with the goals of diversity, equity, and inclusion.

Are there any policy implications for this research?

As more research like this study is conducted, and researchers know more about how to predict people’s behaviors and life outcomes from their DNA (even when those predictions are made with uncertainty), the number of potential commercial, health, reproductive, and forensic applications multiplies. The potential number of policy implications increases accordingly, and some of these implications might be difficult to foresee. The European Commission published a report reviewing some of these policy areas, including insurance markets, labor markets, personalized medicine, reproductive technology. This report can be accessed here. Generally, we think it will be important to remain vigilant against the possibility that genetic data will be used in ways that introduce or exacerbate inequalities in the distribution of freedoms, resources, or welfare. Additionally, we as scientists hope to contribute to discussions about how this research can be used to illuminate sources of injustice and to maximize the unique potential of each child.



Paige Harden

Professor of Psychology at the University of Texas at Austin. My book on genetics and social inequality will be out in 2021.