Rare variants galore
Two new papers from teams led by Josh Akey and John Novembre, and a brief theory paper from Alon Keinan and the prolific Andy Clark, highlight a treasure trove of rare genetic variants in our genomes—and point out why they matter.
The new data, from more than 80 million copies of individual human genes, bolster a growing catalog of such rare variants that clarifies how our many ancestors we had, looking back in time. More importantly, however, many of those rare variants likely figure centrally in our health.
In gathering the data for the new papers, Akey’s and Novembre’s groups scoured every letter of many genes in thousands of people, and found a bumper crop of spelling variants that are each found in just one or a few of those people. The third paper summarized what such findings suggest about precisely how big the human population has been over time, and roughly what they mean for efforts to understand disease.
Their findings shed light on our origins and health because, under some simple assumptions, geneticists can predict how often variants that do (or don’t) greatly alter proteins should pop up in a given proportion of people, if our ancestors were steady in number, and if proteins weren’t especially important for health. Strikingly, the data highlight that real patterns of such variant frequencies in our genomes strongly defy those simple expectations. More specifically, they underscore two broad insights:
• Our population has skyrocketed, but just for the past few millennia—a trend that’s left a strong signature of many young, rare spelling variants in our genomes.
• Many of those rare variants may be making us sick.
A tippy tree, laden with rare fruit
The findings in the new papers hinge on a simple insight: the more widely common a genetic variant is, the older it likely is. This is because old variants have typically been carried down many branches of the growing human family tree, spreading far and wide on the planet. By contrast, variants that just arose recently are typically confined to recently sprouted, geographically narrow branches of the tree.
Much genetic and ancillary historical evidence suggests that our population has grown extremely fast in the past few millennia. Geneticists can use data from our genomes to probe the overall shape of the human family tree, and we see that, effectively, the recent population growth is stretching that tree at its tips, making its young twigs look longer than we’d otherwise expect, given how long the trunk and inner branches are. In particular, because new genetic variants pop up roughly randomly (by mutation) on the branches as they grow, we see that the long, fast-growing tips of the tree harbour much more of its total load of mutations than they would have, had the tree grown at a constant rate.
In pondering this tree, picture one that the late, great Dr. Seuss might have drawn, with each mutation as an odd, never-before-seen kind of fruit, confined to the branch (big or small, and including its sub-branches) where the mutation struck. Many of the rife rare variants in our genomes can thus be thought of as distinctive fruits, each confined to just one or a few twigs amid a great, bushy tree.
In this light, the new papers affirm what’s become clear over the past few years, as we sequence more and more people’s whole genomes: we’ll still be finding new human genetic variants for a long time, even after having sequenced many more of us.1
And, as long as our population continues to balloon, the tree will continue to loosely resemble an inflationary universe, its many branches speeding apart from each other via new mutations. In this analogy, the genetic counterpart of the red-shift that signals cosmic expansion is, roughly speaking, the overall skew in frequency, toward rarity, of our genetic variants.
Rare variants in disease
The image of the human family tree, tips bent toward our probing grasp by newfound fruit, may recall the myth of another tree. Fitting, then, that the second key insight from the new papers is that the rare variants in our genomes may conceal many secrets about human mortality.
Rare variants are thought to figure centrally in disease for two related reasons: as we’ve seen, most such variants are rare because they arose recently, so haven’t had time to spread widely among people; and young variants, by definition, haven’t undergone natural selection for long.
Such selection, assisted by chance, tends to keep harmful variants rare, or purge them from the population altogether. Non-harmful rare variants, by contrast, are in principle free to get more common (though chance often strikes them down too).
That is, over time, consistently harmful variants tend to vanish, especially if the population is big enough to stably harbor a rich variety of alternative variants; meanwhile, variants that happen not to harm their carriers are free to spread, whether by chance or, in rare cases, by helping their carriers have more kids than others do.
Together, these trends mean that a snapshot of the rare variants we carry today, like a minute’s worth of the world’s newest tweets, is likely enriched for items that will soon be either gone2 or, in a few cases, more common.
And they help explain why surveys of the common genetic variants covered by fast, cheap SNP chip screens rarely offer clear insight into disease risk. For a given stretch of the genome, such common variants do distinguish big branches of the human family tree from each other, making them quite informative of ancestry. But a consensus has emerged that the long tail of human genetic diversity—all those rare variants—is where we’ll find much of the genetic contribution to disease risk.
However, spotting which rare variants harm us turns out to be tough.
Proof of burden
Take the extreme case of a variant found in just one woman, among everyone on earth. If we split humanity into those who get a given disease in life, and those who don’t, our chosen woman must fall into one group or the other. And if we look at enough diseases, she’ll eventually fall into the sick group for at least one of them.
But it’s clearly too far a leap to infer that the unique variant she carries made her sick. That is, the variant’s distribution among people with and without the disease simply can’t be statistically significant, given how rare it is overall.
To meet this inherent challenge to squarely implicating a given rare variant in a given disease, geneticists look to leverage other insights. If the variant really is too rare to show up on further screening of more sick or healthy people—and that’s a place where the new data are already helping us here at Knome too.
1In the end, you likely carry a dozen or so brand new genetic variants that arose by mutation only in you. But you also likely harbor plenty of other very rare variants that, until we sequence your genome, will have never been spotted in anyone else.
2Note that this doesn’t mean that no one with harmful variants has kids—after all, everyone carries some such variants, and people are breeding just fine. Rather, because a given variant can be inherited independently of other variants in the same genome, and may wreak harm only in combination with another copy of itself (or some other variant), people simply tend to have more kids who inherit more copies of healthier alternative variants than kids who inherit more copies of harmful ones. Moreover, much of the natural selection in question likely happens beyond our view, before pregnancy begins, when unhealthy early embryos fail to implant and thrive in the womb.