Saturday, June 17, 2017

The GWAS hoax....or was it a hoax? Is it a hoax?

A long time ago, in 2000, in Nature Genetics, Joe Terwilliger and I critiqued the idea then being pushed by the powers-that-be, that the genomewide mapping of complex diseases was going to be straightforward, because of the 'theory' (that is, rationale) then being proposed that common variants caused common disease.  At one point, the idea was that only about 50,000 markers would be needed to map any such trait in any global populations.  I and collaborators can claim that in several papers in prominent journals, in a 1992 Cambridge Press book, Genetic Variation and Human Disease, and many times on this blog we have pointed out numerous reasons, based on what we know about evolution, why this was going to be a largely empty promise.  It has been inconvenient for this message to be heard, much less heeded, for reasons we've also discussed in many blog posts.

Before we get into that, it's important to note that unlike me, Joe has moved on to other things, like helping Dennis Rodman's diplomatic efforts in North Korea (here, Joe's shaking hands as he arrives in his most recent trip).  Well, I'm more boring by far, so I guess I'll carry on with my message for today.....




There's now a new paper, coining a new catch-word (omnigenic), to proclaim the major finding that complex traits are genetically complex.  The paper seems solid and clearly worthy of note.  The authors examine the chromosomal distribution of sites that seem to affect a trait, in various ways including chromosomal conformation.  They argue, convincingly, that mapping shows that complex traits are affected by sites strewn across the genome, and they provide a discussion of the pattern and findings.

The authors claim an 'expanded' view of complex traits, and as far as that goes it is justified in detail. What they are adding to the current picture is the idea that mapped traits are affected by 'core' genes but that other regions spread across the genome also contribute. In my view the idea of core genes is largely either obvious (as a toy example, the levels of insulin will relate to the insulin gene) or the concept will be shown to be unclear.  I say this because one can probably always retroactively identify mapped locations and proclaim 'core' elements, but why should any genome region that affects a trait be considered 'non-core'?

In any case, that would be just a semantic point if it were not predictably the phrase that launched a thousand grant applications.  I think neither the basic claim of conceptual novelty, nor the breathless exploitive treatment of it by the news media, are warranted: we've known these basic facts about genomic complexity for a long time, even if the new analysis provides other ways to find or characterize the multiplicity of contributing genome regions.  This assumes that mapping markers are close enough to functionally relevant sites that the latter can be found, and that the unmappable fraction of the heritability isn't leading to over-interpretation of what is 'mapped' (reached significance) or that what isn't won't change the picture.

However, I think the first thing we really need to do is understand the futility of thinking of complex traits as genetic in the 'precision genomic medicine' sense, and the last thing we need is yet another slogan by which hands can remain clasped around billions of dollars for Big Data resting on false promises.  Yet even the new paper itself ends with the ritual ploy, the assertion of the essential need for more information--this time, on gene regulatory networks.  I think it's already safe to assure any reader that these, too, will prove to be as obvious and as elusively ephemeral as genome wide association studies (GWAS) have been.

So was GWAS a hoax on the public?
No!  We've had a theory of complex (quantitative) traits since the early 1900s.  Other authors argued similarly, but RA Fisher's famous 1918 paper is the typical landmark paper.  His theory was, simply put, that infinitely many genome sites contribute to quantitative (what we now call polygenic) traits.  The general model has jibed with the age-old experience of breeders who have used empirical strategies to improve crop, or pets species.  Since association mapping (GWAS) became practicable, they have used mapping-related genotypes to help select animals for breeding; but genomic causation is so complex and changeable that they've recognized even this will have to be regularly updated.

But when genomewide mapping of complex traits was first really done (a prime example being BRCA genes and breast cancer) it seemed that apparently complex traits might, after all, have mappable genetic causes. BRCA1 was found by linkage mapping in multiply affected families (an important point!), in which a strong-effect allele was segregating.  The use of association mapping  was a tool of convenience: it used random samples (like cases vs controls) because one could hardly get sufficient multiply affected families for every trait one wanted to study.  GWAS rested on the assumption that genetic variants were identical by descent from common ancestral mutations, so that a current-day sample captured the latest descendants of an implied deep family: quite a conceptual coup based on the ability to identify association marker alleles across the genome identical by descent from the un-studied shared remote ancestors.

Until it was tried, we really didn't know how tractable such mapping of complex traits might be. Perhaps heritability estimates based on quantitative statistical models was hiding what really could be enumerable, replicable causes, in which case mapping could lead us to functionally relevant genes. It was certainly worth a try!

But it was quickly clear that this was in important ways a fool's errand.  Yes, some good things were to be found here and there, but the hoped-for miracle findings generally weren't there to be found. This, however, was a success not a failure!  It showed us what the genomic causal landscape looked like, in real data rather than just Fisher's theoretical imagination.  It was real science.  It was in the public interest.

But that was then.  It taught us its lessons, in clear terms (of which the new paper provides some detailed aspects).  But it long ago reached the point of diminishing returns.  In that sense, it's time to move on.

So, then, is GWAS a hoax?
Here, the answer must now be 'yes'!  Once the lesson is learned, bluntly speaking, continuing on is more a matter of keeping the funds flowing than profound new insights.  Anyone paying attention should by now know very well what the GWAS etc. lessons have been: complex traits are not genetic in the usual sense of being due to tractable, replicable genetic causation.  Omnigenic traits, the new catchword, will prove the same.

There may not literally be infinitely many contributing sites as in the original statistical models, be they core or peripheral, but infinitely many isn't so far off.  Hundreds or thousands of sites, and accounting for only a fraction of the heritability means essentially infinitely many contributors, for any practical purposes.  This is particularly so since the set is not a closed one:  new mutations are always arising and current variants dying away, and along with somatic mutation, the number of contributing sites is open ended, and not enumerable within or among samples.

The problem is actually worse.  All these data are retrospective statistical fits to samples of past outcomes (e.g., sampled individuals' blood pressures, or cases' vs controls' genotypes).  Past experience is not an automatic prediction of future risk.  Future mutations are not predicable, not even in principle.  Future environments and lifestyles, including major climatic dislocations, wars, epidemics and the like are not predictable, not even in principle.  Future somatic mutations are not predictable, not even in principle.

GWAS almost uniformly have found (1) different mapping results in different samples or populations, (2) only a fraction of heritability is accounted for by tens, hundreds, or even thousands of genome locations and (3) even relatively replicable 'major' contributors, themselves usually (though not always) small in their absolute effect, have widely varying risk effects among samples.

These facts are all entirely expectable based on evolutionary considerations, and they have long been known, both in principle, indirectly, and from detailed mapping of complex traits.  There are other well-known reasons why, based on evolutionary considerations, among other things, this kind of picture should be expected.  They involve the blatantly obvious redundancy in genetic causation, which is the result of the origin of genes by duplication and the highly complex pathways to our traits, among other things.  We've written about them here in the past.  So, given what we now know, more of this kind of Big Data is a hoax, and as such, a drain on public resources and, perhaps worse, on the public trust in science.

What 'omnigenic' might really mean is interesting.  It could mean that we're pressing up ever more intensely against the log-jam of understanding based on an enumerative gestalt about genetics.  Ever more detail, always promising that if we just enumerate and catalog just a bit (in this case, the authors say we need to study gene regulatory networks) more we'll understand.  But that is a failure to ask the right question: why and how could every trait be affected by every part of the genome?  Until someone starts looking at the deeper mysteries we've been identifying, we won't have the transormative insight that seems to be called for, in my view.

To use Kuhn's term, this really is normal science pressing up against a conceptual barrier, in my view. The authors work the details, but there's scant hint they recognize we need something more than more of the same.  What is called for, I think is young people who haven't already been propagandized about the current way of thinking, the current grantsmanship path to careers.

Perhaps more importantly, I think the situation is at present an especially cruel hoax, because there are real health problems, and real, tragic, truly genetic diseases that a major shift in public funding could enable real science to address.

11 comments:

Ken Weiss said...

I am awful at Twitter (far too prolix)! So I'll mention here that it was tweeted that this post was about disease but for 'some' adaptive traits GWAS works. I'm sure that make sense, for some such traits--and I would like to know of examples just to know about them. A trait strongly or recently adaptive, close to the product of one or a small number of genes, that could certainly be true.

However, it depends a bit on what is meant by 'adaptive'. GWAS works to find variation in causally relevant genome regions, so the 'adaptive' trait has still to be polymorphic enough for the signal to be found. If the trait is largely due to a single locus, one would expect this if there were 'unadaptive' variants still around, or if there were a balanced polymorphism (perhaps?), or in recently admixed populations.

But it can't be generally true, and/or we're into semantics territory. First, a lot of mapping is done for traits that aren't disease, like stature. Secondly, because of phenogenetic drift, adaptive traits can have multiple redundant causes (isn't blood pressure 'adaptive'?). Indeed, this is a central problem for GWAS. If one can't map the trait, do we define it as non-adaptive?

But the relevant point is to avoid semantics and ask what one wants to find and under what conditions can it be found this way, and under what conditions (which I think are pretty general) is this not a good approach. That's not what's been going on for the last ~20 years, where a one-suit-fits-all promise has been, and is being, made.

Ken Weiss said...

.....I would add that if you have a controlled situation, or a situation like a major experimental cross (like Mendel did), and you just want to find the major genetic factors that affect the variation, then mapping can work. You can then see what the implicated genes (or genome regions) do. But even Mendel himself knew that this isn't general, and only applies to some traits that 'segregate' (as we would say now). There, of course, a well-controlled method works. In breeding like this, you cross very different internally consistent strains, which sets up a sensibly mappable situation. But that is not, in my opinion, the general situation in nature or in evolution.

Ken Weiss said...

I want to add something else to help make the point (and this is based on experience with quantitative traits in an intercross between two inbred strains of mice. It was a collaborative project so all I do here is just reflect on our general experience.

If there were only, say, 10 totally unlinked loci affecting the trait and only 2 alleles (one from each founder strain), then there would be 3^10, or almost 60,000 different possible genotypes. The founder alleles start with equal frequencies of 0.5, but they'll drift by chance over generations so that the specific locus, and hence overall genotypes will vary greatly. Each instance of a single-locus genotype will be in very different genomic background contexts. The individual-locus genotypic effects will have to be pretty strong to be consistently detected in mapping studies. Any practicable sample will only contain some of them and of course in mice the samples are perforce very very small, and here we don't include new mutations and the many other factors that apply in reality, but that don't change the basic picture. And this in a highly controlled situation.

In my opinion, until we take these facts, which I'm not inventing, seriously, and change how we do business, we'll keep on spinning expensive wheels.

Anonymous said...

I completely disagree with the answer to this: "So, then, is GWAS a hoax?
"Here, the answer must now be 'yes'!" This answer is misleading.

A couple of points based on my experience:
1. GWAS is not a HOAX but one of the approaches that can resolve or lead us close to resolve associated genetics of complex diseases. There are several examples that define the underlying mechanisms of a GWAS identified SNP/locus as it relates to the disease. GWAS is contributing significantly to our understanding of complex disease traits. Discovery of IFNL4; APOBEC3B mutagenesis in bladder cancer; and Many more examples....
2. Merely finding GWAS signals is not going to solve the mystery behind complex traits. Understanding their functionality is key. And yet we have not explored it.
3. Development of new approaches is critical for understanding missing heritability. Advancement of genomic technology fosters new ideas. I would say calling off GWAS will not healthy.....

David Enard, Petrov Lab said...

About adaptive traits with large effects, I think a good example with strong quantitative evidence (not just anecdotal examples) and theory (Red Queen, etc...) is resistance against pathogens. From studies of divergence between species it is clear that many adaptive mutations fixed in proteins and other types of sequences in response to pathogens. My own work has shown that in human 30% of protein adaptation could have been driven by viruses. With such a massive effect you have a strong prior on a specific trait. We know that those mutations fixed, which means that their effect on phenotypes was strong enough that they rapidly increased in frequency. I am currently working on the effect of viruses on more recent human evolution, and long story short the effect has been quantitaively very strong, with many recent selective sweeps being due to viruses and plausibly other pathogens. So in the case of adaptive traits the picture of human evolution that is emerging is one where you have a mixture of polygenic adaptation AND strong sweeps at genes with large effects on specific traits. In the case of adaptive variants, I do not see any reason to oppose the two.

Ken Weiss said...

I think my use of the word 'hoax' was not being literal. It is being misrepresented to the public and at great cost and by people who know better, but are acting as if the objective is about the funding. There are more honorable ways, less misrepresentative.

Any technique will have its uses, and you may be involved in some of them. But it is being promoted as a panacea right, left, and center, not just for focused questions but for disease generally. That is what is objectionable. Where it's useful, it can be justified.

Pathogens may very well be good examples because they act and spread quickly and strongly. Whether they will select just a few very clear genetic resistance elements is a question to be asked. Many recent, rapid lifestyle changes did not do that and are not mappable.

As to your comment, Anonymous, again where something is warranted it's warranted. But throwing massive amounts of funding in that direction, when there are many very well and focused (and genetic) health problems is not justified in my view. I have seen too many claims by too many people over too many years that sound like yours but didn't pan out and maybe the evidence was never very strong that they would. If your instances pan out, that's great. But if they work for, say, some forms of cancer, the question is what on a public health basis might be done with greater benefits if the same resources were spent in a different. There are always such choices to be made, and they are for society to make. In my view, we are persisting far too much with the Big Data thrust, when we know of, as I said, many dramatic, tragic, simple genetic diseases, usually that devastate life from birth on. I personally say: invest in fixing them, then look to the vague, variable, late onset and often lifestyle-preventable diseases.

Ken Weiss said...

I want to add that Dr Collins and others have been boasting right and left and making wild promises about 'precision' medicine, and silver bullets for disease etc etc for many years, among other hustling activities. Common variants for common disease--that was the slogan many years ago. More measured promises, and more measured and limited funds, and termination of this approach when it's clearly not going anywhere and is biologically naive, is an appropriate thing to suggest.

Ken Weiss said...

This is of course a contentious area. It is so largely, in my view, because of the amount of funding and funding inertia Big Data long-term projects falsely promising 'precision' (and now, probably, 'omnigenic-omics'!) that everyone knows is largely about funding commitments.

Tools are tools. If you have a hammer, the saying goes, everything looks like a nail. But, of course, there _are_ nails! Genomewide mapping to find major genes responsible for strongly segregating trait variants in a controlled laboratory cross (for example) is fine, if you recognize that this will lead you to knowledge and the natural-population or inter-population etc. truths may be more complex.Controlled laboratory crosses in plants or mice may work for such things. But samples don't need to be massive, since strong signals can be picked up.

But even this is limited, as I know from my own experience with mouse cross experiments.

Magnus Nordborg said...

Hi Ken,

Long time — and you'll be happy to know that I often use your paper with Joe when teaching, usually likening your effort to stopping the Iraq War... (thereby hopeless dating myself)

So, apropos my tweet, there are indeed examples where GWAS works "as advertised", with meaningful prediction, based on a manageable number of loci, being possible. We see this in Arabidopsis, and also for skin color in humans. When I say adaptive traits, I mean that the VARIATION is adaptive, i.e. that selection works to maintain differences. At least sometimes, this leads to reasonably tractable genetics. Remains to be determined how often, and what determines this.

Traits under stabilizing or purifying selection, on the other hand, are extremely unlikely to be tractable in this sense, and I would (and did) argue that this was a priori obvious based on basic population genetics theory. But I guess the experiment had to be made... (one senior population geneticists whom you know very well referred to it as "the world's most expensive test of the mutation-selection balance model").

Ironically, prediction clearly works in a statistical sense, so it may be useful for insurance companies after all. A 3% better diagnosis would help the bottom line even if it is useless for "personalized medicine". Incidentally, as a graduate student I once heard Eric Lander predict this, and arguing that this would inevitably lead to a single-payer health care system.

You neglect to mention two incidental consequences of the GWAS experiment. First, it drove down sequencing costs, enabling lots of really cool biology. Second (and much less importantly), it made the careers of a large number of theoretical population geneticists. I'm too old to have benefitted. I recall having lunch with David Botstein as a graduate student, and being asked what I worked on. His response: "Ah yes, population genetics — you know, I've always wondered why so many smart people chose to waste their lives on utterly useless things!". Less than a decade years later, anyone who could convincingly explain what linkage disequilibrium was couldn't fail to get a job.

Finally, I broadly agree with your diagnosis of the present. I fear the field has reached "uncritical mass" — although we badly need to take a step back and reconsider the value of the whole effort, this is unlikely to happen because people are rarely as candid as David Botstein was about their own work. The number of utterly unimaginative proposals in this area I have seen is depressing.

Ken Weiss said...

Magnus,
I basically agree. I have worked on mapping myself (in a mouse intercross context). Mapping was good for genetics in the kinds of ways you say, because it enabled genetics to have a 'say' in complex trait epidemiology. But when something becomes basically a self-perpetuating industry and there are other ways to invest resources, then it is time to change. I think anyone should agree that when a tool is important, it's important.

The side benefits you mention are debatable in the sense that if the same funds were invested in other ways, it would have fostered other sorts of careers. I've heard many argue that mapping (in the sense we're talking about) may be as we're saying, but other benefits will come from it; but serendipity can be expected in any sort of effort.

Anyway, of course my use of the word 'hoax' mainly was intended for the purpose of drawing attention to the element of false promises being made to the public (and legislators) to justify locking up so much funding. I still do population genetics (even though retired), by computer simulation. I tried to credit the GWAS effort in the first part of the post, because it taught us how, and that, Fisher et al were basically right, as was the modern synthesis. But it also taught us that an enumerative approach of the current kind may no longer be the best way to do genetics.

One can debate about consequences (like single-payer) but Eric argues whatever argument will seem to work at a given time. We should have better ways of supporting science and scientists. I would say a truly national health care system and data base in a given country would provide the raw material for many 'big data' kinds of studies and also make for better health care. But if we were having a beer I would spend the time to argue why I think even that would be in many ways false promises regarding genetics. But this message is already way too long!

Finally, I have sympathy for people entering the field: The average age of first grants, the number of post-docs before 'real' jobs, the cog-in-a-wheel nature of most jobs in medical schools, dependency on faculty salaries on grants, and of universities for overhead, and many other things that are happening are in part the result of the 'Big Data' era. Something has to come along to trigger a new 'gestalt', in my view. It won't be omnigenomics.

Ken Weiss said...

An after-after-after thought, Magnus: I did say that GWAS was in order (not a 'hoax') at the beginning, and it proved its point by showing is what the landscape is like.