Monday, June 19, 2017

More thoughts, and just plain provocateuring, on genomic causal complexity. . . .

Here are some follow-up reflections on my recent post about GWAS and kindred methods and claims.  I know I'm being contentious, but science has always been contentious.  However, socioeconomic issues (careers, salaries, etc) also enter the picture in a way that is relevant to the inertial nature of our profession.  Readers who haven't read Ludwik Fleck's 1930's volume on 'thought collectives', one preceding Kuhn's 'normal science/paradigm' discussion, should do that, because it's relevant to where we stand now.

The causal complexity of genetic control of quantitative traits was in principle understood by Fisher and others almost exactly century ago.  The development of mapping tools opened the door to seeing what that meant more specifically, at the genome level.

Some key facts about this, I think, are that when there is a strong single signal, we see segregation in families (when there are enough families, as there were in Utah for BRCA mapping), or some other indicator (detectable deletion chromosome detection in Wilm's tumor and perhaps something similar in Retinoblastoma).  Those were families and mainly monogenic in the Mendelian sense (that is, of the traits Mendel carefully chose to study for their simple states).

But BRCA and I think for different reasons, retinitis pigmentosa, mapping by association rather than families doesn't find these genes 'for' the trait.  They're individually strong, but relatively minor on a population and hence association-mapping sense.  And, in nearly all cases, even with 'single locus' diseases, once the gene is known, we see genotypic complexity, including often very low 'penetrance' (showing that 'the' gene isn't a single-locus cause by itself).

BRCA-associated breast cancer risk, once the gene was known and could specifically be typed, is very different even among women carrying known high-risk BRCA1/2 alleles, depend on cohort and the study.  The purported single-locus Hemochromatosis gene (HFE) mutations are associated with high risk in the original sample, in Utah if my recollection is correct, but the mutation does not cause the disease in other samples.  Even the classic PKU is not always caused by PAH alleles, not all pathogenic PAH alleles cause PKU.  Ditto for CF and the CFTR gene.  In some cases, at least, it is likely other interacting genes that in particular populations lead the target gene to seem causal in a Mendelian-like sense.

And of course there is now a substantial literature showing that individuals carrying dead (non-activated) disease-causing genes are walking around without the disease.  I think estimates have shown that each of us carries many (100 or so?) such genes, at least some if not all of which are diploid-negative.  If this doesn't suggest pervasive redundancy and the mappability problems I and others have written about, what does it suggest?

I will once again utter the apparently off-color factor that few want to acknowledge or say in mixed company: somatic mutation. Enough said on that black-box subject.

And while invoking the Truth's name in vain, I'll just whisper here another off-color word: environment.  Enough said on that black-box subject, too.

And there is the non-reductionistic 4th dimension of genetic causation in cells, which is being studied by chromosomal conformation methods (3C and its variants).  What this will lead to is unclear, to me anyway, but clearly there are extensive trans phenomena that methods for sequence parsing and enumerating methods, par for the course now for many decades, are not solving.  If they were working, we wouldn't need a plethora of new terms, and gilded promises from on high (i.e., NIH).

I've often mentioned that much of what we do relies on statistical inference.  That's been getting a well-deserved bad name, but rest assure that the SAS and SPSS people will guarantee you that their packages or use-instructions have been fixed so they won't lead you astray any further.  Nonetheless, there is this third little secret: statistical methods in this arena assume various aspects of replicability while adaptive evolution is fundamentally about non-replicability.

In any case, estimating risk-factor (causal SNP) effects retrospectively is data-fitting and not, in itself, related to cause or prediction, much less doing so with 'precision'.  Such extrapolation rests on the assumption that past fractions mean future probability, which is critical here (especially when sampling, environment, mutation, somatic mutation etc. are inherently unpredictable and essentially non-replicable).

And is it too identity-political to mention that there is the unseemly fact that most of this intensive mapping work has been done on Europeans for the sometimes even openly acknowledged rationale that Europeans have the moolah to pay for the gene-targeted drugs that Pharma has been promising for decades of the genome era? In any case, that's mere racism relative to the deeper genetic-causal issues themselves.  Even restrictive sampling doesn't guarantee replicability; a point I won't mention again lest I be accused of being as repetitive as someone doing GWAS on obesity.

These are just the simple issues one can conjure up without even doing any PubMed searching.  What amount of hammering does it take to get the message to sink in?  By sinking in I mean not just being noted, briefly and in passing, but to force some change of approach beyond enumerating, random sampling, and cachet marketing words (like gene regulatory networks, precision genomic medicine, omingenic, and all the 'omics'-du-jour, etc.).

I would want to be clear even for those who wish to trash all my thoughts: Go ahead!  But at least acknowledge, as I acknowledged in my previous post, that the mapping era did do us great service by providing, for the first time, some specific sense of the genomic details underlying life's causal complexity and showing that in a general sense the original polygenic model was basically right.  Family studies are better when some really meaningfully single strong factor is at work, but the use of IBD assumptions to do association mapping cast, like a flashlight in the dark, light upon what had perforce remained dark to our understanding.  But it's now been quite a while that we have had the understanding we need to know that we should think of different ways to approach the subject of life's causation.  The flashlight's batteries are fading.

And here's my bit of sympathy for what is going on.  Complementing the complexity landscape that is the obvious reality are the key facts underlying all of this: scientists are people and, including yours truly, have limited abilities and can't just facilely be slammed for their not accounting for everything perfectly and immediately.  We're people who, mainly, need salaries, facilities in which to work, and employers like universities who these days have to operate in the black.  These are the deeply socioeconomic underlying problems that serve to encourage or even to force safe science, big science, and oversold science.  That the news media and other vested interests compound the felony is simply one of the problems of our type of imperfect society.

Moving the Big Money that has been locked up by the current haves, to redistribute to more important-payoff kinds of research would inevitably meet resistance, including from NIH's head office, which has been a sloganeering center that makes PT Barnum look like an amateur.  Whether or how or how much redirection of funding, which is what's actually at the unstated core of much of the controversy, is obviously not predictable.  But the importance of trying is what motivates my perhaps too-often and too-cranky posts:  Somebody has to speak of the Emperor's clothes!

Until we fix these underlying issues, whatever mess our current thrust is embedding us in, they will persist until some lucky day when an actually better idea stumbles upon the stage.

1 comment:

Anonymous said...

1. Given the 'omnigenic' nature of genetic causation of complex traits, it seems that environmental factors - usually regarded as a hopelessly disentangleable black box of mystery - may be less multifarious and hence more amenable to parsing of its effects than genes.

2. The layers of information crammed into the genome - alternative splicing, chromatin marks, mechanical folding etc. - bring to mind an experiment of transistor design using evolutionary algorithms. The end result was barely understandable. We want the genome to work like a rational engineer designed it based on elegant, substrate-independent principles but the 'incidental' mechanistic properties of its constituent molecules are intrinsic to its function.
https://www.damninteresting.com/on-the-origin-of-circuits/