Nehaveigur

Mechanistic Interpretability and Genetics: How AI research may benefit from the epistemological toolkit of statistical genetics

AI models and biological systems are both messy, and as a result we don’t understand a lot of what goes on inside them. In this post, I’m looking at the parallels between the approaches biologists and AI researchers take to understand the systems they study. This post is a continuation of my previous post on mechanistic interpretability.

A lot of biological research is based on correlations. For example, we may observe in those with a disease, a gene of interest is producing more mRNA than in healthy individuals. This may point to higher gene expression causing the disease. Alternatively, the disease may cause elevated gene expression. There may also be some third factor that causes both the change in gene expression and the disease. Finally, are we really sure that the gene is up in those with the disease, or are we just looking at a statistical artifact?

My field of human genetics can provide an answer that goes beyond correlations. That’s because germline variants can cause changes in gene expression and disease, but not vice versa. We can therefore use genetics to study causality in complex biological systems. For example, we know that elevated low-density lipoprotein (LDL) causes coronary heart disease (CHD) because the genetic variants that are associated with high LDL are also associated with higher CHD risk. We even know how much a given increase in LDL increases the risk of developing CHD. This approach is referred to as Mendelian Randomization (MR).

Like biological systems, neural network AI models have a large number of active features. Most of the time, it’s unclear how these features relate to function. Both in genetics and in AI models, adjacent features are often co-activated or co-associated, but it’s unclear which ones drive the output. In both cases, the hard work is going from these things correlate with the outcome to this specific thing causes the outcome through this specific pathway.

In human genetics, gene knockouts are relatively straightforward to interpret. Loss-of-function variants in humans that tell you what happens when a gene is broken. This is directly analogous to ablation and clamping in interpretability, where AI researchers suppress a feature and measure the effect on the model’s output.

Statistical genetics has spent decades trying to understand the same correlation vs. causation problem that AI interpretability research is now facing. The field has developed a disciplined epistemological toolkit (MR is an example) around what counts as sufficient evidence for causality. Mechanistic interpretability as a field is younger and doesn’t yet have equivalent standards, but it seems like it’s moving in the same direction than genetics did.

While the two fields face similar problems, the more important similarity may be what they’re trying to achieve. For most pharmaceutical and biotechs, human genetics is a central part of drug discovery. Most drug development projects fail, but those without genetic evidence fail more often than others. Drugs whose mechanism of action is supported by human genetics are much more likely to show efficacy in clinical trials. The stronger the genetic evidence, the more likely they are to make it. The message is clear: Understanding mechanisms, whether biological or computational, is essential to designing actionable interventions.