Sean Hackett

Decoding Virtual Cell Foundation Models II: Cross-Model Attention and Molecular Interactions

23 minute read

The central question motivating this series is whether the attention patterns learned by single-cell foundation models reflect genuine molecular circuits, or whether they are merely recapitulating pairwise co-expression. The distinction matters: co-expression is observational, collapsing direct regulation, shared upstream control, and correlated noise into a single undifferentiated signal. If attention is instead tracking direct regulatory relationships, it becomes causally interpretable, a map of which genes are actually controlling which, grounded in mechanisms that can be perturbed and tested experimentally.

In Part 1, I built a common framework for extracting and comparing attention patterns across four single-cell foundation model families — scGPT, scFoundation, scPRINT, and AIDO.Cell and nearly two orders of magnitude in parameter count. A key finding was that later layers actively suppress the gene-pair structure that early layers establish; early and late attention patterns are often strongly anticorrelated within a model.

Here, I extend the analysis in two directions:

Cross-model attention consistency: Do different models converge on the same high-attention gene pairs, even if the overall layer structure differs? I compare the top-K attention pairs across all model × layer combinations to ask whether models converge on a shared set of high-attention pairs despite differences in architecture and training.
Validation against molecular interaction networks: I compare each model’s high-attention pairs to the Napistu Octopus network (50K vertices, 8M edges) and to a GNN trained on self-supervised edge prediction (edge_prediction_mlp_256e), asking whether attention-highlighted gene pairs are enriched for known regulatory interactions.

Decoding Virtual Cell Foundation Models I: Architecture and Layer-wise Attention

23 minute read

Foundation models trained on large corpuses of single-cell RNA-seq data have emerged as one of the most promising frontier technologies in biotech. These models embed cells’ expression in a latent space, and then use a transformer architecture to attend to expression embeddings, learning functional coexpression patterns, which can be used to predict diverse properties such as cell type or denoised expression.

These models are built on massive datasets provided by Chan-Zuckerberg Initiative (CZI), Arc Institute, Tahoe Therapeutics and others (CellxGene, Tahoe 100M, Arc/BioHub/Tahoe announced collaboration). Emerging datasets bring additional cellular contexts, making more predictions in-distribution rather than out-of-distribution, and causal grounding with genetic perturbations provides the potential to move beyond coexpression. This causal signal is now being fruitfully mined with impressive results by Altos lab’s Cleopatra model in the Arc Institute’s 2025 Virtual Cell Challenge.

Foundation models themselves are continually evolving as academic and industry groups experiment with model architectures, inductive biases, new tasks, etc. There is a diverse array of models available through platforms like CZI models and NVIDIA BioNeMo which can be applied to single-cell RNAseq, sequences, analysis and structural prediction.

The excitement for foundation models and their evolution into multi-scale multi-modal Virtual Cell models is palpable; they form a cornerstone in the CZI’s aspirational goal to “cure, prevent, or manage all diseases by the end of this century.” (Incidentally, this would require stopping biological aging!)

Distinguishing Activation from Inhibition with Relation-Aware Graph Neural Networks

25 minute read

In my last post, I discussed self-supervised edge prediction as a way of embedding genes using a gene-regulatory network.

This approach allows genes, metabolites, drugs and other vertices to be connected based on shared network topology. However, to date I’ve only discussed edge prediction using a dot-product head, where a vertex-pair’s edge support is a direct readout of their similarity in embedding space (𝐚 · 𝐛). While surprisingly powerful, this head has limitations when vertices are heterogeneous or interact in qualitatively different ways — particularly when we want to distinguish between activation and inhibition.

Here, I explore more expressive approaches for learning mappings between A → B by evaluating both general edge prediction heads (like MLPs) and “relation-aware” heads that can learn distinct mappings for different edge types. The post will cover:

Data model and training changes enabling relation-specific predictions
Geometric analysis revealing how relation-aware heads encode regulatory semantics
PerturbSeq validation demonstrating successful prediction of signed regulatory interactions
Pre-trained models available on HuggingFace

Napistu meets PyTorch Geometric - Predicting Regulatory Interactions with Graph Neural Networks

34 minute read

Biological applications of graph neural networks (GNNs) typically work with either small curated networks (100s-1,000s of nodes) or aggressively filtered subsets of large databases like STRING. The Octopus graph — which I introduced in my previous post — occupies a different space entirely. By integrating eight complementary pathway databases, it creates a genome-scale network with ~50K proteins, metabolites, and complexes spanning ~10M edges, all while preserving rich metadata about edge provenance, confidence scores, and mechanistic detail that filtered approaches discard.

This puts the Octopus in uncharted territory: large enough to capture genome-scale complexity, yet structured enough to preserve the biological interpretability that makes network analysis valuable. GNNs scale well beyond genome-scale requirements (100M+ nodes in social networks), but remain unexplored for comprehensive biological networks that integrate regulatory, metabolic, and interaction data. Bridging this gap requires infrastructure that handles both the biological complexity of multi-source networks and the engineering complexity of training GNNs at scale.

In this post, I’ll introduce Napistu-Torch — the infrastructure that finally makes this space navigable. Available from PyPI and indexed by the Napistu MCP server, Napistu-Torch provides a modular, reproducible framework for training GNNs on comprehensive biological networks. I’ll demonstrate that it’s feasible to train graph convolutional networks on the complete Octopus network using just a laptop (albeit with 2 days of training time for the full suite of models). But the real contribution is the ecosystem: the data structures, pipelines, and evaluation strategies that unlock far more sophisticated analyses.

Napistu’s Octopus: An 8-source human consensus pathway model

20 minute read

Introducing the Octopus: Napistu’s eight-source Human Consensus Pathway Model that unites the breadth of protein-protein interaction networks with the depth of regulatory databases and metabolic models.The result is a genome-scale directed graph that is both densely connected and mechanistically precise. In this post, I will:

Provide an overview of the Octopus model and its construction
Show side-by-side summaries of individual data sources highlighting their complementarity
Demonstrate that the model successfully merges results, creating a dense network covering the complete cellular repertoire of genes, metabolites, drugs, and complexes
Illustrate how source-level information can be carried forward to the Octopus’s graphical network to augment its vertex and edge features