Teaching AI to Predict What Cells Will Look Like Before Running Any Experiments

This is a sponsored article brought to you by MBZUAY.

If you've ever tried to guess how a cell will change shape after taking a drug or editing a gene, you know it's part science, part art, and mostly expensive trial and error. Acquiring images of thousands of states is slow; it is impossible to explore millions.

New article in Natural communications suggests another way: model these cellular “after” images directly from the molecular readouts, so you can preview the morphology before picking up the pipette. The team called their model MorphDiff, and it is a diffusion model driven by the transcriptome, the structure of genes that are turned on or off after a perturbation.

At a high level, the idea changes the way we work. High-throughput imaging is a proven way to discover the mechanism of a compound or determine its biological activity, but profiling every potential drug or CRISPR target is impossible. MorphDiff examines cases where both gene expression and cell morphology are known, and then uses only the L1000 gene expression profile as state to create realistic images after perturbation, either from scratch or by transforming the reference image into its distorted counterpart. It is claimed that competitive accuracy for latent (unseen) changes in large drug and genetic datasets, as well as gains in mechanism of action (MOA) discovery can rival those of real-world images.

aspect_ratio

This study is led by MBZUAY The researchers start with biological observations: Gene expression ultimately controls the proteins and pathways that shape how a cell looks under a microscope. The mapping is not straightforward, but the overall signal is sufficient for learning. Transcriptome conditioning also offers a practical bonus: there is much more publicly available L1000 data than pairwise morphology data, making it easier to capture a wide range of perturbations. In other words, when a new compound appears, you are likely to find its gene signature, which MorphDiff can then use.

Essentially, MorphDiff combines two parts. First, a morphological variational autoencoder (MVAE) compresses five-channel microscopic images into a compact latent space and learns to reconstruct them with high perceptual accuracy. Second, the latent diffusion model learns to denoise samples in this latent space by driving each denoising step with the L1000 vector via attention.

Diagram depicting the cell staining analysis pipeline, including dataset curation and perturbation modeling. Wang et al. Natural communications (2025), CC BY 4.0

Diffusion works well here: it is inherently noise-resistant, and the latent space option is quite efficient for training while preserving image detail. The team implements both gene-to-image (G2I) generation (starting from a noise, transcriptome state) and image-to-image (I2I) transformation (moving a reference image towards its perturbed state, using the same transcriptome state). The latter doesn't require retraining thanks to an SDEdit-style procedure, which is handy when you want to explain changes associated with a control.

It's one thing to create photogenic images; it's another thing to generate biologically true those. The paper covers both aspects: on the generative side, MorphDiff is compared to GAN and diffusion baselines using standard metrics such as FID, Inception Score, coverage, density, and CLIP-based CMMD. In the JUMP (genetic) and CDRP/LINCS (drug) test sections, the two MorphDiff modes typically appear first and second, with significance tests performed on multiple random seeds or independent control plates. The result is consistent: better accuracy and diversity, especially under OOD perturbations where practical value is important.

More broadly, generative AI has finally reached a level of precision where in-silico microscopy can replace first-pass experiments.

More interesting for biologists is that the authors go beyond the aesthetics of the image and move on to the features of morphology. They extract hundreds of CellProfiler features (textures, intensities, granularity, cross-channel correlations) and ask whether the generated distributions correspond to the ground truth.

In a side-by-side comparison, MorphDiff's feature clouds match real-world data more closely than baselines such as IMPA. Statistical tests show that more than 70 percent of the generated feature distributions are indistinguishable from the real ones, and feature scatter plots show that the model correctly reflects differences from control over the most impaired functions. Importantly, the model also preserves the correlation structure between gene expression and morphological traits, with higher agreement to ground truth than previous methods, suggesting that it models more than surface style.

Graphs and images comparing different computational methods for analyzing biological data. Wang et al. Natural communications (2025), CC BY 4.0

The results of using the drug scale this story to thousands of procedures. Using DeepProfiler embeddings as a compact morphological fingerprint, the team demonstrates that the generated MorphDiff profiles are discriminative: classifiers trained on real embeddings also separate the generated profiles by perturbation, and pairwise distances between drug effects are preserved.

Charts comparing the accuracy of different morphing methods for image synthesis methods, in four panels. Wang et al. Natural communications (2025), CC BY 4.0

This is important for the next task that everyone cares about: extracting MOA. Given the query profile, can you find reference drugs with the same mechanism of action? Morphologies generated by MorphDiff not only outperform previous imaging baselines, but also outperform searches using gene expression alone, and they come close to the accuracy you get when using real images. In the top-k search experiments, the average improvement over the strongest baseline is 16.9 percent and 8.0 percent over the transcriptome alone, with robustness shown across multiple k values ​​and metrics such as mean average precision and fold enrichment. This is a strong signal that the simulated morphology contains additional chemical structure and transcriptomics information that is sufficient to help find similar mechanisms even if the molecules themselves are dissimilar.

Morphologies generated by MorphDiff not only outperform previous image generation baselines, but also outperform searches using gene expression alone and approach the accuracy you get from real images.

The document also lists some current limitations that hint at potential future improvements. Diffusion output remains relatively slow; The authors suggest connecting newer samplers to speed up generation. Time and concentration (two factors that biologists care about) are not explicitly coded due to data limitations; the architecture may consider them as additional conditions when the corresponding data sets become available. And because MorphDiff depends on disrupted gene expression as input, it cannot induce morphological changes that lack transcriptome measurements; a natural extension is to combine models that predict the gene expression of previously unknown drugs (the paper cites GEARS as an example). Finally, generalization inevitably weakens as you move away from the training distribution; Larger, better consistent multimodal datasets will help, as will conditioning on more modalities such as structures, textual descriptions, or chromatin accessibility.

What does this mean in practice? Imagine a screening team with a large L1000 library but a smaller imaging budget. MorphDiff becomes a phenotypic co-pilot: generate predicted morphologies of novel compounds, group them by similarity to known mechanisms, and prioritize imaging for confirmation. Because the model also displays interpretable performance changes, researchers can look under the hood. Did ER texture and mitochondrial intensity change as we would expect from an EGFR inhibitor? Did two structurally unrelated molecules end up in the same phenotypic neighborhood? It is precisely such hypotheses that accelerate the search and repurposing of mechanisms.

More broadly, generative AI has finally reached a level of precision where in-silico microscopy can replace first-pass experiments. We've already seen the explosion of text-to-image models in consumer spaces; Here, the transcriptome-to-morphology model shows that the same diffusion mechanism can do scientifically useful work, such as capturing subtle multichannel phenotypes and preserving the relationships that make these images more than pleasing to the eye. It will not replace a microscope. But if it reduces the number of plates you have to go through to find the ones that matter, that's time and money you can spend testing the hits that matter.

Leave a Comment