Merlin: a computed tomography vision–language foundation model and dataset - Nature

Researchers have developed a new vision-language model named Merlin for automated analysis of abdominal CT scans. This 3D model overcomes limitations of existing medical models by learning from volumetric CT scans, electronic health records, and radiology reports without needing extensive manual annotations. Merlin was trained on a large dataset comprising over 6 million images from more than 15,000 CT scans and has been validated for various diagnostic tasks, demonstrating high performance across multiple institutions. This model aims to alleviate the workload of radiologists and enhance disease risk assessment and biomarker discovery. The complete dataset and trained models are publicly available for further research.

Wed, 04 Mar 2026 22:59:49 GMT | Nature