NanoG — Code release

NanoG — pushing the limits of biology through AI: a cancer foundation model that simulates outcomes inside its own reasoning.

Two-model multimodal cancer foundation-model family (NanoG0 ~95M scout, NanoG1 ~1B production), built on the single-primitive Quatrix routing architecture (Q-Compass / SAVO / MH-QVC). The signature capability: mid-CoT hypothetical simulation — the model emits structured <simulate> blocks during its own reasoning, predicting drug-response curves, histopath appearances, 3D tumour evolution, protein conformations, and spatial-tx states, then conditions downstream reasoning on those simulated outcomes. All in one autoregressive pass, no external tools.

Companion dataset: Abd0r/nanog-cancer-data — TCGA mutations + expression + clinical, COSMIC SBS96, Reactome pathways, AlphaFold structures, PubMed oncology corpus, WSI H&E tiles, TCIA volumetric scans, Visium spatial-tx, plus the 100K mid-CoT trace corpus.

What's in this repo

Path	What
`quatrix/`	Q-Compass primitive, encoders, world model, MH-QVC blocks, mHC, FiLM modulation, QACC adaptive depth, MTP head, Muon optimiser
`flash-qcompass/`	Triton kernel for MH-QVC routing (fp16/bf16 fast path, 6× faster than eager)
`bio/`	TCGA + COSMIC + Reactome + AlphaFold + IDC + Visium loaders + tokenisers
`paper/NanoG1/code/`	`nanog1_model.py` (model + decoders), `train_nanog1.py` (4-phase trainer), `multimodal_trace_gen.py` (CoT trace synthesis), `train_bpe_tokeniser.py`, `multimodal_tokeniser.py`, `eval_nanog1.py`, `opd_train.py`, `visualise.py`
`paper/NanoG1/data/`	`multimodal_cot_traces_100k.jsonl` (100K synthetic reasoning traces, 5.9% multimodal-grounded)

Quickstart

hf download Abd0r/nanog-cancer-code     --local-dir ./nanog-code
hf download Abd0r/nanog-cancer-data --repo-type dataset --local-dir ./nanog-data

cd nanog-code
pip install -e ./quatrix
pip install -e ./flash-qcompass

# Sanity-check the architecture
python3 paper/NanoG1/code/nanog1_model.py --preset nanog1_150m   # NanoG0
python3 paper/NanoG1/code/nanog1_model.py --preset nanog1_1b     # NanoG1

# Train (1-epoch unified pretraining over the full corpus)
python3 paper/NanoG1/code/train_nanog1.py \
    --phase 1 --steps 1975000 \
    --preset nanog1_150m \
    --batch 2 --grad_accum 8 --workers 4 \
    --traces_jsonl paper/NanoG1/data/unified_pretrain.jsonl \
    --out_dir bio/cancer_checkpoints/nanog0

See paper/NanoG1/README.md for the full architecture description, eval gates G1–G16, and data plan. See paper/NanoG1/PROJECT_DETAIL_README.md for deep technical details.

Model family

	NanoG0	NanoG1
Params	~95M	~1B
Hidden / layers / heads	512 / 20 / 8	1280 / 56 / 20
Context	8 192	32 768
Train data	~2 B tokens (Chinchilla-optimal)	~20–40 B tokens
Compute	Vast.ai 4090, ~$60, 8–9 days	Vast.ai 4090, ~$200, 10–15 days
Inference target	RTX 4050 (6 GB) at INT4/INT8	RTX 4050 (6 GB) at INT4/INT8

License

Code: MIT
Weights + synthetic CoT traces: OpenRAIL-M (use-based behavioural restrictions)
All upstream data sources verified compatible

Cite

@article{nanog1_2026,
  title   = {NanoG: A Unified Quatrix Cancer Foundation Model with Mid-CoT Hypothetical Simulation},
  author  = {Ali, Syed Abdur Rehman},
  journal = {Nature Biotechnology (under review)},
  year    = {2026}
}

@misc{quatrix_2026,
  title   = {Quatrix: An Empirical Evaluation of Q-Compass and SAVO on Multimodal Sequence Modeling},
  author  = {Ali, Syed Abdur Rehman},
  year    = {2026},
  doi     = {10.5281/zenodo.19839718}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support