NanoG β€” Code release

NanoG β€” pushing the limits of biology through AI: a cancer foundation model that simulates outcomes inside its own reasoning.

Two-model multimodal cancer foundation-model family (NanoG0 ~95M scout, NanoG1 ~1B production), built on the single-primitive Quatrix routing architecture (Q-Compass / SAVO / MH-QVC). The signature capability: mid-CoT hypothetical simulation β€” the model emits structured <simulate> blocks during its own reasoning, predicting drug-response curves, histopath appearances, 3D tumour evolution, protein conformations, and spatial-tx states, then conditions downstream reasoning on those simulated outcomes. All in one autoregressive pass, no external tools.

Companion dataset: Abd0r/nanog-cancer-data β€” TCGA mutations + expression + clinical, COSMIC SBS96, Reactome pathways, AlphaFold structures, PubMed oncology corpus, WSI H&E tiles, TCIA volumetric scans, Visium spatial-tx, plus the 100K mid-CoT trace corpus.

What's in this repo

Path What
quatrix/ Q-Compass primitive, encoders, world model, MH-QVC blocks, mHC, FiLM modulation, QACC adaptive depth, MTP head, Muon optimiser
flash-qcompass/ Triton kernel for MH-QVC routing (fp16/bf16 fast path, 6Γ— faster than eager)
bio/ TCGA + COSMIC + Reactome + AlphaFold + IDC + Visium loaders + tokenisers
paper/NanoG1/code/ nanog1_model.py (model + decoders), train_nanog1.py (4-phase trainer), multimodal_trace_gen.py (CoT trace synthesis), train_bpe_tokeniser.py, multimodal_tokeniser.py, eval_nanog1.py, opd_train.py, visualise.py
paper/NanoG1/data/ multimodal_cot_traces_100k.jsonl (100K synthetic reasoning traces, 5.9% multimodal-grounded)

Quickstart

hf download Abd0r/nanog-cancer-code     --local-dir ./nanog-code
hf download Abd0r/nanog-cancer-data --repo-type dataset --local-dir ./nanog-data

cd nanog-code
pip install -e ./quatrix
pip install -e ./flash-qcompass

# Sanity-check the architecture
python3 paper/NanoG1/code/nanog1_model.py --preset nanog1_150m   # NanoG0
python3 paper/NanoG1/code/nanog1_model.py --preset nanog1_1b     # NanoG1

# Train (1-epoch unified pretraining over the full corpus)
python3 paper/NanoG1/code/train_nanog1.py \
    --phase 1 --steps 1975000 \
    --preset nanog1_150m \
    --batch 2 --grad_accum 8 --workers 4 \
    --traces_jsonl paper/NanoG1/data/unified_pretrain.jsonl \
    --out_dir bio/cancer_checkpoints/nanog0

See paper/NanoG1/README.md for the full architecture description, eval gates G1–G16, and data plan. See paper/NanoG1/PROJECT_DETAIL_README.md for deep technical details.

Model family

NanoG0 NanoG1
Params ~95M ~1B
Hidden / layers / heads 512 / 20 / 8 1280 / 56 / 20
Context 8 192 32 768
Train data ~2 B tokens (Chinchilla-optimal) ~20–40 B tokens
Compute Vast.ai 4090, ~$60, 8–9 days Vast.ai 4090, ~$200, 10–15 days
Inference target RTX 4050 (6 GB) at INT4/INT8 RTX 4050 (6 GB) at INT4/INT8

License

  • Code: MIT
  • Weights + synthetic CoT traces: OpenRAIL-M (use-based behavioural restrictions)
  • All upstream data sources verified compatible

Cite

@article{nanog1_2026,
  title   = {NanoG: A Unified Quatrix Cancer Foundation Model with Mid-CoT Hypothetical Simulation},
  author  = {Ali, Syed Abdur Rehman},
  journal = {Nature Biotechnology (under review)},
  year    = {2026}
}

@misc{quatrix_2026,
  title   = {Quatrix: An Empirical Evaluation of Q-Compass and SAVO on Multimodal Sequence Modeling},
  author  = {Ali, Syed Abdur Rehman},
  year    = {2026},
  doi     = {10.5281/zenodo.19839718}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support