AI & ML interests

Frontier alignment research to ensure the safe development and deployment of advanced AI systems.

Recent Activity

chrisjcundy  updated a dataset about 12 hours ago
AlignmentResearch/deceptive-followup-v15
chrisjcundy  published a dataset about 12 hours ago
AlignmentResearch/deceptive-followup-v15
taufeeque  updated a collection 5 days ago
Diverse Deception Probes
View all activity

AlignmentResearch 's collections 4

The Obfuscation Atlas
Obfuscated Policy, Obfuscated Activations, Blatant Deception, and Honest models trained in the Obfuscation Atlas paper.
The Obfuscation Altas
Obfuscated Policy, Obfuscated Activations, Blatant Deception, and Honest models trained in the Obfuscation Atlas paper