Portfolio item number 1
Short description of portfolio item number 1
Short description of portfolio item number 1
Short description of portfolio item number 2
Published in Neurip ML Safety Workshop, 2022
We introduce Search for Natural Adversarial Features Using Embeddings (SNAFUE) which offers a fully automated method for finding copy/paste attacks
Download here
Published in NeurIPS, 2023
We benchmark feature synthesis tools on their ability to discover vulnerabilities in deep neural networks
Download here
Published in NeurIPS ATTRIB and SoLaR Workshops, 2023
We use mechanistic interpretability tools to try to understand how LLMs reconcile conflicting objectives.
Download here
Published:
This is a description of your talk, which is a markdown files that can be all markdown-ified like any other post. Yay markdown!
Published:
This is a description of your conference proceedings talk, note the different field in type. You can put anything in this field.
Undergraduate course, Center for AI Safety, 2022