Posts by Collection

portfolio

Short description of portfolio item number 1

Short description of portfolio item number 2

Published in Neurip ML Safety Workshop, 2022

We introduce Search for Natural Adversarial Features Using Embeddings (SNAFUE) which offers a fully automated method for finding copy/paste attacks

Download here

Published in NeurIPS, 2023

We benchmark feature synthesis tools on their ability to discover vulnerabilities in deep neural networks

Download here

Published in NeurIPS ATTRIB and SoLaR Workshops, 2023

We use mechanistic interpretability tools to try to understand how LLMs reconcile conflicting objectives.

Download here

Published: March 01, 2012

This is a description of your talk, which is a markdown files that can be all markdown-ified like any other post. Yay markdown!

Published: March 01, 2013

Published: February 01, 2014

Published: March 01, 2014

This is a description of your conference proceedings talk, note the different field in type. You can put anything in this field.

Undergraduate course, Center for AI Safety, 2022