About me

Hi! I’m a final year undergraduate in courses 18 and 6-4 (Mathematics and AI) at MIT. I’ll be a masters student in the EECS department at MIT next year. I’m interested in building safer, more robust AI systems.

My Research

I’m interested in characterizing how ML systems behave, particularly in ‘edge cases’, such as on optimized or out-of-distribution inputs. I think this kind of understanding is particularly important for ensuring that models are safe and aligned.

I’m currently working on developing better measurements for understanding adversarial examples, particularly in the context of the Features not Bugs theory of adversarial examples, as well as statistical signatures of memorization. I’m also interested in developing better safety evaluations for ML models, though I’m yet to pursue this line of work.

In the past, I’ve worked on developing adversarial patches as debugging tools, and on mechanistic interpretability, both on toy models (Transformers trained on finite automata) and on LLMs (understanding competing objectives in Llama models).

Kaivu Hariharan

My Research