Research
- Regret-Based Defense in Adversarial Reinforcement Learning (link to demo) AAMAS 2024 (Arxiv) By optimizing a novel form of regret, we train RL agents that are more robust than previous robustly trained value-optimizing agents. Our regret notion, CCER, provides a scalable, transferrable way to compute adversarial cumulative regret for actions across time steps.
- Probabilistic Perspectives on Error Minimization in Adversarial Reinforcement Learning (under review)Preprint We progress the formulation of observation-adversarial RL by recognizing its true structure—a POMDP. Leveraging this fact, our proposed methods achieve SOTA performance across all adversarial RL benchmarks.
- Automated Benchmarking to Red-Team Large Language Models Work in progress In cooperation with Singapore's Infocomm and Media Development Authority (IMDA), we are developing novel ways to benchmark production LLMs in an automated fashion, using RL as a baseline framework to search the perturbation space.