Adversarial Counterfactual Error

One dominating challenge in adversarial RL is the observability disadvantage that the agent has under an adversary. Adding to the problem, partially observable markov decision problems are known to be intractable. in pursuit of a robust and applicable RL model, we present Adversarial Counterfactual Error (ACoE), which estimates belief probabilities for the underlying MDP based on reward differences and known adversarial paradigms. At the time of publication, this is the first method in adversarial RL to explicitly model belief.

Github LinkedIn G.Scholar

Adversary-Attack-Aware Belief (A3B)

In our paper we argue that a belief about how the adversary behaves is a necessary part of a robust defense. Our primary belief construction, A3B, is based on the idea that an adversary will act in the way that most benefits its goal, that is to say it will provide perturbed observations such that the target agent's rewards are minimized. From this, we extrapolate our reasoning to what the adversary would not do. If state \(s_1\) is a feasible true state for the observation \(o_1\), but the optimal perturbation for \(s_1\) is a different perturbed observation, \(s^*\), then the belief that the adversary perturbs \(s_1 \rightarrow o_1\) should be low. Similarly, if the optimal perturbation for \(s_1\) is close or equal to \(o_1\), then the likelihood that \(s_1\) is the actual ground-truth state should be high.

The belief probability of a state \(s\) being the ground truth state is summarily the softmax over a neighborhood of perturbations around observation \(o\), \({s \in N(o)}\), of the A3B score \(z(o)\), which is given as a ratio between the policy divergences \((\pi(o)||\pi(s))\) and \((\pi(s^{*})||\pi(s))\). Rigorous definitions are found in the ICLR publication.

\[b(s) = \frac{e^{z(s)}}{\sum_{s' \in N(o)} e^{z(s')}} \; \mbox{ where } \; z(s) = \frac{D_{KL}(\pi(o)||\pi(s))}{D_{KL}(\pi(s^{*})||\pi(s))}\]

Get in touch

If you want to collaborate, have a question, or just want to say hi, you can contact me through the links below.