You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Apr 19, 2026. It is now read-only.
When evaluating whether inputs are adversarial, the framework first checks whether the classification of the input matches the groundtruth label. If it does not, then it uses the detection mechanism to reject/ignore inputs. Only when both conditions are satisfied does the framework consider the input adversarial.
returnFalse, "Adversarial example rejected by detector with score {}.".format(adv_detector)
returnTrue, None
My expectation was that correctly classified inputs also ought to be rejected if they trip the detector, but because L223 returns early this can never happen. This is particularly pronounced in the transform defense where a non-trivial majority of the benign inputs would be rejected by the "stable prediction" detector. Is this intentional? It’s a little weird to force the attacker to defeat some objective that the defender can almost never achieve.
When evaluating whether inputs are adversarial, the framework first checks whether the classification of the input matches the groundtruth label. If it does not, then it uses the detection mechanism to reject/ignore inputs. Only when both conditions are satisfied does the framework consider the input adversarial.
selfstudy-adversarial-robustness/common/framework.py
Lines 217 to 234 in 15d1c01
My expectation was that correctly classified inputs also ought to be rejected if they trip the detector, but because L223 returns early this can never happen. This is particularly pronounced in the transform defense where a non-trivial majority of the benign inputs would be rejected by the "stable prediction" detector. Is this intentional? It’s a little weird to force the attacker to defeat some objective that the defender can almost never achieve.