Skip to content
@ethz-spylab

SPY Lab

Secure and Private AI research at ETH Zürich

SPY Lab (ETH Zurich)

The Secure and Private AI (SPY) Lab conducts research on the security, privacy and trustworthiness of machine learning systems. We often approach these problems from an adversarial perspective, by designing attacks that probe the worst-case performance of a system to ultimately understand and improve its safety.

💡 Learn more about our work and read our publications on our website.

🖥️ Check the code for our projects in this repository.

Popular repositories Loading

  1. agentdojo agentdojo Public

    A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents.

    Python 490 123

  2. rlhf_trojan_competition rlhf_trojan_competition Public

    Finding trojans in aligned LLMs. Official repository for the competition hosted at SaTML 2024.

    Python 116 9

  3. rlhf-poisoning rlhf-poisoning Public

    Code for paper "Universal Jailbreak Backdoors from Poisoned Human Feedback"

    Python 66 9

  4. robust-style-mimicry robust-style-mimicry Public

    Python 49 2

  5. diffusion_denoised_smoothing diffusion_denoised_smoothing Public

    Certified robustness "for free" using off-the-shelf diffusion models and classifiers

    Python 44 4

  6. autoadvexbench autoadvexbench Public

    Python 36 3

Repositories

Showing 10 of 29 repositories
  • agentdojo Public

    A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents.

    ethz-spylab/agentdojo’s past year of commit activity
    Python 490 MIT 123 9 9 Updated Mar 12, 2026
  • jailbreak-tax Public
    ethz-spylab/jailbreak-tax’s past year of commit activity
    Python 24 0 0 0 Updated Feb 17, 2026
  • modal-aphasia Public
    ethz-spylab/modal-aphasia’s past year of commit activity
    Jupyter Notebook 4 0 0 0 Updated Feb 13, 2026
  • infoseclab_25 Public
    ethz-spylab/infoseclab_25’s past year of commit activity
    Python 4 0 0 0 Updated Oct 13, 2025
  • hallucinated-citations Public

    Check for probably-hallucinated references in arxiv papers

    ethz-spylab/hallucinated-citations’s past year of commit activity
    Python 3 MIT 0 0 0 Updated Sep 5, 2025
  • RealMath Public
    ethz-spylab/RealMath’s past year of commit activity
    Python 19 0 0 0 Updated May 22, 2025
  • ethz-spylab/autoadvexbench’s past year of commit activity
    Python 36 3 1 0 Updated May 21, 2025
  • agentdojo-core Public

    Core code for AgentDojo

    ethz-spylab/agentdojo-core’s past year of commit activity
    Python 0 MIT 0 0 0 Updated May 14, 2025
  • llm_lab Public
    ethz-spylab/llm_lab’s past year of commit activity
    Python 0 0 0 0 Updated Apr 15, 2025
  • Blind-MIA Public

    This is the official code for Blind Baselines Beat Membership Inference Attacks for Foundation Models

    ethz-spylab/Blind-MIA’s past year of commit activity
    Python 2 0 1 0 Updated Mar 29, 2025

Most used topics

Loading…