Publications
You can also find my articles on my Google Scholar profile.
2025
Breaking Agents: Compromising Autonomous LLM Agents Through Malfunction Amplification
Boyang Zhang, Yicong Tan, Yun Shen, Ahmed Salem, Michael Backes, Savvas Zannettou, Yang Zhang; EMNLP 2025pdf arxiv
Defeating Cerberus: Concept-Guided Privacy-Leakage Mitigation in Multimodal Language Models
Boyang Zhang, Istemi Ekin Akkus, Ruichuan Chen, Alice Dethise, Klaus Satzke, Ivica Rimac, Yang Zhang; Arxivpdf arxiv
2024
The Death and Life of Great Prompts: Analyzing the Evolution of LLM Prompts from the Structural Perspective
Yihan Ma, Xinyue Shen, Yixin Wu, Boyang Zhang, Michael Backes, Yang Zhang; EMNLP 2024pdf
SecurityNet: Assessing Machine Learning Vulnerabilities on Public Models
Boyang Zhang, Zheng Li, Ziqing Yang, Xinlei He, Michael Backes, Mario Fritz, Yang Zhang; Usenix Security 2024pdf arxiv code
Comprehensive Assessment of Toxicity in ChatGPT
Boyang Zhang, Xinyue Shen, Wai Man Si, Zeyang Sha, Zeyuan Chen, Ahmed Salem, Yun Shen, Michael Backes, Yang Zhang; Arxivpdf arxiv
2023
Boyang Zhang, Xinlei He, Yun Shen, Tianhao Wang, Yang Zhang; Usenix Security 2023pdf arxiv code