Authors Abstract Get Involved Examples Cite This Work

Haibo Jin

University of Illinois Urbana-Champaign

Leyang Hu

Brown University

Xinnuo Li

University of Michigan Ann Arbor

Peiyan Zhang

Hong Kong University of Science and Technology

Chonghan Chen

Carnegie Mellon University

Jun Zhuang

Boise State University

Haohan Wang*

University of Illinois Urbana-Champaign

Abstract

The rapid evolution of artificial intelligence (AI) through developments in Large Language Models (LLMs) and Vision-Language Models (VLMs) has brought significant advancements across various technological domains. While these models enhance capabilities in natural language processing and visual interactive tasks, their growing adoption raises critical concerns regarding security and ethical alignment. This survey provides an extensive review of the emerging field of jailbreaking—deliberately circumventing the ethical and operational boundaries of LLMs and VLMs—and the consequent development of defense mechanisms. Our study categorizes jailbreaks into seven distinct types and elaborates on defense strategies that address these vulnerabilities. Through this comprehensive examination, we identify research gaps and propose directions for future studies to enhance the security frameworks of LLMs and VLMs. Our findings underscore the necessity for a unified perspective that integrates both jailbreak strategies and defensive solutions to foster a robust, secure, and reliable environment for the next generation of language models.

Overview

--Click nodes for details --

Get Involved

reading-group"
UIUC DREAM Lab

Developing Reliable and Efficient AI for Medicine

suggestions
Suggest Additional Papers

Help us improve the survey by suggesting additional papers.

reading-group"
Join The Community

Join our community on Slack to discuss ideas, ask questions, and collaborate with new friends.

reading-group
Trustworthy ML Initiative

The Trustworthy ML Initiative (TrustML) addresses challenges in responsible ML by providing resources, showcasing early career researchers, fostering discussions and building a community.

Examples

Adversarial suffix generated using GUARD. Demo is built on top of GCG demo.

Citation Guide

If you consider our work useful, please consider cite us:

@misc{jin2024jailbreakzoosurveylandscapeshorizons,
    title={JailbreakZoo: Survey, Landscapes, and Horizons in Jailbreaking Large Language and Vision-Language Models}, 
    author={Haibo Jin and Leyang Hu and Xinuo Li and Peiyan Zhang and Chonghan Chen and Jun Zhuang and Haohan Wang},
    year={2024},
    eprint={2407.01599},
    archivePrefix={arXiv},
    primaryClass={cs.CL},
    url={https://arxiv.org/abs/2407.01599}, 
}