Uncategorized

GUARD: Role-playing to Generate Natural-language Jailbreakings to Test Guideline Adherence of Large Language Models



"Large Language Models"The discovery of ‘jailbreaks’ to bypass safety filters of Large Language Models (LLMs) and harmful responses have encouraged the community to implement safety measures.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *