Posts Tagged ‘gpt-oss:20b’
gpt-oss-20b is not always faithful in its chain-of-thought In the previous blog post, we showcased a Role in Prompt (RiP) attack where malicious user input and tool output can cause prompt injection and bypass the alignment safeguards in OpenAI’s newest gpt-oss-20b model. This finding was uncovered as part of the Caesar Creek Software’s team research… » Read More
We participated in a Kaggle competition to red team the OpenAI gpt-oss-20b model. The following series is a detailed look at our thoughts that encompassed our submission. Work conducted by Danny L., Huy Chi Dai, Cole L., John H., and Zack B. Bypassing Instruction Hierarchy and Policies in gpt-oss-20b Introduction At the start of August… » Read More
