Google’s Gemini large language model (LLM) is susceptible to security threats that could cause it to divulge system prompts, generate harmful content, and carry out indirect injection attacks.
The findings come from HiddenLayer, which said the issues impact consumers using Gemini Advanced with Google Workspace as well as companies using the LLM API.
The first vulnerability involves getting around security guardrails to leak the system prompts (or a system message), which are designed to set conversation-wide instructions to the LLM to help it generate more useful responses, by asking the model to output its “foundational instructions” in a markdown block.
A second class of vulnerabilities relates to using “crafty jailbreaking” techniques to make the Gemini models generate misinformation surrounding topics like elections as well as output potentially illegal and dangerous information (e.g., hot-wiring a car) using a prompt that asks it to enter into a fictional state.
Also identified by HiddenLayer is a third shortcoming that could cause the LLM to leak information in the system prompt by passing repeated uncommon tokens as input.
“By creating a line of nonsensical tokens, we can fool the LLM into believing it is time for it to respond and cause it to output a confirmation message, usually including the information in the prompt.”
“To help protect our users from vulnerabilities, we consistently run red-teaming exercises and train our models to defend against adversarial behaviors like prompt injection, jailbreaking, and more complex attacks,” a Google spokesperson told The Hacker News. “We’ve also built safeguards to prevent harmful or misleading responses, which we are continuously improving.”
The company also said it’s restricting responses to election-based queries out of an abundance of caution. The policy is expected to be enforced against prompts regarding candidates, political parties, election results, voting information, and notable office holders.
Source: https://thehackernews.com/