Tech Giants Use Red Team Hackers to Protect AI Models
Red team hackers are at the forefront of ensuring the security of AI models, but the trade-off between safety and usefulness poses an ongoing challenge.
Tech giants like Google, Nvidia, and Facebook have turned to red team hackers to ensure the safety and security of their AI models. Red teamers, a group of external experts, think like adversaries and aim to break the AI systems to uncover potential blind spots and risks. With the rapid development and deployment of generative AI tools, the role of in-house AI red teams has become crucial in ensuring the models are safe for public use.
Controversies surrounding AI models, such as OpenAI’s GPT-3.5 and GPT-4, have emphasized the need for red teaming. These models were found to generate harmful, biased, and incorrect responses, including promoting stereotypes against Africans and Muslims. The red teamers identified such issues by injecting prompts and commands that led to problematic outputs, prompting the companies to address and rectify these flaws.
While ensuring safety, red teamers face the challenge of balancing the usefulness and relevance of AI models. A model that is overly secure but offers limited functionality becomes useless. Cristian Canton, head of Facebook’s AI red team, emphasizes this trade-off, stating that there is a fine line between safety and usefulness.
“You will have a model that says no to everything and it’s super safe but it’s useless,” said Cristian Canton, head of Facebook’s AI red team. “There’s a trade off. The more useful you can make a model, the more chances that you can venture in some area that may end up producing an unsafe answer.”
To safeguard AI systems, red teams employ various tactics, including extracting training data that reveals personally identifiable information and poisoning datasets to test the models’ vulnerability to different attacks. However, securing AI models differs from traditional security practices due to their vast training data and complexity.
The field of red teaming is relatively young, and professionals skilled in gaming AI systems are scarce. Therefore, a close-knit community of red teamers share their findings and collaborate on tackling the challenges. Google’s red team has published research on novel ways to attack AI models, while Microsoft’s red team has open-sourced tools like Counterfit, helping other businesses evaluate the safety and security risks of algorithms.
“We were developing these janky scripts that we were using to accelerate our own red teaming,” said Ram Shankar Siva Kumar, who started the team five years ago. “We wanted to make this available to all security professionals in a framework that they know and that they understand.”
As the demand for AI continues to grow, the collaboration between tech giants and red team hackers becomes crucial in building trustworthy and safe AI models. By uncovering vulnerabilities and blind spots, red teamers play a pivotal role in ensuring AI technology benefits the masses without compromising safety and security.