Top Threats to LLMs: Insights from Nvidia's AI Security Architect

Nvidia’s AI security architect talked about what they’ve learned from red-teaming large language models for a year. They talked about problems like unsafe plugins and indirect prompt injections, and they pushed for strong application security practices.

At Black Hat USA 2024 on Wednesday, the lead AI security architect at Nvidia shared his insights from a year spent red-teaming LLMs. Richard Harang, Nvidia’s main AI and machine learning security architect, led the session, titled “Practical LLM Security: Takeaways From a Year in the Trenches.”

Harang talked about the Nvidia AI Red Team’s research into the most common types of attacks, the most damaging ones, and how to check the security of large language models (LLMs) in your environment during the Black Hat session.

Harang told the Editorial ahead of the session that indirect prompt injections have been one of the hardest attacks to stop. This is a type of attack in which an LLM reads and follows an instruction from a third-party source.

LLMs can produce more accurate and up-to-date data by using tools like the retrieval-augmented generation framework. However, attackers can also use this feature to add their data to an LLM’s database.

“Indirect prompt injection means I stick something into this document database, and then it comes back,” he said. In the future, another user of this LLM will open that document, which will cause that prompt injection to happen.

It is not comparable to jailbreaking, in which I am the one communicating with the device and obtaining the desired response. Now that this third party can add content, they have some control over what I, as the user, see.

Harang said that the second biggest problem he sees with LLM implementation is plugins, which are pieces of third-party code that make a model more useful.

Harang said that because LLMs can’t provide up-to-date information and depend on training data that could be weeks, months, or even years old, an up-to-date weather plugin can help a model get more accurate results for certain queries.

He stated that the issue with plugins is their potential for unsafe construction. There is a chance that attackers could use plugins to gain access to the model itself. “Sometimes, especially in the presence of indirect prompt injection, people have different ways that they can inject input into the system,” he noted.

To deal with and protect against these problems, Harang pushed for “old-fashioned application security.”

This includes setting up security and trust boundaries, making sure users can only access the documents and parts of an LLM they are supposed to, and making sure the right access controls are in place.

With plugins, Harang advised companies to make them secure enough to put online. This means that organizations should keep authentication information separate from the LLM, and plugins separate from information about who they are and what they can do.

“You want the plugin itself to be parameterized to have all of those parameters validated and for it to send back information in a sanitized, validated, parameterized format so that at each step, you are reducing the ability of an attacker to get either their malformed inputs into these plugins or databases or reducing the attacker’s ability to have their inputs then proceed back into another iteration of this LLM loop,” he added.

Nvidia’s Rapid Growth and Exciting Challenges

Nvidia has grown a lot as a tech company over the past few years, especially this year. Previously, Nvidia was mostly known for its GPUs.

However, its work on AI-capable data centre chips has made it much more valuable. When asked what it was like to work as a security architect for Nvidia, a big company that is getting even bigger, he said it was “a lot of fun.”

I’ve been working on projects for a while now that use machine learning and deal with privacy and security issues. A lot of these projects have involved using ML to solve security issues.” “Now that we’re looking into the security of AI applications in more depth, it’s an interesting and exciting time to be here,” Harang said.

In my opinion, Nvidia is a great place to work because we make so many models and build so many AI-powered apps. You have the opportunity to observe ongoing developments and potentially contribute to the improvement of the industry. It’s been very exciting, and it goes by very quickly. All in all, though, it’s been fun. I’ve had a wonderful time.”

InAI Security Architect, attacks, databases, Exciting Challenges, Harang, jailbreaking, LLM, Machine Learning, Nvidia, Nvidia's Rapid Growth, plugins, Takeaways, Threats to LLMs, Trenches

US Targets AI Supremacy with $500 Billion Infrastructure Push

Trump Repeals Biden’s AI Risk Executive Order

OpenAI’s PhD-Level AI Super-Agent: Revolutionizing Work and Economy by 2025

OpenAI’s ‘o3 Mini’ AI Model: Revolutionizing Reasoning, Launching Soon

OpenAI’s Sam Altman Sees Long-Term AGI Revolution Beyond AI Hype

OpenAI Rival Zhipu Faces US Ban Over Military Ties to China

Nvidia to Build Advanced AI Data Center in Israel with Blackwell Chips

Microsoft Unveils Copilot Chat: AI-Powered Tool for Business Efficiency

OpenAI Unveils ‘Tasks’ for ChatGPT to Compete with Siri and Alexa

Top Threats to LLMs: Insights from Nvidia’s AI Security Architect