close
close
Hundreds of LLM servers expose corporate, health and other online data

Hundreds of open source servers for building large language models (LLMs) and dozens of vector databases expose highly sensitive information to the open Internet.

As companies rush to integrate AI into their business operations, they sometimes do not pay enough attention to how to protect these tools and the information they entrust to them. In a new report, Legit security researcher Naphtali Deutsch has proven this by scanning the internet for two types of potentially vulnerable open source (OSS) AI services: vector databases – which store data for AI tools – and LLM application creators – in particular the open source program Flowise. The investigation brought a variety of sensitive personal and company dataunknowingly exposed by organizations that fail to participate in the generative AI revolution.

“Many programmers see these tools on the Internet and then try to set them up in their environment,” says Deutsch. But these same programmers ignore security aspects.

Hundreds of unpatched Flowise servers

Flowise is a low-code tool for building all kinds of LLM applications. It is backed by Y Combinator and has tens of thousands of stars on GitHub.

Whether it’s a customer support bot or a tool for generating and extracting data for downstream programming and other tasks, the programs developers build with Flowise typically access and manage large amounts of data. So it’s no surprise that most Flowise servers are password protected.

However, a password is not sufficient security. Earlier this year, a researcher in India discovered an authentication bypass vulnerability in Flowise versions 1.6.2 and earlier, which can be triggered by simply capitalizing some characters in the program’s API endpoints. The issue was tracked as CVE-2024-31621 and received a “high” score of 7.6 on the CVSS version 3 scale.

By exploiting CVE-2024-31621, Legit’s Deutsch cracked 438 Flowise servers. They contained GitHub access tokens, OpenAI API KeyFlowise passwords and API keys in plain text, configurations and prompts related to Flowise apps, and more.

“With a GitHub API token, you can access private repositories,” Deutsch points out, and that’s just one example of the kind of follow-on attacks that such data can enable. “We also found API keys to other vector databases, like Pinecone, a very popular SaaS platform. You could use these to break into a database and dump any data you find – potentially private and confidential data.”

Dozens of unprotected vector databases

Vector databases store virtually any type of data an AI app needs to retrieve, and those that are accessible over the internet are directly vulnerable to attack.

Using scanning tools, Deutsch discovered around 30 vector database servers on the internet that had no authentication checks whatsoever and contained obviously sensitive information: private email conversations from an engineering services provider, documents from a fashion company, personal customer data and financial information from an industrial equipment manufacturer, and more. Other databases contained real estate data, product documentation and data sheets, and patient information used by a medical chatbot.

Leaky vector databases are even more dangerous than leaky LLM builders because they can be manipulated in ways that go unnoticed by users of the AI ​​tools that rely on them. For example, instead of simply stealing information from an exposed vector database, a hacker can delete or corrupt its data to manipulate the results. One could also inject malware into a vector database so that when an LLM program queries it, it picks up the malware.

To reduce the risk of exposed AI tools, Deutsch recommends that organizations restrict access to the AI ​​services they use, monitor and log activities associated with those services, protect sensitive data transmitted by LLM apps, and always apply software updates whenever possible.

“(These tools) are new and people don’t know as much about how to set them up,” he warns. “And it’s also getting easier — with a lot of these vector databases, it’s two clicks to set them up in your Docker or in your AWS Azure environment.” Security is more cumbersome and can lag behind.

By Bronte

Leave a Reply

Your email address will not be published. Required fields are marked *