Evan Schuman
Contributor

Nearly 10% of employee gen AI prompts include sensitive data

News Analysis
10 Feb 20256 mins
Data and Information SecurityGenerative AI

Enterprise users are leaking sensitive corporate data through use of unauthorized and authorized generative AI apps at alarming rates. Plugging the leaks is vital to reduce risk exposure.

Smart young caucasian business man freelancer it worker guy manager male student recording voice message talking on mobile smart phone with customers partners while working studding in office cafe
Credit: Inside Creative House / Shutterstock

Gen AI data leaks from employees are an enterprise nightmare in the making.

According to a recent report on gen AI data leakage from Harmonic, 8.5% of employee prompts to popular LLMs included sensitive data, presenting security, compliance, privacy, and legal concerns. 

Harmonic, which analyzed tens of thousands of prompts to ChatGPT, Copilot, Gemini, Claude, and Perplexity during Q4 2024, found that customer data, including billing information and authentication data, accounted for the largest share of leaked data at 46%. Here, Harmonic highlighted insurance claims as a type of report rife with customer data that is frequently entered into gen AI tools by employees to save time in processing.

Employee data, including payroll data and personally identifiable information (PII), accounted for 27% of sensitive prompts, followed by legal and finance data at 15%.

“Security-related information, comprising 6.88% of sensitive prompts, is particularly concerning,” according to the report. “Examples include penetration test results, network configurations, and incident reports. Such data could provide attackers with a blueprint for exploiting vulnerabilities.”

Out from the shadows

Generative AI data leakage is a challenging problem — and a key reason why enterprise gen AI strategies are putting CISOs in a stressful bind.

Enterprise LLM use falls into three broad categories:sanctioneddeployments, including licensed and in-house developed implementations; shadow AI, typically comprising free consumer-grade apps forbidden by the enterprise for good reason; and semi-shadow gen AI.

Unauthorized shadow AI is a primary issue for CISOs, but this last category is a growing problem that may be the hardest to control. Initiated by business unit chiefs, semi-shadow AI can include paid gen AI apps that have not received IT approval, enlisted for experimentation, expediency, or productivity enhancement. In such instances, the executive may be engaging in shadow IT while line-of-business employees are not, having been told to make use of the tools by management as part of its AI strategy.

Shadow or semi-shadow, free generative AI apps are the most problematic, as their license terms usually allow for training on every query. According to Harmonic’s research, free-tier AI use commands the lion’s share of sensitive data leakage. For example, 54% of sensitive prompts were entered on ChatGPT’s free tier.

But most data specialists also discourage CISOs from trusting contractual promises of paid gen AI apps, most of which prohibit training on user queries in enterprise versions.

Robert Taylor, an attorney with the Carstens, Allen & Gourley intellectual property law firm, gives the example of trade secrets. Various legal protections — especially trade secret protections — can be lost if an employee asks a generative AI system a question that reveals the trade secret, he said, adding that lawyers protecting IP often have team members ask questions of a wide range of AI apps about trade secrets to see whether prohibited data is discovered. If so, then they know someone leaked it.

If a competitor learns of the leak, it can argue in court that the leak invalidates the trade secret’s legal protections. According to Taylor, the IP owner’s lawyers must then prove the enterprise deployed a wide range of mechanisms to protect the secret. Relying on the provisions of a contract that promises no training on generative AI queries “is not a sufficient level of reasonable effort,” Taylor said.

“It would be a totality-of-circumstances situation,” he said. Enterprises must deploy and strictly enforce “policies that constrain your employees on use of that data.”

Data-conscious practices

CISOs should work with business leaders to ensure employees are trained on ways to get the same results from LLMs without using protected data, said Jeff Pollard, a VP and principal analyst at Forrester. Doing so requires more finesse with prompts, but it protects sensitive information without diluting the effectiveness of the AI’s generated answer.

“You really don’t have to reveal sensitive information in order to get a positive benefit out of the system, but we do have to train users to understand query phrasing” strategies, Pollard said. 

When it comes to employee use of free AI tools rather than locked-down corporate-paid apps, “cracking down on employees is the most obvious thing to do, but the core question is: ‘Why are employees doing it?’” asked Arun Chandrasekaran, a distinguished VP and analyst at Gartner.

“Employees are doing it because IT is not providing them the tools they need,” he argued.

CISOs should point this out to their C-suite counterparts to help enforce that enterprise-wide AI tools should be “truly usable,” he said.

Unfortunately, with generative AI, the genie is already out of the bottle, according to Kaz Hassan, senior community and partner marketing manager at software vendor Unily.

“AI use by employees has outrun the ability of IT teams to catch up,” he said. “IT teams know the situation isn’t great but aren’t able to crack the comms, culture, or strategy part of the equation to make an impact.”

Hassan added: “A new blueprint is needed, and organizations need clear AI strategies now to reduce risk, and they need to follow up with AI woven into the employee tech stack imminently.”

Typical monitoring and control apps miss the point of the data leakage, he claimed. 

“Power users are processing sensitive data through unauthorized AI tools not because they can’t be controlled, but because they won’t be slowed down. The old playbook of restrict-and-protect isn’t just failing — it’s actively pushing AI innovation into the shadows,” Hassan said. “CISOs need to face this reality: either lead the AI transformation or watch their security perimeter dissolve.”

Hassan pointed out that the data problem from generative AI goes in two directions: sensitive data leaving via queries, and flawed data — either via hallucinations or having been trained on incorrect information — coming in to the enterprise via generative AI answers that your team relies on for corporate analysis.

“Today’s CISOs shouldn’t just worry about sensitive data getting out,” Hassan said. “They should also be concerned about bad data getting in.”

Evan Schuman

Evan Schuman has covered IT issues for a lot longer than he'll ever admit. The founding editor of retail technology site StorefrontBacktalk, he's been a columnist for CBSNews.com, RetailWeek, Computerworld and eWeek and his byline has appeared in titles ranging from BusinessWeek, VentureBeat and Fortune to The New York Times, USA Today, Reuters, The Philadelphia Inquirer, The Baltimore Sun, The Detroit News and The Atlanta Journal-Constitution. Evan can be reached at eschuman@thecontentfirm.com and he can be followed at http://www.linkedin.com/in/schumanevan/. Look for his blog twice a week.

The opinions expressed in this blog are those of Evan Schuman and do not necessarily represent those of IDG Communications, Inc., its parent, subsidiary or affiliated companies.

More from this author

Exit mobile version