{"id":26043,"date":"2025-05-21T15:21:32","date_gmt":"2025-05-21T14:21:32","guid":{"rendered":"https:\/\/www.riskinsight-wavestone.com\/?p=26043"},"modified":"2025-05-21T15:37:10","modified_gmt":"2025-05-21T14:37:10","slug":"leaking-minds-how-your-data-could-slip-through-ai-chatbots","status":"publish","type":"post","link":"https:\/\/www.riskinsight-wavestone.com\/en\/2025\/05\/leaking-minds-how-your-data-could-slip-through-ai-chatbots\/","title":{"rendered":"Leaking Minds: How Your Data Could Slip Through AI Chatbots\u00a0"},"content":{"rendered":"\n<p style=\"text-align: justify;\"><span data-contrast=\"auto\">OpenAI\u2019s flagship ChatGPT was over the news 18 months ago for accidentally leaking a CEO\u2019s personal information after being asked to repeat a word forever. This is among the many\u00a0 exploits that have been discovered in recent months.\u202f<\/span><span data-ccp-props=\"{&quot;335551550&quot;:6,&quot;335551620&quot;:6}\">\u00a0<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-26024 size-full\" src=\"https:\/\/www.riskinsight-wavestone.com\/wp-content\/uploads\/2025\/05\/Diapositive1-e1747818653646.jpg\" alt=\"Example of the PII Leaking exploit found in ChatGPT in December 2023\" width=\"1280\" height=\"720\" \/><\/p>\n<p style=\"text-align: center;\"><em>Figure 1 : Example of the Leaking exploit found in ChatGPT in December\u00a0<\/em><\/p>\n<p>\u00a0<\/p>\n<p style=\"text-align: justify;\"><span data-contrast=\"auto\">Scandals like these highlight a deeper truth: the core architecture of Large Language Models (LLMs) such as GPT and Google\u2019s Gemini is inherently prone to data leakage. This leakage can involve Personally Identifiable Information (PII) or confidential company data. The techniques used by attackers will continue to evolve in response to improved defenses from tech giants, the underlying vectors remain unchanged.<\/span><span data-ccp-props=\"{&quot;335551550&quot;:6,&quot;335551620&quot;:6}\">\u00a0<\/span><\/p>\n<p style=\"text-align: justify;\"><span data-contrast=\"auto\">Today, three main vectors exist through which PIIs (Personally Identifiable Information) or sensitive data might be exposed to such attacks:\u00a0<\/span><span data-ccp-props=\"{&quot;335551550&quot;:6,&quot;335551620&quot;:6}\">\u00a0<\/span><\/p>\n<ul>\n<li><span data-contrast=\"auto\">The use of publicly available web content in training datasets<\/span><span data-ccp-props=\"{&quot;335551550&quot;:1,&quot;335551620&quot;:1}\">\u00a0<\/span><\/li>\n<li><span data-contrast=\"auto\">The continuous re-training of models using user prompts and conversations<\/span><span data-ccp-props=\"{&quot;335551550&quot;:1,&quot;335551620&quot;:1}\">\u00a0<\/span><\/li>\n<li><span data-contrast=\"auto\">The introduction of persistent memory features in chatbots<\/span>\u00a0<br \/><span data-ccp-props=\"{&quot;335551550&quot;:1,&quot;335551620&quot;:1}\">\u00a0<\/span><\/li>\n<\/ul>\n<h2 style=\"text-align: justify;\"><b><span data-contrast=\"none\">LLM Pre-Training Data Leakage\u202f<\/span><\/b><span data-ccp-props=\"{&quot;335551550&quot;:6,&quot;335551620&quot;:6}\">\u00a0<\/span><\/h2>\n<p style=\"text-align: justify;\">\u00a0<\/p>\n<p style=\"text-align: justify;\"><span data-contrast=\"auto\">Most models available right now are transformer models, specifically GPTs or Generative Pre-Trained Transformers. The Pre-Trained in GPT refers to the initial training phase, where the model is exposed to a massive, diverse corpus of data unrelated to its final application. This helps the model learn foundational knowledge such as grammar, vocabulary, and factual information. When GPTs were first released, companies were transparent on where this training data came from, but currently the largest models on the web have datasets that are too large and too diverse and are often kept confidential.\u202f<\/span><span data-ccp-props=\"{&quot;335551550&quot;:6,&quot;335551620&quot;:6}\">\u00a0<\/span><\/p>\n<p style=\"text-align: justify;\"><span data-contrast=\"auto\">A major source of the data used in GPT pre-training are online forums such as Reddit (for Google\u2019s models), Stack Overflow, and other social media platforms. This poses a significant risk since these social media forums often contain PIIs . Although companies claim to filter out PII during training, there have been many instances where LLMs have leaked personal data from their pre-training data corpus to users after some prompt engineering and jail breaking. This danger will become ever more present as companies race to gather more data through web scraping to train larger and more sophisticated models.\u202f<\/span><span data-ccp-props=\"{&quot;335551550&quot;:6,&quot;335551620&quot;:6}\">\u00a0<\/span><\/p>\n<p style=\"text-align: justify;\"><span data-contrast=\"auto\">Known leaks of this type are mostly uncovered by researchers who develop more and more creative methods to bypass the defenses of chatbots. The example mentioned earlier is one such case. By prompting the chatbot to repeat forever a word, it &#8220;forgets&#8221; its task and begins to exhibit a behavior known as memorization. In this state, the chatbot regurgitates data from its training set. While this attack has been patched, new prompt techniques continue to be found to change the behavior of the chatbot.<\/span><\/p>\n<p style=\"text-align: justify;\"><span data-ccp-props=\"{&quot;335551550&quot;:6,&quot;335551620&quot;:6}\">\u00a0<\/span><\/p>\n<h2 style=\"text-align: justify;\"><b><span data-contrast=\"none\">User Input Re-Usage and Re-Training\u202f<\/span><\/b><span data-ccp-props=\"{&quot;335551550&quot;:6,&quot;335551620&quot;:6}\">\u00a0<\/span><\/h2>\n<p style=\"text-align: justify;\">\u00a0<\/p>\n<p style=\"text-align: justify;\"><span data-contrast=\"auto\">User Inputs re-training is the process of continuously improving the LLM by training it on user inputs. This can be done in several ways, the most popular of which is RLHF or Reinforcement Learning from Human Feedback.\u202f\u202f<\/span><span data-ccp-props=\"{&quot;335551550&quot;:6,&quot;335551620&quot;:6}\">\u00a0<\/span><\/p>\n<p style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-26026 size-full aligncenter\" src=\"https:\/\/www.riskinsight-wavestone.com\/wp-content\/uploads\/2025\/05\/Diapositive2-e1747818997148.jpg\" alt=\"The feedback button used for RHLF in chatGPT\" width=\"1280\" height=\"720\" \/><em>Figure 3 : The feedback buttons used for RLHF in ChatGPT\u00a0<\/em><\/p>\n<p>\u00a0<\/p>\n<p style=\"text-align: justify;\"><span data-contrast=\"auto\">This method is built on top of collecting user feedback on the LLM\u2019s output. Many users of LLMs might have seen the \u201cThumbs Up\u201d or \u201cThumbs Down\u201d buttons in ChatGPT or other LLM platforms.\u202f<\/span><span data-ccp-props=\"{&quot;335551550&quot;:6,&quot;335551620&quot;:6}\">\u00a0<\/span><\/p>\n<p style=\"text-align: justify;\"><span data-contrast=\"auto\">These buttons collect feedback from the user and use the feedback to re-train the model. If the user signifies the response as positive, the platform takes the user input \/ model output pair and encourages the model to replicate the behavior. Similarly, if the user indicates that the model performed poorly, the user input \/ model output pair will be used to discourage the model from replicating the behavior.\u202f<\/span><span data-ccp-props=\"{&quot;335551550&quot;:6,&quot;335551620&quot;:6}\">\u00a0<\/span><\/p>\n<p style=\"text-align: justify;\"><span data-contrast=\"auto\">However, continuous re-training can also occur without any user interaction. Models may occasionally use user input \/ model output to re-train in seemingly random ways. The lack of transparency from model providers and developers makes it difficult to pinpoint exactly how this happens. However, many users across the internet have reported models gaining new knowledge through re-training from other users\u2019 chats all the way back to 2022. For example, OpenAI\u2019s GPT 3.5 should not be able to know any information after Sept 2021, its cut-off date. Yet, asking it about recent information such as Elon Musk\u2019s new position as CEO of Twitter (now X) will provide you with a different reality as it confidently answers your question with accuracy.\u202f\u202f<\/span><span data-ccp-props=\"{&quot;335551550&quot;:6,&quot;335551620&quot;:6}\">\u00a0<\/span><\/p>\n<p style=\"text-align: justify;\"><span data-contrast=\"auto\">Essentially, what this means for end-users is that their chats are not kept confidential at all and any information given to the LLM through internal documents, meeting minutes or development codebases may show up in the chats of other users thus leaking it.\u202fThis poses significant privacy risks not only for individuals but also for companies, many of which have already taken action, like Samsung. In April 2023, Samsung banned the use of ChatGPT and similar chatbots after a group of employees used the tool for coding assistance and summarizing meeting notes. Although Samsung has no concrete evidence that the data was used by OpenAI, the potential risk was deemed too high to allow employees to continue using the tool. This is a classic example of Shadow AI, where unauthorized use of AI tools leads to the possible leakage of confidential or proprietary information.<\/span><span data-ccp-props=\"{&quot;335551550&quot;:6,&quot;335551620&quot;:6}\">\u00a0<\/span><\/p>\n<p style=\"text-align: justify;\"><span data-contrast=\"auto\">Many companies globally are waiting for stricter AI and data regulations before using LLMs for commercial use. We are seeing certain industries such as consulting open up but at an incredibly slow pace. Other companies, however, are tightening their control over internal LLM use to avoid leaking confidential data and client information.\u202f<\/span><span data-ccp-props=\"{&quot;335551550&quot;:6,&quot;335551620&quot;:6}\">\u00a0<\/span><\/p>\n<p style=\"text-align: justify;\">\u00a0<\/p>\n<h2 style=\"text-align: justify;\"><b><span data-contrast=\"none\">Memory Persistence<\/span><\/b><span data-ccp-props=\"{&quot;335551550&quot;:6,&quot;335551620&quot;:6}\">\u00a0<\/span><\/h2>\n<p style=\"text-align: justify;\">\u00a0<\/p>\n<p style=\"text-align: justify;\"><span data-contrast=\"auto\">While the two precedent risks have been recognized to exist for a few years, a new threat has emerged with the introduction of a feature by ChatGPT in September 2024. This feature enables the model to retain long-term memory of user conversations. The idea is to reduce redundancy by allowing the chatbot to remember user preferences, context, and previous interactions, thereby improving the relevance and personalization of responses.\u00a0<\/span><span data-ccp-props=\"{&quot;335551550&quot;:6,&quot;335551620&quot;:6}\">\u00a0<\/span><\/p>\n<p style=\"text-align: justify;\"><span data-contrast=\"auto\">However, this convenience comes at a significant security cost. Unlike earlier cases, where leaked information was more or less random, persistent memory introduces account-level targeting. Now, attackers could potentially exploit this memory to extract specific details from a particular user\u2019s history, significantly raising the stakes.<\/span><span data-ccp-props=\"{&quot;335551550&quot;:6,&quot;335551620&quot;:6}\">\u00a0<\/span><\/p>\n<p style=\"text-align: justify;\"><span data-contrast=\"auto\">Security researcher Johannes Rehberger demonstrated how this vulnerability could be exploited through a technique known as context poisoning. In his proof-of-concept, he crafted a site with a malicious image containing instructions. Once the targeted chatbot views the URL, its persistent memory is poisoned. This covert instruction allows the chatbot to be manipulated into extracting sensitive information from the victim\u2019s conversation history and transmitting it to an external URL.<\/span><span data-ccp-props=\"{&quot;335551550&quot;:6,&quot;335551620&quot;:6}\">\u00a0<\/span><\/p>\n<p style=\"text-align: justify;\"><span data-contrast=\"auto\">This attack is particularly dangerous because it combines persistence and stealth. Once it infiltrates the chatbot, it remains active indefinitely, continuously exfiltrating user data until the memory is cleaned. At the same time, it is subtle enough to go unnoticed, requiring careful human analysis of the memory to be detected.<\/span><span data-ccp-props=\"{&quot;335551550&quot;:6,&quot;335551620&quot;:6}\">\u00a0<\/span><\/p>\n<p style=\"text-align: justify;\"><span data-ccp-props=\"{&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559685&quot;:0}\">\u00a0<\/span><\/p>\n<h2 style=\"text-align: justify;\"><b><span data-contrast=\"none\">LLM Data Privacy and Mitigation\u00a0<\/span><\/b><span data-ccp-props=\"{&quot;335551550&quot;:6,&quot;335551620&quot;:6}\">\u00a0<\/span><\/h2>\n<p>\u00a0<\/p>\n<p style=\"text-align: justify;\"><span data-contrast=\"auto\">LLM developers often intentionally make it hard to disable re-training since it benefits their LLM development. If your personal information is already out in public, it has probably been scraped and used for pre-training an LLM. Additionally, if you gave ChatGPT or another LLM a confidential document in your prompt (without manually turning re-training OFF), it has most probably been used for re-training.\u202f<\/span><span data-ccp-props=\"{&quot;335551550&quot;:6,&quot;335551620&quot;:6}\">\u00a0<\/span><\/p>\n<p style=\"text-align: justify;\"><span data-contrast=\"auto\">Currently, there is no reliable technique that allows an individual to request the deletion of their data once it has been used for model training. Addressing this challenge is the goal of an emerging research area known as Machine Unlearning. This field focuses on developing methods to selectively remove the influence of specific data points from a trained model, thus deleting those data from the memory of the model. The field is evolving rapidly, particularly in response to GDPR regulations that enforce the right to erasure. For this reason, it is important to mitigate and minimize these risks in the future by controlling what data individuals and organizations put out on the internet and what information employees add to their prompts.\u202f<\/span><span data-ccp-props=\"{&quot;335551550&quot;:6,&quot;335551620&quot;:6}\">\u00a0<\/span><\/p>\n<p style=\"text-align: justify;\"><span data-contrast=\"auto\">It is vital for many business operations to stay confidential. However, the productivity boost that LLMs add to employee workflows cannot be overlooked. For this reason, we constructed a 3-step framework to ensure that organizations can harness the power of LLMs without losing control over their data.\u202f<\/span><span data-ccp-props=\"{&quot;335551550&quot;:6,&quot;335551620&quot;:6}\">\u00a0<\/span><\/p>\n<p>\u00a0<\/p>\n<h3 style=\"text-align: justify;\"><strong>Choose the most optimal model, environment and configuration\u202f\u00a0<\/strong><\/h3>\n<p style=\"text-align: justify;\"><span data-contrast=\"auto\">Ensure that the environment and model you are using are well-secured. Check over the model\u2019s data retention period and the provider\u2019s policy on re-training on user conversations. Ensure that you have \u201cAuto-delete\u201d as ON when available and \u201cChat History\u201d to OFF.\u202f\u202f<\/span><span data-ccp-props=\"{&quot;335551550&quot;:6,&quot;335551620&quot;:6}\">\u00a0<\/span><\/p>\n<p style=\"text-align: justify;\"><span data-contrast=\"auto\">At Wavestone we made a <\/span><a href=\"https:\/\/digiplace.sharepoint.com\/:x:\/s\/WOOHK-HONGKONGOFFICE\/EcyjrooJw_hPlkQBjpuYod4Brkuf8-pVV1uKtb5ejJfQLQ?e=i7KITB\"><span data-contrast=\"none\">tool<\/span><\/a><span data-contrast=\"auto\"> that compares the top 3 closed-source and open-source models in terms of pricing, data retention period, guard rails, and confidentiality to empower organizations in their AI journey.\u202f<\/span><\/p>\n<p style=\"text-align: justify;\"><span data-ccp-props=\"{&quot;335551550&quot;:6,&quot;335551620&quot;:6}\">\u00a0<\/span><\/p>\n<h3 style=\"text-align: justify;\"><strong>Raise employee awareness on best practices when using LLMs\u202f\u00a0<\/strong><\/h3>\n<p style=\"text-align: justify;\"><span data-contrast=\"auto\">Ensure that your employees know the danger of providing confidential and client information to LLMs and what they can do to minimize including corporate or personal information in an LLM\u2019s pre-training and re-training data corpus.\u202f<\/span><span data-ccp-props=\"{&quot;335551550&quot;:6,&quot;335551620&quot;:6}\">\u00a0<\/span><\/p>\n<p>\u00a0<\/p>\n<h3 style=\"text-align: justify;\"><strong>Implement a robust AI policy\u202f\u202f\u00a0<\/strong><\/h3>\n<p style=\"text-align: justify;\"><span data-contrast=\"auto\">Forward-looking companies should implement a robust internal AI policy that specifies:\u202f<\/span><span data-ccp-props=\"{&quot;335551550&quot;:6,&quot;335551620&quot;:6}\">\u00a0<\/span><\/p>\n<ul style=\"text-align: justify;\">\n<li><span data-contrast=\"auto\">What information can and can\u2019t be shared with LLMs internally\u202f<\/span><span data-ccp-props=\"{&quot;335551550&quot;:6,&quot;335551620&quot;:6}\">\u00a0<\/span><\/li>\n<li><span data-contrast=\"auto\">Monitoring of AI behavior\u202f<\/span><span data-ccp-props=\"{&quot;335551550&quot;:6,&quot;335551620&quot;:6}\">\u00a0<\/span><\/li>\n<li><span data-contrast=\"auto\">Limiting their online presence\u202f<\/span><span data-ccp-props=\"{&quot;335551550&quot;:6,&quot;335551620&quot;:6}\">\u00a0<\/span><\/li>\n<li><span data-contrast=\"auto\">Anonymization of prompt data\u202f<\/span><span data-ccp-props=\"{&quot;335551550&quot;:6,&quot;335551620&quot;:6}\">\u00a0<\/span><\/li>\n<li><span data-contrast=\"auto\">Limiting use to secure AI tools only\u202f<\/span><span data-ccp-props=\"{&quot;335551550&quot;:6,&quot;335551620&quot;:6}\">\u00a0<\/span><\/li>\n<\/ul>\n<p style=\"text-align: justify;\"><span data-contrast=\"auto\">Following these steps, organizations can minimize the digital risk they face by using the latest GenAI tools while also benefiting from their productivity increases.\u202f<\/span><span data-ccp-props=\"{&quot;335551550&quot;:6,&quot;335551620&quot;:6}\">\u00a0<\/span><\/p>\n<p style=\"text-align: justify;\">\u00a0<\/p>\n<h2 style=\"text-align: justify;\"><b><span data-contrast=\"none\">Moving Forward\u202f<\/span><\/b><span data-ccp-props=\"{&quot;335551550&quot;:6,&quot;335551620&quot;:6}\">\u00a0<\/span><\/h2>\n<p style=\"text-align: justify;\">\u00a0<\/p>\n<p style=\"text-align: justify;\"><span data-contrast=\"auto\">Although the data privacy vulnerabilities mentioned in this article impact individuals like you and me, their cause is the LLM developers\u2019 greed for data. This greed produces higher-quality end products but at the cost of data privacy and autonomy.\u00a0<\/span><span data-ccp-props=\"{&quot;335551550&quot;:6,&quot;335551620&quot;:6}\">\u00a0<\/span><\/p>\n<p style=\"text-align: justify;\"><span data-contrast=\"auto\">New regulations and technologies have come out to combat this issue such as the EU AI Act and OWASP top 10 LLM checklist. However, relying solely on responsible governance is not enough. Individuals and organizations must actively recognize the critical role PIIs play in today&#8217;s digital landscape and take proactive steps to protect them. This is especially important as we move toward more agentic AI systems, which autonomously interact with multiple third-party services. Not only will these systems process an increasing amount of personal and sensitive data, but this data will also be transmitted and handled by numerous different services, complicating oversight and control.<\/span><span data-ccp-props=\"{&quot;335551550&quot;:6,&quot;335551620&quot;:6}\">\u00a0<\/span><\/p>\n<p style=\"text-align: justify;\">\u00a0<\/p>\n<h2 style=\"text-align: justify;\"><span class=\"TextRun SCXW172884042 BCX8\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW172884042 BCX8\">References and Further Reading\u202f<\/span><\/span><span class=\"EOP SCXW172884042 BCX8\" data-ccp-props=\"{&quot;335551550&quot;:6,&quot;335551620&quot;:6}\">\u00a0<\/span><\/h2>\n<p style=\"text-align: justify;\">\u00a0<\/p>\n<p style=\"text-align: justify;\">[1] D. Goodin, \u201cOpenAI says mysterious chat histories resulted from account takeover,\u201d Ars Technica, https:\/\/arstechnica.com\/security\/2024\/01\/ars-reader-reports-chatgpt-is-sending-him-conversations-from-unrelated-ai-users\/ (accessed Jul. 13, 2024).\u00a0<\/p>\n<p style=\"text-align: justify;\">[2] M. Nasr et al., \u201cExtracting Training Data from ChatGPT,\u201d not-just-memorization , Nov. 28, 2023. Available: <a href=\"https:\/\/not-just-memorization.github.io\/extracting-training-data-from-chatgpt.html\">https:\/\/not-just-memorization.github.io\/extracting-training-data-from-chatgpt.html<\/a>\u00a0<\/p>\n<p style=\"text-align: justify;\">[3] \u201cWhat Is Confidential Computing? Defined and Explained,\u201d Fortinet. Available: <a href=\"https:\/\/www.fortinet.com\/resources\/cyberglossary\/confidential-computing#:~:text=Confidential%20computing%20refers%20to%20cloud\">https:\/\/www.fortinet.com\/resources\/cyberglossary\/confidential-computing#:~:text=Confidential%20computing%20refers%20to%20cloud<\/a>\u00a0<\/p>\n<p style=\"text-align: justify;\">[4] S. Wilson, \u201cOWASP Top 10 for Large Language Model Applications | OWASP Foundation,\u201d owasp.org, Oct. 18, 2023. Available: <a href=\"https:\/\/owasp.org\/www-project-top-10-for-large-language-model-applications\/\">https:\/\/owasp.org\/www-project-top-10-for-large-language-model-applications\/<\/a>\u00a0<\/p>\n<p style=\"text-align: justify;\">[5] \u201cExplaining the Einstein Trust Layer,\u201d Salesforce. Available: https:\/\/www.salesforce.com\/news\/stories\/video\/explaining-the-einstein-gpt-trust-layer\/\u00a0<\/p>\n<p style=\"text-align: justify;\">[6] \u201cHacker plants false memories in ChatGPT to steal user data in perpetuity\u201d Ars Technica , 24 sept. 2024 Available: <a href=\"https:\/\/arstechnica.com\/security\/2024\/09\/false-memories-planted-in-chatgpt-give-hacker-persistent-exfiltration-channel\/\">https:\/\/arstechnica.com\/security\/2024\/09\/false-memories-planted-in-chatgpt-give-hacker-persistent-exfiltration-channel\/<\/a><\/p>\n<p style=\"text-align: justify;\">[7] \u201cWhy we\u2019re teaching LLMs to forget things\u201d IBM, 07 Oct 2024 Available: https:\/\/research.ibm.com\/blog\/llm-unlearning<\/p>\n<p style=\"text-align: justify;\">\u00a0<\/p>\n\n\n","protected":false},"excerpt":{"rendered":"<p>OpenAI\u2019s flagship ChatGPT was over the news 18 months ago for accidentally leaking a CEO\u2019s personal information after being asked to repeat a word forever. This is among the many\u00a0 exploits that have been discovered in recent months.\u202f\u00a0 Figure 1&#8230;<\/p>\n","protected":false},"author":1516,"featured_media":26075,"comment_status":"open","ping_status":"closed","sticky":true,"template":"page-templates\/tmpl-one.php","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[3266,3977],"tags":[4083,4655,2772,2817,4642,4297,3128,2796,2878],"coauthors":[4503,4656],"class_list":["post-26043","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-cloud-next-gen-it-security-en","category-focus","tag-ai","tag-chatbots","tag-cybersecurity","tag-data-protection","tag-genai-2","tag-llm-2","tag-machine-learning-en","tag-risk","tag-vulnerabilities"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.0 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Leaking Minds: How Your Data Could Slip Through AI Chatbots\u00a0 - RiskInsight<\/title>\n<meta name=\"description\" content=\"Scandals like these highlight a deeper truth: the core architecture of Large Language Models (LLMs) such as GPT and Google\u2019s Gemini is inherently prone to data leakage. This leakage can involve Personally Identifiable Information (PII) or confidential company data. The techniques used by attackers will continue to evolve in response to improved defenses from tech giants, the underlying vectors remain unchanged.\u00a0Today, three main vectors exist through which PIIs (Personally Identifiable Information) or sensitive data might be exposed to such attacks:\u00a0\u00a0The use of publicly available web content in training datasets\u00a0The continuous re-training of models using user prompts and conversations\u00a0The introduction of persistent memory features in chatbots\u00a0\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.riskinsight-wavestone.com\/en\/2025\/05\/leaking-minds-how-your-data-could-slip-through-ai-chatbots\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Leaking Minds: How Your Data Could Slip Through AI Chatbots\u00a0 - RiskInsight\" \/>\n<meta property=\"og:description\" content=\"Scandals like these highlight a deeper truth: the core architecture of Large Language Models (LLMs) such as GPT and Google\u2019s Gemini is inherently prone to data leakage. This leakage can involve Personally Identifiable Information (PII) or confidential company data. The techniques used by attackers will continue to evolve in response to improved defenses from tech giants, the underlying vectors remain unchanged.\u00a0Today, three main vectors exist through which PIIs (Personally Identifiable Information) or sensitive data might be exposed to such attacks:\u00a0\u00a0The use of publicly available web content in training datasets\u00a0The continuous re-training of models using user prompts and conversations\u00a0The introduction of persistent memory features in chatbots\u00a0\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.riskinsight-wavestone.com\/en\/2025\/05\/leaking-minds-how-your-data-could-slip-through-ai-chatbots\/\" \/>\n<meta property=\"og:site_name\" content=\"RiskInsight\" \/>\n<meta property=\"article:published_time\" content=\"2025-05-21T14:21:32+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-05-21T14:37:10+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.riskinsight-wavestone.com\/wp-content\/uploads\/2025\/05\/767-scaled.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"2560\" \/>\n\t<meta property=\"og:image:height\" content=\"1920\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Jeanne PIGASSOU, Rayan BEN TALEB\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Jeanne PIGASSOU, Rayan BEN TALEB\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"10 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.riskinsight-wavestone.com\/en\/2025\/05\/leaking-minds-how-your-data-could-slip-through-ai-chatbots\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.riskinsight-wavestone.com\/en\/2025\/05\/leaking-minds-how-your-data-could-slip-through-ai-chatbots\/\"},\"author\":{\"name\":\"Jeanne PIGASSOU\",\"@id\":\"https:\/\/www.riskinsight-wavestone.com\/en\/#\/schema\/person\/452b07e38a831a2a62bc945ae0972d8b\"},\"headline\":\"Leaking Minds: How Your Data Could Slip Through AI Chatbots\u00a0\",\"datePublished\":\"2025-05-21T14:21:32+00:00\",\"dateModified\":\"2025-05-21T14:37:10+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.riskinsight-wavestone.com\/en\/2025\/05\/leaking-minds-how-your-data-could-slip-through-ai-chatbots\/\"},\"wordCount\":1863,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/www.riskinsight-wavestone.com\/en\/#organization\"},\"image\":{\"@id\":\"https:\/\/www.riskinsight-wavestone.com\/en\/2025\/05\/leaking-minds-how-your-data-could-slip-through-ai-chatbots\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.riskinsight-wavestone.com\/wp-content\/uploads\/2025\/05\/767-scaled.jpg\",\"keywords\":[\"AI\",\"Chatbots\",\"cybersecurity\",\"data protection\",\"genai\",\"LLM\",\"Machine learning\",\"risk\",\"Vulnerabilities\"],\"articleSection\":[\"Cloud &amp; Next-Gen IT Security\",\"Focus\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/www.riskinsight-wavestone.com\/en\/2025\/05\/leaking-minds-how-your-data-could-slip-through-ai-chatbots\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.riskinsight-wavestone.com\/en\/2025\/05\/leaking-minds-how-your-data-could-slip-through-ai-chatbots\/\",\"url\":\"https:\/\/www.riskinsight-wavestone.com\/en\/2025\/05\/leaking-minds-how-your-data-could-slip-through-ai-chatbots\/\",\"name\":\"Leaking Minds: How Your Data Could Slip Through AI Chatbots\u00a0 - RiskInsight\",\"isPartOf\":{\"@id\":\"https:\/\/www.riskinsight-wavestone.com\/en\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.riskinsight-wavestone.com\/en\/2025\/05\/leaking-minds-how-your-data-could-slip-through-ai-chatbots\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.riskinsight-wavestone.com\/en\/2025\/05\/leaking-minds-how-your-data-could-slip-through-ai-chatbots\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.riskinsight-wavestone.com\/wp-content\/uploads\/2025\/05\/767-scaled.jpg\",\"datePublished\":\"2025-05-21T14:21:32+00:00\",\"dateModified\":\"2025-05-21T14:37:10+00:00\",\"description\":\"Scandals like these highlight a deeper truth: the core architecture of Large Language Models (LLMs) such as GPT and Google\u2019s Gemini is inherently prone to data leakage. This leakage can involve Personally Identifiable Information (PII) or confidential company data. The techniques used by attackers will continue to evolve in response to improved defenses from tech giants, the underlying vectors remain unchanged.\u00a0Today, three main vectors exist through which PIIs (Personally Identifiable Information) or sensitive data might be exposed to such attacks:\u00a0\u00a0The use of publicly available web content in training datasets\u00a0The continuous re-training of models using user prompts and conversations\u00a0The introduction of persistent memory features in chatbots\u00a0\",\"breadcrumb\":{\"@id\":\"https:\/\/www.riskinsight-wavestone.com\/en\/2025\/05\/leaking-minds-how-your-data-could-slip-through-ai-chatbots\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.riskinsight-wavestone.com\/en\/2025\/05\/leaking-minds-how-your-data-could-slip-through-ai-chatbots\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.riskinsight-wavestone.com\/en\/2025\/05\/leaking-minds-how-your-data-could-slip-through-ai-chatbots\/#primaryimage\",\"url\":\"https:\/\/www.riskinsight-wavestone.com\/wp-content\/uploads\/2025\/05\/767-scaled.jpg\",\"contentUrl\":\"https:\/\/www.riskinsight-wavestone.com\/wp-content\/uploads\/2025\/05\/767-scaled.jpg\",\"width\":2560,\"height\":1920,\"caption\":\"Artificial Intelligence, isometric ai robot on mobile phone screen, chatbot app vector neon dark\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.riskinsight-wavestone.com\/en\/2025\/05\/leaking-minds-how-your-data-could-slip-through-ai-chatbots\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Accueil\",\"item\":\"https:\/\/www.riskinsight-wavestone.com\/en\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Leaking Minds: How Your Data Could Slip Through AI Chatbots\u00a0\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.riskinsight-wavestone.com\/en\/#website\",\"url\":\"https:\/\/www.riskinsight-wavestone.com\/en\/\",\"name\":\"RiskInsight\",\"description\":\"The cybersecurity &amp; digital trust blog by Wavestone&#039;s consultants\",\"publisher\":{\"@id\":\"https:\/\/www.riskinsight-wavestone.com\/en\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.riskinsight-wavestone.com\/en\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.riskinsight-wavestone.com\/en\/#organization\",\"name\":\"Wavestone\",\"url\":\"https:\/\/www.riskinsight-wavestone.com\/en\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.riskinsight-wavestone.com\/en\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.riskinsight-wavestone.com\/wp-content\/uploads\/2021\/08\/Monogramme\u2013W\u2013NEGA-RGB-50x50-1.png\",\"contentUrl\":\"https:\/\/www.riskinsight-wavestone.com\/wp-content\/uploads\/2021\/08\/Monogramme\u2013W\u2013NEGA-RGB-50x50-1.png\",\"width\":50,\"height\":50,\"caption\":\"Wavestone\"},\"image\":{\"@id\":\"https:\/\/www.riskinsight-wavestone.com\/en\/#\/schema\/logo\/image\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.riskinsight-wavestone.com\/en\/#\/schema\/person\/452b07e38a831a2a62bc945ae0972d8b\",\"name\":\"Jeanne PIGASSOU\",\"url\":\"https:\/\/www.riskinsight-wavestone.com\/en\/author\/jeanne-pigassou\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Leaking Minds: How Your Data Could Slip Through AI Chatbots\u00a0 - RiskInsight","description":"Scandals like these highlight a deeper truth: the core architecture of Large Language Models (LLMs) such as GPT and Google\u2019s Gemini is inherently prone to data leakage. This leakage can involve Personally Identifiable Information (PII) or confidential company data. The techniques used by attackers will continue to evolve in response to improved defenses from tech giants, the underlying vectors remain unchanged.\u00a0Today, three main vectors exist through which PIIs (Personally Identifiable Information) or sensitive data might be exposed to such attacks:\u00a0\u00a0The use of publicly available web content in training datasets\u00a0The continuous re-training of models using user prompts and conversations\u00a0The introduction of persistent memory features in chatbots\u00a0","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.riskinsight-wavestone.com\/en\/2025\/05\/leaking-minds-how-your-data-could-slip-through-ai-chatbots\/","og_locale":"en_US","og_type":"article","og_title":"Leaking Minds: How Your Data Could Slip Through AI Chatbots\u00a0 - RiskInsight","og_description":"Scandals like these highlight a deeper truth: the core architecture of Large Language Models (LLMs) such as GPT and Google\u2019s Gemini is inherently prone to data leakage. This leakage can involve Personally Identifiable Information (PII) or confidential company data. The techniques used by attackers will continue to evolve in response to improved defenses from tech giants, the underlying vectors remain unchanged.\u00a0Today, three main vectors exist through which PIIs (Personally Identifiable Information) or sensitive data might be exposed to such attacks:\u00a0\u00a0The use of publicly available web content in training datasets\u00a0The continuous re-training of models using user prompts and conversations\u00a0The introduction of persistent memory features in chatbots\u00a0","og_url":"https:\/\/www.riskinsight-wavestone.com\/en\/2025\/05\/leaking-minds-how-your-data-could-slip-through-ai-chatbots\/","og_site_name":"RiskInsight","article_published_time":"2025-05-21T14:21:32+00:00","article_modified_time":"2025-05-21T14:37:10+00:00","og_image":[{"width":2560,"height":1920,"url":"https:\/\/www.riskinsight-wavestone.com\/wp-content\/uploads\/2025\/05\/767-scaled.jpg","type":"image\/jpeg"}],"author":"Jeanne PIGASSOU, Rayan BEN TALEB","twitter_misc":{"Written by":"Jeanne PIGASSOU, Rayan BEN TALEB","Est. reading time":"10 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.riskinsight-wavestone.com\/en\/2025\/05\/leaking-minds-how-your-data-could-slip-through-ai-chatbots\/#article","isPartOf":{"@id":"https:\/\/www.riskinsight-wavestone.com\/en\/2025\/05\/leaking-minds-how-your-data-could-slip-through-ai-chatbots\/"},"author":{"name":"Jeanne PIGASSOU","@id":"https:\/\/www.riskinsight-wavestone.com\/en\/#\/schema\/person\/452b07e38a831a2a62bc945ae0972d8b"},"headline":"Leaking Minds: How Your Data Could Slip Through AI Chatbots\u00a0","datePublished":"2025-05-21T14:21:32+00:00","dateModified":"2025-05-21T14:37:10+00:00","mainEntityOfPage":{"@id":"https:\/\/www.riskinsight-wavestone.com\/en\/2025\/05\/leaking-minds-how-your-data-could-slip-through-ai-chatbots\/"},"wordCount":1863,"commentCount":0,"publisher":{"@id":"https:\/\/www.riskinsight-wavestone.com\/en\/#organization"},"image":{"@id":"https:\/\/www.riskinsight-wavestone.com\/en\/2025\/05\/leaking-minds-how-your-data-could-slip-through-ai-chatbots\/#primaryimage"},"thumbnailUrl":"https:\/\/www.riskinsight-wavestone.com\/wp-content\/uploads\/2025\/05\/767-scaled.jpg","keywords":["AI","Chatbots","cybersecurity","data protection","genai","LLM","Machine learning","risk","Vulnerabilities"],"articleSection":["Cloud &amp; Next-Gen IT Security","Focus"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.riskinsight-wavestone.com\/en\/2025\/05\/leaking-minds-how-your-data-could-slip-through-ai-chatbots\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.riskinsight-wavestone.com\/en\/2025\/05\/leaking-minds-how-your-data-could-slip-through-ai-chatbots\/","url":"https:\/\/www.riskinsight-wavestone.com\/en\/2025\/05\/leaking-minds-how-your-data-could-slip-through-ai-chatbots\/","name":"Leaking Minds: How Your Data Could Slip Through AI Chatbots\u00a0 - RiskInsight","isPartOf":{"@id":"https:\/\/www.riskinsight-wavestone.com\/en\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.riskinsight-wavestone.com\/en\/2025\/05\/leaking-minds-how-your-data-could-slip-through-ai-chatbots\/#primaryimage"},"image":{"@id":"https:\/\/www.riskinsight-wavestone.com\/en\/2025\/05\/leaking-minds-how-your-data-could-slip-through-ai-chatbots\/#primaryimage"},"thumbnailUrl":"https:\/\/www.riskinsight-wavestone.com\/wp-content\/uploads\/2025\/05\/767-scaled.jpg","datePublished":"2025-05-21T14:21:32+00:00","dateModified":"2025-05-21T14:37:10+00:00","description":"Scandals like these highlight a deeper truth: the core architecture of Large Language Models (LLMs) such as GPT and Google\u2019s Gemini is inherently prone to data leakage. This leakage can involve Personally Identifiable Information (PII) or confidential company data. The techniques used by attackers will continue to evolve in response to improved defenses from tech giants, the underlying vectors remain unchanged.\u00a0Today, three main vectors exist through which PIIs (Personally Identifiable Information) or sensitive data might be exposed to such attacks:\u00a0\u00a0The use of publicly available web content in training datasets\u00a0The continuous re-training of models using user prompts and conversations\u00a0The introduction of persistent memory features in chatbots\u00a0","breadcrumb":{"@id":"https:\/\/www.riskinsight-wavestone.com\/en\/2025\/05\/leaking-minds-how-your-data-could-slip-through-ai-chatbots\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.riskinsight-wavestone.com\/en\/2025\/05\/leaking-minds-how-your-data-could-slip-through-ai-chatbots\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.riskinsight-wavestone.com\/en\/2025\/05\/leaking-minds-how-your-data-could-slip-through-ai-chatbots\/#primaryimage","url":"https:\/\/www.riskinsight-wavestone.com\/wp-content\/uploads\/2025\/05\/767-scaled.jpg","contentUrl":"https:\/\/www.riskinsight-wavestone.com\/wp-content\/uploads\/2025\/05\/767-scaled.jpg","width":2560,"height":1920,"caption":"Artificial Intelligence, isometric ai robot on mobile phone screen, chatbot app vector neon dark"},{"@type":"BreadcrumbList","@id":"https:\/\/www.riskinsight-wavestone.com\/en\/2025\/05\/leaking-minds-how-your-data-could-slip-through-ai-chatbots\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Accueil","item":"https:\/\/www.riskinsight-wavestone.com\/en\/"},{"@type":"ListItem","position":2,"name":"Leaking Minds: How Your Data Could Slip Through AI Chatbots\u00a0"}]},{"@type":"WebSite","@id":"https:\/\/www.riskinsight-wavestone.com\/en\/#website","url":"https:\/\/www.riskinsight-wavestone.com\/en\/","name":"RiskInsight","description":"The cybersecurity &amp; digital trust blog by Wavestone&#039;s consultants","publisher":{"@id":"https:\/\/www.riskinsight-wavestone.com\/en\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.riskinsight-wavestone.com\/en\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.riskinsight-wavestone.com\/en\/#organization","name":"Wavestone","url":"https:\/\/www.riskinsight-wavestone.com\/en\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.riskinsight-wavestone.com\/en\/#\/schema\/logo\/image\/","url":"https:\/\/www.riskinsight-wavestone.com\/wp-content\/uploads\/2021\/08\/Monogramme\u2013W\u2013NEGA-RGB-50x50-1.png","contentUrl":"https:\/\/www.riskinsight-wavestone.com\/wp-content\/uploads\/2021\/08\/Monogramme\u2013W\u2013NEGA-RGB-50x50-1.png","width":50,"height":50,"caption":"Wavestone"},"image":{"@id":"https:\/\/www.riskinsight-wavestone.com\/en\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/www.riskinsight-wavestone.com\/en\/#\/schema\/person\/452b07e38a831a2a62bc945ae0972d8b","name":"Jeanne PIGASSOU","url":"https:\/\/www.riskinsight-wavestone.com\/en\/author\/jeanne-pigassou\/"}]}},"_links":{"self":[{"href":"https:\/\/www.riskinsight-wavestone.com\/en\/wp-json\/wp\/v2\/posts\/26043","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.riskinsight-wavestone.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.riskinsight-wavestone.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.riskinsight-wavestone.com\/en\/wp-json\/wp\/v2\/users\/1516"}],"replies":[{"embeddable":true,"href":"https:\/\/www.riskinsight-wavestone.com\/en\/wp-json\/wp\/v2\/comments?post=26043"}],"version-history":[{"count":7,"href":"https:\/\/www.riskinsight-wavestone.com\/en\/wp-json\/wp\/v2\/posts\/26043\/revisions"}],"predecessor-version":[{"id":26077,"href":"https:\/\/www.riskinsight-wavestone.com\/en\/wp-json\/wp\/v2\/posts\/26043\/revisions\/26077"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.riskinsight-wavestone.com\/en\/wp-json\/wp\/v2\/media\/26075"}],"wp:attachment":[{"href":"https:\/\/www.riskinsight-wavestone.com\/en\/wp-json\/wp\/v2\/media?parent=26043"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.riskinsight-wavestone.com\/en\/wp-json\/wp\/v2\/categories?post=26043"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.riskinsight-wavestone.com\/en\/wp-json\/wp\/v2\/tags?post=26043"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.riskinsight-wavestone.com\/en\/wp-json\/wp\/v2\/coauthors?post=26043"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}