Privacy Issues

Is ChatGPT a Data Privacy Nightmare?

If you've ever posted something online, you should be concerned...

March 7, 2023

ChatGPT is the media’s darling, with people from the left and right having something to say about it. The technology powering the innovative chat system is fantastic, no doubts about it — but it also brings along a set of concerns that go beyond student cheating.

One concern is privacy, as ChatGPT has been fed information from all across the Internet. That means that if you’ve ever posted something online – chances are that ChatGPT knows about it. Heck, if you’re knowledgeable on some subject and have shared your knowledge publicly, the algorithm will feed on the information you provided to everyone.

That, however, may not be a problem; instead, the problem is that it also uses people’s data.

300 billion words and counting

OpenAI, which is the company behind ChatGPT, said that its tool had scrapped some 300 billion words from the internet, and that would include books, articles, websites, and posts, as well as personal information obtained without consent. And as we’ve noted, if you’ve ever written something online, ChatGPT is probably using that data without asking you anything about it.

This massive data collection is problematic for several reasons, such as:

1. No one asked us for permission to use our data. This is impossible to imagine in a “physical world,” yet online – it’s given. Algorithms are constantly scraping the data without anyone being asked about it. With ChatGPT, however, that data could be provided to another individual if he/she asks about it.

2. OpenAI doesn’t specify what it’s doing with personal information, although that should be mandatory in many places worldwide. For instance, in the European Union, there’s the General Data Protection Regulation (GDPR), which is meant to govern companies collecting and processing users’ information. Alas, it is still arbitrary whether ChatGPT is compliant with GDPR requirements. Among these requirements is the “right to be forgotten,” which we’re not sure works properly with ChatGPT.

3. What about proprietary or copyrighted materials? ChatGPT has regularly quoted parts of the books, and we doubt every author or publisher-provided the algorithm with the consent to do so. If you’re an author, you’d like to test whether ChatGPT can quote paragraphs from your book.

4. OpenAI didn’t pay anyone to use their data. And, to be fair, neither has Google, though Google points users to the original source — it’s a search engine, not a chatbot. In contrast, ChatGPT is meant to keep users engaged on their own page. There are no ads yet, but they do offer a paid service and have raised billions in venture funding. So they should be able to pay authors in some way.

5. Data from user prompts may include personal information and could then be provided to other users. For instance, a developer may ask ChatGPT to go through proprietary code or a lawyer may ask it to check a contract. We can only hope that this information won’t be shared with other parties, but can we actually be sure about it?

And there’s more…

Like any other website, OpenAI also gathers other data that users’ browsers leave, like their IP addresses that can further reveal your location to the company — at least if you’re NOT using a VPN.

Like all that is not enough, OpenAI also states that it may share users’ personal information with third parties — without informing them — to meet their business objectives.

To be fair, OpenAI is not the only one doing this, and since it’s such a novel technology, it is hard to come to the “proper” stance about its use. If they had to ask for permission for every bit of data they analyze, they would never be able to launch ChatGPT and their other products.

However, now that ChatGPT is with us, all of us have to pause for a moment and think about how to regulate its use. There are obvious copyright concerns and also those related to privacy. From what we can tell, this kind of technology could lead to even better (or that’s worse) user profiling than what we currently have from Google. There’s also Facebook, but I guess we are willing to provide them with our personal information.

Nevertheless, we would love to see some authorities reigning and opening public debates. The technology shouldn’t be stopped but it must be channeled in a way to actually work for the people, instead of yet again turning them into products. We don’t need ChatGPT for that; we already have Google and Facebook.