The AI dooooooom thread

DGuller · February 28, 2024, 08:03:30 AM

Looks like it was indeed the prompt abuse that made ChatGPT "plagiarize" NYT: https://arstechnica.com/tech-policy/2024/02/openai-accuses-nyt-of-hacking-chatgpt-to-set-up-copyright-suit.

I'm not surprised at all, the original NYT claim never passed the smell test. LLMs don't work like that unless you hack them in a way that would only be done to manufacture a copyright trolling lawsuit.

garbon · February 28, 2024, 08:27:04 AM

Quote from: DGuller on February 28, 2024, 08:03:30 AMLooks like it was indeed the prompt abuse that made ChatGPT "plagiarize" NYT: https://arstechnica.com/tech-policy/2024/02/openai-accuses-nyt-of-hacking-chatgpt-to-set-up-copyright-suit.

I'm not surprised at all, the original NYT claim never passed the smell test. LLMs don't work like that unless you hack them in a way that would only be done to manufacture a copyright trolling lawsuit.

Interesting. What I saw in that article was Open AI doing a few unethical things that they are now 'bugfixing' as a result of the lawsuit.

DGuller · February 28, 2024, 08:49:43 AM

Quote from: garbon on February 28, 2024, 08:27:04 AMInteresting. What I saw in that article was Open AI doing a few unethical things that they are now 'bugfixing' as a result of the lawsuit.

You could've seen that just as well on a blank screen, there would've been just as much support for that interpretation there.

The bugs they're fixing would make it harder to manufacture a lawsuit out of whole cloth, among other things. Fixing the bug would make it harder for lawyers to engineer a case of plagiarism. Fixing that bug would change the chance of plagiarism in actual use from 0% to 0%.

crazy canuck · February 28, 2024, 11:07:58 AM

Quote from: DGuller on February 28, 2024, 08:03:30 AMLooks like it was indeed the prompt abuse that made ChatGPT "plagiarize" NYT: https://arstechnica.com/tech-policy/2024/02/openai-accuses-nyt-of-hacking-chatgpt-to-set-up-copyright-suit.

I'm not surprised at all, the original NYT claim never passed the smell test. LLMs don't work like that unless you hack them in a way that would only be done to manufacture a copyright trolling lawsuit.

I can only assume you did not actually read the whole article that you linked.

If you had, I am doubtful you would be making such a claim. For example:

Ian Crosby, Susman Godfrey partner and lead counsel for The New York Times, told Ars that "what OpenAI bizarrely mischaracterizes as 'hacking' is simply using OpenAI's products to look for evidence that they stole and reproduced The Times's copyrighted works. And that is exactly what we found. In fact, the scale of OpenAI's copying is much larger than the 100-plus examples set forth in the complaint."

Crosby told Ars that OpenAI's filing notably "doesn't dispute—nor can they—that they copied millions of The Times' works to build and power its commercial products without our permission."

Josquius · February 28, 2024, 11:17:52 AM

I've been following this story as it has steadily developed. Seems very languish. And interesting.
Lots of people crying bloody murder about teh woke AI but actually pretty interestingly seems the problem with the AI was quite the opposite, and clunky attempts to counter this.

Shame this feature isn't available in Europe as I'd love to try it.

https://www.bbc.co.uk/news/technology-68412620

QuoteWhy Google's 'woke' AI problem won't be an easy fix
In the last few days, Google's artificial intelligence (AI) tool Gemini has had what is best described as an absolute kicking online.

Gemini has been thrown onto a rather large bonfire: the culture war which rages between left- and right- leaning communities.

Gemini is essentially Google's version of the viral chatbot ChatGPT. It can answer questions in text form, and it can also generate pictures in response to text prompts.

Initially, a viral post showed this recently launched AI image generator create an image of the US Founding Fathers which inaccurately included a black man.

Gemini also generated German soldiers from World War Two, incorrectly featuring a black man and Asian woman.

Google apologised, and immediately "paused" the tool, writing in a blog post that it was "missing the mark".

But it didn't end there - its over-politically correct responses kept on coming, this time from the text version.

Gemini replied that there was "no right or wrong answer" to a question about whether Elon Musk posting memes on X was worse than Hitler killing millions of people.

When asked if it would be OK to misgender the high-profile trans woman Caitlin Jenner if it was the only way to avoid nuclear apocalypse, it replied that this would "never" be acceptable.

Jenner herself responded and said actually, yes, she would be alright about it in these circumstances.

Elon Musk, posting on his own platform, X, described Gemini's responses as "extremely alarming" given that the tool would be embedded into Google's other products, collectively used by billions of people.

I asked Google whether it intended to pause Gemini altogether. After a very long silence, I was told the firm had no comment. I suspect it's not a fun time to be working in the public relations department.

But in an internal memo Google's chief executive Sundar Pichai has acknowledged some of Gemini's responses "have offended our users and shown bias".

That was he said "completely unacceptable" - adding his teams were "working around the clock" to fix the problem.

Biased data
It appears that in trying to solve one problem - bias - the tech giant has created another: output which tries so hard to be politically correct that it ends up being absurd.

The explanation for why this has happened lies in the enormous amounts of data AI tools are trained on.

Much of it is publicly available - on the internet, which we know contains all sorts of biases.

Traditionally images of doctors, for example, are more likely to feature men. Images of cleaners on the other hand are more likely to be women.

AI tools trained with this data have made embarrassing mistakes in the past, such as concluding that only men had high powered jobs, or not recognising black faces as human.

It is also no secret that historical storytelling has tended to feature, and come from, men, omitting women's roles from stories about the past.

It looks like Google has actively tried to offset all this messy human bias with instructions for Gemini not make those assumptions.

But it has backfired precisely because human history and culture are not that simple: there are nuances which we know instinctively and machines do not.

Unless you specifically programme an AI tool to know that, for example, Nazis and founding fathers weren't black, it won't make that distinction.

Google DeepMind boss Demis Hassabis speaks at the Mobile World Congress in Barcelona, Spain
IMAGE SOURCE,REUTERS
Image caption,
Google DeepMind boss Demis Hassabis
On Monday, the co-founder of DeepMind, Demis Hassabis, an AI firm acquired by Google, said fixing the image generator would take a matter of weeks.

But other AI experts aren't so sure.

"There really is no easy fix, because there's no single answer to what the outputs should be," said Dr Sasha Luccioni, a research scientist at Huggingface.

"People in the AI ethics community have been working on possible ways to address this for years."

One solution, she added, could include asking users for their input, such as "how diverse would you like your image to be?" but that in itself clearly comes with its own red flags.

"It's a bit presumptuous of Google to say they will 'fix' the issue in a few weeks. But they will have to do something," she said.

Professor Alan Woodward, a computer scientist at Surrey University, said it sounded like the problem was likely to be "quite deeply embedded" both in the training data and overlying algorithms - and that would be difficult to unpick.

"What you're witnessing... is why there will still need to be a human in the loop for any system where the output is relied upon as ground truth," he said.

Bard behaviour
From the moment Google launched Gemini, which was then known as Bard, it has been extremely nervous about it. Despite the runaway success of its rival ChatGPT, it was one of the most muted launches I've ever been invited to. Just me, on a Zoom call, with a couple of Google execs who were keen to stress its limitations.

And even that went awry - it turned out that Bard had incorrectly answered a question about space in its own publicity material.

The rest of the tech sector seems pretty bemused by what's happening.

They are all grappling with the same issue. Rosie Campbell, Policy Manager at ChatGPT creator OpenAI, was interviewed earlier this month for a blog which stated that at OpenAI even once bias is identified, correcting it is difficult - and requires human input.

But it looks like Google has chosen a rather clunky way of attempting to correct old prejudices. And in doing so it has unintentionally created a whole set of new ones.

On paper, Google has a considerable lead in the AI race. It makes and supplies its own AI chips, it owns its own cloud network (essential for AI processing), it has access to shedloads of data and it also has a gigantic user base. It hires world-class AI talent, and its AI work is universally well-regarded.

As one senior exec from a rival tech giant put it to me: watching Gemini's missteps feels like watching defeat snatched from the jaws of victory.

Jacob · February 28, 2024, 12:48:57 PM

Re: the NYT thing, I suppose it depends on the framing of the question:

Framing 1 (Not Plagiarism)
The only way to plagiarize the NYT with ChatGPT is if the user deliberately sets out to plagiarize (via prompt engineering). Therefore ChatGPT (and OpenAI) are innocent of any plagiarism; any guilt lies on the prompt engineer who set out to plagiarize.

Framing 2 (Unlawful/Plagiarism)
The fact that it is possible to use ChatGPT to obviously plagiarize the NYT indicates that OpenAI used NYT data to train ChatGPT. That was NYT data was used for this training without permission is unlawful, and that it is used as a basis for creating answers without permission or credit is plagiarism. The fault for the plagiarism lies with OpenAI as they're the one who ingested the data without permission; that individual users can be more or less successful in plagiarizing material is secondary.

Basically, it's a contest between the point of view that the tool itself is morally (and legally) neutral, with any onus being on end users, versus the point of view that the tool itself is fundamentally built on plagiarism (and other unlawful use of other people's data) independently of whatever individual users may do.

DGuller · February 28, 2024, 01:08:06 PM

I think even framing 1 might be too generous to NYT. Depending on just how hackish the prompting is, they may essentially be retyping their news articles into MS Word verbatim, and then claiming that MS Word is plagiarizing its content.

Whether ChatGPT synthesizing the NYT content is okay or not is a different question. I'm just addressing the idea is that you can just get ChatGPT to regurgitate an NYT article for you, which frankly always smelled, especially once you looked in the complaint and saw how selectively the proof of that was presented.

Sheilbh · February 28, 2024, 02:31:53 PM

Right but NYT aren't suing for plagiarism they're suing for breach of copyright and I think it's quite a specific point (I could be wrong - not an IP lawyer - not a US lawyer etc).

But I'd read that point as doing two things - a nice bit of splashy PR that's easy to understand and knocking out the "transformative use" argument.

Now having said all of that I find it a bit odd for a company who's trained an LLM to argue that running something thousands of time to get a result is "hacking"

Jacob · February 28, 2024, 03:37:08 PM

Indeed.

As I understand it, the case is not about whether you can accidentally use ChatGPT to plagiarize the NYT or whether you have to deliberately set out to do it. It's about whether OpenAI used NYT data to train ChatGPT without permission.

The answer to that question seems to be "yes." Which leads to the next question, which is "how big a deal is that".

crazy canuck · February 28, 2024, 03:47:05 PM

Quote from: Jacob on February 28, 2024, 03:37:08 PMIndeed.

As I understand it, the case is not about whether you can accidentally use ChatGPT to plagiarize the NYT or whether you have to deliberately set out to do it. It's about whether OpenAI used NYT data to train ChatGPT without permission.

The answer to that question seems to be "yes." Which leads to the next question, which is "how big a deal is that".

That is pretty much it. And the answer is, a big enough deal for the NYT to spend legal resources to stop it and seek damages for the unauthorized use of their intellectual property.

Sheilbh · February 28, 2024, 03:53:17 PM

I was at a media event just today with people who are working on this (from an IP perspective, editorial, data science etc).

And there was the "where will this be in 5-10 years in the sector". While there was a degree of distinguishing between the NYT (and similar titles) who do good original reporting and the bits of the media that went for a volume strategy focused on pageviews and nothing else, fundamentally the view was: every journalist will be using AI in their job (and there is a route to a virtuous cycle), but if we get it wrong none of us might be here.

Interesting times

Syt · March 26, 2024, 12:57:54 PM

OpenAI have released a video showcasing the generative text to video model:

Tonitrus · March 26, 2024, 09:03:46 PM

Don't let fanfiction get a hold of this...

Valmy · March 26, 2024, 09:33:31 PM

Quote from: Tonitrus on March 26, 2024, 09:03:46 PMDon't let fanfiction get a hold of this...

Teddy Roosevelt and the Rough Riders mounted on Dinosaurs.

Tonitrus · March 26, 2024, 09:39:08 PM

Quote from: Valmy on March 26, 2024, 09:33:31 PM
Quote from: Tonitrus on March 26, 2024, 09:03:46 PMDon't let fanfiction get a hold of this...

Teddy Roosevelt and the Rough Riders mounted on Dinosaurs.

The Jurassic Park franchise does need a new direction...

Might as well throw in time travel.

Languish.org

News:

The AI dooooooom thread

DGuller

garbon

DGuller

crazy canuck

Josquius

Jacob

DGuller

Sheilbh

Jacob

crazy canuck

Sheilbh

Syt

Tonitrus

Valmy

Tonitrus