News:

And we're back!

Main Menu

The AI dooooooom thread

Started by Hamilcar, April 06, 2023, 12:44:43 PM

Previous topic - Next topic

crazy canuck

Quote from: frunk on August 27, 2025, 10:39:05 AM
Quote from: Josquius on August 27, 2025, 09:50:48 AMChat GPT does use regular search and provide sources- though says nothing about quality of sources and will often go and make something up unsourced.

That's using the LLM itself to generate sources, which as with the rest of the model isn't an authenticated attribution and shouldn't be used as such.

I have repeated that very thing many times to Jos, I am not sure why he has still not understood the point.  Generative AI is trained on data, it does not search for accurate data or sources.
Awarded 17 Zoupa points

In several surveys, the overwhelming first choice for what makes Canada unique is multiculturalism. This, in a world collapsing into stupid, impoverishing hatreds, is the distinctly Canadian national project.

crazy canuck

#541
Quote from: Josquius on August 27, 2025, 10:50:13 AM
Quote from: frunk on August 27, 2025, 10:39:05 AM
Quote from: Josquius on August 27, 2025, 09:50:48 AMChat GPT does use regular search and provide sources- though says nothing about quality of sources and will often go and make something up unsourced.

That's using the LLM itself to generate sources, which as with the rest of the model isn't an authenticated attribution and shouldn't be used as such.

I'm not sure what you mean.
If it says something is a fact then links you off to a journal where you see it says precisely that then isn't that a relevant source?
Its like wikipedia. Anyone can edit it and a rubbish source in itself, but using it as a shortcut to valid sources can be OK.

Let me give you a concreate example from the work I do - a figure in a scholarly paper submitted for publication to a journal was caught by peer review as probably being drafted by AI (which was both not disclosed and resulted in the paper being essentially nonsensical).  It was not hard to spot the problems. The paper cited a source that was 10 years old (one of the hallmarks of AI generated research because it is just trained on data, it does not go out and find up to date data). But I digress. The figure the AI generated from the data had no relationship to the data that was in the cited source.  It was a complete fabrication. 
Awarded 17 Zoupa points

In several surveys, the overwhelming first choice for what makes Canada unique is multiculturalism. This, in a world collapsing into stupid, impoverishing hatreds, is the distinctly Canadian national project.

Zoupa

I thought AI was gonna do my laundry and dishes, not take my job away. Right now I fail to see any societal benefits and many societal detriments.

Tamas

One aspect is, I suspect, that there is a looming economic downturn and AI is as good an excuse as it gets for companies for getting rid of people. "look, we are not struggling to keep growing at all! We are in fact growing so much thst we are going to replace highly skilled people with this glorified chatbot right here"

crazy canuck

Quote from: Tamas on August 27, 2025, 01:33:41 PMOne aspect is, I suspect, that there is a looming economic downturn and AI is as good an excuse as it gets for companies for getting rid of people. "look, we are not struggling to keep growing at all! We are in fact growing so much thst we are going to replace highly skilled people with this glorified chatbot right here"

I think there is a lot to your observation.  Add to that the economic incentive to produce short term results, and we get a perfect storm. 
Awarded 17 Zoupa points

In several surveys, the overwhelming first choice for what makes Canada unique is multiculturalism. This, in a world collapsing into stupid, impoverishing hatreds, is the distinctly Canadian national project.

Josquius

#545
Quote from: crazy canuck on August 27, 2025, 12:07:05 PM
Quote from: Josquius on August 27, 2025, 10:50:13 AM
Quote from: frunk on August 27, 2025, 10:39:05 AM
Quote from: Josquius on August 27, 2025, 09:50:48 AMChat GPT does use regular search and provide sources- though says nothing about quality of sources and will often go and make something up unsourced.

That's using the LLM itself to generate sources, which as with the rest of the model isn't an authenticated attribution and shouldn't be used as such.

I'm not sure what you mean.
If it says something is a fact then links you off to a journal where you see it says precisely that then isn't that a relevant source?
Its like wikipedia. Anyone can edit it and a rubbish source in itself, but using it as a shortcut to valid sources can be OK.

Let me give you a concreate example from the work I do - a figure in a scholarly paper submitted for publication to a journal was caught by peer review as probably being drafted by AI (which was both not disclosed and resulted in the paper being essentially nonsensical).  It was not hard to spot the problems. The paper cited a source that was 10 years old (one of the hallmarks of AI generated research because it is just trained on data, it does not go out and find up to date data). But I digress. The figure the AI generated from the data had no relationship to the data that was in the cited source.  It was a complete fabrication. 


OK?
So this time it gave a wrong number and lied about a source.
You checked that and saw that so wouldn't use it.
Sometimes however the check shows it is giving things correctly.
In that case it is still a valid source no matter how you found it.

QuoteI have repeated that very thing many times to Jos, I am not sure why he has still not understood the point.  Generative AI is trained on data, it does not search for accurate data or sources
Except when it does.
What you're saying here is a generalisation that doesn't reflect how chat gpt at least works in practice.
██████
██████
██████

Sheilbh

I can only speak to news a little bit. AI is trained on data to build the model where there is a time limit. But those models are applied on other data as well.

Open AI, Amazon, Google etc (basically all the AI companies) have all been doing big licensing deals with many news publishers. This is why I think the agencies will do well - I think AP News were the first publisher to do a deal with OpenAI. But others have followed: NewsCorp, Axel Springer, AFP, Reuters etc.

So the underlying model will be based on a point in time. But - depending on what you're looking at - in generating outputs it will be applied to fresher data (this is really important for news publishers because if it was frozen in time they wouldn't be able to flow through retractions, corrections etc). If it's Google generative search then the data Google is applying the Gemini model to is the same data it uses to index the web for search.

That same data will also be used for building the next model.
Let's bomb Russia!

crazy canuck

Quote from: Josquius on August 27, 2025, 04:56:58 PM
Quote from: crazy canuck on August 27, 2025, 12:07:05 PM
Quote from: Josquius on August 27, 2025, 10:50:13 AM
Quote from: frunk on August 27, 2025, 10:39:05 AM
Quote from: Josquius on August 27, 2025, 09:50:48 AMChat GPT does use regular search and provide sources- though says nothing about quality of sources and will often go and make something up unsourced.

That's using the LLM itself to generate sources, which as with the rest of the model isn't an authenticated attribution and shouldn't be used as such.

I'm not sure what you mean.
If it says something is a fact then links you off to a journal where you see it says precisely that then isn't that a relevant source?
Its like wikipedia. Anyone can edit it and a rubbish source in itself, but using it as a shortcut to valid sources can be OK.

Let me give you a concreate example from the work I do - a figure in a scholarly paper submitted for publication to a journal was caught by peer review as probably being drafted by AI (which was both not disclosed and resulted in the paper being essentially nonsensical).  It was not hard to spot the problems. The paper cited a source that was 10 years old (one of the hallmarks of AI generated research because it is just trained on data, it does not go out and find up to date data). But I digress. The figure the AI generated from the data had no relationship to the data that was in the cited source.  It was a complete fabrication. 


OK?
So this time it gave a wrong number and lied about a source.
You checked that and saw that so wouldn't use it.
Sometimes however the check shows it is giving things correctly.
In that case it is still a valid source no matter how you found it.

QuoteI have repeated that very thing many times to Jos, I am not sure why he has still not understood the point.  Generative AI is trained on data, it does not search for accurate data or sources
Except when it does.
What you're saying here is a generalisation that doesn't reflect how chat gpt at least works in practice.


You are missing the point.  The paper was submitted for publication by the authors of the paper.  It was caught by peer reviewers. 

But more importantly, generative AI does not work the way you keep saying it does.
Awarded 17 Zoupa points

In several surveys, the overwhelming first choice for what makes Canada unique is multiculturalism. This, in a world collapsing into stupid, impoverishing hatreds, is the distinctly Canadian national project.

crazy canuck

Quote from: Sheilbh on August 27, 2025, 05:13:43 PMI can only speak to news a little bit. AI is trained on data to build the model where there is a time limit. But those models are applied on other data as well.

Open AI, Amazon, Google etc (basically all the AI companies) have all been doing big licensing deals with many news publishers. This is why I think the agencies will do well - I think AP News were the first publisher to do a deal with OpenAI. But others have followed: NewsCorp, Axel Springer, AFP, Reuters etc.

So the underlying model will be based on a point in time. But - depending on what you're looking at - in generating outputs it will be applied to fresher data (this is really important for news publishers because if it was frozen in time they wouldn't be able to flow through retractions, corrections etc). If it's Google generative search then the data Google is applying the Gemini model to is the same data it uses to index the web for search.

That same data will also be used for building the next model.

Yes, currency of some of the information on which generative AI is trained will improve for some things, and get worse for others (scholarly journals for example, will not be used to train AI).

So think about the ramifications of just that sentence.  News reports that give poor quality reports of the findings of scholarly works are going to train generative AI applications.  Soon it will be common knowledge that the scholarly literature agrees that vaccines cause autism  :P
Awarded 17 Zoupa points

In several surveys, the overwhelming first choice for what makes Canada unique is multiculturalism. This, in a world collapsing into stupid, impoverishing hatreds, is the distinctly Canadian national project.

Sheilbh

So first of all it's not totally true. Major journal publishers have also been doing big licensing deals with the AI companies. Wiley, Taylor & Francis, the New England Journal of Medicine and major university presses have all done them or announced they're in negotiations for both on journal articles and full books. So they will increasingly be part of the training data.

Given the way Google routinely abuse their market position - for example in the early days you could not opt out of your content being part of their AI building efforts unless you opted out of Google search indexing - I would be astonished if they're not using that power to get any content available on Google Scholar into their models.

I think since AI companies have, largely, accepted that they need to get licenses for content that they pay for - these deals are only going to become more common as it will primarily be a question of commercials and royalties and use/liability limitations. The big challenge for news publishers and, I'd argue, academic presses is to make sure they extract enough money out of the AI companies because they didn't in internet round one - and their content, which is well-written, edited, legaled is very valuable for companies building LLMs.

I'd also add that I think your last sentence - and the slight disdain for the press, which is essential to a democratic society, is why I think generative search is winning and why I worry an AI product will win. Because people don't trust the press and understand biases but think the solution is not engaged, critical reading but that if you just put facts and collections of the right data you will get a less biased, less humanly flawed report from of it. Why read the news article about scholarly works when you can upload the pdf and do a Q&A with ChatGPT about it?

But I think that whole conversation is slightly separate from the example you posted. To my mind the simple point there is don't use an AI to write a paper.
Let's bomb Russia!

crazy canuck

#550
Wiley is actually one I have in mind that has a specific prohibition in its editorial policy for scholars to give any rights to an AI provider to use their data in any way.

Also, I don't have disdain for real press, I have disdain for the things that are posing as news sources online now.

Awarded 17 Zoupa points

In several surveys, the overwhelming first choice for what makes Canada unique is multiculturalism. This, in a world collapsing into stupid, impoverishing hatreds, is the distinctly Canadian national project.

Sheilbh

I think that's more to do with Wiley's business model - and pitch.

In 2024 they earned $45 million from licensing content for AI. They do not allow authors to have an "opt-out". However they position this as not allowing specific users to "opt-in or opt-out" - their argument is that basically it would buttress the argument made by some AI companies that licensing isn't feasible. Individual authors can't, but publishers can in order to get more money from licensing which then results in more/better royalties to authors than they'd get on a case by case basis.

While I can kind of see their point, I think there should be a right for creators to opt-out of AI regardless of their contractual arrangements.
Let's bomb Russia!

garbon

Quote from: DGuller on August 27, 2025, 08:20:23 AMSure.  Just for clarity, my question was:  "when does that mechanism of predicting the next word results in something that isn't too functionally different from real intelligence."

My question is the fundamental question you need answered for knowing when to use AI, when to use AI very carefully, or when not to use AI at all.  Of course it's a dangerous question to get the answer wrong for, that's what makes it an important question.

Thanks for expanding on that and yes, I agree entirely. It would be good to have such introductory articles highlight that as it does appear currently that many are saying it should be used all the time.

Quote from: DGuller on August 27, 2025, 08:20:23 AMThis is the question for which it helps understanding how LLMs work.  For example, they work really well in programming because programming is by design something from which you can infer patterns from without actually knowing for sure.  If I were to go into some new language and have to write a conditional statement, I wouldn't know for a fact that it would be an "if" statement, but that would be a pretty good guess.  Knowledge that can be guessed can be effectively compressed and synthesized.

For the same reason, the legal field is where LLMs can be very dangerous, at least general purpose LLMs.  To use an obsolete example, there was a case in 1973 called Roe vs. Wade that legalized abortion in many cases.  Would you be able to use your general knowledge to guess that one party of Roe, the other party was Wade, and that it was decided in 1973?  No, this is something that you just have to know.  A random happening here or there, and it could've been Smith v. Miller decided in 1975 that legalized the abortion.  All that means is that it's very dangerous to generalize about laws, and generalization is what intelligence is about.  Even an intelligent human being who's not educated about the law can be very dangerous if he doesn't understand the limitation of his intelligence when it comes to law and tries to reason out the legal questions based on general knowledge.

I would add that it is also a shame how companies like Google help to reinforce misperceptions by their AI mode* / AI overview above search results. It gives the impression that its AI is just a superpowered version of its search engine, even when what it displays has a tendency to be built on erroneous or irrelevant information. It makes me feel sometimes like at the start of popular search engines where people just trusted the first hit.


*I'm perhaps making an unwarranted assumption on its AI mode as I have not used it. -_-
"I've never been quite sure what the point of a eunuch is, if truth be told. It seems to me they're only men with the useful bits cut off."
I drank because I wanted to drown my sorrows, but now the damned things have learned to swim.

Syt

I plopped some SQL queries from our developers into ChatGPT 5 yesterday, and it broke it down for me in a way that made sense to me and helped me understand it. But as I don't know SQL I checked with my colleague who does and he said it was a good summary and it even figured out the functions and context from the names of the data fields. It provided me with a flow diagram and even let me test out the whole thing by giving it variables for the various fields.

For a layperson like me that's very valuable; though I of course for now I would rely on colleagues to verify the accuracy of the analysis.
We are born dying, but we are compelled to fancy our chances.
- hbomberguy

Proud owner of 42 Zoupa Points.

Josquius

#554
Quote from: garbon on Today at 02:44:01 AMI would add that it is also a shame how companies like Google help to reinforce misperceptions by their AI mode* / AI overview above search results. It gives the impression that its AI is just a superpowered version of its search engine, even when what it displays has a tendency to be built on erroneous or irrelevant information. It makes me feel sometimes like at the start of popular search engines where people just trusted the first hit.


*I'm perhaps making an unwarranted assumption on its AI mode as I have not used it. -_-

For sure there's the AI= better buzz.
Google's AI result definitely is drawn from search results.... though I find it basically summarises the top one for you no matter how wrong it is. Sometimes even if the result is actually right, it interprets it terribly and makes it sound wrong.

Quote from: crazy canuck on August 27, 2025, 06:20:16 PMYou are missing the point.  The paper was submitted for publication by the authors of the paper.  It was caught by peer reviewers. 

So there's two extremes to using AI.

1: This way. What I'd call the lazy student method, though is done by more than students. Ask AI something, get an answer, just copy and paste it as-is. Done.
Any organisations that adopt this way of working are really going to screw themselves over as the errors build up.

2: Refuse to have anything to do with AI. Absolute purist. Can't be used at all.
But as desirable as a world without AI may be, if you're not using it your competitors will be, and there are efficiency gains to be had. Even if you're having to check AI's working, it at least gives you a pointer and can save time.
Organisations following this path are similarly in trouble.

The best way of working with it is to recognise what it can do and what its common mistakes are.

QuoteBut more importantly, generative AI does not work the way you keep saying it does.
Yes it does.
Go use chat gpt today and you'll see thats what it does.
Ask it for instance whats the capital of Finland- this is an easy one. Its training data is full of this fact. It can say with confidence based on just guessing the next most likely word that Helsinki is the capital of Finland.

Ask it something more complicated though, for instance when did Helsinki's first bookshop open.... then using its training data/most likely word guessing it could tell you a broad range or maybe take a stab at a date... but here it would be hallucinating something that sounds vaguely right based off knowledge (i.e. words that follow other words in sources it has had fed into it) it does have about related topics.
Chat GPT at the moment seems to have coded in a high degree of caution around such things. If its got a <70% (I guess) likelihood of being right based on training data it will then use search.

Go try it yourself.
Ask it about Roe vs. Wade, world famous, much discussed, and it should be pretty fast with writing something up for you.
Ask it about Smith vs. Gupta, I just made that up, then it probably won't jump straight to hallucination. It will take a bit longer and search.
██████
██████
██████