News:

And we're back!

Main Menu

ChatGPT thread

Started by DGuller, July 11, 2023, 02:01:53 AM

Previous topic - Next topic

DGuller

A tangent conversation in the other thread made me think that we need to have a serious thread about large language models.  I don't mean the thread about the fun uses of it, but rather the thread about how useful they are today at doing serious things.  I was a little surprised at how skeptical pretty much everyone else was that ChatGPT is a serious tool with serious uses, as in my professional life LLMs have been universally accepted as revolutionary.

One thing always brought up is hallucination.  While it's an issue, I think it's an overblown issue.  In my professional life, the hard part is getting any kind of idea.  Verifying it or discarding it is a much more manageable problem than having it occur to you in the first place.  However, another issue could be the difference between the free version of ChatGPT and the paid version.  The free version uses GPT-3.5, while the paid version gives you an option of using GPT-4, which is a model with much higher capacity (1000 times the number of parameters).

I decided to test the two models on the first somewhat obscure factual question that popped into my head.  For reasons I cannot explain, that question was "What was the fate of the original five marshals of the Soviet Union?"

This is the free ChatGPT output:

QuoteThe original five marshals of the Soviet Union were a group of high-ranking military officers who played significant roles during World War II and in the post-war period. Here's information on the fate of each of the original five marshals:

Georgy Zhukov: Georgy Zhukov was the most prominent and renowned marshal of the Soviet Union. He played a crucial role in the defense of Moscow, the Battle of Stalingrad, and the Battle of Berlin. After the war, Zhukov served as Deputy Minister of Defense and then as Minister of Defense of the Soviet Union. However, his influence began to decline in the 1950s due to political disagreements with the Soviet leadership. In 1957, he was removed from his position and sent to various diplomatic posts. He was rehabilitated in 1961 and held various positions in the government until his death in 1974.

Aleksandr Vasilevsky: Aleksandr Vasilevsky was another key figure in the Soviet military during World War II. He served as the Chief of the General Staff and later as the Deputy Minister of Defense. After the war, Vasilevsky continued to hold important military positions, including being the Soviet representative to the Allied Control Council in Germany. He later served as the Minister of Defense and was promoted to Marshal of the Soviet Union in 1949. Vasilevsky retired in 1957 and lived until 1977.

Ivan Konev: Ivan Konev played a significant role in several major battles during World War II, including the Battle of Stalingrad and the Battle of Berlin. After the war, he served as the commander of Soviet forces in East Germany and held various military and political positions. Konev was promoted to Marshal of the Soviet Union in 1955. In the 1960s, he was appointed as the Commander-in-Chief of the Warsaw Pact forces. Konev retired in 1965 and passed away in 1973.

Konstantin Rokossovsky: Konstantin Rokossovsky was a prominent military strategist and commander during World War II. After the war, he held various high-ranking military and political positions. Rokossovsky served as the Minister of Defense of Poland from 1949 to 1956. He returned to the Soviet Union and became the Deputy Minister of Defense. Rokossovsky was promoted to Marshal of the Soviet Union in 1955. He retired from active service in 1962 and passed away in 1978.

Semyon Budyonny: Semyon Budyonny was a cavalry officer who played a significant role during the Russian Civil War and later became one of the original marshals of the Soviet Union. After World War II, Budyonny served as the commander of the North Caucasus Military District and held various military and political positions. However, his influence and military career gradually diminished in the 1950s. Budyonny was removed from his positions and retired in 1956. He lived until 1973.

It's important to note that the careers and fates of these marshals were influenced by the complex political dynamics of the Soviet Union, and their fortunes varied over time.

Ugh, not looking good there.  It got 1 out of 5 marshals correct.  While all the other eventually became marshals, they weren't part of the first 5.

Here is what the paid ChatGPT gives you when you select GPT-4:

QuoteThe original five Marshals of the Soviet Union were:

Kliment Voroshilov: Kliment Voroshilov was an important military and political leader in the Soviet Union. After holding various posts in the government, including that of Defense Commissar (Minister of Defense), he was promoted to the rank of Marshal of the Soviet Union in 1935, one of the first five to receive this honor. He later served as Chairman of the Presidium of the Supreme Soviet, a position equivalent to President. He died of heart failure in 1969.

Semyon Budyonny: Semyon Budyonny was a military leader known for his command during the Russian Civil War and later in the early stages of World War II. He was also among the first five men appointed as Marshals of the Soviet Union in 1935. He held various military and political posts until his death in 1973.

Vasily Blyukher: Vasily Blyukher was another of the first five Marshals, appointed in 1935. He had an active military career in the Soviet Union, participating in the Russian Civil War and the early stages of the Soviet-Japanese Border Wars. He was a victim of Stalin's purges and was arrested in 1938. He died later the same year, under circumstances that remain unclear. Officially, he died of a heart attack, but it's believed he may have been executed.

Alexander Yegorov: Alexander Yegorov was a high-ranking Soviet military leader who was one of the first five individuals to receive the rank of Marshal of the Soviet Union in 1935. He played a key role in several military operations, including the Russian Civil War and the defense of the Soviet Union during the early years of World War II. However, he fell out of favor during Stalin's purges, was arrested in 1938, and executed in 1939.

Mikhail Tukhachevsky: Mikhail Tukhachevsky was an influential military leader and theorist who contributed significantly to the modernization of the Red Army. Appointed a Marshal of the Soviet Union in 1935, Tukhachevsky was falsely accused of treason during Stalin's Great Purge in 1937. He was arrested, tried, and executed in June of the same year.

Much better.  Got all of them correct.

In general, I don't recall the last time I got a hallucination out of GPT-4 without me trying to drill it with really obscure factual questions.  I'm sure it happens, but it's far harder to make it hallucinate.  Having 1000 times as many parameters sure helps.

DGuller

Here is another example of what GPT-4 can do.  Here is a question and an answer.  While there is some unfortunate hallucination at the very end of the answer, I would hazard a guess that many skeptics would be surprised at the capabilities of the model.

Question:
QuoteCan you evaluate the quoted statement: "This paragraph is an example of why I said what I said earlier, about you not being qualified to identify whether the different words are describing the same concept, or a different concept.  It's clear to me that you have not grasped the meaning behind the technical description of how large language models work.

It is true that ChatGPT hallucinates, but your "because" does not follow.  Yes, you are correct that the way ChatGPT writes is by predicting one word at a time, but it seems like you've taken that knowledge in without really understanding the concept or the context.  As I'm writing this reply, I'm also writing one word at a time, and I think most people likewise write one word at a time.

Even though I'm writing one word at a time, I have a general idea in my mind what I want to convey, and that general idea is a big part in determining what the next word I'm typing is going to be.  I still care what words I already typed, because obviously I want the next word to play nice with everything I've already written before it, but that doesn't mean that I'm just writing something that strings together.  You seem to think that predicting which word to write one at a time somehow precludes the possibility that a larger ideation of what is being written can possibly exist.

I don't know what these experts said on the podcast.  More likely than not, they said something insightful and correct.  That doesn't necessarily mean that what you got out of listening to them is anywhere near what they wanted to convey.  Clearly some other communication about LLMs didn't land the way it was intended, because no expert would want you to believe that LLMs hallucinate because they write one word at a time."

Answer:
QuoteThe person who wrote this statement is challenging the listener's understanding of how large language models (LLMs), such as GPT, work. They're explaining that while it's true that GPT generates responses by predicting one word at a time, it doesn't mean the system lacks a larger guiding concept behind its generated responses.

The writer is making an analogy to human writing, explaining that while we humans write one word at a time, we have an overarching idea or goal for what we're communicating. The same applies to GPT, they argue, even though GPT doesn't "understand" in the same way humans do.

The key disagreement between the writer and the listener seems to be the listener's belief that GPT's "hallucinations" (i.e., its generation of creative or unanticipated responses) occur simply because the model writes one word at a time. The writer suggests that this belief oversimplifies the complex nature of language modeling, and that it fails to grasp how these models use patterns in their training data to generate coherent and contextually relevant responses.

In the final part of the statement, the writer seems to caution the listener against misunderstanding expert commentary, implying that the listener may have misconstrued the experts' statements on a podcast. The writer emphasizes the importance of a deep understanding of a subject before forming conclusions, which in this case is the functionality of LLMs.

To evaluate the statement, it shows a deep understanding of how LLMs function and underlines the nuances involved. The analogy to human writing helps to clarify how LLMs generate coherent responses, despite their one-word-at-a-time approach. However, the writer's tone may come off as condescending or dismissive, which could deter a productive conversation and limit the listener's openness to understanding and learning.

Sheilbh

I work in media and hallucination is a big issue in terms of the use of AI in the sector (and possibly a worry for other uses). It's probably the biggest obstacle to overcome (although the IP issues are also a factor).

There was a story from The Guardian of a researcher looking into strategic litigation/SLAPPs. They emailed the Guardian and asked about a list of articles that they were having trouble accessing in relation to litigation from one company or if they weren't available because they'd been withdrawn or legalled if it would be possible to talk.

The Guardian couldn't find any of them. They looked in the online archive, print archive, withdrawn articles and all through the legal files. After a bit of back and forth they realised the researcher had used ChatGPT to do some research on sources.

The list had identified not just plausible articles, but the Guardian journalist who has done lots of reporting on that company - to such an extent that when the journalist was shown the list they weren't sure whether they'd written those articles or not.

Now in terms of putting any of that tech into the production of journalism there's clearly big risks. In addition, from my understanding, the benefit is likely to be from fully automating/relying on it not having it in bits and pieces. Bild are moving strongly to AI so we'll see what goes on there.

The other point, though, is that we're likely to move to AI search pretty soon and from a media companies perspective our nightmare is that the search AI gives an untrue answer, referring to a non-existent article that it says we've published (especially if it's libellous :lol: :bleeding:). If that risk is there then it feels like the search AI needs a big disclaimer next to it saying "this may not tell the truth" which I think undermines the point of it.

From what I understand it's basically a bit of an open question of whether it's a feature or a bug. I've listened to some technical people who understand far more and from what I understand there's some who basically think hallucination is a problem that can be solved by refining and fixing the models we have; there are others who say that it's because the AI is predicting the words in response but doesn't have a conceptual or symbolic understanding (so it doesn't spot contradictions or inaccuracies because that's not what it's been taught) that the hallucination is a feature of this model of AI and it might require a bit of a next generation to fix it. If it is designed to predict the series of words in response to questions but not understanding the underlying concepts or the relationships then the hallucinations may just be a part of this generation of AI.

I'm in no way qualified. From my data lawyer perspective there's loads that is legally interesting and uncertain. One very geeky one is whether an answer/output about an individual is their personal data. Is a string of predicted text truly relating to an actual individual? But it's going to be used - so it's working out what are sensible guard rails around it.
Let's bomb Russia!

DGuller

Quote from: Sheilbh on July 11, 2023, 05:21:19 AMFrom what I understand it's basically a bit of an open question of whether it's a feature or a bug. I've listened to some technical people who understand far more and from what I understand there's some who basically think hallucination is a problem that can be solved by refining and fixing the models we have; there are others who say that it's because the AI is predicting the words in response but doesn't have a conceptual or symbolic understanding (so it doesn't spot contradictions or inaccuracies because that's not what it's been taught) that the hallucination is a feature of this model of AI and it might require a bit of a next generation to fix it. If it is designed to predict the series of words in response to questions but not understanding the underlying concepts or the relationships then the hallucinations may just be a part of this generation of AI.
IMO, I think it would be wise to move away from the assumption that LLMs can't conceptualize the underlying concepts like humans when they write responses.  In order for us to be sure that they can't conceptualize like humans, we first have to understand how humans conceptualize, and we also have to have a way to measure "conceptualization".  Yes, LLMs are just literally trillions of parameters interacting with each other to "make sense" of the communication, but how do we know for sure that human brains work materially differently when they do that?

DGuller

I kept probing further on the original topic of purged Soviet military leaders, and went into ever more obscure territory to see at which point hallucination becomes noticeable.  I started to notice it when I was asking about the families of purged officers.  Some details were correct, but sometimes the first names were mixed up, and sometimes the same person was correctly identified as the wife the first time, but she became daughter the next time.

If you think about it from the point of view of a human being, it's not that strange that you'll get the details wrong once you get into less traveled paths of knowledge.  As a human, you would probably be exposed once in your life to it.  It's possible that humans would be better at identifying when they're not sure, and opt to not say anything than mix up the recollection, but it's not like humans are perfect at knowing what they don't know either. 

I think the big part of the solution to hallucination on the user side is to treat ChatGTP as a human with memory, not as a database.  You're not going to put your total trust in one human's recollection if the stakes are high enough, you'll verify what they have to say, but conversation with even fallible humans can really get you started in the right direction.

Sheilbh

Quote from: DGuller on July 11, 2023, 12:12:01 PMI think the big part of the solution to hallucination on the user side is to treat ChatGTP as a human with memory, not as a database.  You're not going to put your total trust in one human's recollection if the stakes are high enough, you'll verify what they have to say, but conversation with even fallible humans can really get you started in the right direction.
But I think that gets to the point of the risk with generative AI in where and how it is being introduced. For example integrating it into search as Google are planning - I don't think we currently consider search as a fallible human, rather it is an indexing tool - and obviously that highlights the risk that if you've not found something on Google search now, turning to Bard is not going to help (but seems to be a common thing).

Similarly I know we've done some experiments on it generating headlines (on post-publication stories) as summarising is something ChatGPT is pretty good at. And it's generally pretty good, especially if you give it a fixed format - so for example an interview where you normally want a quote and the name of the interviewee in the headline. But there have still been cases where it's made up a quote that doesn't appear in the copy.

One really interesting use - which is a bit in line with your "think of it as a human" point - is apparently giving it quite an article in quite a technical area, for example by a science or environment reporter and basically asking if there are any bits it doesn't understand or that are confusing. Apparently it's generally pretty good at flagging things. Sometimes the reporter disagrees and doesn't change what they've written, other times it has pointed an area that could be clearer for a general audience.
Let's bomb Russia!

DGuller

Quote from: Sheilbh on July 11, 2023, 12:47:44 PM
Quote from: DGuller on July 11, 2023, 12:12:01 PMI think the big part of the solution to hallucination on the user side is to treat ChatGTP as a human with memory, not as a database.  You're not going to put your total trust in one human's recollection if the stakes are high enough, you'll verify what they have to say, but conversation with even fallible humans can really get you started in the right direction.
But I think that gets to the point of the risk with generative AI in where and how it is being introduced. For example integrating it into search as Google are planning - I don't think we currently consider search as a fallible human, rather it is an indexing tool - and obviously that highlights the risk that if you've not found something on Google search now, turning to Bard is not going to help (but seems to be a common thing).

Similarly I know we've done some experiments on it generating headlines (on post-publication stories) as summarising is something ChatGPT is pretty good at. And it's generally pretty good, especially if you give it a fixed format - so for example an interview where you normally want a quote and the name of the interviewee in the headline. But there have still been cases where it's made up a quote that doesn't appear in the copy.

One really interesting use - which is a bit in line with your "think of it as a human" point - is apparently giving it quite an article in quite a technical area, for example by a science or environment reporter and basically asking if there are any bits it doesn't understand or that are confusing. Apparently it's generally pretty good at flagging things. Sometimes the reporter disagrees and doesn't change what they've written, other times it has pointed an area that could be clearer for a general audience.
Regarding the first point, I think it's fair that humans currently have insufficient appreciation for uncertainty, I even know some data scientists who are very bad at truly grasping the implications of it.  It is very unnatural for humans to understand uncertainty properly, it's one of those "software hotfixes" that today you can only get through deliberate education followed by practice.  However, it's going to be the way of life going forward, as every AI model has at least some level of uncertainty, and AI models will only get more prevalent with time.

As far as your second paragraph about generating headlines, I'm curious, why can't you treat it as a human the same way you do in your third paragraph?  What if you treat ChatGTP as an inexperienced assistant you asked to suggest a headline; you're not just going to blindly take what your assistant gives you, you'll probably check it and hope that you won't have to waste a lot of time intervening.  Or am I being naive about how headlines are generated?

Sheilbh

Quote from: DGuller on July 11, 2023, 01:06:11 PMRegarding the first point, I think it's fair that humans currently have insufficient appreciation for uncertainty, I even know some data scientists who are very bad at truly grasping the implications of it.  It is very unnatural for humans to understand uncertainty properly, it's one of those "software hotfixes" that today you can only get through deliberate education followed by practice.  However, it's going to be the way of life going forward, as every AI model has at least some level of uncertainty, and AI models will only get more prevalent with time.
Yeah although I think it also requires a shift in how people understand or use search tools. I think it's less uncertainty than there is a shift to something that indexes results that respond to your terms - and you may not be able to find something - and a tool that is like a fallible human. It's moving from an incomplete library index to a librarian in that sense.

I think there's also a specific fear from a media perspective of it hallucinating something and attributing it to an article we didn't publish because I think the reputational hit would be on us and it's something we wouldn't necessarily even know about as when they're happening.

QuoteAs far as your second paragraph about generating headlines, I'm curious, why can't you treat it as a human the same way you do in your third paragraph?  What if you treat ChatGTP as an inexperienced assistant you asked to suggest a headline; you're not just going to blindly take what your assistant gives you, you'll probably check it and hope that you won't have to waste a lot of time intervening.  Or am I being naive about how headlines are generated?
I think partly there there is a range of mistakes you can expect an inexperienced human sub-editor to make in a newsroom. Something like making up a quote for the headline is well outside of that and is in immediate termination territory. This may be quite media specific but one of the core parts of being a journalist (even a very inexpeirenced one) is not making things up and it's, unfortunately, one of the problems at the minute. For example in more acceptable mistakes, missing the point, or choosing the wrong angle to lead on are understandable mistakes - that may prompt a bit of a re-write as in the third paragraph.

Obviously still playing around with it and it will vary by sector - for example for marketing or PR I can see immediate uses in terms of generating headlines/subject lines etc. I think it's going to be useful in the workplace (I'm looking forward to seeing reports back from people who've signed for Google Workspace's AI).

But at the minute in this sector I think the hallucination issue is the big obstacle and because of the IP position being so unclear it's also just not possible to put content in before it's been published. The area where I think it could be immediately useful is possibly doing a live summary of a liveblog. At the minute those recaps are written by hand every 3-4 hours and having an AI doing that could be easily labeled and it would free up the human journalists time to be updating the actual liveblog/reporting (plus by definition it's post-publication stuff so a bit more relaxed IP/risk wise). As well as the clarifying point (but that's of limited use because IP and risk).

The sense I get - with a few exceptions - is a lot of play it by ear, in part because an inexperienced assistant isn't necessarily a great pitch. It's an astonishing achievement and I'm always very impressed seeing people use it or playing around myself. I suspect going to improve rapidly, but I think as a tool for this sector it's probably not there yet.

Also on headlines there's a style point that would need to be developed but I think that's probably the easiest point to fix. It just summarises now rather than come up with a pun or a gag or something else to attract attention. Also I suspect that a New York Times headline, AI would struggle to compose.
Let's bomb Russia!

DGuller

I think that's a good point:  generally we think of assistants as capable of making mistakes, but we expect them to be honest.  Lack of trust is often a fatal flaw in such working relationship.  With ChatGPT we have it backwards. 

I guess that is going to require some different approach, though as I said, the obvious one is that you shouldn't expect them to be honest.  Maybe in some cases you can live with it and adjust to it, but in journalism that may be just a hard no for some tasks.  Some applications just don't tolerate stochastic element.

Josquius

I wonder whether they couldn't code honesty into it.
Make it not speak with full authority when it hasn't got a clue. Make clear "I think x but that's based on one not great source"
I've seen bings AI let's you switch between safe and more out there thinking, I've not played with it much myself but could be similar?
██████
██████
██████

HVC

ChatGPT getting dumber

At least CHATGPT 4 is. Not sure why they're running multiple versions, but I guess there's a good reason. 
Being lazy is bad; unless you still get what you want, then it's called "patience".
Hubris must be punished. Severely.

DGuller

The original GPT4 was probably too heavy on the computing resources to keep using.  The number of parameters in these LLMs can be just mind-numbing, and at some point the amount of energy needed to just use the model is prohibitively expensive and not covered by the $20/month fee.  I find that the current one is still plenty smart enough, however.

viper37

Quote from: DGuller on July 20, 2023, 02:23:06 AMThe original GPT4 was probably too heavy on the computing resources to keep using.  The number of parameters in these LLMs can be just mind-numbing, and at some point the amount of energy needed to just use the model is prohibitively expensive and not covered by the $20/month fee.  I find that the current one is still plenty smart enough, however.

There's a free competing product, Llama 2.  It's based on Meta AI and it's Open Source.

Haven't tried it yet, it seems a tad more complicated than ChatGPT, but it's apparently better, for the free version at least.
https://ai.meta.com/llama/
I don't do meditation.  I drink alcohol to relax, like normal people.

If Microsoft Excel decided to stop working overnight, the world would practically end.

Josquius

I've certainly heard talk it can be useful for data crunching. Though I not sure I see how given the input limits.
Also we've been explicitly told at work not to do this given it saves it's input and there's huge gdpr issues of releasing this stuff into the public like so.
██████
██████
██████

DGuller

Quote from: Josquius on July 22, 2023, 10:14:17 PMI've certainly heard talk it can be useful for data crunching. Though I not sure I see how given the input limits.
Also we've been explicitly told at work not to do this given it saves it's input and there's huge gdpr issues of releasing this stuff into the public like so.
It's a fair concern, and one of my biggest fears is an intelligent Google accidentally figuring something out based on a pattern of searches, but being too inhuman to realize it should keep the conclusions to itself.  Even many years ago, much dumber recommendation algorithms were already accidentally outing pregnant teenager daughters to their parents because of the kinds of items they searched for.

That said, my own company explicitly allows ChatGPT usage, but with a huge caution that everyone is responsible to not put any sensitive information into it.  The thinking is that this is a tool that can significantly improve productivity, and the genie is out of the bottle.  I'm sure that sooner rather than later, there would be enterprise solutions specifically designed to comply with all privacy policies.