this post was submitted on 01 Oct 2024

88 points (81.9% liked)

Asklemmy

43893 readers

1139 users here now

A loosely moderated place to ask open-ended questions

Search asklemmy 🔍

If your post meets the following criteria, it's welcome here!

Open-ended question
Not offensive: at this point, we do not have the bandwidth to moderate overtly political discussions. Assume best intent and be excellent to each other.
Not regarding using or support for Lemmy: context, see the list of support communities and tools for finding communities below
Not ad nauseam inducing: please make sure it is a question that would be new to most members
An actual topic of discussion

Looking for support?

Looking for a community?

Lemmyverse: community search
sub.rehab: maps old subreddits to fediverse options, marks official as such
!lemmy411@lemmy.ca: a community for finding communities

~Icon~ ~by~ ~@Double_A@discuss.tchncs.de~

founded 5 years ago

MODERATORS

Why are we training AIs on reddit posts instead of Research Papers? We could be saving the world! (lemmy.dbzer0.com)

submitted 1 month ago by Melatonin@lemmy.dbzer0.com to c/asklemmy@lemmy.ml

92 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] TheOubliette@lemmy.ml 23 points 1 month ago (21 children)

"AI" is a parlor trick. Very impressive at first, then you realize there isn't much to it that is actually meaningful. It regurgitates language patterns, patterns in images, etc. It can make a great Markov chain. But if you want to create an "AI" that just mines research papers, it will be unable to do useful things like synthesize information or describe the state of a research field. It is incapable of critical or analytical approaches. It will only be able to answer simple questions with dubious accuracy and to summarize texts (also with dubious accuracy).

Let's say you want to understand research on sugar and obesity using only a corpus from peer reviewed articles. You want to ask something like, "what is the relationship between sugar and obesity?". What will LLMs do when you ask this question? Well, they will just attempt to do associations and to construct reasonable-sounding sentences based on their set of research articles. They might even just take an actual semtence from an article and reframe it a little, just like a high schooler trying to get away with plagiarism. But they won't be able to actually mechanistically explain the overall mechanisms and will fall flat on their face when trying to discern nonsense funded by food lobbies from critical research. LLMs do not think or criticize. Of they do produce an answer that suggests controversy it will be because they either recognized diversity in the papers or, more likely, their corpus contains reviee articles that criticize articles funded by the food industry. But it will be unable to actually criticize the poor work or provide a summary of the relationship between sugar and obesity based on any actual understanding that questions, for example, whether this is even a valid question to ask in the first place (bodies are not simple!). It can only copy and mimic.

[–] howrar@lemmy.ca 2 points 1 month ago* (last edited 1 month ago) (12 children)

Why does everyone keep calling them Markov chains? They're missing ~~all the required properties, including~~ the eponymous Markovian property. Wouldn't it be more correct to call them stochastic processes?

Edit: Correction, turns out the only difference between a stochastic process and a Markov process is the Markovian property. It's literally defined as "stochastic process but Markovian".

[–] TheOubliette@lemmy.ml 2 points 1 month ago (11 children)

Because it's close enough. Turn off beam and redefine your state space and the property holds.

[–] howrar@lemmy.ca 4 points 1 month ago (1 children)

Why settle for good enough when you have a term that is both actually correct and more widely understood?

[–] TheOubliette@lemmy.ml 0 points 1 month ago (1 children)

What term is that?

[–] howrar@lemmy.ca 2 points 1 month ago (1 children)

Stochastic process

[–] TheOubliette@lemmy.ml 1 points 1 month ago (1 children)

But that's so vague. Molecules semi-randomly smashin into each other is a stochastic process

[–] howrar@lemmy.ca 2 points 1 month ago (1 children)

That's basically like saying that typical smartphones are square because it's close enough to rectangle and rectangle is too vague of a term. The point of more specific terms is to narrow down the set of possibilities. If you use "square" to mean the set of rectangles, then you lose the ability to do that and now both words are equally vague.

[–] TheOubliette@lemmy.ml 1 points 1 month ago (1 children)

Is this referring to what I said about Markov chains or stochastic processes? If it's the former the only discriminating factor is beam and not all LLMs use that. If it's the latter then I don't know what you mean. Molecular dffusion is a classic stochastic process, I am 100% correct in my example.

[–] howrar@lemmy.ca 1 points 1 month ago (1 children)

It's in reference to your complaint about the imprecision of "stochastic process". I'm not disagreeing that molecular diffusion is a stochastic process. I'm saying that if you want to use "Markov process" to describe a non-Markovian stochastic process, then you no longer have the precision you're looking for and now molecular diffusion also falls under your new definition of Markov process.

[–] TheOubliette@lemmy.ml 0 points 1 month ago (1 children)

Okay so both of those ideas are incorrect.

As I said, many are literally Markovian and the main discriminator is beam, which does not really matter for helping people understand my meaning nor should it confuse anyone that understands this topic. I will repeat: there are examples that are literally Markovian. In your example, it would be me saying there are rectangular phones but you step in to say, "but look those ones are curved! You should call it a shape, not a rectangle." I'm not really wrong and your point is a nitpick that makes communication worse.

In terms of stochastic processes, no, that is incredibly vague just like calling a phone a "shape" would not be more descriptive or communicate better. So many things follow stochastic processes that are nothing like a Markov chain, whereas LLMs are like Markov Chains, either literally being them or being a modified version that uses derived tree representations.

[–] howrar@lemmy.ca 0 points 1 month ago (1 children)

I'm not familiar with the term "beam" in the context of LLMs, so that's not factored into my argument in any way. LLMs generate text based on the history of tokens generated thus far, not just the last token. That is by definition non-Markovian. You can argue that an augmented state space would make it Markovian, but you can say that about any stochastic process. Once you start doing that, both become mathematically equivalent. Thinking about this a bit more, I don't think it really makes sense to talk about a process being Markovian or not without a wider context, so I'll let this one go.

nitpick that makes communication worse

How many readers do you think know what "Markov" means? How many would know what "stochastic" or "random" means? I'm willing to bet that the former is a strict subset of the latter.

[–] TheOubliette@lemmy.ml 0 points 1 month ago (1 children)

The very first response I gave said you just have to reframe state.

This is getting repetitive and I think it is because you aren't really trying to understand what I am saying. Please let me know when you are ready to have an actual conversation.

[–] howrar@lemmy.ca 1 points 1 month ago

The very first response I gave said you just have to reframe state.

And I said "am augmented state space would make it Markovian". Is that not what you meant by reframing the state? If not, then apologies for the misunderstanding. I do my best, but I understand that falls short sometimes.

load more comments (9 replies)

load more comments (17 replies)