zappy

joined 1 year ago
MODERATOR OF
[–] zappy@lemmy.ca 1 points 1 year ago

That's kind of the point and how's it different than a human. A human is going to weight local/recent contextual information as much more relevant to the conversation because they're actively learning and storing the information (our brains work on more of an associative memory basis than temporal). However, with our current models it's simulated by decaying weights over the data stream. So when you get conflicts between contextual correct vs "global" correct output, global has a tendency to win out that is more obvious. Remember you can't actually make changes to the model as a user without active learning. Thus the model will always eventually return to it's original behaviour as long as you can fill up the memory.

[–] zappy@lemmy.ca 2 points 1 year ago (2 children)

I'm trying to tell you limited context is a feature not a bug, even other bots do the same thing like Replika. Even when all past data is stored serverside and available, it won't matter because you need to reduce the weighting or you prevent significant change in output values (and less change as the history grows larger). Time decay of information is important to making these systems useful.

[–] zappy@lemmy.ca 1 points 1 year ago

I hear this from Americans a lot, here everything is pretty much online nowadays (although a friend of mine had her identity stolen so she has to get in person which is her biggest complaint about the whole thing)

[–] zappy@lemmy.ca 3 points 1 year ago (4 children)

The problem isn't the memory capacity, even thought the LLM can store the information, it's about prioritization/weighting. For example, if I tell chatgpt not to include a word (for example apple) in it's responses then ask it some questions then ask it a question about what are popular fruit-based pies then it will tend to pick the "better" answer of including apple pie rather than the rule I gave it a while ago about not using the word apple. We do want decaying weights on memory because most of the time old information isn't as relevant but it's one of those things that needs optimization. Imo I think we're going to get to the point where the optimal parameters for maximizing "usefullness" to the average user is different enough from what's needed to pass someone intentionally testing the AI. Mostly bc we know from other AI (like Siri) that people don't actually need that much context saved to find them helpful

[–] zappy@lemmy.ca 2 points 1 year ago

You don't get to complain about people being condescending to you when you are going around literally copy and pasting wikipedia. Also you're not right, major progress in this field started in the 80s although the concepts were published earlier, they were basically ignored by researchers. You're making it sound like the NNs we're using now are the same as the 60s when in reality our architectures and just even how we approach the problem have changed significantly. It's not until the 90s-00s that we started getting decent results that could even match older ML techniques like SVM or kNN.

[–] zappy@lemmy.ca 2 points 1 year ago (1 children)

Last time I talked about this with the other TAs, we ended up coming to the conclusion that most papers that were decent were close to the max word count or above it (I don't think the students were really treating it as a max, more like a target). Like 50% of the word count really wasn't enough to actually complete the assignment

[–] zappy@lemmy.ca 0 points 1 year ago (2 children)

The idea of NN or the basis itself is not AI. If you had actual read D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning Internal Representations by Error Propagation.” Sep. 01, 1985. then you would understand this bc that paper is about a machine learning technique not AI. If you had done your research properly instead of just reading wikipedia, then you would have also come across autoassociative memory which is the precursor to autoencoders and generative autoencoders which is the foundation of a lot of what we now think of as AI models. H. Abdi, “A Generalized Approach For Connectionist Auto-Associative Memories: Interpretation, Implication Illustration For Face Processing,” in In J. Demongeot (Ed.) Artificial, University Press, 1988, pp. 151–164.

[–] zappy@lemmy.ca 3 points 1 year ago* (last edited 1 year ago)

Over-enthusiatic english teachers... and skynet (cue dramatic music)

[–] zappy@lemmy.ca 1 points 1 year ago (4 children)

Not the specific models unless I've been missing out on some key papers. The 90s models were a lot smaller. A "deep" NN used to be 3 or more layers and that's nothing today. Data is a huge component too

[–] zappy@lemmy.ca 13 points 1 year ago

So I'm a reasearcher in this field and you're not wrong, there is a load of hype. So the area that's been getting the most attention lately is specifically generative machine learning techniques. The techniques are not exactly new (some date back to the 80s/90s) and they aren't actually that good at learning. By that I mean they need a lot of data and computation time to get good results. Two things that have gotten easier to access recently. However, it isn't always a requirement to have such a complex system. Even Eliza, a chatbot was made back in 1966 has suprising similar to the responses of some therapy chatbots today without using any machine learning. You should try it and see for yourself, I've seen people fooled by it and the code is really simple. Also people think things like Kalman filters are "smart" but it's just straightforward math so I guess the conclusion is people have biased opinions.

[–] zappy@lemmy.ca 3 points 1 year ago (1 children)

Do you ever pretend to be a robot just to mess with people?

[–] zappy@lemmy.ca 3 points 1 year ago

That's true, also at some point the human will go "that's too much work, I'm not going to answer that" but the ai will always try to give you it's best response. Like I could look up the unicode characters you're using but I'd never actually take the time to do that

view more: next ›