this post was submitted on 07 Aug 2024
834 points (96.7% liked)
Technology
59674 readers
4112 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each another!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
Approved Bots
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Is this llamafile?
The thing about LLMs is that no one knows how to write the ultra low level optimizations/runtimes, so they port others (llamafile largely borrows from llama.cpp AFAIK, albeit with some major contributions from their own devs).
Performance is insanely critical because they're so darn hard to run, and new models/features come out weekly which no sane dev can keep up with without massive critical mass (like HF Transformers, mainly, with llama.cpp barely keeping up with some major jank).
So... I'm not sure what Mozilla is thinking here. They don't have many of those kind of devs, they don't have a GPU farm, they're not even contributing to promising webassembly projects like mlc-llm. They're just one of a bazillion companies that was ordered to get into AI with no real purpose or advantage. And while Gemma 2B may be the "first" model that's kinda OK on average PCs, we're still a long way away from easy mass local deployment.
Anyway, what I'm getting at is that I'm a local LLM tinkerer, and I've never touched or even looked at anything from Mozilla. The community would have if anything of theirs was super useful.
From what I've heard the general idea is to run AI search on your browsing history, which is a very useful feature. I'm not deep into AI tech at all but to me it looks like that would involve local finetuning, ingesting all that history during inference sounds like a bad idea. It also wouldn't be necessary to generate stuff, only answer "Can you find that article about how nature makes blue feathers" and it's going to spit out previously-read links that match that kind of thing. Also, tl;dr-bot it.
Oh and there's already AI, as in ML, in firefox, in the form of machine translation. Language detection seems to be built-in, translating requires downloading a model per language pair, 16M parameters. Trained on workstations with 8GPUs. Which is all to say: You don't need gigantic GPU farms if you aren't training gazillion parameter models on the whole internet.
It shoudn't be finetuning, if anything it should be RAG with an embeddings model + regular inference.
This is kinda cool, but it still doesn't seem to justify bogging down a machine with a huge LLM. And I am speaking as a massive local LLM enthusiast who uses them every day.