this post was submitted on 21 Feb 2024
506 points (97.7% liked)

Technology

59311 readers
5850 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] avidamoeba@lemmy.ca 102 points 8 months ago (1 children)

Perhaps it's becoming clear that search needs to become a common cooperatively managed infrastructure similar to Wikipedia. That this is in the best interest of everyone but advertisers and spammers.

[–] Bizarroland@kbin.social 53 points 8 months ago (2 children)

Too bad the Mozilla foundation didn't pivot to that instead of whatever the hell they're doing with AI

[–] Damage@feddit.it 80 points 8 months ago

They can't. Google is their main source of income.

[–] avidamoeba@lemmy.ca 19 points 8 months ago (2 children)

Truly. I wonder if ActivityPub could be utilized to create a resilient search engine that shares the cost among federated instances. We already have something like that in Lemmy and Mastodon where federated data can be search from any instance. If the data is pages crawled by some automatic crawler which is then federated across instances which in turn allow to search through it, perhaps it might resemble a search engine. Page ranking beyond text matching could even be done by peoples up/down votes instead of some arbitrary algorithm. Similar to how voting works on StackExchange or Lemmy. 🤔 I'm sure someone is thinking about this.

[–] deur@feddit.nl 26 points 8 months ago (2 children)

The answer to your question is no, federation is not an appropriate model for internet scale search.

[–] Sigh_Bafanada@lemmy.world 4 points 8 months ago (2 children)

Yeah I think you need a centralized system with decentralized ownership, so that no single party can fuck it up by themselves

[–] ben_dover@lemmy.world 2 points 8 months ago (1 children)
[–] Sigh_Bafanada@lemmy.world 1 points 8 months ago

I mean yeah exactly

[–] avidamoeba@lemmy.ca 1 points 8 months ago

Yeah, decentralized ownership or democratic ownership would be another way to achieve this. A federated system even if possible would almost certainly be less efficient resource-wise.

[–] avidamoeba@lemmy.ca 1 points 8 months ago* (last edited 8 months ago)

Just to be clear, what I'm referring to here is that a search would occur on a single instance. E.g. searches on lemmy.world occur on the lemmy.world instance, and load lemmy.world's servers. The federated part is in the building the database on lemmy.world. E.g. a crawler or a user on lemmy.ca adds a new web site and that record is federated to lemmy.world to add to its database. Another user on feddit.de upvotes a search result and that upvote is federated to lemmy.world so that the search result shows higher for users searching on lemmy.world. In this kind of model individual search instances could in fact be very large based on their usage. If there's no limit to what's federated, that would put a lower bound on the size of instances. If there's a limit (something dumb like federate only search records for *.fr domains) then that would allow for smaller instances that don't have the compute and storage for the complete index.

[–] umbrella@lemmy.ml 1 points 8 months ago (1 children)

the biggest question would be how to defend it from spammers and corporations with potentially much more money.

[–] avidamoeba@lemmy.ca 2 points 8 months ago* (last edited 8 months ago) (1 children)

One answer that's proven to work is by involving a lot of people's labor in the editorial/curation process. Similar to how posting/commenting/voting/moderation work on Lemmy, how it's worked on Reddit and other human-driven platforms. Corporations have proven on multiple occasions that paying for this labor is not feasible and so a system that depends on it should be corpo-resistant or capital-resistant.

[–] umbrella@lemmy.ml 2 points 8 months ago* (last edited 8 months ago)

well reddit did that and was full of shills and bots, vote manipulation, and more, this approach completely failed for them.

and they do put a lot of money into it.