this post was submitted on 09 Jun 2023
471 points (98.8% liked)

Memes

45738 readers
1109 users here now

Rules:

  1. Be civil and nice.
  2. Try not to excessively repost, as a rule of thumb, wait at least 2 months to do it if you have to.

founded 5 years ago
MODERATORS
 
you are viewing a single comment's thread
view the rest of the comments
[–] argv_minus_one@beehaw.org 1 points 1 year ago (1 children)

That problem is already solved. Google and Microsoft are already fetching every single page on Reddit for search engine indexing.

[–] sealneaward@lemmy.ml 1 points 1 year ago (1 children)

Could they be doing that already because of the still open API of Reddit and that will soon change? I just feel like it's easier for them currently and it will be tougher once the API changes are implemented.

[–] argv_minus_one@beehaw.org 1 points 1 year ago* (last edited 1 year ago)

No. Search engines fetch pages using plain old HTTP GET requests, same as how browsers fetch pages. There is some difficulty in parsing the HTML and extracting meaningful content, but it's too late: the HTML is already stored on Google/Microsoft servers, ready for extraction, and there's nothing Reddit can do to stop them.

Reddit can make future content harder to extract, but not without also making it invisible to search engines, which would cause Reddit to disappear from Google Search and Bing. That would destroy Reddit even faster than a mass moderator exodus will.

That's why I say trying to charge money for AI training data is a fool's errand. These facts make it impossible. That doesn't mean Spez won't try, but it does mean he won't succeed.