this post was submitted on 29 Jun 2023
73 points (98.7% liked)

Reddit

13619 readers
1 users here now

founded 5 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[โ€“] twelve@sh.itjust.works 18 points 1 year ago (3 children)

I still find astonishing that tech crunch buys the argument of ML model training.

No one in their sane mind would use the API (that have always been rate limited) for fetch data for text generation. People would use HTTP or, even better, archives of reddit.

Why? Because there is better or no rate limit, there is no need to write anything (only reading) and it will stay free ๐Ÿ™‚ Also super fresh data is not dramatically useful (except in very specific corner cases when something in the news change the way we talk)

[โ€“] Hotzilla@sopuli.xyz 7 points 1 year ago* (last edited 1 year ago)

Web crawling has always worked through raw HTTP/HTML parsing, why create site specific API calls that require authentication and are throttled.

This excuse is pure bullshit.

[โ€“] AstralJaeger@lemmy.ml 5 points 1 year ago

Considering the Reddit API has a hilariously low limit, I fully understand why the AI bro's will use a scraping approach instead. I've built small discord bots that had a difficult time following the API because you had so little Requests available! I was in the process of building an event-driven system which used multiple API tokens in order to be able to keep up with multiple feeds. Its just terrible.

[โ€“] olizet@lemmy.works 3 points 1 year ago

Another proof of Reddit's incompetence.