Technology

37720 readers

156 users here now

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:

This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 2 years ago

MODERATORS

alyaza@beehaw.org

TheRtRevKaiser@beehaw.org

gyrfalcon@beehaw.org

rs5th@beehaw.org

Los@beehaw.org

coldredlight@beehaw.org

SemioticStandard@beehaw.org

TheRtRevKaiser@kbin.social

remington@beehaw.org

Why wordfreq will not be updated - AI spam (github.com)

submitted 1 month ago by Templa@beehaw.org to c/technology@beehaw.org

10 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] tal@lemmy.today 21 points 1 month ago (7 children)

wordfreq is not just concerned with formal printed words. It collected more conversational language usage from two sources in particular: Twitter and Reddit.

Now Twitter is gone anyway, its public APIs have shut down,

Reddit also stopped providing public data archives, and now they sell their archives at a price that only OpenAI will pay.

There's still the Fediverse.

I mean, that doesn't solve the LLM pollution problem, but...

[–] Melody@lemmy.one 9 points 1 month ago (6 children)

I'm going to be bold enough to say we don't have as wide of an AI/LLM issue on the Fediverse as the other platforms will have.

I'm certain that if someone did collect data from the Fediverse; it would become a hot topic and it might not be enough data anyways as the Fediverse is not mainstream enough normally. So the data and language collected here might skew in a few imaginable ways that one might find undesirable for a general model of word frequencies.

Also the fact that people might not appreciate that data being collected. Let's be real. It's too soon for such a project to begin. The AI TREND MUST DIE as it currently lives and it's corpse must be rotted away completely. Now, in internet time that may not be all that long...a few to several years...the memory of the internet can be short-lived at times. It must, however, fade from the public conscience into some obscurity first.

Once the technology no longer lies in greedy hands again; new development can begin anew.

[–] Danterious@lemmy.dbzer0.com 5 points 1 month ago (1 children)

I’m going to be bold enough to say we don’t have as wide of an AI/LLM issue on the Fediverse as the other platforms will have.

Why do you think that? I don't think that there is anything systemic in how the fediverse operates that will stop LLMs polluting the discourse here too. Actually I already think that they are polluting the discourse here.

~Anti~ ~Commercial-AI~ ~license~ ~(CC~ ~BY-NC-SA~ ~4.0)~

[–] Melody@lemmy.one 2 points 1 month ago (1 children)

The filtration capabilities available to most users is pretty robust; depending on what you use to interact with the Fediverse. I thinik it would be possible to filter out problematic bots, users and even whole domain sources with the right kind of software.

[–] Danterious@lemmy.dbzer0.com 1 points 1 month ago (1 children)

Good point. I have been a lot more active in tailoring my experience here compared to other social media. I wish there was more tools for deciding whether or not you want to block someone though. Sometimes its not as simple as just looking at their post history. Also as an aside I wish it was possible to block votes as well so the ranking of the content was also able to be personalized.

~Anti~ ~Commercial-AI~ ~license~ ~(CC~ ~BY-NC-SA~ ~4.0)~

[–] Melody@lemmy.one 2 points 1 month ago (1 children)

Such a system might be constructed for one's own scraping needs by taking any one of the current frontend/backends and customizing that behavior such that it could mitigate issues or ingest/ignore data based on your own inputs as well; such that your model could be "riding along on a human surfboard with human guidance"

[–] Danterious@lemmy.dbzer0.com 1 points 1 month ago

such that your model could be “riding along on a human surfboard with human guidance”

Sorry I don't really understand what you're saying here.

~Anti~ ~Commercial-AI~ ~license~ ~(CC~ ~BY-NC-SA~ ~4.0)~

load more comments (4 replies)