cross-posted from: https://fanaticus.social/post/1955
Hi all, just wanted to get the discussion around mod tools and a pushshift for lemmy started. Sorry if this is a duplicate but I haven't been able to find any discussion about this topic.
If one thing we learned about reddit and third party API is that mod tools are of the utmost important for developing a thriving community. Pushshift is a powerful tool that allows its users to query aggregated data in their workflows.
The data lemmy users create (posts and comments) is valuable. Moderators use it to make informed decisions and improve the experience of their communities; researchers use it to build their own studies; LLM use it for training; internet searchers use it to find answers and opinions written by real people.
I think as admins we need to be clear up-front about the licensing of the content created on our site. I plan on specifying a Creative Commons license for my instance and would like to get some opinions on which would be best for the community.
Once properly licensed, I think it would be in the lemmy community's best interest to provide our community's data in aggregate (scrubbed of PII of course) for all those that need access to it to build tools for the community. People interested in our data will attempt to retrieve it anyway, whether through scraping or direct API access, so it is not only beneficial for our communities to make this data more easily accessible, but also for our servers.
Finally, once we establish our best practices for aggregating our data, we should begin work on building/forking/integrating with pushshift for lemmy. That will allow developers to build the mod tools our communities need to thrive.
TL;DR: establish open license for our content, provide access to PII-scrubbe data in bulk, build pushshift for lemmy, create better mod tools, (don't) profit.