this post was submitted on 21 Jul 2023
24 points (92.9% liked)
Piracy: ꜱᴀɪʟ ᴛʜᴇ ʜɪɢʜ ꜱᴇᴀꜱ
54500 readers
363 users here now
⚓ Dedicated to the discussion of digital piracy, including ethical problems and legal advancements.
Rules • Full Version
1. Posts must be related to the discussion of digital piracy
2. Don't request invites, trade, sell, or self-promote
3. Don't request or link to specific pirated titles, including DMs
4. Don't submit low-quality posts, be entitled, or harass others
Loot, Pillage, & Plunder
📜 c/Piracy Wiki (Community Edition):
💰 Please help cover server costs.
Ko-fi | Liberapay |
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Unless you have an account there's no easy way to get access to the content on the page. Once you have an account there's technically nothing stopping you from just saving the HTML file to your computer.
Something else you can try though, assuming you don't have an account, is to just turn off JavaScript. If the site lets you partially load the content and then asks you to create an account to read more, they usually just block the content by having JavaScript add an opaque overlay. With JavaScript disabled, obviously it's not there to add the overlay and you're able to keep reading.
I have an account, so that's not a problem. The problem is how to automate going into every little content page and downloading the content, including the hi-res files.
I'm on a Mac and use SiteSucker so I know that's not super helpful but for windows you could try wGet or WebCopy? https://www.cyotek.com/cyotek-webcopy / https://gnuwin32.sourceforge.net/packages/wget.htm
Webcopy looks promising if I can get the crawler part of it to work with this site's authentication...
edit: I couldn't get Webcopy's spider to authenticate correctly.
Webcopy uses the deprecated version of Internet Explorer in Windows 10 as a module, and I can log into the website using the Capture Forms browser dialog, but the cookies or whatever else don't translate over to the spider.