this post was submitted on 02 Jul 2023

86 points (98.9% liked)

Piracy: ꜱᴀɪʟ ᴛʜᴇ ʜɪɢʜ ꜱᴇᴀꜱ

54731 readers

185 users here now

⚓ Dedicated to the discussion of digital piracy, including ethical problems and legal advancements.

Rules • Full Version

1. Posts must be related to the discussion of digital piracy

2. Don't request invites, trade, sell, or self-promote

3. Don't request or link to specific pirated titles, including DMs

4. Don't submit low-quality posts, be entitled, or harass others

Loot, Pillage, & Plunder

📜 c/Piracy Wiki (Community Edition):

💰 Please help cover server costs.


Ko-fi	Liberapay

founded 1 year ago

MODERATORS

db0@lemmy.dbzer0.com

sunbrothersco@lemmy.dbzer0.com

dataprolet@lemmy.dbzer0.com

Flatworm7591@lemmy.dbzer0.com

RandomLegend@lemmy.dbzer0.com

How to remove tracking metadata from my PDF (reddthat.com)

submitted 1 year ago by SpookyBanana@reddthat.com to c/piracy@lemmy.dbzer0.com

19 comments fedilink hide all child comments

I have some good PDF ebooks I'm willing to share, but I suspect the seller embeds some tracking data in them to link them to my account, as every time I download them from the official website they have a different hash while being visually identical. The same when checking against the copies a friend bought from the same seller. Since I dont wanna get banned, can you recommend a way to remove that stuff?

top 18 comments

sorted by: hot top controversial new old

[–] LunchEnjoyer@lemmy.world 27 points 1 year ago (1 children)

Could look into using exiftool, qpdf or pdftk, if you are comfortable with the terminal ✨

[–] 0x4E4F@vlemmy.net 12 points 1 year ago

qpdf is very powerful. If OP is comfortable with the terminal, I'd recommend qpdf.

[–] ruination@discuss.tchncs.de 20 points 1 year ago

There's dangerzone by freedom of press

[–] Shizu@lemmy.world 19 points 1 year ago (1 children)

I would try reprinting the PDFs and comparing the hashes afterwards. That should remove any metadata in the headers as new headers are created.

[–] bionicjoey@lemmy.ca 9 points 1 year ago (1 children)

That wouldn't work for something like Pathfinder PDFs from the Paizo website. They add a text watermark with the name and email associated with your account on their site to each page of the document. It's not metadata, it's actual data

[–] Shizu@lemmy.world 2 points 1 year ago (1 children)

Why would the checksum differ between downloads if there was a watermark with user identifiable data

[–] bionicjoey@lemmy.ca 4 points 1 year ago* (last edited 1 year ago) (1 children)

Just checked one of my Paizo pdfs and in addition to my account name and email address it also has the datetime that I downloaded the pdf written in the watermark. Presumably because they append the file creation time when the pdf is being signed

[–] Shizu@lemmy.world 0 points 1 year ago (1 children)

Fair, then reprinting won't help. I'd go ahead and come up with some Python script which exported all pages as png, edited that specific portion of every image and recompile it to a pdf. I'm not sure if there is a too which could already do that out-of-the-box.

[–] bionicjoey@lemmy.ca 2 points 1 year ago (1 children)

Unfortunately then you lose things like text and links. I think the only real solution for my specific example (which to be clear, might not be OP's dilemma) is to crack and directly edit the binary data of the PDF file

[–] SpookyBanana@reddthat.com 1 points 1 year ago

What you mean by crack and directly edit?

[–] daranto@feddit.de 10 points 1 year ago

Maybe print the book via print to pdf and check again.

[–] gh0stkey@lemmy.world 9 points 1 year ago

Wow… The amount of information already being shared here is outstanding! Keep on rowing/patching mates

[–] thumbman@lemm.ee 6 points 1 year ago (1 children)

Okay hear me out... physically print the documents then, using a high resolution scanner, make a digital copy and finally use a raster to vector convertor.

I know this is probably dumb, but I just wanted to throw this out there.

[–] 0x4E4F@vlemmy.net 9 points 1 year ago* (last edited 1 year ago)

Why not just print it to PDF. It doesn't lose any data, plus it doesn't take ages to scan the books.

[–] CanOpener@sh.itjust.works 4 points 1 year ago

If you're on Linux, Metadata Cleaner might work. https://flathub.org/apps/fr.romainvigier.MetadataCleaner

[–] bbbhltz@beehaw.org 2 points 1 year ago

Exiftool can remove metadata. There might even be websites that can handle this.

[–] tubbadu@lemmy.kde.social 1 points 10 months ago

Perhaps printing to pdf may work

load more comments