Reminder that this is made by Ben Zhao, the University of Chicago professor who stole open source code for his last data poisoning scheme.
Technology
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each another!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
Approved Bots
Pardon my ignorance but how do you steal code if it's open source?
He took GPLv3 code, which is a copyleft license that requires you share your source code and license your project under the same terms as the code you used. You also can't distribute your project as a binary-only or proprietary software. When pressed, they only released the code for their front end, remaining in violation of GPLv3.
And as I said there, it is utterly hypocritical for him to sell snake oil to artists, allegedly to help them fight copyright violations, while committing actual copyright violations.
Is there a similar tool that will "poison" my personal tracked data? Like, I know I'm going to be tracked and have a profile built on me by nearly everywhere online. Is there a tool that I can use to muddy that profile so it doesn't know if I'm a trans Brazilian pet store owner, a Nigerian bowling alley systems engineer, or a Beverly Hills sanitation worker who moonlights as a practice subject for budding proctologists?
The only way to taint your behavioral data so that you don’t get lumped into a targetable cohort is to behave like a manic. As I’ve said in a past comment here, when you fill out forms, pretend your gender, race, and age is fluid. Also, pretend you’re nomadic. Then behave erratic as fuck when shopping online - pay for bibles, butt plugs, taxidermy, and PETA donations.
Your data will be absolute trash. You’ll also be miserable because you’re going to be visiting the Amazon drop off center with gag balls and porcelain Jesus figurines to return every week.
Then behave erratic as fuck when shopping online - pay for bibles, butt plugs, taxidermy, and PETA donations.
...in the same transaction. It all needs to be bought and then shipped together. Not only to fuck with the algorithm, but also to fuck with the delivery guy. Because we usually know what you ordered. Especially when it's in the soft bag packaging. Might as well make everyone outside your personal circle think you're a bit psychologically disturbed, just to be safe.
How? Aren't most items in boxes even in the bags? It's not like they just toss a butt plug into a bag and ship it...right?
The browser addon "AdNauseum" can help with that, although it's not a complete solution.
Is there a similar tool that will “poison” my personal tracked data? Like, I know I’m going to be tracked and have a profile built on me by nearly everywhere online. Is there a tool that I can use to muddy that profile so it doesn’t know if I’m a trans Brazilian pet store owner, a Nigerian bowling alley systems engineer, or a Beverly Hills sanitation worker who moonlights as a practice subject for budding proctologists?
Have you considered just being utterly incoherent, and not making sense as a person? That could work.
According to my exes, yes.
The tool's creators are seeking to make it so that AI model developers must pay artists to train on data from them that is uncorrupted.
That's not something a technical solution will work for. We need copyright laws to be updated.
You should check out this article by Kit Walsh, a senior staff attorney at the EFF. The EFF is a digital rights group who recently won a historic case: border guards now need a warrant to search your phone.
A few quotes:
and
This doesn't work outside of laboratory conditions.
It's the equivalent of "doctors find cure for cancer (in mice)."
I like that example, everytime you hear about some discovery that x kills 100% of cancer cells in a petri dish. You always have to think, so does bleach.
Explanation of how this works.
These "AI models" (meaning the free and open Stable Diffusion in particular) consist of different parts. The important parts here are the VAE and the actual "image maker" (U-Net).
A VAE (Variational AutoEncoder) is a kind of AI that can be used to compress data. In image generators, a VAE is used to compress the images. The actual image AI only works on the smaller, compressed image (the latent representation), which means it takes a less powerful computer (and uses less energy). It’s that which makes it possible to run Stable Diffusion at home.
This attack targets the VAE. The image is altered so that the latent representation is that of a very different image, but still roughly the same to humans. Say, you take images of a cat and of a dog. You put both of them through the VAE to get the latent representation. Now you alter the image of the cat until its latent representation is similar to that of the dog. You alter it only in small ways and use methods to check that it still looks similar for humans. So, what the actual image maker AI "sees" is very different from the image the human sees.
Obviously, this only works if you have access to the VAE used by the image generator. So, it only works against open source AI; basically only Stable Diffusion at this point. Companies that use a closed source VAE cannot be attacked in this way.
I guess it makes sense if your ideology is that information must be owned and everything should make money for someone. I guess some people see cyberpunk dystopia as a desirable future. I wonder if it bothers them that all the tools they used are free (EG the method to check if images are similar to humans).
It doesn’t seem to be a very effective attack but it may have some long-term PR effect. Training an AI costs a fair amount of money. People who give that away for free probably still have some ulterior motive, such as being liked. If instead you get the full hate of a few anarcho-capitalists that threaten digital vandalism, you may be deterred. Well, my two cents.
So, it only works against open source AI; basically only Stable Diffusion at this point.
I very much doubt it even works against the multitude of VAEs out there. There's not just the ones derived from StabilitiyAI's models but ones right now simply intended to be faster (at a loss of quality): TAESD can also encode and has a completely different architecture thus is completely unlikely to be fooled by the same attack vector. That failing, you can use a simple affine transformation to convert between latent and rgb space (that's what "latent2rgb" is) and compare outputs to know whether the big VAE model got fooled into generating something unrelated. That thing just doesn't have any attack surface, there's several magnitudes too few weights in there.
Which means that there's an undefeatable way to detect that the VAE was defeated. Which means it's only a matter of processing power until Nightshade is defeated, no human input needed. They'll of course again train and try to fool the now hardened VAE, starting another round, ultimately achieving nothing but making the VAE harder and harder to defeat.
It's like with Russia: They've already lost the war but they haven't noticed, yet -- though I wouldn't be too sure that Nightshade devs themselves aren't aware of that: What they're doing is a powerful way to grift a lot of money from artists without a technical bone in their body.
Fascinating that they develop this tool and then only release Windows and MacOS versions.
To be fair, windows and macos are the 2 biggest computer operating systems in the world. It makes a lot more sense to focus on building tools for people using the biggest platforms rather than focus on people using something with a user base fragmented across multiple versions of the same OS.
Though I do agree a version for Linux would be nice. Even if we have the mac equivalent of wine, darling, I don't know enough about it to say whether it's up to the task or not.
It's simple math. 97% of the population uses those two operating systems.
There isn't much more incentive to go after the 3% Linux users. You know the population that loves free and open source software and isn't exactly known for dropping a bunch of cash on software. Not to mention it's a fragmented 3%. Even the flatpak, snap, app images of the world that were supposed to make devs lives easier are fragmented across distros.
It's not FOSS and I don't see a way to review if what they claim is actually true.
It may be a way to just help to diferentiate legitimate human made work vs machine-generated ones, thus helping AI training models.
Can't demostrate that fact neither, because of its license that expressly forbids sofware adaptions to other uses.
Edit, alter, modify, adapt, translate or otherwise change the whole or any part of the Software nor permit the whole or any part of the Software to be combined with or become incorporated in any other software, nor decompile, disassemble or reverse engineer the Software or attempt to do any such things
The EULA also prohibits using Nightshade "for any commercial purpose", so arguably if you make money from your art—in any way—you're not allowed to use Nightshade to "poison" it.
I read the article enough to find that the Nightshade tool is under EULA... :(
Because it definitely is not FOSS, use it with caution, preferably on a system not connected to internet.
Apparently people who specialize in AI/ML have a very hard time trying to replicate the desired results when training models with 'poisoned' data. Is that true?
I've only heard that running images through a VAE just once seems to break the Nightshade effect, but no one's really published anything yet.
You can finetune models on known bad and incoherent images to help it to output better images if the trained embedding is used in the negative prompt. So there's a chance that making a lot of purposefully bad data could actually make models better by helping the model recognize bad output and avoid it.
Until they come with some preprocessing step, or some better feature extractors etc. This is an arms race like there are many of
Begun, the AI Wars have.
Excited to see the guys that made Nightshade get sued in a Silicon Valley district court, because they're something something mumble mumble intellectual property national security.
They already stole GPLv2 code for their last data poisoning scheme and remain in violation of that license. They're just grifters.
I bet that before the end of this year this tool will be one of the things that helped improve the performance and quality of AI.
Oily snakes slither such that back and forth looks like production..
Ironic that they used an AI picture for the article...
Ah, another arms race has begun. Just be wary, what one person creates another will circumvent.
They clam a credit to using AI to make the thumbnail..... The same people who did nothing more then ask Chat GPT to make a picture to represent the article on a tool that poisons AI models to protect people who make pictures for a living from having Chat GPT use their work to make; say a picture to represent an article on a tool that poisons AI models......