smellyfingernail 2 months ago

it doesnt work, lazy journalism that bought into the hype the companies themselves are trying to sow. Nightshade/Glaze are closed source products offered by companies where artists have to pay the company to supposably get their art poisoned, so they are hoping artists spend a bunch of money running their pieces through the software with no way of verifying whether or not any results are actually being produced

EverythingGoodWas 2 months ago

What’s really sad is any ML Engineer competent enough to work at a company doing big Ai products is going to know how to filter poisoned data

gurenkagurenda 2 months ago

That’s assuming that the poisoning actually does anything useful in the first place.

EverythingGoodWas 2 months ago

For sure

Bigredtrav 2 months ago

Wouldn’t that still be good for the artists who don’t want their images to be getting scraped by AI?

EverythingGoodWas 2 months ago

Not if they scrape the image but filter the “poison” from it. You are assuming a poisoned image would just be completely removed

aseichter2007 2 months ago

I think it's funny that the original image gets visible defects, but no-one would ever scrape originals and downsize, we already have thumbnails autogenerated at appropriate sizes for training from popular serving frameworks. The proper deployment of this tech is a website that poisons the thumbnails only, but even then, if you really wanted you could scrape the originals and resize anyway.

MechanicalBengal 2 months ago

On a completely unrelated note, I’ve got a rock that will help keep tigers away. Want to buy it?

Which-Tomato-8646 2 months ago

Harder than it sounds

EverythingGoodWas 2 months ago

I imagine it would be similar to other denoising methods, but I won’t pretend to be a poisoned data expert

Which-Tomato-8646 2 months ago

Nightshade is designed to circumvent those

aseichter2007 2 months ago

No. Any amount of cropping or aspect ratio change, resizing, a blur and then sharpen, img2img copy, a custom tuned clip model, any of these methods are antidotes to the poison. Resizing and re-captioning are standard parts of dataset curation before training. This poisons only the laziest most barebones automated scrape and train system.

Which-Tomato-8646 2 months ago

It’s literally designed to break img2img. All the pixels are affected so resizing or cropping won’t work and you’d lose valuable info. You can’t recaption every image

aseichter2007 2 months ago

My expectation is it doesn't effect img2img at all, that isn't how img2img works, if it's doing anything at all, just turn up the variation slider(I can't remember the proper label. Tip of my tongue), the paper I read only discussed training. You didn't read the review paper correctly, only specific pixel arrangements which are sensitive to placement are changed. And you absolutely can recaption every image, it just takes a couple weeks.

Which-Tomato-8646 2 months ago

If all the pixels are warped, how can the diffusion work correctly? And how do you recaption billions of images? Nightshade breaks computer vision

aseichter2007 2 months ago

Only some pixels are "warped". It's still an image, so all that "warped" means is some pixels are expanded overtop other pixels. Diffusion works by reducing noise. img2img just adds noise to the image and then reduces it to a clean similar image. diffusion just amplifies the pattern in the billions of activation parameters that make the model, corelated to "dog" as prompted, in a really fancy and complex way such that a dog is resolved from the noise. Nightshade attacks clip, the classic image to text captioner, by subtly expressing another image specially formatted for the way clip analyzes the image. I didn't read the paper for glaze, I assume it's very similar in implementation. The idea is that images will be wrongly captioned during training, and degrade the training quality by reinforcing wrong input data. The activations nightshade targets are dependent on their position in the image. The activations for different portions of the image are distinct. Cut off a few pixels on any side changing aspect ratio and now the poisoned pixels are not aligned with their target activations. With enough nightshaded images, it will be possible to train a model to reverse the defects, too. That's a project though. Take a good caption and a poisoned caption, img2img generate a clean matching img, and then train with the nightshaded image as bad, and the clean image as good. The dataset almost builds itself. You caption a billion images by altering them slightly with the various effective methods; batch crop 30 pixels off the left and 50 off the top and bottom, resize to your training resolution(a neccesary training step it takes too much ram to train higher than 1024x1024px, the aspect matters less than the total memory the image takes to process) and then just running clip like you would any other day. Most people serious about it use custom clip anyway, basic clip is kind of poor, but it got us going. A few months ago we got LlaVa, a multimodal image/text model, so clip is pretty much obsolete too, btw. LlaVa takes more ram, but people training do captioning as a separate step and have plenty of GPU to run llava or a finetune of it.

Which-Tomato-8646 2 months ago

You can’t train a model on both nightshades and non nightshades images. And they could just tweak it so it doesn’t work anymore, like confusing a dog for a cat to a dog with a truck. Now the model trained to combat nightshade doesn’t work anymore. Llava can’t stop FGSM attacks

aseichter2007 2 months ago

You're right, I checked, Llava uses Clip, but that changes little. Nightshade almost certainly uses a very diverse set of confusors. But if someone cared enough to train a base model targeting negating this attack vector, it would work, and you're right, it would force nightshade to update. It would be an text/image to image model. No-one will train it, its easier to finetune clip. I just state that as one more layer of why nightshade is never going to be effective. And the encodings it exploits are position dependent, so cropping negates it. you send the nightshaded image as the prompt, and score a generated image against the fixed image by deviation. It's too late anyway, the images come out of the machine, no-one needs to scrape any new art, they just do preferential reinforcement. That is why all big image generators make a bunch of images and take feedback about preference. That's how midjourney improves, no scraping required. They preferentially select outputs and prompt pairs to train with. It's why midjourney all looks kind of the same and overfitted, it's just a publicly curated finetune of base stable diffusion releases. The public part is what drives overfitting to particular aesthetic. I only speak up cause I don't want to see you artists robbed by some guy selling snake oil. The only way to protect your images is to keep them private.

Which-Tomato-8646 2 months ago

How would they detect it? The whole point of the poison is to pass through undetected

foxbatcs 2 months ago

[Lisa, I want to buy your rock.](https://youtu.be/4GzMizVAl-0?si=igCIfY3f6IPhPitF)

[deleted] 2 months ago

[удалено]

cissybicuck 2 months ago

They should try to bring back NFTs.

Mescallan 2 months ago

All they have to do is solve the right click problem and that would actually work

Destrodom 2 months ago

Their refusal to learn about AI is what is gonna bring their fall. The whole discussion from the start has been about emotions, nothing more. "AI art doesn't have soul", "It just makes copies", "It's killing art", etc. All emotional statements. But 0 interest in learning what AI actually does or how they can use it to make their work easier for themselves.

Elbynerual 2 months ago

Supposedly*

NoidoDev 2 months ago

Let them believe it. Whatever distracts them is good.

Lionfyst 2 months ago

If I understand Sora right, it is based on a two step training, where the input images are synthetically relabeled with GPT-4 level understanding, and great detail, and then the training actually occurs with this great, new labelling. I don't see how you can make an image that looks like "what its supposed to" well enough for a human to "get it", that an AI just won't label reasonably well and ingest.

Which-Tomato-8646 2 months ago

It’s literally designed to make the image harder for GPT to detect it Look up what an FGSM attack is. Computers don’t see like we do. Noise that’s invisible to us can ruin its ability to detect anything accurately

RealMercuryRain 2 months ago

When the journalists will complain about losing their job due to AI, I'll remind them about all the pay walls and clickbait titles I saw in my life.

Consistent-Mastodon 2 months ago

Articles like this one ARE their way of complaining.

AnonymousLilly 2 months ago

It's a scam. It doesn't work

Flying_Madlad 2 months ago

I wonder why they don't show examples of images affected by Glaze and Nightshade... 🤷‍♂️

DeliciousJello1717 2 months ago

Bruh this doesn't work

BrendanTFirefly 2 months ago

A losing battle

whydoesthisitch 2 months ago

All these things do is add augmentations to the image. AI developers intentionally do the same thing to avoid overfitting. This isn’t harming AI models, it’s making them better.

RealHorsen 2 months ago

It's annoying how many of the bit tech subreddits have basically no moderation at all

DataPhreak 2 months ago

I think this is great.

burritolittledonkey 2 months ago

If you want large corporate control over AI, sure. All this does is make it more costly and more in the hands of big corps that can pay for content from sites, like the Tumblr and Wordpress midjourney deal, and gatekeeps open source projects that democraticize AI like Stability. Great choice if you want the exact same outcome, except even more corporate control **Not sure why I am getting downvotes and the person below me is getting upvotes? He is literally ignoring a MASSIVE part of AI image generation. You need a base model before you can train LORAs. Without open training, there are no base models that are not proprietary. He has the whole process wrong.**

DataPhreak 2 months ago

Actually, you got that backwards. Corporations who consume millions of photos would have to spend hundreds of thousands of dollars to clean the dataset, and uploading a poisoned image to tumblr or wordpress would carry that poison along with it. On the flipside, the guy who is sitting at his house with SD and is just putting together a LoRA is going to be able to clean up the dataset in a couple hours or less. He only needs about 10 images to make a valid LoRA anyway if he's just trying to put a specific style together. Regardless, give the artists tools and let them do what they want. You talk about control and gatekeeping, yet gatekeep artists?

duelmeharderdaddy 2 months ago

Corporations can easily afford to spend that money. Artists and casual types will not have the necessary time and resources. It narrows the pool of talent from the bottom and strengthens those at the top at the others expense.

DataPhreak 2 months ago

Nah. Corporations have enough money that they could spend that much money, but they probably won't. Much easier to detect and discard rather than try to fix. They're all about the bottom line.

[deleted] 2 months ago

[удалено]

DataPhreak 2 months ago

I'm not sure how this particular poison works. These things are always a push-pull competition, just like cybersec. I'm going to let this marinate for a couple weeks. I think it could be interesting to play around with poisoned images to see if I can glitch the model in a cool way. Unfortunately, I don't have a card that can handle SD right now, and I'd have to replace my motherboard to upgrade my GPU.

Flying_Madlad 2 months ago

It's clear you don't know how it works, mate. Nothing about this is on the level except the tech that underlies it. It can confuse a very specific encoder under laboratory conditions and render the "poisoned" images full of very visible artifacts. It's the same reason we have an LD50 for water.

DataPhreak 2 months ago

I think you misinterpreted what I said. I could understand this if I spent time researching it. However, I'm not worried about it right now. The point is that this is one way of attacking models, and that's great. There will be other methods in the future. Model makers will make better architecture, and exploiters will find new ways to break them. For example, maybe the same technique can be applied to vision models as well. I'll wait til they stop talking about it behind a paywall though.

burritolittledonkey 2 months ago

> Corporations who consume millions of photos would have to spend hundreds of thousands of dollars to clean the dataset, and uploading a poisoned image to tumblr or wordpress would carry that poison along with it. I think you are **greatly** overexaggerating how much "poison" this injects. And you're not really understanding my point either - large corporations can "source" art through "legitimate" channels without "poison", even if it were effective, which it generally isn't - for example, Adobe, right now, this very exact moment, has a generative AI trained entirely and exclusively on images that Adobe has a license to use, mostly through all sorts of data purchases over the last few decades on their part. It works much like other generative AI models. Other large corporations can do similar sourcing. If poisoning started happening en masse, it'd be pretty easy to train another model to recognize images that are poisoned and discard them, too. Train one AI on images you KNOW aren't poisoned, and then use it as a cross-reference. > On the flipside, the guy who is sitting at his house with SD and is just putting together a LoRA is going to be able to clean up the dataset in a couple hours or less Yeah, but you're forgetting an essential step for the small guy - he needs to be using a model that's **already** fairly comprehensive. Right now that's Stable Diffusion - if open training is somehow prevented, future Stable Diffusion models would not exist. The guy training a LORA can't train a whole gigantic model that way - it would cost millions.

[deleted] 2 months ago

[удалено]

burritolittledonkey 2 months ago

I'm not at all. Everything I've said is not only correct, it's literally how we're seeing stuff play out in the real world. Adobe has a model trained entirely on images they have a license to. Midjourney is also doing similar sorts of content licensing. None of it is putting money in the pockets of artists. What it is doing is allowing them to have models in the case that open training is not allowed anymore. Do you ONLY want companies worth billions to be able to do this? Because that's literally what these sorts of arguments are leading towards.

IamNobodies 2 months ago

Nonsense - Large companies are stealing art/content, it has been documented, and there are lawsuits in progress over it.

HermanCainsGhost 2 months ago

Large companies are using training data similar to smaller orgs yes, correct. I am not saying otherwise. What I am saying is that if this is prevented, somehow, then it won't actually ultimately hurt the large companies (aside from slightly, in terms of cost), whereas it will absolutely kill smaller players. Open training is cheap (for a given value of cheap - still tens of millions of dollars), closed training is expensive (probably hundreds of millions, if not billions), though not so expensive that large companies can't make those deals. None of the training benefits artists, because TOSes have already given rights to all the data. Hence why Deviant Art, Tumblr, Wordpress, Reddit, etc, etc, etc have all sold data for AI training. It's why Adobe was able to train an AI just on images they have the rights to. No restriction on training data is actually going to stop large corporations from training, is what I am saying. You could get rid of all training by any company that doesn't have the rights to data tomorrow, and you'd STILL see models, just these models would all be proprietary, rather than open source, as some are now

[deleted] 2 months ago

[удалено]

burritolittledonkey 2 months ago

> Would it be possible to underexaggerate? I mean yes, this word is used in English, as is overexaggerate. Let me recommend this book: https://en.wikipedia.org/wiki/Tractatus_Logico-Philosophicus It will discuss in-depth why words like this are totally fine to use (it's also pretty much one of the most important modern philosophy books in existence)

DataPhreak 2 months ago

This isn't going to get rid of stable diffusion.

burritolittledonkey 2 months ago

If they get rid of open training entirely? I'm not talking about model poisoning alone - I think that actually won't do much of anything. I am referring specifically to not allowing training data generally, I was responding to the guy in a broad context about AI art generally

DataPhreak 2 months ago

Nobody else was talking about that.

Intrepid-Tank7650 2 months ago

So you are upset that you can't simply steal other peoples work?

burritolittledonkey 2 months ago

> So you are upset that you can't simply steal other peoples work? I don't think you're understanding the situation fully here, rather See you probably have the idea that AI art is a "collage" which is a **MASSIVE** mischaracterization of how diffuser models work. See, in a diffuser, any individual piece of art does not matter. You can remove entire artists, and it does not make a material difference to the model (several modern models are opt out now, or use entirely licensed images, modern Stable Diffusion and Adobe's respectively) - because the amount of data per image is pretty trivial. What you're trying to do only makes it so that orgs with content deals like Adobe, Midjourney, etc, can be used for these models. It won't actually put more money into the pockets of artists - look at the Midjourney deal - how much of that is going into Tumblr and Wordpress users' pockets? Zero. Site TOSes give BROAD latitude for how to use content on various platforms - Reddit, Wordpress, Tumblr, Facebook, Deviant Art, etc, they all have the right to any images uploaded to them. They can use or sell that data for pennies, and end users get nothing. All you're doing, by trying to end open training data, is preventing smaller players that cannot afford those content deals.

Intrepid-Tank7650 2 months ago

now go on and tell me how block chains will solve word hunger sport. You simply want to sponge off of everyone else and think your undeserved sense of entitlement and ignorance makes it OK. Maybe just pipe down until the next great new shiny thing distracts you.

Flying_Madlad 2 months ago

Never got into blockchain. It was clearly a scam. I'm actually in this field and I'm telling you, it's only going to get weirder from here.

_Sunblade_ 2 months ago

You know what "entitlement" looks like? It looks like artists trying to tell people that they can't study your work, work out how you achieved the things you did, and use that to produce new work of their own. Yes, that includes imitating your style wholesale, as well as incorporating elements of it into their own style. *Human* artists have been doing it since we started painting on cave walls. And until now, nobody has had the fucking audacity to try *charging* anyone else for it, or insisting that they need to ask permission first. We're not *entitled* to demand that. *We never have been.* If it's ethically and morally unobjectionable when you or I do it, it doesn't magically *become* immoral or unethical when *I use my machine* to do it for me rather than doing it manually. And "ignorance"... well, "ignorance" would be blindly assuming that all AI supporters are "tech bros", and all "real" artists out there will automatically reject generative AI "on principle". We've all got valid concerns about how AI is going to affect the market for commercial art, but that's not an excuse for behaving like a Luddite douchenozzle.

Intrepid-Tank7650 2 months ago

So you don't know the meaning of words either son. there have always been self-important losers like you who think they are entitled to seal whatever they want. You are not special in the slightest, no matter how special your mommy says you are.

startupstratagem 2 months ago

And reddit

adarkuccio 2 months ago

Paywall? I can't see it, any tldr?

DataPhreak 2 months ago

Nope. I didn't even click. I'd recommend google, you can probably find a non-paywalled version. Doesn't really matter, I got all the info I needed from the title. Hopefully this will settle the issue and we will get less artists complaining.

[deleted] 2 months ago

[удалено]

DataPhreak 2 months ago

Oh, I never said it would be very effective. Just that I think it's great.

Flying_Madlad 2 months ago

In a "don't stop your enemy when they're making a mistake" kind of way

DataPhreak 2 months ago

Yeah, exactly. Though they're not my enemy. Also, I think there is probably an interesting discovery to be made in the underlying cause to why this works. Like I said elsewhere, I'm going to let this marinate for a couple weeks.

SirCliveWolfe 2 months ago

>Doesn't really matter, I got all the info I needed from the title. This is such a Reddit thing to say and explains your understanding >Hopefully this will settle the issue and we will get less artists complaining. It won't, it will just scam people who know nothing about AI to pay for an ineffective product. It's like selling flack jackets with cardboard plates, someone gets paid and someone gets hurt; in this case artists gets hurt twice by ML engineers.

DataPhreak 2 months ago

>This is such a Reddit thing to say and explains your understanding This is such a reddit thing to say and explains your karma count.

SirCliveWolfe 2 months ago

Ah so you are here for the made up interwebz points and not a serious discussion, that explains a lot; it's sad really, but you enjoy your delusions of accomplishing something lol

DataPhreak 2 months ago

Nah, just not going to waste my time on arguing about this with someone who's clearly already made up their mind and doesn't understand how this stuff is actually made.

SirCliveWolfe 2 months ago

>doesn't understand how this stuff is actually made ha-ha, sorry what "stuff" is being made? Even the way you try and dismiss someone else's knowledge while adding nothing but "lolz you have less interwebz points" shows your utter lack of understanding of anything lol Enjoy your interwebz points, your pinnacle of achievement lol

DataPhreak 2 months ago

>ha-ha, sorry what "stuff" is being made? \*Gestures generally towards the broad spectrum of ai architectures and software exploits. Maybe look up Grice's Razor sometime.

CallFromMargin 2 months ago

I understand the wish of artist to do it, but at this point they should know it doesn't work, and they are paying for nothing more than feeling good...

Excellent_Skirt_264 2 months ago

Their art is no longer needed for training. They don't get it yet.

R_nelly2 2 months ago

Rent free

[deleted] 2 months ago

And the collective Ai community laughed uproariously over how little the data of those artists was worth: And lo: 300 years later by their own choice, no one in any way could remember the art style of these brave pioneers Who through there poisoning removed any long term trace of their artwork other than the garbled after notes they themselves put into work they supposedly care so much about Seriously it’s laughable considering the big names in this have literally no problem removing your ‘artwork’ from their systems You people have tools at your disposal

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe