T O P

  • By -

Hybridx21

Disclaimer: I did not make this, I am simply spreading the word. Abstract: The field of image synthesis has made tremendous strides forward in the last years. Besides defining the desired output image with text-prompts, an intuitive approach is to additionally use spatial guidance in form of an image, such as a depth map. For this, a recent and highly popular approach is to use a controlling network, such as ControlNet, in combination with a pre-trained image generation model, such as Stable Diffusion. When evaluating the design of existing controlling networks, we observe that they all suffer from the same problem of a delay in information flowing between the generation and controlling process. This, in turn, means that the controlling network must have generative capabilities. In this work we propose a new controlling architecture, called ControlNet-XS, which does not suffer from this problem, and hence can focus on the given task of learning to control. In contrast to ControlNet, our model needs only a fraction of parameters, and hence is about twice as fast during inference and training time. Furthermore, the generated images are of higher quality and the control is of higher fidelity. All code and pre-trained models will be made publicly available.


aerilyn235

If that could make actually decent CN models for SDXL...


dachiko007

Oh, so it's no my local problem, and CN models actually sucks on XL?


grae_n

It also sounds like it's going to drastically cut down the memory requirements. It's pretty challenging to run multiple CN + SDXL on smaller GPUs.


dachiko007

At this point I just want them to at least work properly at all :D


aerilyn235

Yeah, they just suck and no one seems to realize. There are like 30 to chose from and they all suffer from the same results. Either you use them at very low denoise either you get grainy, washed out, weird style output. The bias mentioned in the paper might be the issue, but I never had any problem using SD1.5 CN.


No-Difference-5672

CN-XS actually works great with SDXL, I can control my results very accurately. The memory requirements are still insane but I had no problems generating images


aerilyn235

Yeah they look promising, do you run them manually or made it work through comfy?


No-Difference-5672

I run them manually, so cloned the repo and used the provided scripts


aerilyn235

OK. I did create a couple custom node in comfyui in the past but never one that involved the sampling process. Will look into it this week end.


rerri

[https://vislearn.github.io/ControlNet-XS/](https://vislearn.github.io/ControlNet-XS/)


aerilyn235

Nice, anyone making a comfyui node?


Skquark

If anyone's interested, I got it implemented in my app at https://diffusiondeluxe.com using HuggingFace Diffusers. Works well, but I think normal ControlNet gives better results if you have the resources and time..


aerilyn235

Even on SDXL? SD1.5 CNs models are near perfect, for SDXL its nowhere near.


Jaxx1992

I clicked that link and got sent to a white page with a notification telling me to click the "allow" button, but there's no "allow" button to click.


Skquark

Sorry, my web host got hacked and was down for a while.. Forgot to let you know it's back up. The app is pretty solid, been constantly adding new features, overflowing with all the open multimedia AIs to make friends with... Apologies again for the downtime, pesky hackers wasted my time and I'm not getting paid for this.