The SDXL 1. 5 model to generate a few pics (take a few seconds for those). The SDXL base model performs significantly better than the previous variants, and the model combined with the refinement module achieves the best overall performance. 9. SD. 5 in ~30 seconds per image compared to 4 full SDXL images in under 10 seconds is just HUGE!It features 3,072 cores with base / boost clocks of 1. 1. Can someone for the love of whoever is most dearest to you post a simple instruction where to put the SDXL files and how to run the thing?. I guess it's a UX thing at that point. 8 cudnn: 8800 driver: 537. Sep 03, 2023. My advice is to download Python version 10 from the. 5 negative aesthetic score Send refiner to CPU, load upscaler to GPU Upscale x2 using GFPGAN SDXL (ComfyUI) Iterations / sec on Apple Silicon (MPS) currently in need of mass producing certain images for a work project utilizing Stable Diffusion, so naturally looking in to SDXL. GPU : AMD 7900xtx , CPU: 7950x3d (with iGPU disabled in BIOS), OS: Windows 11, SDXL: 1. 0-RC , its taking only 7. Static engines use the least amount of VRAM. Only works with checkpoint library. 🧨 Diffusers SDXL GPU Benchmarks for GeForce Graphics Cards. The 3090 will definitely have a higher bottleneck than that, especially once next gen consoles have all AAA games moving data between SSD, ram, and GPU at very high rates. If you want to use more checkpoints: Download more to the drive or paste the link / select in the library section. 0-RC , its taking only 7. First, let’s start with a simple art composition using default parameters to. 2. The release went mostly under-the-radar because the generative image AI buzz has cooled. Stable Diffusion XL(通称SDXL)の導入方法と使い方. ago. Gaming benchmark enthusiasts may be surprised by the findings. For our tests, we’ll use an RTX 4060 Ti 16 GB, an RTX 3080 10 GB, and an RTX 3060 12 GB graphics card. If you want to use more checkpoints: Download more to the drive or paste the link / select in the library section. These settings balance speed, memory efficiency. This is the Stable Diffusion web UI wiki. because without that SDXL prioritizes stylized art and SD 1 and 2 realism so it is a strange comparison. the A1111 took forever to generate an image without refiner the UI was very laggy I did remove all the extensions but nothing really change so the image always stocked on 98% I don't know why. In the second step, we use a. The current benchmarks are based on the current version of SDXL 0. Building upon the success of the beta release of Stable Diffusion XL in April, SDXL 0. 0, Stability AI once again reaffirms its commitment to pushing the boundaries of AI-powered image generation, establishing a new benchmark for competitors while continuing to innovate and refine its. In this benchmark, we generated 60. 5 is version 1. Quick Start for SHARK Stable Diffusion for Windows 10/11 Users. safetensors file from the Checkpoint dropdown. Only uses the base and refiner model. No way that's 1. You can use Stable Diffusion locally with a smaller VRAM, but you have to set the image resolution output to pretty small (400px x 400px) and use additional parameters to counter the low VRAM. We are proud to host the TensorRT versions of SDXL and make the open ONNX weights available to users of SDXL globally. You can not generate an animation from txt2img. The images generated were of Salads in the style of famous artists/painters. 3. Installing ControlNet. 2, along with code to get started with deploying to Apple Silicon devices. 5 negative aesthetic score Send refiner to CPU, load upscaler to GPU Upscale x2 using GFPGANSDXL (ComfyUI) Iterations / sec on Apple Silicon (MPS) currently in need of mass producing certain images for a work project utilizing Stable Diffusion, so naturally looking in to SDXL. 10 Stable Diffusion extensions for next-level creativity. previously VRAM limits a lot, also the time it takes to generate. Nvidia isn't pushing it because it doesn't make a large difference today. 👉ⓢⓤⓑⓢⓒⓡⓘⓑⓔ Thank you for watching! please consider to subs. SDXL is superior at keeping to the prompt. ) RTX. 0. As much as I want to build a new PC, I should wait a couple of years until components are more optimized for AI workloads in consumer hardware. It's an excellent result for a $95. Aesthetic is very subjective, so some will prefer SD 1. 0 is still in development: The architecture of SDXL 1. We’ll test using an RTX 4060 Ti 16 GB, 3080 10 GB, and 3060 12 GB graphics card. SDXL-VAE-FP16-Fix was created by finetuning the SDXL-VAE to: 1. 10:13 PM · Jun 27, 2023. The drivers after that introduced the RAM + VRAM sharing tech, but it. It is important to note that while this result is statistically significant, we must also take into account the inherent biases introduced by the human element and the inherent randomness of generative models. In a notable speed comparison, SSD-1B achieves speeds up to 60% faster than the foundational SDXL model, a performance benchmark observed on A100 80GB and RTX 4090 GPUs. Devastating for performance. I don't think it will be long before that performance improvement come with AUTOMATIC1111 right out of the box. You can also vote for which image is better, this. Close down the CMD window and browser ui. Much like a writer staring at a blank page or a sculptor facing a block of marble, the initial step can often be the most daunting. Installing ControlNet for Stable Diffusion XL on Google Colab. SDXL 1. SytanSDXL [here] workflow v0. The current benchmarks are based on the current version of SDXL 0. August 21, 2023 · 11 min. NansException: A tensor with all NaNs was produced in Unet. 0 is particularly well-tuned for vibrant and accurate colors, with better contrast, lighting, and shadows than its predecessor, all in native 1024×1024 resolution. 5 will likely to continue to be the standard, with this new SDXL being an equal or slightly lesser alternative. ; Prompt: SD v1. Building a great tech team takes more than a paycheck. latest Nvidia drivers at time of writing. Thus far didn't bother looking into optimizing performance beyond --xformers parameter for AUTOMATIC1111 This thread might be a good way to find out that I'm missing something easy and crucial with high impact, lolSDXL is ready to turn heads. There definitely has been some great progress in bringing out more performance from the 40xx GPU's but it's still a manual process, and a bit of trials and errors. This repository comprises: python_coreml_stable_diffusion, a Python package for converting PyTorch models to Core ML format and performing image generation with Hugging Face diffusers in Python. Question | Help I recently fixed together a new PC with ASRock Z790 Taichi Carrara and i7 13700k but reusing my older (barely used) GTX 1070. This is an aspect of the speed reduction in that it is less storage to traverse in computation, less memory used per item, etc. 35, 6. This repository hosts the TensorRT versions of Stable Diffusion XL 1. The LoRA training can be done with 12GB GPU memory. I don't think it will be long before that performance improvement come with AUTOMATIC1111 right out of the box. 9 and Stable Diffusion 1. 0 or later recommended)SDXL 1. A brand-new model called SDXL is now in the training phase. 1. Then, I'll go back to SDXL and the same setting that took 30 to 40 s will take like 5 minutes. Figure 1: Images generated with the prompts, "a high quality photo of an astronaut riding a (horse/dragon) in space" using Stable Diffusion and Core ML + diffusers. 5. VRAM Size(GB) Speed(sec. For users with GPUs that have less than 3GB vram, ComfyUI offers a. Read More. This is a benchmark parser I wrote a few months ago to parse through the benchmarks and produce a whiskers and bar plot for the different GPUs filtered by the different settings, (I was trying to find out which settings, packages were most impactful for the GPU performance, that was when I found that running at half precision, with xformers. Originally I got ComfyUI to work with 0. 5 and SDXL (1. torch. That's still quite slow, but not minutes per image slow. 6 or later (13. See the usage instructions for how to run the SDXL pipeline with the ONNX files hosted in this repository. I have tried putting the base safetensors file in the regular models/Stable-diffusion folder. 8 cudnn: 8800 driver: 537. Unfortunately, it is not well-optimized for WebUI Automatic1111. 10 in parallel: ≈ 8 seconds at an average speed of 3. AI Art using SDXL running in SD. Looking to upgrade to a new card that'll significantly improve performance but not break the bank. I tried SDXL in A1111, but even after updating the UI, the images take veryyyy long time and don't finish, like they stop at 99% every time. It's a single GPU with full access to all 24GB of VRAM. 由于目前SDXL还不够成熟,模型数量和插件支持相对也较少,且对硬件配置的要求进一步提升,所以. Below we highlight two key factors: JAX just-in-time (jit) compilation and XLA compiler-driven parallelism with JAX pmap. 1 at 1024x1024 which consumes about the same at a batch size of 4. Results: Base workflow results. I also looked at the tensor's weight values directly which confirmed my suspicions. We cannot use any of the pre-existing benchmarking utilities to benchmark E2E stable diffusion performance,","# because the top-level StableDiffusionPipeline cannot be serialized into a single Torchscript object. Stability AI. Software. The SDXL model represents a significant improvement in the realm of AI-generated images, with its ability to produce more detailed, photorealistic images, excelling even in challenging areas like. I'm able to generate at 640x768 and then upscale 2-3x on a GTX970 with 4gb vram (while running. Dubbed SDXL v0. If you would like to access these models for your research, please apply using one of the following links: SDXL-base-0. OS= Windows. when fine-tuning SDXL at 256x256 it consumes about 57GiB of VRAM at a batch size of 4. The SDXL base model performs significantly better than the previous variants, and the model combined with the refinement module achieves the best overall performance. (6) Hands are a big issue, albeit different than in earlier SD. 5700xt sees small bottlenecks (think 3-5%) right now without PCIe4. make the internal activation values smaller, by. As much as I want to build a new PC, I should wait a couple of years until components are more optimized for AI workloads in consumer hardware. In the second step, we use a. fix: I have tried many; latents, ESRGAN-4x, 4x-Ultrasharp, Lollypop,I was training sdxl UNET base model, with the diffusers library, which was going great until around step 210k when the weights suddenly turned back to their original values and stayed that way. Stable Diffusion XL (SDXL) is the latest open source text-to-image model from Stability AI, building on the original Stable Diffusion architecture. It can produce outputs very similar to the source content (Arcane) when you prompt Arcane Style, but flawlessly outputs normal images when you leave off that prompt text, no model burning at all. We haven't tested SDXL, yet, mostly because the memory demands and getting it running properly tend to be even higher than 768x768 image generation. Originally Posted to Hugging Face and shared here with permission from Stability AI. I have seen many comparisons of this new model. SD1. One way to make major improvements would be to push tokenization (and prompt use) of specific hand poses, as they have more fixed morphology - i. (PS - I noticed that the units of performance echoed change between s/it and it/s depending on the speed. Base workflow: Options: Inputs are only the prompt and negative words. Currently ROCm is just a little bit faster than CPU on SDXL, but it will save you more RAM specially with --lowvram flag. 5 models and remembered they, too, were more flexible than mere loras. 1 OS Loader Version: 8422. The answer from our Stable Diffusion XL (SDXL) Benchmark: a resounding yes. SDXL’s performance has been compared with previous versions of Stable Diffusion, such as SD 1. 1. I the past I was training 1. 9 is able to be run on a fairly standard PC, needing only a Windows 10 or 11, or Linux operating system, with 16GB RAM, an Nvidia GeForce RTX 20 graphics card (equivalent or higher standard) equipped with a minimum of 8GB of VRAM. First, let’s start with a simple art composition using default parameters to. 1mo. 939. 0 mixture-of-experts pipeline includes both a base model and a refinement model. AUTO1111 on WSL2 Ubuntu, xformers => ~3. 6B parameter refiner model, making it one of the largest open image generators today. 8, 2023. Unless there is a breakthrough technology for SD1. 2, i. SDXL-VAE-FP16-Fix was created by finetuning the SDXL-VAE to: 1. • 6 mo. 100% free and compliant. Following up from our Whisper-large-v2 benchmark, we recently benchmarked Stable Diffusion XL (SDXL) on consumer GPUs. 6. Or drop $4k on a 4090 build now. 16GB VRAM can guarantee you comfortable 1024×1024 image generation using the SDXL model with the refiner. Running on cpu upgrade. SDXL GPU Benchmarks for GeForce Graphics Cards. x and SD 2. 5: SD v2. torch. Note | Performance is measured as iterations per second for different batch sizes (1, 2, 4, 8. Then, I'll change to a 1. If you would like to access these models for your research, please apply using one of the following links: SDXL-base-0. keep the final output the same, but. 61. SDXL consists of a two-step pipeline for latent diffusion: First, we use a base model to generate latents of the desired output size. Image: Stable Diffusion benchmark results showing a comparison of image generation time. 🚀LCM update brings SDXL and SSD-1B to the game 🎮SDXLと隠し味がベース. Clip Skip results in a change to the Text Encoder. Instructions:. If you would like to make image creation even easier using the Stability AI SDXL 1. 0) stands at the forefront of this evolution. Funny, I've been running 892x1156 native renders in A1111 with SDXL for the last few days. In this SDXL benchmark, we generated 60. In contrast, the SDXL results seem to have no relation to the prompt at all apart from the word "goth", the fact that the faces are (a bit) more coherent is completely worthless because these images are simply not reflective of the prompt . เรามาลองเพิ่มขนาดดูบ้าง มาดูกันว่าพลังดิบของ RTX 3080 จะเอาชนะได้ไหมกับการทดสอบนี้? เราจะใช้ Real Enhanced Super-Resolution Generative Adversarial. 2it/s. 9 and Stable Diffusion 1. Note that stable-diffusion-xl-base-1. One Redditor demonstrated how a Ryzen 5 4600G retailing for $95 can tackle different AI workloads. I have always wanted to try SDXL, so when it was released I loaded it up and surprise, 4-6 mins each image at about 11s/it. 02. This model runs on Nvidia A40 (Large) GPU hardware. Your Path to Healthy Cloud Computing ~ 90 % lower cloud cost. AMD RX 6600 XT SD1. It underwent rigorous evaluation on various datasets, including ImageNet, COCO, and LSUN. 这次我们给大家带来了从RTX 2060 Super到RTX 4090一共17款显卡的Stable Diffusion AI绘图性能测试。. Big Comparison of LoRA Training Settings, 8GB VRAM, Kohya-ss. 5: SD v2. April 11, 2023. 1Ever since SDXL came out and first tutorials how to train loras were out, I tried my luck getting a likeness of myself out of it. Stable Diffusion XL (SDXL) was proposed in SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis by Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim. 5 is superior at human subjects and anatomy, including face/body but SDXL is superior at hands. Instead, Nvidia will leave it up to developers to natively support SLI inside their games for older cards, the RTX 3090 and "future SLI-capable GPUs," which more or less means the end of the road. (I’ll see myself out. Read More. 0, Stability AI once again reaffirms its commitment to pushing the boundaries of AI-powered image generation, establishing a new benchmark for competitors while continuing to innovate and refine its models. SD 1. This means that you can apply for any of the two links - and if you are granted - you can access both. 42 12GB. 0 should be placed in a directory. 0, the base SDXL model and refiner without any LORA. This checkpoint recommends a VAE, download and place it in the VAE folder. For direct comparison, every element should be in the right place, which makes it easier to compare. app:stable-diffusion-webui. Stability AI API and DreamStudio customers will be able to access the model this Monday,. Running on cpu upgrade. The most recent version, SDXL 0. Idk why a1111 si so slow and don't work, maybe something with "VAE", idk. In a notable speed comparison, SSD-1B achieves speeds up to 60% faster than the foundational SDXL model, a performance benchmark observed on A100 80GB and RTX 4090 GPUs. 9 の記事にも作例. To gauge the speed difference we are talking about, generating a single 1024x1024 image on an M1 Mac with SDXL (base) takes about a minute. Achieve the best performance on NVIDIA accelerated infrastructure and streamline the transition to production AI with NVIDIA AI Foundation Models. Best Settings for SDXL 1. 24GB GPU, Full training with unet and both text encoders. The SDXL base model performs significantly. SDXL Benchmark with 1,2,4 batch sizes (it/s): SD1. It supports SD 1. 🧨 DiffusersThis is a benchmark parser I wrote a few months ago to parse through the benchmarks and produce a whiskers and bar plot for the different GPUs filtered by the different settings, (I was trying to find out which settings, packages were most impactful for the GPU performance, that was when I found that running at half precision, with xformers. With 3. 9. Can generate large images with SDXL. Installing ControlNet for Stable Diffusion XL on Windows or Mac. 2. Problem is a giant big Gorilla in our tiny little AI world called 'Midjourney. ComfyUI is great if you're like a developer because. 5B parameter base model and a 6. a fist has a fixed shape that can be "inferred" from. 5 I could generate an image in a dozen seconds. I have no idea what is the ROCM mode, but in GPU mode my RTX 2060 6 GB can crank out a picture in 38 seconds with those specs using ComfyUI, cfg 8. 1. 🧨 Diffusers Step 1: make these changes to launch. git 2023-08-31 hash:5ef669de. Specs n numbers: Nvidia RTX 2070 (8GiB VRAM). It is important to note that while this result is statistically significant, we must also take into account the inherent biases introduced by the human element and the inherent randomness of generative models. I find the results interesting for. But these improvements do come at a cost; SDXL 1. I'm getting really low iterations per second a my RTX 4080 16GB. Create models using more simple-yet-accurate prompts that can help you produce complex and detailed images. Besides the benchmark, I also made a colab for anyone to try SD XL 1. Step 1: Update AUTOMATIC1111. It shows that the 4060 ti 16gb will be faster than a 4070 ti when you gen a very big image. Download the stable release. Next supports two main backends: Original and Diffusers which can be switched on-the-fly: Original: Based on LDM reference implementation and significantly expanded on by A1111. What does matter for speed, and isn't measured by the benchmark, is the ability to run larger batches. 15. I solved the problem. Consider that there will be future version after SDXL, which probably need even more vram, it seems wise to get a card with more vram. Devastating for performance. It shows that the 4060 ti 16gb will be faster than a 4070 ti when you gen a very big image. It supports SD 1. This opens up new possibilities for generating diverse and high-quality images. This benchmark was conducted by Apple and Hugging Face using public beta versions of iOS 17. However, ComfyUI can run the model very well. Without it, batches larger than one actually run slower than consecutively generating them, because RAM is used too often in place of VRAM. Access algorithms, models, and ML solutions with Amazon SageMaker JumpStart and Amazon. After searching around for a bit I heard that the default. 4070 uses less power, performance is similar, VRAM 12 GB. Read More. Figure 14 in the paper shows additional results for the comparison of the output of. You can learn how to use it from the Quick start section. Performance Against State-of-the-Art Black-Box. Training T2I-Adapter-SDXL involved using 3 million high-resolution image-text pairs from LAION-Aesthetics V2, with training settings specifying 20000-35000 steps, a batch size of 128 (data parallel with a single GPU batch size of 16), a constant learning rate of 1e-5, and mixed precision (fp16). ) Cloud - Kaggle - Free. Install Python and Git. Join. SDXL performance optimizations But the improvements don’t stop there. arrow_forward. The latest result of this work was the release of SDXL, a very advanced latent diffusion model designed for text-to-image synthesis. Inside you there are two AI-generated wolves. 0 release is delayed indefinitely. 0, while slightly more complex, offers two methods for generating images: the Stable Diffusion WebUI and the Stable AI API. 0 to create AI artwork. To use SDXL with SD. It's slow in CompfyUI and Automatic1111. I tried comfyUI and it takes about 30s to generate 768*1048 images (i have a RTX2060, 6GB vram). Midjourney operates through a bot, where users can simply send a direct message with a text prompt to generate an image. The results were okay'ish, not good, not bad, but also not satisfying. 35, 6. [8] by. Empty_String. Use the optimized version, or edit the code a little to use model. Normally you should leave batch size at 1 for SDXL, and only increase batch count (since batch size increases VRAM usage, and if it starts using system RAM instead of VRAM because VRAM is full, it will slow down, and SDXL is very VRAM heavy) I use around 25 iterations with SDXL, and SDXL refiner enabled with default settings. Disclaimer: if SDXL is slow, try downgrading your graphics drivers. 5 has developed to a quite mature stage, and it is unlikely to have a significant performance improvement. 9 sets a new benchmark by delivering vastly enhanced image quality and composition intricacy compared to its predecessor. 0, the base SDXL model and refiner without any LORA. For example, in #21 SDXL is the only one showing the fireflies. 5 guidance scale, 50 inference steps Offload base pipeline to CPU, load refiner pipeline on GPU Refine image at 1024x1024, 0. 4K SR Benchmark Dataset The 4K RTSR benchmark provides a unique test set com-prising ultra-high resolution images from various sources, setting it apart from traditional super-resolution bench-marks. 0, a text-to-image generation tool with improved image quality and a user-friendly interface. ashutoshtyagi. Vanilla Diffusers, xformers => ~4. VRAM settings. Finally got around to finishing up/releasing SDXL training on Auto1111/SD. make the internal activation values smaller, by. 0: Guidance, Schedulers, and. exe and you should have the UI in the browser. 1 and iOS 16. Aug 30, 2023 • 3 min read. This checkpoint recommends a VAE, download and place it in the VAE folder. I'd recommend 8+ GB of VRAM, however, if you have less than that you can lower the performance settings inside of the settings!Free Global Payroll designed for tech teams. For our tests, we’ll use an RTX 4060 Ti 16 GB, an RTX 3080 10 GB, and an RTX 3060 12 GB graphics card. a 20% power cut to a 3-4% performance cut, a 30% power cut to a 8-10% performance cut, and so forth. Stable Diffusion XL (SDXL) Benchmark shows consumer GPUs can serve SDXL inference at scale. This might seem like a dumb question, but I've started trying to run SDXL locally to see what my computer was able to achieve. Guess which non-SD1. It's every computer. It’ll be faster than 12GB VRAM, and if you generate in batches, it’ll be even better. The first invocation produces plan files in engine. 5 GHz, 8 GB of memory, a 128-bit memory bus, 24 3rd gen RT cores, 96 4th gen Tensor cores, DLSS 3 (with frame generation), a TDP of 115W and a launch price of $300 USD. macOS 12. Understanding Classifier-Free Diffusion Guidance We haven't tested SDXL, yet, mostly because the memory demands and getting it running properly tend to be even higher than 768x768 image generation. Python Code Demo with Segmind SD-1B I ran several tests generating a 1024x1024 image using a 1. 3. IP-Adapter can be generalized not only to other custom models fine-tuned from the same base model, but also to controllable generation using existing controllable tools. 6k hi-res images with randomized prompts, on 39 nodes equipped with RTX 3090 and RTX 4090 GPUs - getting . To install Python and Git on Windows and macOS, please follow the instructions below: For Windows: Git:Amblyopius • 7 mo. . Here is one 1024x1024 benchmark, hopefully it will be of some use. 0 Seed 8 in August 2023. App Files Files Community . Benchmarks exist for classical clone detection tools, which scale to a single system or a small repository. backends. This value is unaware of other benchmark workers that may be running. You’ll need to have: macOS computer with Apple silicon (M1/M2) hardware. . *do-not-batch-cond-uncond LoRA is a type of performance-efficient fine-tuning, or PEFT, that is much cheaper to accomplish than full model fine-tuning. Dhanshree Shripad Shenwai. In general, SDXL seems to deliver more accurate and higher quality results, especially in the area of photorealism. 5: SD v2. The Collective Reliability Factor Chance of landing tails for 1 coin is 50%, 2 coins is 25%, 3. More detailed instructions for installation and use here. 0, anyone can now create almost any image easily and. There are slight discrepancies between the output of SDXL-VAE-FP16-Fix and SDXL-VAE, but the decoded images should be close. true. Copy across any models from other folders (or previous installations) and restart with the shortcut. 在过去的几周里,Diffusers 团队和 T2I-Adapter 作者紧密合作,在 diffusers 库上为 Stable Diffusion XL (SDXL) 增加 T2I-Adapter 的支持. This is an order of magnitude faster, and not having to wait for results is a game-changer. It can generate novel images from text. The high end price/performance is actually good now. The abstract from the paper is: We present SDXL, a latent diffusion model for text-to-image synthesis. 0 introduces denoising_start and denoising_end options, giving you more control over the denoising process for fine. The key to this success is the integration of NVIDIA TensorRT, a high-performance, state-of-the-art performance optimization framework. 50. Thanks to specific commandline arguments, I can handle larger resolutions, like 1024x1024, and use still ControlNet smoothly and also use. In addition, the OpenVino script does not fully support HiRes fix, LoRa, and some extenions. Can generate large images with SDXL. Stable Diffusion XL (SDXL) Benchmark . I cant find the efficiency benchmark against previous SD models. SD XL. It's not my computer that is the benchmark. 9. June 27th, 2023. You should be good to go, Enjoy the huge performance boost! Using SD-XL. Performance benchmarks have already shown that the NVIDIA TensorRT-optimized model outperforms the baseline (non-optimized) model on A10, A100, and. SDXL does not achieve better FID scores than the previous SD versions. The Stability AI team takes great pride in introducing SDXL 1. dll files in stable-diffusion-webui\venv\Lib\site-packages\torch\lib with the ones from cudnn-windows-x86_64-8. next, comfyUI and automatic1111. Omikonz • 2 mo. comparative study. ptitrainvaloin. Optimized for maximum performance to run SDXL with colab free. NVIDIA GeForce RTX 4070 Ti (1) (compute_37) (8, 9) cuda: 11. Image created by Decrypt using AI. 5 was trained on 512x512 images. To see the great variety of images SDXL is capable of, check out Civitai collection of selected entries from the SDXL image contest.