{"id":565,"date":"2023-04-02T07:00:40","date_gmt":"2023-04-02T07:00:40","guid":{"rendered":"https:\/\/gammonrants.org\/?page_id=565"},"modified":"2023-04-02T07:00:41","modified_gmt":"2023-04-02T07:00:41","slug":"upscaling-in-stable-diffusion-compared-in-detail","status":"publish","type":"page","link":"https:\/\/gammonrants.org\/index.php\/upscaling-in-stable-diffusion-compared-in-detail\/","title":{"rendered":"Upscaling in Stable Diffusion compared. In Detail."},"content":{"rendered":"\n<p>Models are usually trained at a fairly low resolution, something like 512&#215;512. At this resolution they produce the best output. Therefore my SD workflow, probably your&#8217;s as well, is to upscale images to a resolution where you can use them as backdrops, posters, calendar images, as all the stuff you want to use photos for. Now SD (I&#8217;m using automatic1111) has a huge number of upscaling options, going from blindingly fast to gruesomely slow. Which one to pick? Do the slow ones justify the time investment?<\/p>\n\n\n\n<p>I mention the time to wait for an upscaler to do its job, so here&#8217;s some context: my computer is a laptop using the NVIDIA Geforce 3080TI. Not totally high end anymore but fairly good (it outperforms my Mac&#8217;s M1 graphic card 10:1 or so).<\/p>\n\n\n\n<p>My workflow for nice backdrop images is: render at 640&#215;400, upscale 4x to the native resolution of my mac &#8211; 2560&#215;1600. This is what I&#8217;m going to do here.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Extras &#8211; Upscaling<\/h2>\n\n\n\n<p>Automatic1111&#8217;s UI offers an &#8220;extras&#8221; tab that contains an option to upscale any image with a good number of upscalers.<\/p>\n\n\n\n<p>Let&#8217;s have a look what they do.<\/p>\n\n\n\n<p>Here is a 100&#215;100 part of the original image, scaled with gimp, with a scaler that does absolutely nothing but turn 1 pixel into a little square of four pixels with its color.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"400\" height=\"400\" src=\"https:\/\/gammonrants.org\/wp-content\/uploads\/2023\/04\/orig_scaled_4x.png\" alt=\"\" class=\"wp-image-567\" srcset=\"https:\/\/gammonrants.org\/wp-content\/uploads\/2023\/04\/orig_scaled_4x.png 400w, https:\/\/gammonrants.org\/wp-content\/uploads\/2023\/04\/orig_scaled_4x-300x300.png 300w, https:\/\/gammonrants.org\/wp-content\/uploads\/2023\/04\/orig_scaled_4x-150x150.png 150w\" sizes=\"auto, (max-width: 400px) 100vw, 400px\" \/><\/figure>\n\n\n\n<p>Not very beautiful. <\/p>\n\n\n\n<p>Now, as a starting point, let&#8217;s look at GIMPs traditional cubic scaler.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"400\" height=\"400\" src=\"https:\/\/gammonrants.org\/wp-content\/uploads\/2023\/04\/orig_scaled_4x_cubic.png\" alt=\"\" class=\"wp-image-568\" srcset=\"https:\/\/gammonrants.org\/wp-content\/uploads\/2023\/04\/orig_scaled_4x_cubic.png 400w, https:\/\/gammonrants.org\/wp-content\/uploads\/2023\/04\/orig_scaled_4x_cubic-300x300.png 300w, https:\/\/gammonrants.org\/wp-content\/uploads\/2023\/04\/orig_scaled_4x_cubic-150x150.png 150w\" sizes=\"auto, (max-width: 400px) 100vw, 400px\" \/><\/figure>\n\n\n\n<p>I must confess I never tried this, so now I&#8217;m shocked; the cubic scaler produces a worse image. It doesn&#8217;t look as low-res, instead it looks blurry and adds weird artifacts &#8211; look closely at the corners of the building.<\/p>\n\n\n\n<p>Off to the AI upscalers.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">NEAREST<\/h2>\n\n\n\n<p>Blindingly fast it is, took me no second to render the image. Faster than GIMPs scalers, actually. Here&#8217;s its output.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"400\" height=\"400\" src=\"https:\/\/gammonrants.org\/wp-content\/uploads\/2023\/04\/t-nearest_crop.png\" alt=\"\" class=\"wp-image-569\" srcset=\"https:\/\/gammonrants.org\/wp-content\/uploads\/2023\/04\/t-nearest_crop.png 400w, https:\/\/gammonrants.org\/wp-content\/uploads\/2023\/04\/t-nearest_crop-300x300.png 300w, https:\/\/gammonrants.org\/wp-content\/uploads\/2023\/04\/t-nearest_crop-150x150.png 150w\" sizes=\"auto, (max-width: 400px) 100vw, 400px\" \/><\/figure>\n\n\n\n<p>Doesn&#8217;t look much better, does it? A bit of a mixture of the two above &#8211; certainly less blurry than cubic, but plenty of artifacts and generally a mess you can&#8217;t use for e.g. a poster-size photo.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">ESRGAN-4x<\/h2>\n\n\n\n<p>One of the scalers that are recommended<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"400\" height=\"400\" src=\"https:\/\/gammonrants.org\/wp-content\/uploads\/2023\/04\/t-emargan-crop.png\" alt=\"\" class=\"wp-image-570\" srcset=\"https:\/\/gammonrants.org\/wp-content\/uploads\/2023\/04\/t-emargan-crop.png 400w, https:\/\/gammonrants.org\/wp-content\/uploads\/2023\/04\/t-emargan-crop-300x300.png 300w, https:\/\/gammonrants.org\/wp-content\/uploads\/2023\/04\/t-emargan-crop-150x150.png 150w\" sizes=\"auto, (max-width: 400px) 100vw, 400px\" \/><\/figure>\n\n\n\n<p>ESRGAN produce a good upscale, you see that instead of a blurry mass you get good structure in the building. It produces quite some artifacts though.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">R-ESRGAN-4x+<\/h2>\n\n\n\n<p>The scaler with the unpronouncible name is my favorite one that I&#8217;ve been using constantly. Supposedly it works similar to ESRGAN but produces a smoother output (with less details).<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"400\" height=\"400\" src=\"https:\/\/gammonrants.org\/wp-content\/uploads\/2023\/04\/t-r-emargan4xplus-crop.png\" alt=\"\" class=\"wp-image-571\" srcset=\"https:\/\/gammonrants.org\/wp-content\/uploads\/2023\/04\/t-r-emargan4xplus-crop.png 400w, https:\/\/gammonrants.org\/wp-content\/uploads\/2023\/04\/t-r-emargan4xplus-crop-300x300.png 300w, https:\/\/gammonrants.org\/wp-content\/uploads\/2023\/04\/t-r-emargan4xplus-crop-150x150.png 150w\" sizes=\"auto, (max-width: 400px) 100vw, 400px\" \/><\/figure>\n\n\n\n<p>&#8230; and it does. Very cool output, in my opinion beating all the others by a high margin.  Nearly no artifacts, well scaled structures with detail and little noise. Yes, the structure probably are a bit too clean, but this is great.<\/p>\n\n\n\n<p> Both ESRGAN scalers run in a few seconds (5?) on my machine, so they can be well used to upscale many images.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">SwinIR 4x<\/h2>\n\n\n\n<p>I never tried this one. Not recommended by &#8220;stable diffusion art&#8221;, at the end of the dropdown as well \ud83d\ude42<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"400\" height=\"400\" src=\"https:\/\/gammonrants.org\/wp-content\/uploads\/2023\/04\/t-SwinIR4x-crop.png\" alt=\"\" class=\"wp-image-572\" srcset=\"https:\/\/gammonrants.org\/wp-content\/uploads\/2023\/04\/t-SwinIR4x-crop.png 400w, https:\/\/gammonrants.org\/wp-content\/uploads\/2023\/04\/t-SwinIR4x-crop-300x300.png 300w, https:\/\/gammonrants.org\/wp-content\/uploads\/2023\/04\/t-SwinIR4x-crop-150x150.png 150w\" sizes=\"auto, (max-width: 400px) 100vw, 400px\" \/><\/figure>\n\n\n\n<p>And undeservedly so. Produces a fairly clean and fairly artifact-free render. I had to look hard for differences to the previous one. In reality it probably doesn&#8217;t matter if you use this one or the R-ESRGAN.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">LDSR<\/h2>\n\n\n\n<p>This upscaler is SLOW. Supposedly (stable diffusion art &#8211; a great site &#8211; explains this) it came as the scaler with SD 1.4, and has its own neural network model (that it downloads &#8211; 1 gig or so &#8211; the first time you use it). It&#8217;s terribly slow, something like 20 times slower than the others.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"400\" height=\"400\" src=\"https:\/\/gammonrants.org\/wp-content\/uploads\/2023\/04\/t-ldsr-crop.png\" alt=\"\" class=\"wp-image-573\" srcset=\"https:\/\/gammonrants.org\/wp-content\/uploads\/2023\/04\/t-ldsr-crop.png 400w, https:\/\/gammonrants.org\/wp-content\/uploads\/2023\/04\/t-ldsr-crop-300x300.png 300w, https:\/\/gammonrants.org\/wp-content\/uploads\/2023\/04\/t-ldsr-crop-150x150.png 150w\" sizes=\"auto, (max-width: 400px) 100vw, 400px\" \/><\/figure>\n\n\n\n<p>And the result is not that much better. It preserves a bit more detail than the two above, at the expense of looking less smooth.<\/p>\n\n\n\n<p>And finally:<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">LANCZOS<\/h2>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"400\" height=\"400\" src=\"https:\/\/gammonrants.org\/wp-content\/uploads\/2023\/04\/t-lanczos-crop.png\" alt=\"\" class=\"wp-image-574\" srcset=\"https:\/\/gammonrants.org\/wp-content\/uploads\/2023\/04\/t-lanczos-crop.png 400w, https:\/\/gammonrants.org\/wp-content\/uploads\/2023\/04\/t-lanczos-crop-300x300.png 300w, https:\/\/gammonrants.org\/wp-content\/uploads\/2023\/04\/t-lanczos-crop-150x150.png 150w\" sizes=\"auto, (max-width: 400px) 100vw, 400px\" \/><\/figure>\n\n\n\n<p>Frankly, I don&#8217;t like this one. Compared to the three above, it looks blurry and has more artifacts.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">A first conclusion<\/h2>\n\n\n\n<p>Judging only from this one crop, the upscalers of choice are R-ESRGAN-4x+ and SwinIR 4+, and maybe LDSR if you have time to waste.<\/p>\n\n\n\n<p>However, in my experiments I found that overall the LDSR images look better. It seems to recover details in dark places and find a nicer overall tone of the image. Here&#8217;s two 640&#215;480 crops from another image:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"640\" height=\"480\" src=\"https:\/\/gammonrants.org\/wp-content\/uploads\/2023\/04\/r-esrgan_4xplus-crop.png\" alt=\"\" class=\"wp-image-575\" srcset=\"https:\/\/gammonrants.org\/wp-content\/uploads\/2023\/04\/r-esrgan_4xplus-crop.png 640w, https:\/\/gammonrants.org\/wp-content\/uploads\/2023\/04\/r-esrgan_4xplus-crop-300x225.png 300w\" sizes=\"auto, (max-width: 640px) 100vw, 640px\" \/><\/figure>\n\n\n\n<p>R-ESRGAN-4x+ does a fine job here, great tree detail, no artifacts, wonderful. <\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"640\" height=\"480\" src=\"https:\/\/gammonrants.org\/wp-content\/uploads\/2023\/04\/ldsr-crop.png\" alt=\"\" class=\"wp-image-576\" srcset=\"https:\/\/gammonrants.org\/wp-content\/uploads\/2023\/04\/ldsr-crop.png 640w, https:\/\/gammonrants.org\/wp-content\/uploads\/2023\/04\/ldsr-crop-300x225.png 300w\" sizes=\"auto, (max-width: 640px) 100vw, 640px\" \/><\/figure>\n\n\n\n<p>But LDSR clearly outperforms it with much more detail fo the person in front and an overall slightly better coloring. This is just perfect.<\/p>\n\n\n\n<p>So overall, if you&#8217;re in a hurry and\/or want to create many upscales, go for R-ESRGAN-4x+. For a picture you want to have perfect, use R-ESRGAN-4x+ and LDSR, and compare the two.<\/p>\n\n\n\n<h1 class=\"wp-block-heading\">And what to do with Hires.fix?<\/h1>\n\n\n\n<p>Apart from this extra upscaler, Automatic1111 also offers &#8220;hires.fix&#8221; as part of txt2img. I&#8217;m struggling a lot with this option, not using it often, because it pretty much kills my workflow: when I&#8217;m creating something with stable diffusion, I start with a good prompt, let it create a batch of 6 images (which takes me some 10 secs), and refine until I&#8217;m happy with the output. Then I create 15 batches of 6 images and look for the few best images that I then inpaint and then upscale. I&#8217;m fairly happy with this workflow.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<p>Hires.fix supposedly adds more detail and is better than a &#8220;post mortem upscaler&#8221;. However, it is SLOW and a real MEMORY HOG. I can&#8217;t upscale to my final 2560&#215;1600 due to out-of-memory situations. So the best I can do is go for a 2x scale with Hires.fix and another 2x scale with an extras scaler afterwards.<\/p>\n\n\n\n<p>Also, this is terribly slow. If I can generate 90 images in half an hour with my normal workflow (with 6 image batches), it takes me hours this way (with 2 image batches and a SLOW upscaler).<\/p>\n\n\n\n<p>So let&#8217;s try this, with R-ESRGAN-4x+as a Hires.fix to go from 640&#215;400 to 1280&#215;800 and LDSR for the final scaling to 2560&#215;1600.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"400\" height=\"400\" src=\"https:\/\/gammonrants.org\/wp-content\/uploads\/2023\/04\/preupscale.png\" alt=\"\" class=\"wp-image-577\" srcset=\"https:\/\/gammonrants.org\/wp-content\/uploads\/2023\/04\/preupscale.png 400w, https:\/\/gammonrants.org\/wp-content\/uploads\/2023\/04\/preupscale-300x300.png 300w, https:\/\/gammonrants.org\/wp-content\/uploads\/2023\/04\/preupscale-150x150.png 150w\" sizes=\"auto, (max-width: 400px) 100vw, 400px\" \/><\/figure>\n\n\n\n<p>Now this is a different image. The overall building is way more broken (yes, the prompt included post-apocalyptic, so no surprise there). But other than the &#8220;extras&#8221; upscalers, the upscaling process of stable diffusion with Hires.fix significantly alters details of the image.<\/p>\n\n\n\n<p>(and yes, I double checked &#8211; I reused prompt and seed, and if I turn off Hires.fix I get exactly the same image as before).<\/p>\n\n\n\n<p>The quality is excellent, though, no artifacts, no noise, no blurriness.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Conclusion (again)<\/h2>\n\n\n\n<p>Hires.fix is NOT an upscaler. It is a way to get a good image at a higher resolution by sacrificing a large amount of time in the workflow. The image is significantly different to the image you would have got without it. I&#8217;m not going to use it because I like having 100 images to choose from for my prompt, and the extras upscalers do their job well.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Models are usually trained at a fairly low resolution, something like 512&#215;512. At this resolution they produce the best output. Therefore my SD workflow, probably your&#8217;s as well, is to upscale images to a resolution where you can use them as backdrops, posters, calendar images, as all the stuff you want to use photos for. [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"om_disable_all_campaigns":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"class_list":["post-565","page","type-page","status-publish","hentry"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/gammonrants.org\/index.php\/wp-json\/wp\/v2\/pages\/565","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/gammonrants.org\/index.php\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/gammonrants.org\/index.php\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/gammonrants.org\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/gammonrants.org\/index.php\/wp-json\/wp\/v2\/comments?post=565"}],"version-history":[{"count":1,"href":"https:\/\/gammonrants.org\/index.php\/wp-json\/wp\/v2\/pages\/565\/revisions"}],"predecessor-version":[{"id":578,"href":"https:\/\/gammonrants.org\/index.php\/wp-json\/wp\/v2\/pages\/565\/revisions\/578"}],"wp:attachment":[{"href":"https:\/\/gammonrants.org\/index.php\/wp-json\/wp\/v2\/media?parent=565"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}