Into the Imaginarium: Our Second Style Guide by pixelmindai

APRIL 2022 | OUR SECOND STYLE GUIDE

PIXELZINE

INTO THE IMAGINARIUM A COLLABORATION BY CH AND GLOU

02 Editor's Note 03 Quick Looks 05 First, What Are Your Options?

By DeepestFellOpen

PIXELZINE STYLE GUIDE

By Kanard

07 The Settings Are the Adventure! 11 Diffusion Only Parameters 13 Less Useful Things to Know

Zine edited by Bot Ross Blue Cover Artwork by Adam B. Levine and Number 48 in the "Abstract Ink" Series 1

Words and

EDITOR'S NOTE Dear Pixelmind Community, I am very pleased to present our latest iteration of the style guide - an undertaking that would not have been possible without the fastidious work of Glou and Ch (otherwise known as our new CTO!). In this issue, we walk you through the advanced parameters in Imaginarium - yes, even the ones we are still scratching our heads at... While this guide attempts to give you a better sense of what parameters you're using, it will not replace the hard-won knowledge that comes from experimentation. With that, we invite you to play around with them! These definitions

are

flux,

and

sometimes

contradictory depending on the context in which a setting is used. Further, the learning is never over. We are constantly understanding more about these tools, and the wider frameworks for applying AI to art creation. With that, our style guides will continue to build on the knowledge created in the Pixelmind Discord and beyond. Let's keep learning! Bot Ross Blue Pixelzine | 2

Artwork by Bot Ross Blue

Quick Look: VQGAN (Imagine) Page 5

Page 13 text, watermarks, royalty free

Page 7

Page 9, typically 6-12 range

Page 9 The 'Turbulence' Page 8 Page 10

Don't go past 64, Page 8

Page 14

The Randomness Factor more on Page 14

Pixelzine | 3

Quick Look: Diffusion Page 5

Page 13

text, watermarks, royalty free

Page 9

Page 7

Page 8

Both on Page 8.

More than 64 cuts breaks Pixelcat Not super useful for Diffusion

Page 14 Don't worry about it.

The 'Randomness Factor' - Page 10

Not a huge impact, stick to either 0

Default's are your

or 500, Page 11

friends, Page 11

Start with default and experiment from there, Page 11

No higher than

Don't worry

2-3, Page 11

about it.

This is useful,

Page 11

Page 12 Pixelzine | 4

Page 11-12

FIRST,WHAT ARE

YOUR

Curated by Glou

OPTIONS? Editor's Note: First, let's take a moment to describe how Imagine and Diffusion work. At this point, it's likely intuitive for many Pixelmind artists the difference in results between Imagine and Diffusion, but it is fun to learn a bit more of the 'why'. I'll leave that work to Glou:

VQGAN (aka Imagine) If you’ve ever seen a cloud shaped like a rabbit drift and transform into a face, that’s like what a VQGAN generator model is doing. It’s classifying the discrete modalities of what it “sees” and shapes the image in order to match a text prompt. However, because Pixelcat is “sculpting clouds” the result is not as high fidelity as the .diffusion generator. Typically, it works very well to transform existing images, rather than work from scratch (though it can also do that very well when trained).

Pixelzine | 5

Yes, these are clouds we transformed into rabbits using VQGAN.

Here is a very comprehensive blog post about VQGAN that is fun and accessible., with a lovely illustrated guide! More about VQGAN basic applications in our First Style Guide.

Diffusion

Process Examples

Diffusion generator is like the spirit of Karate Kid for Pixelcat because it will “wax on, wax off” starting from a blur and clarify, clarify, clarify until it achieves its goal. The goal of ‘wax on’ for Pixelcat, is to add “Gaussian noise” and then during “wax off” it will create super-resolution results. Here’s a handy blog post if you want to go deeper into the dojo. As an example, we used the prompt:

Early Sequence

.diffusion a rabbit transforming into a human face, tending on artstation to demonstrate

this

'wax

on,

wax

off'

approach. In the first image early in Pixelcat's generation, you can see a figure emerge from the blur, and in sequence two, a dog's head leaps into mid-focus.

Finally, the

image is clarified into a 'higher fidelity form' - the dark outline of a rabbit's ears and

Mid Generation

head, inset with two abstract rabbit profiles. This image is much crisper than the rabbit generation on the previous page.

Final Image Pixelzine | 6

THE

SETTINGS

ARE

THE

ADVENTURE!

Useful Paramaters for both VQGAN and Diffusion: This section is all about the parameters and prompts that can be adjusted with the “Advanced Mode” toggle on in the “Additional Options” section in the Imaginarium. Members of the Pixelmind community have contributed to this section via Discord channels and DMs. Thank you! As we described in the opening, sometimes knowing what a setting means isn't particularly useful - often, what is more useful is the knowledge that comes from experimentation. Yet we've tried to lay out some of the settings that might help you make sense of what you're playing around with, and to suggest a few contexts where they will work! As one of Pixelmind's brilliant developers, Alex, says: "The settings are the adventure!"

Negative Text The negative text category in the Imaginarium subtracts anything you don't want in the final image. Sometimes the AI generates text or watermarks; as a way to lower the odds of this happening, you could add "text, watermarks, royalty free" to the negative text to try to reduce them from appearing in your final image. - @Will Pixelzine | 7 Pixelzine | 8

Source 1

Cuts

Think of cuts like puzzle pieces that create the "complete" image when combined. The higher the cut number, the more Pixelcat is asked to pay attention to the detail of the pieces when creating the overall image and subsequently, the output is higher fidelity. The lower the cut number, the 'easier' it is to assemble the puzzle for Pixelcat. Pixelcat loves puzzles, but once the puzzle exceeds 64 pieces Pixelcat starts to forget where it left its car keys. As an example, when giving the same prompt of "Everest | 3d render" using VQGAN and only changing the cuts variable from 64 to 24, these are the two outputs: 64 cuts

24 cuts

Init Image Noise Experiment with the value! Below are two images of a mountain - one with no image noise and the other set to 50. In the second image you can see how the ‘turbulence’ is increased. No Image Noise

Pixelzine | 8| 8 Pixelzine

Image Noise of 50

Seed "The seed will determine the map of noise that VQGAN will use as its initial image - similar to how the concept of a seed works in Minecraft. Setting the value of the seed to -1 will generate a random image every time. Using any positive integer will generate the same sheet of noise each time, allowing for comparisons in style and tone between different images." This helpful guide provided this definition: https://heystacks.org/doc/935/introduction-tovqganclip.

Init Weight If you are starting from an initial image upload and you want Pixelcat to adhere to the original image subject and composition, a range of 6-12 is recommended for Init Weight. <5 is further away >10 is much closer to the image. Init Weight is more useful in VQGAN, whereas Skip Timesteps achieves similar effects in Diffusion (see page 11). We're including it here as it still is an option for Diffusion.

Pixelzine | 9

Step Size (VQGAN ONLY)

Step Size 0.1 (Default in Advanced Mode)

Step Size 1

Step Size 30

Pixelzine | 10 Pixelzine | 8

Here is the best analogy for Step Size from Alex: Imagine you are in a dark room and can't see anything...Someone calls your name (they are in trouble and you're trying to get to them before the time runs out aka the generation ends), you can sort of figure out which direction they are coming from, so you take a step... and if you can only take small steps, then it might take you a very long time to reach the person, and end up never reaching them. Though, if you take too large steps, then you may end up going way far beyond them and actually never find them...but, with just the right amount of steps, you could reach them just in time to save them, which would be -- finding a very good step size. On the left, we typed "mountain" into Imaginarium and used the Advanced Settings to change the step size, leaving every other setting the same. The results show the effect of smaller or bigger steps. The first uses the default setting for Advanced Mode - a step size of 0.1. Moving quickly, Pixelcat produced a relatively simple and flat depiction of a mountain. The step size in the final image was 30, and the processing time to complete this request was immense. Clearly there is a 'mountain shape,' but nowhere near as good as the Step Size of 1 in the middle. Important note: Diffusion does not use the step size variable at all nor does it use the iterations value. The number of diffusion steps is currently hardcoded (for a few reasons) and considering the way diffusion works (always being "done" at the end of a predetermined sampling schedule), changing the iterations is what would effectively change the learn rate, so they're both effectively hardcoded right now. @BoneAmputee

Diffusion Only Parameters Clip Guidance Scale

Skip Timesteps

Controls how much the image should look like your

If you are uploading a start image and you'd like to keep the image output more like the original, increase your Skip Timesteps. 65-75 is typically Pixelcat's sweet spot.

text prompt. “If you go too high it starts failing, and 10,000 is the max." @Will

TV Scale Controls the smoothness of the final output. Sort of the opposite of VQGAN ‘Init Image Noise.' It doesn't seem to have much effect, but experiment with either 0 and 500. @Will.

Range Scale Controls how far out of range RGB (red, green, blue) values are allowed to be.

Clamp Max “Clamp_max is kind of like a step size or learn rate, if it's too high it will be very messy, if it's too low it won't find the optimal image. Start at 0.05 and then work your way up (7 is typically where it ends up). -@Will

Skip Timesteps 65

Cutn Batches First, think of Cuts like puzzle pieces that individually create your full final picture. If you ‘batch’ them, Pixelcat can focus on a region of your puzzle a bit easier. Technically, you are telling Pixelcat to accumulate CLIP gradient from multiple batches of cuts… if that makes sense, great. If not, think of this this way: 2 batches of 16 would be the same as 1 batch of 32 or 4 batches of 8. Using a higher batch size gets you greater detail/results. But don’t go too high or Pixelcat will OD on Ritalin and pass out. “2-3 is good, too much and you get out of memory errors” -@Will Pixelzine | 11

Skip Timesteps 85

Note: Typically, when starting from an image Pixelheads use VQGAN Generator and adjust Init Weight. More on how to transform images in our first Style Guide.

Init Scale Init Scale enhances the effect of the initial (init) image. What do we mean by “enhance”? Think of seeing a blossoming flower once per day over a week; there is a certain point in time where you are like ‘wow, that is beautiful…’ then it begins to degrade over time. But even a bit of decay is also fascinating! It’s all a bit subjective so see @ch’s examples below and experiment yourself:

Pixelzine | 12

Less Important Things to Know: VQGAN Models OK, so you want to know about image processing models? Are you sure? It's not too late to turn back... Without going into the evolution of graphics processing units (GPU’s) around 2012, we’ll just outline the types of models and their core features. This could help you tune your work and use a specific type of model based on it’s strengths… maybe…this is all so new, no one really knows what’s going on!

Imagenet16K ImageNet is the OG dataset of image recognition. Humans annotated 14Million images and since it’s early creation in 2015, Imagenet’s exploration of object categorization has been fundamental for pioneering ‘machine vision.’ Learn more here: https://www.image-net.org/index.php.

Coco

Dataset Example

Have you ever had to prove to a robot that you are a human using CAPTCHA? Well that’s like what COCO is doing, it’s looking for large objects to identify. COCO stands for Common Objects in Context and is designed to be a large-scale object detection, segmentation, and captioning dataset. Here’s an example of they type of data COCO is trained to recognize: Read more about the specific features of COCO and/or download the white paper here: https://cocodataset.org/#home

WikiArt The WikiArt model trains VQGAN on a dataset derived from WikiArt[dot]org. If you want to learn more, below is a link to a professor at Tufts University using Wiki Art to create generative content that includes a handy component diagram of how models (regardless of source material) work. Source:

https://towardsdatascience.com/ganshare-creating-and-curating-art-with-ai-

for-fun-and-profit-1b3b4dcd7376. Pixelzine | 13 8

Less Important Parameters to Know Init Noise (Currently Only for VQGAN) Perhaps the best way to describe Init_noise is ‘turbulence intensity.’ The image will be disrupted and added texture will appear. The options in Imaginarium are: None Pixels Gradient Selecting 'Pixels' gives the AI a scaffolding to start from instead of an empty pit. 'Gradient' allows the AI to discover larger structures. It is soon to be deterministic so if you find a gradient you like, you'll be able to always get it using the seed variable." - @BoneAmputee

Cut Power Cutpow skews the square pieces that Pixelcat sees to be smaller on average, which changes the composition of the image. Higher cut power generally creates a less focused or “soft” image.

Prompt Optimizer Optimizers... um, well - they optimize. They are used in neural networks to reduce errors (loss function) and maximize efficiency. That's the short answer. If you want to understand each one you can start with the default Adam (unfortunately, not named after Mr. B. Levine): https://arxiv.org/abs/1412.6980v9.

Pixelzine | 14 8

Arwork by Wxll