Our First Style Guide by pixelmindai

FEB 2022 | STYLE GUIDE ONE

PIXELZINE

OUR FIRST STYLE GUIDE A COLLABORATION BY CH AND GLOU

02 Editor's Note 03 Primer on Pixelmind Tools 05 Beginner Imagine Tips on Discord

PIXELZINE

ISSUE

8 08 Beginner Diffusion Tips on Discord 10 Understanding GPT-J Technology by Pzoro Zine edited by Bot Ross Blue Cover and table of contents artwork by BRB, a series inspired by "Hive of Creation, Buzzing with Artificial Intelligence"

Artwork by Wxll

EDITOR'S NOTE Dear Pixelmind community, Welcome to the first issue of our Style Guide! What follows is the result of much community collaboration - particularly led by members glou and ch. We tip our hats to them for their generative art deep dives you will see so well documented in these pages - and in issues to come! This style guide is dedicated to

building a

foundation of what tools are available, the basics of using them, and some background on their underlying technology. The second issue is already underway with more focus on the nitty-gritty of applying various parameters, particularly when it comes to Diffusion and beta.pixelmind.ai - expect that very soon! In these pages you will find a walkthrough of every tool at your disposal by Robert C Stevens II, and then collectively sourced Imagine and Diffusion Tips, stylistic shorthands and more. Plus, we have a great overview of GPT-J technology by Pzoro - the AI that (in part) powers this thing! Happy reading (and creating!), Bot Ross Blue Pixelzine | 2

PRIMER

Words by Robert C Stevens II

PIXELMIND'S TOOLS

This a brief primer guide on the tools available for you to use in your journey of art creation. It will help familiarize you with the basics to get started.

Tools on Discord: On the main Creation Cat Channel and other creation channels are two tools: .imagine .diffusion To use either of these tools, go to any channel that allows creation (currently Creation-Cat-Members,

Art-Quest

Creations, and WAGMI, all available only for holders). More later on how to use these tools. Caption-Cat Channel: Caption Cat can provide a description of any image that you upload to it. It will also provide a fitting emoji and a couple tag words for the image. It may be able to create a title for the image in the future. Doorways to Anywhere Series Pixelzine | 3

Art by Adam B. Levine in Pixelmind Beta

Skull Series

Lyrical Postcard Series

Tools on beta.pixelmind.ai:

Its settings have been tuned for general purpose

Create (Collections)

creation. You may also select "Advanced Mode",

Create (Collections) is unique to the other tools. It is

which allows you to use a starting image, and lets

a curated experience where you will be able to co-

you tinker with many of the settings. Adjusting

create with the best talent from the Pixelmind

these settings is secondary to coming up with a

community, and with already well-established

good descriptive prompt, but there are guides

acclaimed artists. You'll also be able to mint some of

available for how to adjust them if you are so

your creations in a digital collectible NFT format in

inclined (more on these advanced settings in our

the near future. Put simply, it is a finely tuned

next issue).

experience that will produce similar results each

Notes on the Website Experience

time, with a unique style based on what you type in

Images created on the website will typically be

the prompt.

higher resolution and more evocative of your

Collections is ideal for beginners, as it requires no

prompt.

technical knowledge of the tool. Be aware that

Final Word

some expectations await you here. The curator

A balanced use of the Website and Discord Server

artist will usually ask you to complete a sentence

will have you well on your way to greatness. Good

(For instance, in the Skull series "This skull is made

Luck and WAGMI!

of ..."). Or alternatively, they will give a description of what your prompt should include (In the Lyrical Postcard series, you are asked to provide lines from a song or poem). Imaginarium Imaginarium allows you to simply type in any prompt in plain english and watch your creation come to life.

Pixelzine | 4

Robert C Stevens II Editor's note: Before our next edition with more info on creating in beta.pixelmind.ai, check out this video tutorial. https://www.loom.com/share/c71f5aa987c 14afa9c55041c01ed15c2

Collaboration by ch,

IMAGINE ON

Bot Ross Blue and

TIPS

glou

DISCORD Source 1

.imagine is an AI that looks for places in an image where it can get its hooks into and tug into the ideal configuration to meet the

Change Image Prompt On discord you upload one image you would like to change with prompt:

prompt. .imagine is easier to use but results aren't

e.g. .imagine landscape, artstation --init_weight 7

usually as good as .diffusion. Repeating the same prompt usually gets you similar output.

Basic Prompt for Imagine .imagine TEXT DESCRIPTION, STYLE e.g. .imagine landscape, watercolor style

Initial Image

Result Pixelzine | 5

trending

Square Image Prompt Below is a shorthand for working with a square image, typically a pfp: .imagine TEXT DESCRIPTION --init_weight 6 --learn_rate 0.09 -steps 75 --w 800 --h 800 Below is how the prompt was modified to create the pixelmix result:

.imagine Lavender painted in Impressionist style --init_weight 6 -learn_rate 0.09 --steps 75 --w 800 --h 800

If you wanted to make this image appear more colorful or distinct, you could add in style words like "multicolor," "bold lines," "dramatic," etc. A popular way to modify a pfp is to use the frame "made of [object]," such as smoke, sunset, bubblegums - you name it!

Beginner Imagine Commands Here are a few more input commands you can play around with:

--init_weight # how tightly it'll stick to the original uploaded image, subject and composition. With .imagine, you typically want to stick within a range of 6-12 (default: 1). As a test, see how the same prompt above, with a changed init weight affects the outcome. This init weight was brought to 15 (with every other value the same) and you can see how the result stayed closer to the original, including in the background color and distinct shapes of the crown.

Init weight of 15

The next image is with an init weight of 1. You can tell by how the subject in the center is starting to blur into the background, and the stronger lines are replaced by more 'impressionistic' brushstrokes. Though both examples are very similar, they show that changing the init weight results in how much of the original image is expressed.

Pixelzine | 6| 8 Pixelzine

Init weight of 1

--steps # number of iterations to work (default: 250) *in Imaginarium it's called "iteration value.' In the last example, you can see the step number was much lower (75), as we wanted to do fewer iterations on the initial image. Our earlier example, .imagine landscape, watercolor style, depicted on the left, used the default step size of 250. We used 400 steps to show you the subtle way more steps can affect the result, including add more detailed elements and dimensions. Careful though: more steps doesn't always mean better results!

Steps 250 --learn_rate # Creation cat will interpret --lr, -learn_rate and --step_size to all mean the same thing. It's usually a number with a decimal. if you want the robot to work faster (in fewer steps) and more sloppily, you increase that number. 0.01 is more controlled, as if you were to focus on a task harder and be slower at it as a result (default 0.1) You can see above in the square pfp image example, we used a learn rate of 0.09. More on learn rate in the next issue - for now, default settings are good. --width # output image width in pixels (default: 512)

Steps 400

BONUS! CLIP VQGAN: Keyword Comparison

Pixelzine | 7| 8 Pixelzine

--height # output image height in pixels (default: 512)

Visit this helpful resource to view how certain keywords determine outcomes in .imagine: https://imgur.com/a/SnSIQRu

Collaboration by ch, Bot Ross Blue and

DIFFUSION ON

TIPS

glou

DISCORD Source 1

.diffusion is high fidelity and achieves its aims by basically starting from a blur and then clarifying, clarifying, clarifying. Repeating the same prompt usually gets you a different output.

Basic Prompt for Diffusion .diffusion STYLE

OBJECT/TEXT

DESCRIPTION,

More Complex Prompt If you want a main object and then secondary objects, you can test something out like this, a hack originated by @Architect. Repeating the 'main' object' will make the AI take notice: .diffusion MAIN OBJECT, SECONDARY OBJECT | MAIN OBJECT | STYLE

e.g. .diffusion landscape, trending on artstation

e.g. .diffusion rabbit, landscape | rabbit | trending on artstation --w 800 --h 600

Quick Tip! Using trending on artstation as a style prompt is a quick way to tell the bot to produce images that are colorful, stylized, and rendered like digital art. More aesthetic shorthands listed on the next page.

Pixelzine | 8

Testing Styles with Diffusion Words and artwork by ch I have tested all the styles I could remember of and did .diffusion landscape, {style} on all of them and here is the result: https://ch.gallery/pixelmind/styles. It’s a great way to visually find a style you want, and this could be always updated with suggestions from the community. Artist style by @remi_durant: https://remidurant.com/artists/#

Stained Glass

Ukiyo-e

Pixelzine | 9 8

Flat Shading

Willy Art

Charcoal Drawing

8K Resolution

UNDERSTANDING

Essentially, language models are models that try

THE

lot of words. 📖

MAGIC

to learn what “language” looks like by reading a By reading A LOT of words, the model can learn

GPT-J

a probability distribution over words and word sequences. Overall, this helps the model predict what words or phrases would make sense next to

Words by Pzoro, Artwork by Wxll

each other or produce a valid sequence after a GPT-J-6B is an open source, autoregressive

given sequence of words. Validity here refers not

language

to grammar, but to how often someone would use

collective of researchers working to open source

this combination of words in speech or text based

AI research.

on the data the model has consumed.

That was quite a handful to read. Let’s unpack

For example, if you type “How” - the model can

that statement a bit by understanding what a

suggest “are”, “is”, “to” etc. simply because these

language model is - which will lead us to

words often occur after the word “How” in the

understanding what GPT-J is.

English language. Now that we understand this

model

created

EleutherAI,

simple scenario, what else can language models

What is a language model?

do?

We all have used smart keyboards on our phones. As we type sentences, the keyboard offers us a

What can language models do?

few choices of words it thinks we might want to

Not only trained to learn the nuance of language,

type next. Given a bit more thought, it seems

these models can also be trained to perform

strange a phone can even predict what we want

many different tasks. We can train models to

to say. So how does it know which words to offer

translate languages, summarize large texts, or

us? The answer is Language Models.

even answer questions.

Pixelzine | 10

After all, the answers to a question can just be

Essentially, the model has understood language

thought of as predicting what words should

well enough to perform many tasks automatically.

come after the words in the question. We can

Although, fine-tuning the model for specific tasks

even

can help boost performance and accuracy.

train

the

model

code

since

programming language is also a “language”.

“Transformer”

refers

the

popular

model

Exciting, isn’t it?

architecture in deep learning. One way to think about model architecture is that it defines how the

So can the model in our smart keyboards do all

information learned by a model is organized inside

these tasks since it is a language model? Not

it. Learn more about it here in a very well written

quite. The models that can perform these tasks

and easy-to-understand article about transformers.

are much bigger and consume much more data and compute to learn. GPT-J-6B is one of those

Let’s continue. "J" distinguishes this model from

large models.

other GPT variants and is likely due to the model being trained on the popular python library “JAX” it is something that helps programmers make

What is GPT-J-6B? "GPT"

short

transformer.

for

Let’s

generative break

that

pre-trained down.

🔨

models

without

manually

writing

out

all

mathematical operations underneath.

“Generative” means that the model was trained to predict or “generate” the next token (word) in

"6B" represents the 6 billion trainable parameters.

a sequence of tokens. “Pre-trained” refers to the

Parameters can be thought of as information

fact that a trained model can be considered

storing units in neural network models. When a

entirely trained for any language task and does

model is given an input, it runs through a

not need to be re-trained for specific tasks

combination of these parameters to give us the

individually (with some caveats).

result.

Pixelzine | 10

Quality of a language model has been found to

Now imagine, if instead of showing the model a

continue

whole bunch of text in training, we showed the

parameters increase. More parameters means it

model an image (or rather pixel data) along with

can consume more data successfully and store it.

some text that described that image.

improve

the

number

For example, the GPT-3 from OpenAI for example has 175 billion parameters (almost 30x

Would the model be similarly able to learn what

larger than GPT-J-6B). The most recent state-of-

text often appears with what kind of pixels?

the-art language model Megatron-Turing by

Would the model be able to get a rough idea of

Microsoft

what pixel values often occur next to each other

and

NVIDIA

has

530

billion

parameters (almost a crazy 90x bigger 😲).

in an image when the text description contains the word “corgi”?

So what is GPT-J-6B trained on and it’s importance?

Understanding GPT-J 🧠 4

As we have learnt, the model not only learns the

You see where I am going with this. If we scale

words individually but learns the combination of

this kind of thinking, we can imagine that the

words and phrases. In learning this, the model

model can learn the relationship between words

also learns the biases and behaviors of the

and parts of an image, location of objects in an

original dataset which can be very dangerous.

image, the concept of foreground/background,

Some extreme examples here. The dataset plays

styles and even what “Trending on artstation”

a crucial part in the model’s learning.

looks like. It will be able to start predicting pixels in the same way that it was able to predict

For GPT-J-6B, the research group has compiled

words 🤯

a 825 gigabytes (GB) large dataset called The Pile, curated from a set of datasets including

There you have it folks - that’s my idea of how a

arXiv,

StackExchange,

Pixelmind-like tool could use language models for

HackerNews, etc. The model was trained on 400

more than just language. Now, to be very clear

billion tokens from this dataset.

there are way more advanced methodologies of

GitHub,

Wikipedia,

achieving image generation from text that Now you might be thinking: “That’s all good to

cannot be captured in this simplistic example.

know but what does this have to do with

Pixelmind leverages those methods for us in the

images?” 🤔. You’ll have to bear with me as this is

tools we love but I hope this text gives you some

where my understanding still has its gaps but

tiny

let’s try to reason about it.

Intelligence is making the magical world of

bit

glimpses

into

how

Artificial

generating wild art from our imaginations

Language models + images + Pixelmind? Now, we established earlier that language models

were

good

learning

what

words/phrases appeared often around other words/phrases.

Pixelzine | 11

possible. 🧠