FEB 2022 | STYLE GUIDE ONE
PIXELZINE
OUR FIRST STYLE GUIDE A COLLABORATION BY CH AND GLOU
02 Editor's Note 03 Primer on Pixelmind Tools 05 Beginner Imagine Tips on Discord
PIXELZINE
ISSUE
8 08 Beginner Diffusion Tips on Discord 10 Understanding GPT-J Technology by Pzoro Zine edited by Bot Ross Blue Cover and table of contents artwork by BRB, a series inspired by "Hive of Creation, Buzzing with Artificial Intelligence"
1
Artwork by Wxll
EDITOR'S NOTE Dear Pixelmind community, Welcome to the first issue of our Style Guide! What follows is the result of much community collaboration - particularly led by members glou and ch. We tip our hats to them for their generative art deep dives you will see so well documented in these pages - and in issues to come! This style guide is dedicated to
building a
foundation of what tools are available, the basics of using them, and some background on their underlying technology. The second issue is already underway with more focus on the nitty-gritty of applying various parameters, particularly when it comes to Diffusion and beta.pixelmind.ai - expect that very soon! In these pages you will find a walkthrough of every tool at your disposal by Robert C Stevens II, and then collectively sourced Imagine and Diffusion Tips, stylistic shorthands and more. Plus, we have a great overview of GPT-J technology by Pzoro - the AI that (in part) powers this thing! Happy reading (and creating!), Bot Ross Blue Pixelzine | 2
PRIMER
ON
Words by Robert C Stevens II
PIXELMIND'S TOOLS
This a brief primer guide on the tools available for you to use in your journey of art creation. It will help familiarize you with the basics to get started.
Tools on Discord: On the main Creation Cat Channel and other creation channels are two tools: .imagine .diffusion To use either of these tools, go to any channel that allows creation (currently Creation-Cat-Members,
Art-Quest
Creations, and WAGMI, all available only for holders). More later on how to use these tools. Caption-Cat Channel: Caption Cat can provide a description of any image that you upload to it. It will also provide a fitting emoji and a couple tag words for the image. It may be able to create a title for the image in the future. Doorways to Anywhere Series Pixelzine | 3
Art by Adam B. Levine in Pixelmind Beta
Skull Series
Lyrical Postcard Series
Tools on beta.pixelmind.ai:
Its settings have been tuned for general purpose
Create (Collections)
creation. You may also select "Advanced Mode",
Create (Collections) is unique to the other tools. It is
which allows you to use a starting image, and lets
a curated experience where you will be able to co-
you tinker with many of the settings. Adjusting
create with the best talent from the Pixelmind
these settings is secondary to coming up with a
community, and with already well-established
good descriptive prompt, but there are guides
acclaimed artists. You'll also be able to mint some of
available for how to adjust them if you are so
your creations in a digital collectible NFT format in
inclined (more on these advanced settings in our
the near future. Put simply, it is a finely tuned
next issue).
experience that will produce similar results each
Notes on the Website Experience
time, with a unique style based on what you type in
Images created on the website will typically be
the prompt.
higher resolution and more evocative of your
Collections is ideal for beginners, as it requires no
prompt.
technical knowledge of the tool. Be aware that
Final Word
some expectations await you here. The curator
A balanced use of the Website and Discord Server
artist will usually ask you to complete a sentence
will have you well on your way to greatness. Good
(For instance, in the Skull series "This skull is made
Luck and WAGMI!
of ..."). Or alternatively, they will give a description of what your prompt should include (In the Lyrical Postcard series, you are asked to provide lines from a song or poem). Imaginarium Imaginarium allows you to simply type in any prompt in plain english and watch your creation come to life.
Pixelzine | 4
Robert C Stevens II Editor's note: Before our next edition with more info on creating in beta.pixelmind.ai, check out this video tutorial. https://www.loom.com/share/c71f5aa987c 14afa9c55041c01ed15c2
Collaboration by ch,
IMAGINE ON
Bot Ross Blue and
TIPS
glou
DISCORD Source 1
.imagine is an AI that looks for places in an image where it can get its hooks into and tug into the ideal configuration to meet the
Change Image Prompt On discord you upload one image you would like to change with prompt:
prompt. .imagine is easier to use but results aren't
e.g. .imagine landscape, artstation --init_weight 7
usually as good as .diffusion. Repeating the same prompt usually gets you similar output.
Basic Prompt for Imagine .imagine TEXT DESCRIPTION, STYLE e.g. .imagine landscape, watercolor style
Initial Image
Result Pixelzine | 5
trending
on
Square Image Prompt Below is a shorthand for working with a square image, typically a pfp: .imagine TEXT DESCRIPTION --init_weight 6 --learn_rate 0.09 -steps 75 --w 800 --h 800 Below is how the prompt was modified to create the pixelmix result:
.imagine Lavender painted in Impressionist style --init_weight 6 -learn_rate 0.09 --steps 75 --w 800 --h 800
If you wanted to make this image appear more colorful or distinct, you could add in style words like "multicolor," "bold lines," "dramatic," etc. A popular way to modify a pfp is to use the frame "made of [object]," such as smoke, sunset, bubblegums - you name it!
Beginner Imagine Commands Here are a few more input commands you can play around with:
--init_weight # how tightly it'll stick to the original uploaded image, subject and composition. With .imagine, you typically want to stick within a range of 6-12 (default: 1). As a test, see how the same prompt above, with a changed init weight affects the outcome. This init weight was brought to 15 (with every other value the same) and you can see how the result stayed closer to the original, including in the background color and distinct shapes of the crown.
Init weight of 15
The next image is with an init weight of 1. You can tell by how the subject in the center is starting to blur into the background, and the stronger lines are replaced by more 'impressionistic' brushstrokes. Though both examples are very similar, they show that changing the init weight results in how much of the original image is expressed.
Pixelzine | 6| 8 Pixelzine
Init weight of 1
--steps # number of iterations to work (default: 250) *in Imaginarium it's called "iteration value.' In the last example, you can see the step number was much lower (75), as we wanted to do fewer iterations on the initial image. Our earlier example, .imagine landscape, watercolor style, depicted on the left, used the default step size of 250. We used 400 steps to show you the subtle way more steps can affect the result, including add more detailed elements and dimensions. Careful though: more steps doesn't always mean better results!
Steps 250 --learn_rate # Creation cat will interpret --lr, -learn_rate and --step_size to all mean the same thing. It's usually a number with a decimal. if you want the robot to work faster (in fewer steps) and more sloppily, you increase that number. 0.01 is more controlled, as if you were to focus on a task harder and be slower at it as a result (default 0.1) You can see above in the square pfp image example, we used a learn rate of 0.09. More on learn rate in the next issue - for now, default settings are good. --width # output image width in pixels (default: 512)
Steps 400
BONUS! CLIP VQGAN: Keyword Comparison
Pixelzine | 7| 8 Pixelzine
--height # output image height in pixels (default: 512)
Visit this helpful resource to view how certain keywords determine outcomes in .imagine: https://imgur.com/a/SnSIQRu
Collaboration by ch, Bot Ross Blue and
DIFFUSION ON
TIPS
glou
DISCORD Source 1
.diffusion is high fidelity and achieves its aims by basically starting from a blur and then clarifying, clarifying, clarifying. Repeating the same prompt usually gets you a different output.
Basic Prompt for Diffusion .diffusion STYLE
OBJECT/TEXT
DESCRIPTION,
More Complex Prompt If you want a main object and then secondary objects, you can test something out like this, a hack originated by @Architect. Repeating the 'main' object' will make the AI take notice: .diffusion MAIN OBJECT, SECONDARY OBJECT | MAIN OBJECT | STYLE
e.g. .diffusion landscape, trending on artstation
e.g. .diffusion rabbit, landscape | rabbit | trending on artstation --w 800 --h 600
Quick Tip! Using trending on artstation as a style prompt is a quick way to tell the bot to produce images that are colorful, stylized, and rendered like digital art. More aesthetic shorthands listed on the next page.
Pixelzine | 8
Testing Styles with Diffusion Words and artwork by ch I have tested all the styles I could remember of and did .diffusion landscape, {style} on all of them and here is the result: https://ch.gallery/pixelmind/styles. It’s a great way to visually find a style you want, and this could be always updated with suggestions from the community. Artist style by @remi_durant: https://remidurant.com/artists/#
Stained Glass
Ukiyo-e
Pixelzine | 9 8
Flat Shading
Willy Art
Charcoal Drawing
8K Resolution
UNDERSTANDING
Essentially, language models are models that try
THE
lot of words. 📖
MAGIC
OF
to learn what “language” looks like by reading a By reading A LOT of words, the model can learn
GPT-J
a probability distribution over words and word sequences. Overall, this helps the model predict what words or phrases would make sense next to
Words by Pzoro, Artwork by Wxll
each other or produce a valid sequence after a GPT-J-6B is an open source, autoregressive
given sequence of words. Validity here refers not
language
a
to grammar, but to how often someone would use
collective of researchers working to open source
this combination of words in speech or text based
AI research.
on the data the model has consumed.
That was quite a handful to read. Let’s unpack
For example, if you type “How” - the model can
that statement a bit by understanding what a
suggest “are”, “is”, “to” etc. simply because these
language model is - which will lead us to
words often occur after the word “How” in the
understanding what GPT-J is.
English language. Now that we understand this
model
created
by
EleutherAI,
simple scenario, what else can language models
What is a language model?
do?
We all have used smart keyboards on our phones. As we type sentences, the keyboard offers us a
What can language models do?
few choices of words it thinks we might want to
Not only trained to learn the nuance of language,
type next. Given a bit more thought, it seems
these models can also be trained to perform
strange a phone can even predict what we want
many different tasks. We can train models to
to say. So how does it know which words to offer
translate languages, summarize large texts, or
us? The answer is Language Models.
even answer questions.
Pixelzine | 10
After all, the answers to a question can just be
Essentially, the model has understood language
thought of as predicting what words should
well enough to perform many tasks automatically.
come after the words in the question. We can
Although, fine-tuning the model for specific tasks
even
can help boost performance and accuracy.
train
the
model
to
code
since
a
programming language is also a “language”.
“Transformer”
refers
to
the
popular
model
Exciting, isn’t it?
architecture in deep learning. One way to think about model architecture is that it defines how the
So can the model in our smart keyboards do all
information learned by a model is organized inside
these tasks since it is a language model? Not
it. Learn more about it here in a very well written
quite. The models that can perform these tasks
and easy-to-understand article about transformers.
are much bigger and consume much more data and compute to learn. GPT-J-6B is one of those
Let’s continue. "J" distinguishes this model from
large models.
other GPT variants and is likely due to the model being trained on the popular python library “JAX” it is something that helps programmers make
What is GPT-J-6B? "GPT"
is
short
transformer.
for
Let’s
generative break
that
pre-trained down.
🔨
models
without
manually
writing
out
all
mathematical operations underneath.
“Generative” means that the model was trained to predict or “generate” the next token (word) in
"6B" represents the 6 billion trainable parameters.
a sequence of tokens. “Pre-trained” refers to the
Parameters can be thought of as information
fact that a trained model can be considered
storing units in neural network models. When a
entirely trained for any language task and does
model is given an input, it runs through a
not need to be re-trained for specific tasks
combination of these parameters to give us the
individually (with some caveats).
result.
Pixelzine | 10
Quality of a language model has been found to
Now imagine, if instead of showing the model a
continue
of
whole bunch of text in training, we showed the
parameters increase. More parameters means it
model an image (or rather pixel data) along with
can consume more data successfully and store it.
some text that described that image.
to
improve
as
the
number
For example, the GPT-3 from OpenAI for example has 175 billion parameters (almost 30x
Would the model be similarly able to learn what
larger than GPT-J-6B). The most recent state-of-
text often appears with what kind of pixels?
the-art language model Megatron-Turing by
Would the model be able to get a rough idea of
Microsoft
what pixel values often occur next to each other
and
NVIDIA
has
530
billion
parameters (almost a crazy 90x bigger 😲).
in an image when the text description contains the word “corgi”?
So what is GPT-J-6B trained on and it’s importance?
Understanding GPT-J 🧠 4
As we have learnt, the model not only learns the
You see where I am going with this. If we scale
words individually but learns the combination of
this kind of thinking, we can imagine that the
words and phrases. In learning this, the model
model can learn the relationship between words
also learns the biases and behaviors of the
and parts of an image, location of objects in an
original dataset which can be very dangerous.
image, the concept of foreground/background,
Some extreme examples here. The dataset plays
styles and even what “Trending on artstation”
a crucial part in the model’s learning.
looks like. It will be able to start predicting pixels in the same way that it was able to predict
For GPT-J-6B, the research group has compiled
words 🤯
a 825 gigabytes (GB) large dataset called The Pile, curated from a set of datasets including
There you have it folks - that’s my idea of how a
arXiv,
StackExchange,
Pixelmind-like tool could use language models for
HackerNews, etc. The model was trained on 400
more than just language. Now, to be very clear
billion tokens from this dataset.
there are way more advanced methodologies of
GitHub,
Wikipedia,
achieving image generation from text that Now you might be thinking: “That’s all good to
cannot be captured in this simplistic example.
know but what does this have to do with
Pixelmind leverages those methods for us in the
images?” 🤔. You’ll have to bear with me as this is
tools we love but I hope this text gives you some
where my understanding still has its gaps but
tiny
let’s try to reason about it.
Intelligence is making the magical world of
bit
of
glimpses
into
how
Artificial
generating wild art from our imaginations
Language models + images + Pixelmind? Now, we established earlier that language models
were
good
at
learning
what
words/phrases appeared often around other words/phrases.
Pixelzine | 11
possible. 🧠