Article Six: My text-to-img Journey - Prompt Basics and Prompt Matrix

This is the sixth post in a series of articles I have been writing about text-to-image software. In this series, I will talk about how the technology works, ways you can use it, how to set up your own system to batch out images, technical advice on hardware and software, advice on how you can get the images you want, and touch on some of the legal, ethical, and cultural challenges that we are seeing around this technology. My goal is to keep it practical so that anyone with a little basic computer knowledge can understand and use these articles to add another tool to their art toolbox, as cheaply and practically as possible. In this sixth post, we will discuss some ways to prompt Automatic1111, and how to use the prompt matrix.

Interface Basics

We discussed some of the automatic interface basics in the last article, but I want to repeat some of these again:
  1. Stable Diffusion Checkpoint: In the upper left of your window. Select the model you want to use. We, surprise, are using our 8Buff_GEN_FP16_v1.safetensor for this demo.
  2. The top text box is for positive prompts, the things you want in your image.
  3. The bottom text box is for negative prompts. These are things we want to avoid in our image. In the image we preloaded it with our common negative prompts we use for the 8Buff_Gen model.
  4. Sampling method: Stability AI uses various sampling types when generating images. In general, samples allow for the development of finer details using a variational autoencoder (VAE), a type of artificial neural network that is used for unsupervised learning of complex distributions. You can select different sampling methods to refine your image. See here for more information.
  5. Sampling Steps: How many times to improve the image iteratively. Higher numbers take longer but create better results.
  6. Width & Height: Setting controls the size of the output image. We recommend using 512x512 for initial testing. Most models are trained for a certain pixel size. The default for many models is 512x512 meaning the image is 512 pixels in height and 512 pixels in length. The newer models default to 768x768 (image is 768 pixels in height and 768 pixels in length). You can do different size images with most models. Why the size matters is to get the most consistent and best images if you use the size that the model was trained with, 512x512 or 768x768. Expand the size after you have basedlined the model.
  7. CFG Scale: parameter that controls how much the image generation process follows the text prompt. The higher the value, the more image sticks to a given text input.
  8. Batch count: controls the number of images to be generated each time. We usually set this to 4 so we can see if our prompt needs more tweaking, or if we are getting a one off image.
  9. Seed: Specifies the exact way the AI calculates your individual image
  10. Image viewer is where the image is displayed during and after generation.


Using Styles

A useful thing to know when you are using the automatic1111 UI is the possibility to save and load prompts by using the “Styles” functionality.

To do this you just need to write a prompt as you’d usually do and then just click the file disk icon on the right (under the generate button):

Then just choose a name for the style and save it.

Now you can call that prompt by selecting it in the styles box under the icons. If you want to paste the style into your active prompt you just have to click the clipboard on the left of the file disk icon.

Learn the Checkpoint

The first thing we need to understand when we write a prompt for a specific Stable Diffusion checkpoint is how the checkpoint has been trained.

For example the standard Stable Diffusion model has been trained to work with natural language prompts while other checkpoints work better with keywords separated by commas.

When you download a custom checkpoint, always check if there are any hints or trigger words about how the model can be prompted in the description. Another quick way to understand how to correctly use the model is to check some example prompts made by the creator of the checkpoint or by the community.

We are going to use our 8buff_Gen model for these examples. We have trained and mixed this model to be prompt heavy, meaning the more detail the better the picture. The 8Buff_Gen model can be downloaded from Huggingface (https://huggingface.co/EightBuff/8Buff_Gen) or Civitiai (https://civitai.com/models/41928/8buffgen) where details on the model and prompts are provided.

Note: All positive prompts below are in italics. The full PNG info is stored in each image or you can find it at the end of this post.
 
Lets start with a very simple prompt.

Friends at a park,

So we generated 4 images. Not bad, but not interesting either. So lets take the seed, 2613755730, from the first image and expand on it.
 

Copy the seed number from the PNG info under the picture and replace the -1 under seed with 2613755730 so we can keep using that seed as our starting point. Now lets add to the prompt.
 
photo of friends at a park,


More, friends, but they are walking away. Lets try and correct this.

photo of friends at a park, looking at camera,


 
Now they are looking at us, but there's some blurry and eyes aren't looking not quite right. Again, lets see if we can address

photo of friends at a park, looking at camera, perfect faces,


Better. Eyes need some work, but lets first try to get away from the mono culture.

photo of America friends at a park, looking at camera, perfect faces,


That didn't work as well as I hoped. Let's emphasize some of the keywords I want and adjust the prompt to be more clear. First let's cover emphasis and attention.

Attention/emphasis

You can add weight to your prompts, to increase or decrease prompt words by using () in the prompt increases the model's attention to enclosed words, and [] decreases it. You can combine multiple modifiers:

Cheat sheet:
a (word) - increase attention to word by a factor of 1.1
a ((word)) - increase attention to word by a factor of 1.21 (= 1.1 * 1.1)
a [word] - decrease attention to word by a factor of 1.1
a (word:1.5) - increase attention to word by a factor of 1.5
a (word:0.25) - decrease attention to word by a factor of 4 (= 1 / 0.25)
a \(word\) - use literal () characters in prompt

With (), a weight can be specified like this: (text:1.4). If the weight is not specified, it is assumed to be 1.1. Specifying weight only works with () not with []. If you want to use any of the literal ()[] characters in the prompt, use the backslash to escape them: anime_\(character\).

Going back to our example, lets change the word America to American and add some attention to it.

photo of (American) friends at a park, looking at camera, perfect faces, different faces, different clothes, different hair,
 

Better, but they don’t look they know what to do with there faces. Lets make them happy.

photo of (American) friends at a park, looking at camera, happy, perfect faces, different faces, different clothes, different hair,

 

Good. An important thing to remember when prompting is to use words that can impact the overall quality of your picture. When images are trained in Stable Diffusion they are usually accompanied by adjectives that describes the the image. These can be positive (happy, beautiful, detailed, masterpiece) or negative (sad, bad, awful, deformed). Using these in your prompts can drastically change the quality of your picture.

To show this I am going to show you how to use the prompt matrix, the importance of prompt order and quality impact for the overall picture.

First, let's reorder the last positive prompt we use and move the prompt word happy to the end.

photo of (American) friends at a park, looking at camera, perfect faces, different faces, different clothes, different hair, happy,


 
Notice moving just that one word prompt to the end of the prompt string altered the image. Order matters, so for consistency of images, make sure your prompt order changes as little as possible.

Now, let's see how the overall image is changed with different quality adjectives. Instead of doing this as separate generated images, we are going to use a control script to create a prompt matrix. See image for reference on setting up a prompt matrix.

Under “ControlNet” and “Script” on the lower left drop down the menu to “Prompt matrix”. Make sure under select “prompt positive” is selected. Under “Select joining char” make sure “comma” is selected.

Under “Grid margins (px.)” select 6 to keep our grid clean.

To use the prompt matrix, we need to add in the positive prompt what we want to test, and separate them by pipes | with no spaces. Lets also move the batch count to 16 so we create enough images and replace happy, with
|happy|angry|excited|sad, so that the positive prompt looks like this.

photo of (American) friends at a park, looking at camera, perfect faces, different faces, different clothes, different hair, |happy|angry|excited|sad,

After all the images are generated, the prompt matrix will be generated.

As you can see, each prompt changes the image, and you can combine the prompts for much bigger changes to how the image looks.

To summarize, learn your models, and use the right one for the right image. Order, attention, emphases and adjectives in prompts can all change the way your images look, in subtle or huge ways. Don’t be afraid to test different things and run a model through some testing before committing to it on large projects. Most importantly, be creative with your prompts, you never know what images the computer may provide that inspire you in new and different ways.
 
Full first image PNG info:
friends at a park,
Negative prompt: (((worst quality, distorted face, censor, jpeg artifacts, signature, watermark, text, username))), blurry, bad faces, bad eyes, bad anatomy, bad hands, extra limbs, missing fingers, extra digit, fewer digits, cropped, same hair, same face, duplicate,
Steps: 100, Sampler: Euler a, CFG scale: 8, Seed: 2613755730, Size: 512x512, Model hash: 431208a773, Model: 8buff_GEN_FP16_v1






Comments

Popular Posts