truekry

Wizzard

Foreword

Hello dear readers,
you may ask, what is this? The answer would be “a summary of my personal experiences playing with AI to generate images, so far”. I want to share my experiences, discoveries I made and little tricks I came upon. My main target is to get people new to the topic up to speed, and maybe some old veterans can also learn a thing or two.

That said, English is only my second language. If you see a mistake or a passage of text that is just weird, please tell me and I will correct it. You are also invited to share your own experiences and so on.

The Content

Chapter 1 - What you can see

Lets start with a simple, but powerful one: If you get an idea for an image, just stop and think a moment before you prompt. Why? Lets say you want Fluttershy on a bed. Easy! pony, feral pony, my little pony, close up, fluttershy, indoors, bedroom, on bed, lying on side. Boom! done. And you have a wardrobe between the bed and the wall where realistically is not enough space to fit one. Bummer. But why? The problem here is bedroom, believe it or not. While training a model, pictures of bedrooms are tagged as such and the AI learns the concept of them. What have most bedroom pictures in them? A night stand, wardrobes, maybe some pictures on the wall, etc. You get the idea. Nice of the AI to add it automatically, but if we want a picture of just the bed with a pony on it, as in only part of a room, the AI will still try to add these things. We have two options now to avoid this. We could leave bedroom out and add to the prompt stuff like: wooden wall, window with curtains, pictures OR we prompt stuff out we don’t want as a negative: wardrobe, nightstand with a high enough value.

This is one example of course. Lets say you want to prompt a character only visible from behind. Some one like solo, pony, feral pony, bat pony, grey fur, spread bat wings, yellow eyes, blue mane. This will most likely result in a bat pony seen at an angle from behind or a pony looking over their shoulder. This time yellow eyes are the culprit. The AI will always try to put into a pic what you prompt. If you want the pony to look away from the viewer, you can add looking away with a high value, but it would be way easier to just leave the eyes out.

And this is what I mean. Put only stuff in the prompt of what you wish to be in the picture and mind “implications”. xy-room and so on always result in some implications for example. So stop a moment and think what truly should be visible in your scenario. And now the curve ball and my favorite example: castle bedroom

If you have been paying attention you know what could happen here. We get a luxurious room with stone wall, a big window and outside of it? A castle! Go figure. AI will generate what you give it and castle was definitely in there. luxurious bedroom would be a good alternative to use here for example. A trick that sometimes seems to work, depending on the model, is using “-” or “_”. Like castle_bedroom. It no cure it all but it helps. Same principle, different example for this: Lets say you want Sweetie Belle without her cutie mark. You put cutie mark in the negative but is still doesn’t work? The term blank flank is an option but you probably guessed it: cutiemark, cutie_mark, cutie-mark. To give the curve ball a curve ball, different models used different tags in training. A “Applejack” or “Apple Jack” kind of scenario. Good luck finding that out.

Chapter 2 - The Power of Paint

In chapter 1 we talked about prompting, but sometimes a prompt can get you only so far. AI can’t get new concepts without being retrained, so we as end users have to make do until a new version is out that knows the concept. Like a character from a recent anime. Sure, Loras as little DLCs help us over until PonyV8 - Electric Boogaloo comes out and can generate the whole universe at once, but using too much of them is like Skyrim modding. It gets even more unstable really fast. So normally you want as less Loras as possible. Inpainting is a really powerful tool that not many people seem to use, so here a quick example to make my point:

The picture you see on the left? I turned something like that into the right one. You still don’t need any artistic skills, just a general idea of what you want. I used a simple “technique” called blobbing. And it is as dumb as it sounds. Just open Paint (or any other digital drawing application), set it to the resolution you want and just drop in simple blobs until they “loosely” resemble your idea. Then write the matching prompt and denoise your “sketch” at 0.75 to like 0.9 (so its nearly gone). It will then work as a “guidance” for the AI.

I used the same technique here, just I drew into the generated image, then masked the area out and let it redraw only that part. This was at one point, only a one pony picture. (1 cookie if you can tell which one was the original.)

Interlude

Something you need to be aware of in general is prompt “bleed”. Let say you want a picture with Twilight Sparkle taking a sun bath. Most likely the result will be a dusk picture. The problem is Twilight. As in chapter 1 mentioned, the AI will take what is there. It takes the Twilight as base to make a dusk setting, like the castle example. Its a general issue most models have. Fern, from the popular anime, is a good second example. If I generate her, I also get a lot of fern, the plant. (Frieren and Fern are actually German words and mean “freezing” and “distance” respectively. Japanese love to take german words in their fantasy writing.)

This can be countered by adding the daytime we don’t want in the negative or fern plant in the other case. Just another thing to look out for in general.

Chapter 3 - Fantasy is Dead

So far, I talked about having a general idea, or that you have a picture in mind. What if you don’t? What if you just have the basic need of: “Need cute pone pictures”? This chapter is for you when you don’t get why grown men get exited over some sticks on the internet. Jokes aside, sometimes we have just a vague inkling or are just bored. That’s what the cfg-scale is for. Most models give users a range like 3-7 but what does this actually do? Its basically just a scale how “obedient” the AI is to your prompt. The higher the scale, the less it does on its own. Think like the bleed thing I talked about, just on purpose. This also means part of your prompt will be ignored more likely. I bet you generated a pony before and said yellow mane, blue eyes and got the exact opposite. That could be a too low cfg. I personally consider 3 the lowest, regardless of model. The highest I ever used was 11 back in the SD1.5 days. For SDXL (that includes pony, illustrious, Noob, etc…) it is around 7 to 8. While it can go up to 20, you will never be happy with those results, trust me bro.

So now that we established that we are unimaginative bellends, can we still make pictures? Yes, that what AI is for after all. You want to go with a very simple prompt for this, like solo, pony, feral pony, unicorn, twilight sparkle, lying in tall grass and a cfg at 3 (don’t forget the usual quality words). And hit run. That’s it. This is the monkey with typewriter approach to generation. Generate random number of pictures, eventually one will be Hamlet.

For the second one, I left the grass part out. But as you can see, we still get pictures with absolute minimal input and let the computer do most of the work. (Or maybe even more as it already does.)

I personally only use this approach if I am absolutely out of ideas. Sometimes I struck gold, sometimes something in the results gives me an idea and I prompt with more purpose (4-6 cfg). But this is what most AI haters think is to “good AI pictures”. The downside of this they are somewhat right. These picture will most likely be bland and simple. But, what if could spice this up? What if… Lora? And here comes the fun part; Throw Loras for special effects at this gives often very interesting results.

Chapter 4 - Fusion

Lets go from simple to complicated. Or to be more precise, detailed. Detailed backgrounds are difficult in a sense that AI has no concept of depth, space and consistency. That’s why you have two beds in bedroom, or five hundred lamps and so on. The AI doesn’t remember what it put in, it just guesses what should be there. And its biggest enemy? Characters. They disrupt the generation and it starts new between the legs, or the right side looks different from the left cause the character splits the picture in half. That’s why most backgrounds “suck” in generated images. But there is a way around it, and all you need is a free image editing tool (Gimp, Krita, Photopea, etc…) and 10-20 minutes extra of your time.

And now hold onto your horses because the trick is; We generate background and character separately. I know, mind blown, take a minute to come back into reality, I will wait. But jokes aside, its not as hard as it sounds. We need 3 prompts for this little stunt. One for the character, one for the background and one that is the combination of the two. Then we get to generating. For the character, just look for a fitting angle and that it has the pose you want. We ignore the background here. (Also lighting and so on.)

If we have what we want, we generate the background. Prompts like no human, scenery, wide shot are our friends here. Here you set the mood and tone. Night time, day time, stuff like that. AI is good at just generating background, since there is no interruption by unshapely creatures.

Now come the human part, aka you. If we have both pictures we want to “marry”, we open the character in our editing tool of choice and use the lasso tool. Just cut her out like Sunset did Twilight in the Equestria Girls move. It doesn’t need to be pixel perfect or anything. Then open the background and slap that bad boy in. Play a little with size, angle, lighting and stuff if you want to (and know how), then save the image and your part is done.

Remember chapter 2? Well we do that now, just with a really low denoise around 0.3 to 0.4 this time and our combined third prompt. Inpainting will clean up our sloppy mess and make it fit together. And if not on the first try, do it again, like at 0.2 to 0.3 this time. And then, we have a picture with detailed background (that makes sense) and a character or two in it. Or TLDR: Photobashing is still a thing.

Chapter 5 - Let it be Light_

Lighting can have a huge impact on any given scene, regardless if it is film or picture. This is also true for AI generated images. There are various forms of lighting, but I will keep it down to the most used ones. I learned about this stuff when I started to get into AI and it can help to make a picture better. But what do I mean by “lighting”? imagine you want to take a picture of your friends at an outing. If they are facing the sun, they are properly lighted, but will squint their eyes to shield them. If they face away, chanced are there isn’t enough light for a good picture, especially with a smartphone camera. So you stand them in an angle to the sun, so they don’t look directly at the sun, but get enough light to make the picture. And now remember, we can control the sun in this case. And we do it by the right prompts:

natural lighting
Uses sunlight or ambient light from the environment.

dramatic lighting
Creates strong contrast between light and shadow.

cinematic lighting
Mimics film lighting with controlled shadows and highlights.

soft lighting
Diffused and even light with gentle shadows.

hard lighting
Sharp, strong light that creates crisp shadows.

rim lighting
Creates a border of light around edges or persons.

volumetric lighting
Also known as light rays or god rays.

_{Thanks to ChatGPT for the short descriptions of the different techniques}

There are more, of course, but these are the most common ones. If you want an outdoor scene, go for natural. If you have an action scene, dramatic or cinematic, and so on. The right light makes a big difference and the AI knows these terms.
You can go further into detail. Lets say we have a pony walking through a forest on a warm summer day. Our prompt could look like this:

solo, pony, feral pony, full body, unicorn, rarity, walking towards viewer, outdoors, forest, tall grass, natural lighting, volumetric lighting, dappled sunlight, thick leaf canopy

Chapter 6 - Designing a prompt_

We have come so far, learned some basic techniques and even some “pro haxxor moves”. Now its time to talk about the absolute basics. There are people making money with “prompt design” and I never heard anything more silly. Its not a science, just basic logic and a little bit of knowledge how AI works. Here is a basic graphic how it works in our case. _{(Source @Witty-Designer7316)}

The first info we need is the base model we are using. Lets say our model is named “Prefect Pony XL”. The name gives it away but the description on civitai also states it is based on ponyv6. So it should take the same “quality words” as ponyv6. And after a quick check on the sample images, yes it does. So now we can putting our prompt together:

The most important rule is, the more important something is, the further at the beginning of the prompt it should be. That’s why quality words come first. So, depending on the model our first words should look something like this:
score_9, score_8_up, score_7_up, masterpiece, best quality, amazing quality

The next thing should be what we want. Lets say we want Fluttershy with spread wings standing in a lake.
So something like: solo, pony, feral pony, full body, pegasus, Fluttershy, spread wings, partly submerged, wet fur

Now the background: outdoors, forest, lake, tall grass, flowers

And at last, additional stuff like lighting and stuff: natural lighting, dappled sunlight, dusk

This gives us a final prompt:

score_9, score_8_up, score_7_up, masterpiece, best quality, amazing quality, solo, pony, feral pony, full body, pegasus, Fluttershy, spread wings, partly submerged, wet fur, outdoors, forest, lake, tall grass, flowers, natural lighting, dappled sunlight, dusk

But we are not done. Now we come to weights! The make it possible to make parts of the prompt more important than others, to control how much of what we want. The program I use takes “+” and “-” as weights, so for me it would look something like this:

score_9, score_8_up, score_7_up, masterpiece, best quality, amazing quality, solo, pony, feral pony, (full body)-, pegasus, Fluttershy, spread wings, partly submerged, (wet fur)+, outdoors, forest, lake, tall grass, (flowers)-, natural lighting, (dappled sunlight, dusk)++

Most others programs use a syntax like this:

score_9, score_8_up, score_7_up, masterpiece, best quality, amazing quality, solo, pony, feral pony, (full body:0.8), pegasus, Fluttershy, spread wings, partly submerged, (wet fur:1.2), outdoors, forest, lake, tall grass, (flowers:0.8), natural lighting, (dappled sunlight:1.5), (dusk:1.5)

BUT WAIT, THERE IS MORE!

This is of course only 50% of what we need. We still need to tell the AI what we don’t want. So first we check the base model page again and get the default negative prompt quality words: score_4,score_3,score_2,score_1

Then the model site of our merge Perfect Pony XL: ugly, worst quality, bad quality, jpeg artifacts

And normally that would be enough, but I will add a few personal favorite of mine today:
long body, realistic, monochrome, greyscale, artist name, signature, watermark

Giving us the final negative prompt:

score_4,score_3,score_2,score_1, ugly, worst quality, bad quality, jpeg artifacts, long body, realistic, monochrome, greyscale, artist name, signature, watermark

We could add weights here too, but we should try first for now (I use tpony here, another ponyv6 based model):
large

Okay, the general idea is there, but not quite right. And now the hard truth, ponyv6 is a little outdated as is the tpony model I used here. So I exchanged the quality words to fit a popular illustrious model and got this:
large

Better, but still off. I want Fluttershy to look at the viewer so we add looking at viewer and try again. Or better you can try. Happy prompting!

To be continued…

Posted a day ago Report
Edited 25 minutes ago because: Chapter 6 added

Link Quote Reply

Field Selector	Type	Description	Example
`author`	Literal	Matches the author of this post. Anonymous authors will never match this term.	`author:Joey`
`body`	Full Text	Matches the body of this post. This is the default field.	`body:test`
`created_at`	Date/Time Range	Matches the creation time of this post.	`created_at:2015`
`id`	Numeric Range	Matches the numeric surrogate key for this post.	`id:1000000`
`my`	Meta	`my:posts` matches posts you have posted if you are signed in.	`my:posts`
`subject`	Full Text	Matches the title of the topic.	`subject:time wasting thread`
`topic_id`	Literal	Matches the numeric surrogate key for the topic this post belongs to.	`topic_id:7000`
`topic_position`	Numeric Range	Matches the offset from the beginning of the topic of this post. Positions begin at 0.	`topic_position:0`
`updated_at`	Date/Time Range	Matches the creation or last edit time of this post.	`updated_at.gte:2 weeks ago`
`user_id`	Literal	Matches posts with the specified user_id. Anonymous users will never match this term.	`user_id:211190`
`forum`	Literal	Matches the short name for the forum this post belongs to.	`forum:meta`

Posts

Search Results

Creative Corner » Diary of a prompter » Post 1

Creative Corner » Diary of a prompter » Topic Opener