Text-to-image prompting

Thoryn

Hello all, I’m hoping people can share their tips and tricks with regards to text-to-image prompting, as I am hitting a few hurdles that seem too silly to be a problem, yet they are.

My most basic issue at the moment, is that I’ve been trying to make some Christmas artwork, where the composition in my head is that there’d be a full view of the tree, with packages and a pony at the bottom of it. Sounds easy and very doable.

The biggest issue I’m having, is that no matter what I prompt, it always crops the tree and only shows the very bottom of it.. in the rare instance it gives me a seed with a full tree, it’s pot-plant sized.

I have tried to put christmas tree at the start of the prompt, majestic christmas tree, full view of christmas tree, big christmas tree etc, as well as putting things like zoomed in, cropped etc in negative prompt. I’ve also tried multiple aspect ratios. (1:1, 16:9 and 9:16)

Is there something obvious with my prompting that I’m doing incorrectly? Do I really need to get a LoRA for such a common thing?

Model: ponyDiffusionV6XL_v6StartWithThisOne

Posted a day ago Report

Link Quote Reply

tyto4tme4l

Something of an artist

At this point, I would just put the tree in manually and img2img/inpaint.

Posted a day ago Report

Link Quote Reply

Thoryn

Remember trying those two features a year or two ago, and not making it work. Guess I’ll just have to try again.

Another question though, when prompting do you guys use natural language describing what you envision, or comma,separated,keywords,like,this?
A mixture? Varies by model?
And to “save on tokens”, do you use more concise but lesser used words, or favor using more tokens for more words just in case the model wouldn’t understand the lesser used ones?

Posted about 23 hours ago Report

Link Quote Reply

Lord Waite

It does seem rather temperamental. I generally find it helps if you mention things the view needs to be bigger to accommodate. I had some success with this prompt on pony v6:
score_8, score_8_up, score_7_up, score_6_up, score_5_up, score_4_up,
source_pony, rating_safe,
cute Pegasus Derpy sleeping under a tree, Christmas, presents, ornaments, star on top of tree, window, ceiling,
god rays,

“star on top of tree” and “ceiling” helped clue it in that it should be a further away shot.

(Generally, the format I use for pony v6 models is score tags, source, rating, a general description, then tags.)

Posted about 16 hours ago Report

Link Quote Reply

Thoryn

@Lord Waite
Still not seeing any success, no matter how much I emphasize the look or size of the room.
Is it possible to attach images here? Can only see URLs to enter, but I have nowhere to host them.

Posted about 9 hours ago Report

Link Quote Reply

Lord Waite

@Thoryn
I don’t know of any way to actually attach images. I generally upload them to https://postimages.org/ and then use the urls from there.

Posted about 9 hours ago Report

Link Quote Reply

Thoryn

@Lord Waite
Thanks for the tip.
I basically only get things like this, where the prompt acts like a toddler and doesn’t listen at all.

parameters

score_9,score_8_up,score_7_up,score_6_up,score_5_up,score_4_up,
Panoramic view of a spacious room with high ceiling,large tainted glass windows,Christmas decorations on walls,Flurry Heart_(Mlp),lying under Christmas tree,presents,<lora:Flurry Heart-Mlp-PonyXL:0.7>,pony,filly,cute,

Negative prompt: anthro,closeup,wip,sketch,blurry,disfigured,bad_hands,badly_drawn,bad_anatomy,watercolor,e621_p_low,thicc,thick,wide_hips,chubby,poofy,hyper,watermark,missing_tail,pillow,bed,couch,sofa,mattress,smiling,standing,sitting,scared,afraid,zoomed_in,cropped,signature,

Steps: 23, Sampler: DPM++ 2S a, Schedule type: Karras, CFG scale: 7, Seed: 582442736, Size: 512x512, Model hash: 67ab2fd8ec, Model: ponyDiffusionV6XL_v6StartWithThisOne, VAE hash: 95f26a5ab0, VAE: sdxl_vae.safetensors, Lora hashes: “Flurry Heart-Mlp-PonyXL: e75f8a2d04d3”, Version: v1.10.1

Posted about 8 hours ago Report

Link Quote Reply

Lord Waite

No problem. And I can definitely see a couple potential issues.

First, 512x512 is not going to get good results with pony v6. It’s an XL based model, so generally speaking, we’re talking 1024x1024. Other good resolutions are 1152 x 896, 896 x 1152, 1216 x 832, 832 x 1216, 1344 x 768, 768 x 1344, 1536 x 640, & 640 x 1536. That’s basically what XL was trained on.

That negative prompt has way too much in it. “e621_p_low” isn’t actually a tag v6 knows. That was for older versions of pony, and was a precursor to the score tags. Usually, I’d say to start out with a very minimal negative prompt, and add things as needed, and also remove them if they aren’t working. Negative prompts that aren’t needed can actually make an image worse.

You do usually want to have a source tag and a rating tag after the scores. source_pony, source_furry, source_anime, and source_cartoon are the big source tags, then rating_safe, rating_questionable, and rating_explicit.

Not sure the lora is needed, either. You could probably remove it and just say “Flurry Heart”. Also, the point of “star on top of tree” was to try and get it to put the star at the top of the tree in the picture, and by extension, the rest of the tree.

I’d personally try something more like:
score_9, score_8_up, score_7_up, score_6_up, score_5_up, score_4_up,
source_pony, rating_safe,
Flurry Heart lying under Christmas tree, ceiling, window, Christmas decorations on walls, presents, pony, filly, cute, star on top of tree,

with no negative prompt and no lora, and go from there. Definitely keep in mind that the longer a prompt is, the less anything in it actually is weighted. The tokenizer can handle 75 (well, 77, but the other two are used internally) tokens, then after that, the prompt gets broken into 75 token chunks, and that’s the point they start meaning less individually.

(Looks like you’re using something A111 based, so there might be a token count at the top of the prompt entry box?)

Posted about 7 hours ago Report
Edited about 7 hours ago because: Missed a number.

Link Quote Reply

Thoryn

@Lord Waite
Thanks for the tips. Have used 512x512 because I have basically the bare minimum of capable hardware (with that res, it With 75 prompt chunks, it usually takes me almost 5 for ~23-25 steps, and 10 minutes for 32-35.. and you’re saying I need to quadruple the res, oof..
(I promised myself not to throw more money at expensive GPUs as I am broke and have stopped gaming, but a 5090 starts to look more appealing the more I mess around with AI).

You’re correct that I’m using Automatic1111 by the way.
(Have pondered alternatives, as the cmd window spews errors left and right even on a fresh and up-to-date version, but it’s the devil I know right now..)

I will avoid LoRAs, clean out the negative prompts (and add things only as needed), up the res to 1024x1024 and do some testing.
Thanks again for all the input, really appreciate it.

Posted about 6 hours ago Report
Edited about 6 hours ago

Link Quote Reply

Lord Waite

@Thoryn
No problem.

512x512 was fine for 1.5 based models, just XL and newer has moved beyond that. Pony v5 was 2.1 based, so 768x768, and v4 and earlier was 1.5-, IIRC?

What I’ve got is a 3060 12GB, btw, which is probably about as low as you can go and still have 12GB. Though an 8GB card would work as well…

The lora does also add to the amount of memory used. While I don’t think the Flurry Heart one is needed, I will note that my first post was using the Wholesome MLP lora, which is a rather nice art style lora.

You could try changing the sampler to Uni_pc and normal, and lower the steps to, say, 12-14, and see if that speeds things up a bit for you.

Posted about 6 hours ago Report

Link Quote Reply

tyto4tme4l

Something of an artist

@Thoryn
If you have a weak GPU, then how about trying out Forge WebUI? It looks almost exactly like A1111, but it should be much faster, especially on a weak GPU. I don’t know about newer versions, but I’m using a release from 02.2024 and it’s working great. I have GeForce 3060Ti with 8GB VRAM and I can generate four 1024x1024 pictures in slightly above one minute. Pretty much no OOM, errors or crashes.
https://github.com/lllyasviel/stable-diffusion-webui-forge/releases

There are also other UIs like ReForge or ComfyUI, I’d recommend testing different options to see what suits you best. Stability Matrix is great for installing and maintaining multiple UIs.
https://github.com/LykosAI/StabilityMatrix

Posted about 6 hours ago Report

Link Quote Reply

Thoryn

@Lord Waite
For the record, I’m using RTX 3070 TI with 8GB of video memory.

Good point on the fact that LoRAs also use some memory.. best to avoid if possible.

Don’t have the sampler named Uni_pc on my setup, so I kept it on DPM++ 2S a paired with Karras - as when I did a test with all the samplers (same prompt, seed etc), that was one of the fastest (Euler A was slightly faster and what I’ve been using when experimenting with Automatic a couple times before, but lately I have seen more instances of DPM++ 2S a in the wild, so figured I’d give it a try).

I copied your prompt exactly, and it actually gave a decent composition this time!
At 1024x1024 and 15 steps, it took 13 minutes… maybe this is passable for getting the composition, then I can use img2img to flesh things out? I should start experimenting with the pipeline soon, like only loading one or two LoRAs at a time (especially for kinks and concepts I know the model can’t do at well or at all), figure out grid-editing/prompting, inpainting etc.

score_9,score_8_up,score_7_up,score_6_up,score_5_up,score_4_up,
source_pony,rating_safe,
Flurry Heart lying under Christmas tree,ceiling,window,Christmas decorations on walls,presents,pony,filly,cute,star on top of tree,

Steps: 15, Sampler: DPM++ 2S a, Schedule type: Karras, CFG scale: 6, Seed: 40318147, Size: 1024x1024, Model hash: 67ab2fd8ec, Model: ponyDiffusionV6XL_v6StartWithThisOne, VAE hash: 95f26a5ab0, VAE: sdxl_vae.safetensors, Version: v1.10.1

Posted about 5 hours ago Report
Edited about 5 hours ago because: sampled to sampler

Link Quote Reply

Lord Waite

@tyto4tme4l
Yeah, IIRC, one of the points behind Forge was that it reworked things on A1111 to use code from ComfyUI for some of the backend, because ComfyUI is faster and better with memory.

I’ve got it, but I got used to ComfyUI, and while it might have a steep learning curve, it’s a lot more flexible for things once you know it. (Though, alright, inpainting is still going to be easier on a different UI.)

Posted about 5 hours ago Report

Link Quote Reply