We investigate the mechanisms of negative prompts in Text2Image diffusion models and discover that they exhibit different behaviors compared to positive prompts. In the following figure, the first and third rows display the cross-attention heat map of words in the positive prompt, while the second and fourth rows show those of the negative prompt. We observe that the negative prompt exhibits a delayed response which means they usually attend to the target position later than the positive prompt.
The concept of negative prompts, emerging from conditional generation models like Stable Diffusion, allows users to specify what to exclude from the generated images.Despite the widespread use of negative prompts, their intrinsic mechanisms remain largely unexplored. This paper presents the first comprehensive study to uncover how and when negative prompts take effect. Our extensive empirical analysis identifies two primary behaviors of negative prompts. Delayed Effect: The impact of negative prompts is observed after positive prompts render corresponding content. Deletion Through Neutralization: Negative prompts delete concepts from the generated image through a mutual cancellation effect in latent space with positive prompts. These insights reveal significant potential real-world applications; for example, we demonstrate that negative prompts can facilitate object inpainting with minimal alterations to the background via a simple adaptive algorithm. We believe our findings will offer valuable insights for the community in capitalizing on the potential of negative prompts.
When do negative prompts take effect
We define a metric called the Negative Prompt Strength (Ratio) to quantify the investigate the timing of negative prompts, which is based on the mean value of the cross-attention map. The higher the value is, the more effective the negative prompt is. The following figure shows the ratios for both adjectives and noun.
For nouns, we find a peak at the 5th step for the noun-based negative prompt. Before that the target object haven't been fully formed, while after that the object has been removed. For adjectives, we observe a plateau around the 10th step for adjective-based negative prompts. After the critical step, as the object becomes clear, the negative prompt accurately focuses on the intended area and maintains its influence.
How do negative prompts take effect?
We hypothesize that the negative prompts initially generate a target object at a specific location within the image, which neutralizes the positive noise through a subtractive process, effectively erasing the object.
During our experiments, we discovered an intriguing phenomenon called the Reverse Effect, which may be linked to the underlying mechanism. As shown in the figure, each column displays an image generated by applying negative prompts at specific steps indicated at the top. In these examples, the diffusion process without a negative prompt does not produce the object mentioned in the negative prompt. However, introducing a negative prompt in the early stages results in the generation of the specified object, highlighted with a red bounding box.
To explain the Reverse Effect, we need to consider the energy function in image generation dynamics. This function is designed to assign lower energy levels to images that are more 'likely' or 'natural' according to the model's training data. To synthesize a specific object, like a tower, from scratch, the diffusion process must traverse an intermediary barrier that represents a blurry outline of the object.
The following figure illustrate the inducing effect, that is, the negative prompt triggers the positive noise (red arrow) in a direction that represents the context of the negative prompt. The strong negative prompt (brown arrow) induces the real world distribution guidance (purple arrow) to direct at the target object region and increases its strength.
The figure illustrates the Momentum Effect, where noise preserves its direction during the diffusion process. For detailed experiments verifying the Momentum Effect, please refer to the paper.
Finally, we can explain the Reverse Effect. Without the negative prompt, the implicit guidance is insufficient to generate the intended object. The application of a negative prompt intensifies the distribution guidance towards the object, which prevents the object from materializing. If we remove the negative prompt after several steps, the real-world distribution guidance will maintain a large component towards the object's direction. Such a momentum effect finally facilitates the object's emergence.
Application: Enhanced Controllable Inpainting.
Our findings on the critical step of using negative prompts can significantly enhance object inpainting with minimal background alteration. The key insight is to apply negative prompts after the critical step. Prior to this step, the object is not fully formed, so introducing negative prompts at this stage would only introduce noise to the original background without having a removal effect. The following figure illustrates the inpainting results achieved by our method.
@article{ban2024understanding,
title={Understanding the Impact of Negative Prompts: When and How Do They Take Effect?},
author={Ban, Yuanhao and Wang, Ruochen and Zhou, Tianyi and Cheng, Minhao and Gong, Boqing and Hsieh, Cho-Jui},
journal={arXiv preprint arXiv:2406.02965},
year={2024}
}