This commit is contained in:
Johannes Stelzer 2023-11-16 15:47:10 +01:00
parent fc88024601
commit 13d8006347
1 changed files with 20 additions and 39 deletions

View File

@ -1,28 +1,25 @@
# Quickstart # Quickstart
Latent blending enables video transitions with incredible smoothness between prompts, computed within seconds. Powered by [stable diffusion 2.1](https://stability.ai/blog/stablediffusion2-1-release7-dec-2022), this method involves specific mixing of intermediate latent representations to create a seamless transition with users having the option to fully customize the transition and run high-resolution upscaling. Latent blending enables video transitions with incredible smoothness between prompts, computed within seconds. Powered by [stable diffusion XL](https://stability.ai/stable-diffusion), this method involves specific mixing of intermediate latent representations to create a seamless transition with users having the option to fully customize the transition directly in high-resolution.
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1I77--5PS6C-sAskl9OggS1zR0HLKdq1M?usp=sharing)
```python ```python
fp_ckpt = hf_hub_download(repo_id="stabilityai/stable-diffusion-2-1-base", filename="v2-1_512-ema-pruned.ckpt") pretrained_model_name_or_path = "stabilityai/stable-diffusion-xl-base-1.0"
pipe = DiffusionPipeline.from_pretrained(pretrained_model_name_or_path, torch_dtype=torch.float16).to('cuda')
sdh = StableDiffusionHolder(fp_ckpt) dh = DiffusersHolder(pipe)
lb = LatentBlending(sdh) lb = LatentBlending(dh)
lb.set_prompt1('photo of my first prompt1') lb.set_prompt1('photo of my first prompt1')
lb.set_prompt2('photo of my second prompt') lb.set_prompt2('photo of my second prompt')
depth_strength = 0.6 # How deep the first branching happens depth_strength = 0.6 # How deep the first branching happens
t_compute_max_allowed = 10 # How much compute time we give to the transition t_compute_max_allowed = 10 # How much compute time we give to the transition
imgs_transition = lb.run_transition(depth_strength=depth_strength, t_compute_max_allowed=t_compute_max_allowed) imgs_transition = lb.run_transition(
depth_strength=depth_strength,
num_inference_steps=num_inference_steps,
t_compute_max_allowed=t_compute_max_allowed)
``` ```
## Gradio UI ## Gradio UI
To run the UI on your local machine, run `gradio_ui.py` Coming soon again :)
If you want to specify the output directory, you can create a `.env` file in the latentblending git directory.
In here, specify:
```
DIR_OUT="SET_PATH_HERE"
```
## Example 1: Simple transition ## Example 1: Simple transition
![](example1.jpg) ![](example1.jpg)
@ -32,12 +29,6 @@ To run a simple transition between two prompts, run `example1_standard.py`
To run multiple transition between K prompts, resulting in a stitched video, run `example2_multitrans.py`. To run multiple transition between K prompts, resulting in a stitched video, run `example2_multitrans.py`.
[View a longer example video here.](https://vimeo.com/789052336/80dcb545b2) [View a longer example video here.](https://vimeo.com/789052336/80dcb545b2)
## Example 3: High-resolution with upscaling
![](example3.jpg)
You can run a high-res transition using the x4 upscaling model in a two-stage procedure, see `example3_upscaling.py`. [View as video here.](https://vimeo.com/787639426/f88dae2ea6)
## Example 4: Multi transition with high-resolution with upscaling
You can run a multi transition movie and upscale it, see `example4_multitrans_upscaling.py`.
# Customization # Customization
@ -46,8 +37,8 @@ You can find the [most relevant parameters here.](parameters.md)
### Change the height/width ### Change the height/width
```python ```python
lb.set_height(512) size_output = (1024, 768)
lb.set_width(1024) lb.set_dimensions(size_output)
``` ```
### Change guidance scale ### Change guidance scale
```python ```python
@ -87,22 +78,6 @@ lb.set_parental_crossfeed(crossfeed_power, crossfeed_range, crossfeed_decay)
pip install -r requirements.txt pip install -r requirements.txt
``` ```
#### (Optional but recommended) Install [Xformers](https://github.com/facebookresearch/xformers)
With xformers, stable diffusion will run faster with smaller memory inprint. Necessary for higher resolutions / upscaling model.
```commandline
conda install xformers -c xformers/label/dev
```
Alternatively, you can build it from source:
```commandline
# (Optional) Makes the build much faster
pip install ninja
# Set TORCH_CUDA_ARCH_LIST if running and building on different GPU types
pip install -v -U git+https://github.com/facebookresearch/xformers.git@main#egg=xformers
# (this can take dozens of minutes)
```
# How does latent blending work? # How does latent blending work?
## Method ## Method
![](animation.gif) ![](animation.gif)
@ -120,6 +95,8 @@ imgs_transition = lb.run_transition(num_inference_steps=10, depth_strength=0.2,
With latent blending, we can create transitions that appear to defy the laws of nature, yet appear completely natural and believable. The key is to surpress processing in our [dorsal visual stream](https://en.wikipedia.org/wiki/Two-streams_hypothesis#Dorsal_stream), which is achieved by avoiding motion in the transition. Without motion, our visual system has difficulties detecting the transition, leaving viewers with the illusion of a single, continuous image, see [change blindness](https://en.wikipedia.org/wiki/Change_blindness). However, when motion is introduced, the visual system can detect the transition and the viewer becomes aware of the transition, leading to a jarring effect. Therefore, best results will be achieved when optimizing the transition parameters, particularly the crossfeeding parameters and the depth of the first injection. With latent blending, we can create transitions that appear to defy the laws of nature, yet appear completely natural and believable. The key is to surpress processing in our [dorsal visual stream](https://en.wikipedia.org/wiki/Two-streams_hypothesis#Dorsal_stream), which is achieved by avoiding motion in the transition. Without motion, our visual system has difficulties detecting the transition, leaving viewers with the illusion of a single, continuous image, see [change blindness](https://en.wikipedia.org/wiki/Change_blindness). However, when motion is introduced, the visual system can detect the transition and the viewer becomes aware of the transition, leading to a jarring effect. Therefore, best results will be achieved when optimizing the transition parameters, particularly the crossfeeding parameters and the depth of the first injection.
# Changelog # Changelog
* SD XL support
* Diffusers backend, greatly simplifing installation and use (bring your own pipe)
* New blending engine with cross-feeding capabilities, enabling structure preserving transitions * New blending engine with cross-feeding capabilities, enabling structure preserving transitions
* LPIPS image similarity for finding the next best injection branch, resulting in smoother transitions * LPIPS image similarity for finding the next best injection branch, resulting in smoother transitions
* Time-based computation: instead of specifying how many frames your transition has, you can tell your compute budget and get a transition within that budget. * Time-based computation: instead of specifying how many frames your transition has, you can tell your compute budget and get a transition within that budget.
@ -128,9 +105,13 @@ With latent blending, we can create transitions that appear to defy the laws of
* Inpaint support dropped (as it only makes sense for a single transition) * Inpaint support dropped (as it only makes sense for a single transition)
# Coming soon... # Coming soon...
- [ ] Gradio interface
- [ ] Huggingface Space - [ ] Huggingface Space
- [ ] More manipulations to the latent (translation, zoom, masking) - [ ] Controlnet
- [ ] Transitions with Depth model - [ ] IP-Adapter
- [ ] Latent Consistency
Stay tuned on twitter: ```@j_stelzer``` Stay tuned on twitter: ```@j_stelzer```