latentblending/README.md

Latent blending enables lightning-fast video transitions with incredible smoothness between prompts. Powered by [stable diffusion 2.1](https://stability.ai/blog/stablediffusion2-1-release7-dec-2022), this method involves specific mixing of intermediate latent representations to create a seamless transition – with users having the option to fully customize the transition and run high-resolution upscaling.

# Quickstart
```python
fp_ckpt = 'path_to_SD2.ckpt'
fp_config = 'path_to_config.yaml'

sdh = StableDiffusionHolder(fp_ckpt, fp_config, 'cuda')
lb = LatentBlending(sdh)

lb.load_branching_profile(quality='medium', depth_strength=0.4)
lb.set_prompt1('photo of my first prompt1')
lb.set_prompt2('photo of my second prompt')

imgs_transition = lb.run_transition()
```
## Gradio UI
To run the UI on your local machine, run `gradio_ui.py`

## Example 1: Simple transition
![](example1.jpg)
To run a simple transition between two prompts, run `example1_standard.py`

## Example 2: Inpainting transition
![](example2.jpg)
To run a transition between two prompts where you want some part of the image to remain static, run `example2_inpaint.py`

## Example 3: Multi transition
To run multiple transition between K prompts, resulting in a stitched video, run `example3_multitrans.py`

## Example 4: High-resolution with upscaling
![](example4.jpg)
You can run a high-res transition using the x4 upscaling model in a two-stage procedure, see `example4_upscaling.py`

# Customization

## Most relevant parameters

### Change the height/width
```python 
lb.set_height(512)
lb.set_width(1024)
```
### Change guidance scale
```python 
lb.set_guidance_scale(5.0)
```
### depth_strength / list_injection_strength
The strength dictates how early the blending process starts. The closer its value is to zero, the more inventive the results will be; whereas, a value closer to one indicates a more simple alpha blending.


## Set up the branching structure

There are three ways to change the branching structure.
### Presets
```python 
quality = 'medium' #choose from lowest, low, medium, high, ultra
depth_strength = 0.5 # see above (Most relevant parameters)

lb.load_branching_profile(quality, depth_strength)
```

### Autosetup tree setup
```python 
depth_strength = 0.5 # see above (Most relevant parameters)
num_inference_steps = 30 # the number of diffusion steps
nmb_branches_final = 20 # how many diffusion images will be generated for the transition

lb.autosetup_branching(num_inference_steps, list_nmb_branches, list_injection_strength)
```

### Fully manual
```python 
num_inference_steps = 30 # the number of diffusion steps
list_nmb_branches = [2, 4, 8, 20]
list_injection_strength = [0.0, 0.3, 0.5, 0.9]

lb.setup_branching(num_inference_steps, list_nmb_branches, list_injection_strength=list_injection_strength)
```

# Installation
#### Packages
```commandline
pip install -r requirements.txt
```
#### Download Models from Huggingface
[Download the Stable Diffusion v2-1_768 Model](https://huggingface.co/stabilityai/stable-diffusion-2-1)

[Download the Stable Diffusion Inpainting Model](https://huggingface.co/stabilityai/stable-diffusion-2-inpainting)

[Download the Stable Diffusion x4 Upscaler](https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler)

#### (Optional but recommended) Install [Xformers](https://github.com/facebookresearch/xformers)
With xformers, stable diffusion will run faster with smaller memory inprint. Necessary for higher resolutions / upscaling model.

```commandline
conda install xformers -c xformers/label/dev
```

Alternatively, you can build it from source:
```commandline
# (Optional) Makes the build much faster
pip install ninja
# Set TORCH_CUDA_ARCH_LIST if running and building on different GPU types
pip install -v -U git+https://github.com/facebookresearch/xformers.git@main#egg=xformers
# (this can take dozens of minutes)
```

# How does latent blending work?
## Technology
![](animation.gif)

In the figure above, a diffusion tree is illustrated. The diffusion steps are represented on the y-axis, with temporal blending on the x-axis. The diffusion trajectory for the first prompt is the most left column, with the trajectory for the second prompt to the right. At the third iteration, three branches are created, followed by seven at iteration six and the final ten at iteration nine.

This example can be manually set up using the following code
```python 
num_inference_steps = 10 
list_nmb_branches = [2, 3, 7, 10]
list_injection_idx = [0, 3, 6, 9]

lb.setup_branching(num_inference_steps, list_nmb_branches, list_injection_idx=list_injection_idx)
```

Instead of specifying the absolute injection indices using list_injection_idx, we can also pass the list_injection_strength, which are independent of the total number of diffusion iterations (num_inference_steps).
```python 
list_injection_strength = [0, 0.3, 0.6, 0.9]
lb.setup_branching(num_inference_steps, list_nmb_branches, list_injection_strength=list_injection_strength)
```
## Perception
With latent blending, we can create transitions that appear to defy the laws of nature, yet appear completely natural and believable. The key is to surpress processing in our [dorsal visual stream](https://en.wikipedia.org/wiki/Two-streams_hypothesis#Dorsal_stream), which is achieved by avoiding motion in the transition. Without motion, our visual system has difficulties detecting the transition, leaving viewers with the illusion of a single, continuous image. However, when motion is introduced, the visual system can detect the transition and the viewer becomes aware of the transition, leading to a jarring effect. Therefore, best results will be achieved when optimizing the transition parameters, particularly the depth of the first injection.
-												upscaling model

											
										
										
											2023-01-09 09:59:00 +00:00
+								Latent blending enables lightning-fast video transitions with incredible smoothness between prompts. Powered by [stable diffusion 2.1](https://stability.ai/blog/stablediffusion2-1-release7-dec-2022), this method involves specific mixing of intermediate latent representations to create a seamless transition – with users having the option to fully customize the transition and run high-resolution upscaling.
-												Update README.md
											
										
										
											2023-01-09 08:13:33 +00:00
-												Update README.md
											
										
										
											2023-01-09 08:06:01 +00:00
+								# Quickstart
 								```python
 								fp_ckpt = 'path_to_SD2.ckpt'
 								fp_config = 'path_to_config.yaml'
-												Update README.md
											
										
										
											2023-01-09 08:08:51 +00:00
-												Update README.md
											
										
										
											2023-01-09 08:09:11 +00:00
+								sdh = StableDiffusionHolder(fp_ckpt, fp_config, 'cuda')
-												Update README.md
											
										
										
											2023-01-09 08:06:01 +00:00
+								lb = LatentBlending(sdh)
-												Update README.md
											
										
										
											2023-01-09 08:08:51 +00:00
-												Update README.md
											
										
										
											2023-01-09 08:06:01 +00:00
+								lb.load_branching_profile(quality='medium', depth_strength=0.4)
 								lb.set_prompt1('photo of my first prompt1')
 								lb.set_prompt2('photo of my second prompt')
-												Update README.md
											
										
										
											2023-01-09 08:08:51 +00:00
-												Update README.md
											
										
										
											2023-01-09 08:06:01 +00:00
+								imgs_transition = lb.run_transition()
 								```
-												Update README.md
											
										
										
											2023-01-09 08:08:51 +00:00
+								## Gradio UI
 								To run the UI on your local machine, run `gradio_ui.py`
-												Update README.md
											
										
										
											2023-01-09 08:06:01 +00:00
 								## Example 1: Simple transition
-												Update README.md
											
										
										
											2023-01-09 07:58:03 +00:00
+								![](example1.jpg)
-												Update README.md
											
										
										
											2023-01-09 08:00:24 +00:00
+								To run a simple transition between two prompts, run `example1_standard.py`
-												Update README.md
											
										
										
											2022-11-21 23:20:07 +00:00
-												Update README.md
											
										
										
											2023-01-09 08:06:01 +00:00
+								## Example 2: Inpainting transition
-												Update README.md
											
										
										
											2023-01-09 08:00:24 +00:00
+								![](example2.jpg)
 								To run a transition between two prompts where you want some part of the image to remain static, run `example2_inpaint.py`
-												Update README.md
											
										
										
											2022-11-21 23:20:07 +00:00
-												Update README.md
											
										
										
											2023-01-09 08:14:57 +00:00
+								## Example 3: Multi transition
-												Update README.md
											
										
										
											2023-01-09 08:00:24 +00:00
+								To run multiple transition between K prompts, resulting in a stitched video, run `example3_multitrans.py`
-												Update README.md
											
										
										
											2022-11-21 23:20:07 +00:00
-												upscaling model

											
										
										
											2023-01-09 09:59:00 +00:00
+								## Example 4: High-resolution with upscaling
 								![](example4.jpg)
 								You can run a high-res transition using the x4 upscaling model in a two-stage procedure, see `example4_upscaling.py`
-												Update README.md
											
										
										
											2023-01-09 08:40:01 +00:00
+								# Customization
 								## Most relevant parameters
-												Update README.md
											
										
										
											2023-01-09 08:50:15 +00:00
+								### Change the height/width
 								```python
 								lb.set_height(512)
 								lb.set_width(1024)
 								```
 								### Change guidance scale
 								```python
 								lb.set_guidance_scale(5.0)
 								```
 								### depth_strength / list_injection_strength
 								The strength dictates how early the blending process starts. The closer its value is to zero, the more inventive the results will be; whereas, a value closer to one indicates a more simple alpha blending.
-												Update README.md
											
										
										
											2023-01-09 08:40:01 +00:00
 								## Set up the branching structure
 								There are three ways to change the branching structure.
-												Update README.md
											
										
										
											2023-01-09 08:50:15 +00:00
+								### Presets
-												Update README.md
											
										
										
											2023-01-09 08:40:01 +00:00
+								```python
 								quality = 'medium' #choose from lowest, low, medium, high, ultra
 								depth_strength = 0.5 # see above (Most relevant parameters)
-												Update README.md
											
										
										
											2023-01-09 08:50:15 +00:00
-												Update README.md
											
										
										
											2023-01-09 08:40:01 +00:00
+								lb.load_branching_profile(quality, depth_strength)
 								```
-												Update README.md
											
										
										
											2022-12-02 11:42:09 +00:00
-												Update README.md
											
										
										
											2023-01-09 08:50:15 +00:00
+								### Autosetup tree setup
 								```python
-												Update README.md
											
										
										
											2023-01-09 10:53:14 +00:00
+								depth_strength = 0.5 # see above (Most relevant parameters)
-												Update README.md
											
										
										
											2023-01-09 08:50:15 +00:00
+								num_inference_steps = 30 # the number of diffusion steps
-												Update README.md
											
										
										
											2023-01-09 10:53:14 +00:00
+								nmb_branches_final = 20 # how many diffusion images will be generated for the transition
-												Update README.md
											
										
										
											2023-01-09 08:50:15 +00:00
 								lb.autosetup_branching(num_inference_steps, list_nmb_branches, list_injection_strength)
 								```
 								### Fully manual
 								```python
 								num_inference_steps = 30 # the number of diffusion steps
-												Update README.md
											
										
										
											2023-01-09 10:53:14 +00:00
+								list_nmb_branches = [2, 4, 8, 20]
 								list_injection_strength = [0.0, 0.3, 0.5, 0.9]
-												Update README.md
											
										
										
											2023-01-09 08:50:15 +00:00
-												Update README.md
											
										
										
											2023-01-09 10:53:14 +00:00
+								lb.setup_branching(num_inference_steps, list_nmb_branches, list_injection_strength=list_injection_strength)
-												Update README.md
											
										
										
											2023-01-09 08:50:15 +00:00
+								```
-												Update README.md
											
										
										
											2022-12-02 11:42:09 +00:00
+								# Installation
 								#### Packages
 								```commandline
-												Update README.md
											
										
										
											2022-12-02 12:08:17 +00:00
+								pip install -r requirements.txt
-												Update README.md
											
										
										
											2022-12-02 11:42:09 +00:00
+								```
-												Update README.md
											
										
										
											2022-12-02 12:08:17 +00:00
+								#### Download Models from Huggingface
-												sd v2.1
											
										
										
											2022-12-09 11:52:50 +00:00
+								[Download the Stable Diffusion v2-1_768 Model](https://huggingface.co/stabilityai/stable-diffusion-2-1)
-												Update README.md
											
										
										
											2022-12-02 11:42:09 +00:00
-												Update README.md
											
										
										
											2023-01-09 08:20:29 +00:00
+								[Download the Stable Diffusion Inpainting Model](https://huggingface.co/stabilityai/stable-diffusion-2-inpainting)
-												Update README.md
											
										
										
											2022-12-02 11:42:09 +00:00
-												Update README.md
											
										
										
											2023-01-09 08:20:29 +00:00
+								[Download the Stable Diffusion x4 Upscaler](https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler)
-												Update README.md
											
										
										
											2023-01-09 08:17:25 +00:00
 								#### (Optional but recommended) Install [Xformers](https://github.com/facebookresearch/xformers)
 								With xformers, stable diffusion will run faster with smaller memory inprint. Necessary for higher resolutions / upscaling model.
-												Update README.md
											
										
										
											2022-12-02 11:42:09 +00:00
 								```commandline
-												Update README.md
											
										
										
											2022-12-02 12:08:17 +00:00
+								conda install xformers -c xformers/label/dev
-												Update README.md
											
										
										
											2022-12-02 11:42:09 +00:00
+								```
-												Update README.md
											
										
										
											2022-12-02 12:08:17 +00:00
+								Alternatively, you can build it from source:
-												Update README.md
											
										
										
											2022-12-02 11:42:09 +00:00
+								```commandline
-												Update README.md
											
										
										
											2022-12-02 12:08:17 +00:00
+								# (Optional) Makes the build much faster
 								pip install ninja
 								# Set TORCH_CUDA_ARCH_LIST if running and building on different GPU types
 								pip install -v -U git+https://github.com/facebookresearch/xformers.git@main#egg=xformers
 								# (this can take dozens of minutes)
-												Update README.md
											
										
										
											2022-12-02 11:42:09 +00:00
+								```
-												Update README.md
											
										
										
											2023-01-09 10:53:14 +00:00
+								# How does latent blending work?
-												Update README.md
											
										
										
											2023-01-09 11:09:25 +00:00
+								## Technology
-												Update README.md
											
										
										
											2022-12-02 11:42:09 +00:00
+								![](animation.gif)
-												Update README.md
											
										
										
											2022-11-21 23:20:07 +00:00
-												Update README.md
											
										
										
											2023-01-09 10:53:14 +00:00
+								In the figure above, a diffusion tree is illustrated. The diffusion steps are represented on the y-axis, with temporal blending on the x-axis. The diffusion trajectory for the first prompt is the most left column, with the trajectory for the second prompt to the right. At the third iteration, three branches are created, followed by seven at iteration six and the final ten at iteration nine.
 								This example can be manually set up using the following code
 								```python
 								num_inference_steps = 10
 								list_nmb_branches = [2, 3, 7, 10]
 								list_injection_idx = [0, 3, 6, 9]
 								lb.setup_branching(num_inference_steps, list_nmb_branches, list_injection_idx=list_injection_idx)
 								```
 								Instead of specifying the absolute injection indices using list_injection_idx, we can also pass the list_injection_strength, which are independent of the total number of diffusion iterations (num_inference_steps).
 								```python
 								list_injection_strength = [0, 0.3, 0.6, 0.9]
 								lb.setup_branching(num_inference_steps, list_nmb_branches, list_injection_strength=list_injection_strength)
 								```
-												Update README.md
											
										
										
											2023-01-09 11:09:25 +00:00
+								## Perception
 								With latent blending, we can create transitions that appear to defy the laws of nature, yet appear completely natural and believable. The key is to surpress processing in our [dorsal visual stream](https://en.wikipedia.org/wiki/Two-streams_hypothesis#Dorsal_stream), which is achieved by avoiding motion in the transition. Without motion, our visual system has difficulties detecting the transition, leaving viewers with the illusion of a single, continuous image. However, when motion is introduced, the visual system can detect the transition and the viewer becomes aware of the transition, leading to a jarring effect. Therefore, best results will be achieved when optimizing the transition parameters, particularly the depth of the first injection.
-												Update README.md
											
										
										
											2023-01-09 10:53:14 +00:00