Merge branch 'main' of github.com:lunarring/latentblending
This commit is contained in:
commit
1196fd7f69
115
README.md
115
README.md
|
@ -10,36 +10,39 @@ fp_ckpt = 'path_to_SD2.ckpt'
|
||||||
sdh = StableDiffusionHolder(fp_ckpt)
|
sdh = StableDiffusionHolder(fp_ckpt)
|
||||||
lb = LatentBlending(sdh)
|
lb = LatentBlending(sdh)
|
||||||
|
|
||||||
lb.load_branching_profile(quality='medium', depth_strength=0.4)
|
|
||||||
lb.set_prompt1('photo of my first prompt1')
|
lb.set_prompt1('photo of my first prompt1')
|
||||||
lb.set_prompt2('photo of my second prompt')
|
lb.set_prompt2('photo of my second prompt')
|
||||||
|
depth_strength = 0.6 # How deep the first branching happens
|
||||||
imgs_transition = lb.run_transition()
|
t_compute_max_allowed = 10 # How much compute time we give to the transition
|
||||||
|
imgs_transition = lb.run_transition(depth_strength=depth_strength, t_compute_max_allowed=t_compute_max_allowed)
|
||||||
```
|
```
|
||||||
## Gradio UI
|
## Gradio UI
|
||||||
To run the UI on your local machine, run `gradio_ui.py`
|
To run the UI on your local machine, run `gradio_ui.py`
|
||||||
|
If you want to specify the output directory, you can create a `.env` file in the latentblending git directory.
|
||||||
You can find the [most relevant parameters here.](parameters.md)
|
In here, specify:
|
||||||
|
```
|
||||||
|
DIR_OUT="SET_PATH_HERE"
|
||||||
|
```
|
||||||
|
|
||||||
## Example 1: Simple transition
|
## Example 1: Simple transition
|
||||||
![](example1.jpg)
|
![](example1.jpg)
|
||||||
To run a simple transition between two prompts, run `example1_standard.py`
|
To run a simple transition between two prompts, run `example1_standard.py`
|
||||||
|
|
||||||
## Example 2: Inpainting transition
|
## Example 2: Multi transition
|
||||||
![](example2.jpg)
|
To run multiple transition between K prompts, resulting in a stitched video, run `example2_multitrans.py`.
|
||||||
To run a transition between two prompts where you want some part of the image to remain static, run `example2_inpaint.py`
|
|
||||||
|
|
||||||
## Example 3: Multi transition
|
|
||||||
To run multiple transition between K prompts, resulting in a stitched video, run `example3_multitrans.py`.
|
|
||||||
[View a longer example video here.](https://vimeo.com/789052336/80dcb545b2)
|
[View a longer example video here.](https://vimeo.com/789052336/80dcb545b2)
|
||||||
|
|
||||||
## Example 4: High-resolution with upscaling
|
## Example 3: High-resolution with upscaling
|
||||||
![](example4.jpg)
|
![](example3.jpg)
|
||||||
You can run a high-res transition using the x4 upscaling model in a two-stage procedure, see `example4_upscaling.py`. [View as video here.](https://vimeo.com/787639426/f88dae2ea6)
|
You can run a high-res transition using the x4 upscaling model in a two-stage procedure, see `example3_upscaling.py`. [View as video here.](https://vimeo.com/787639426/f88dae2ea6)
|
||||||
|
|
||||||
|
## Example 4: Multi transition with high-resolution with upscaling
|
||||||
|
You can run a multi transition movie and upscale it, see `example4_multitrans_upscaling.py`.
|
||||||
|
|
||||||
# Customization
|
# Customization
|
||||||
|
|
||||||
## Most relevant parameters
|
## Most relevant parameters
|
||||||
|
You can find the [most relevant parameters here.](parameters.md)
|
||||||
|
|
||||||
### Change the height/width
|
### Change the height/width
|
||||||
```python
|
```python
|
||||||
|
@ -50,42 +53,33 @@ lb.set_width(1024)
|
||||||
```python
|
```python
|
||||||
lb.set_guidance_scale(5.0)
|
lb.set_guidance_scale(5.0)
|
||||||
```
|
```
|
||||||
### depth_strength / list_injection_strength
|
|
||||||
The strength of the diffusion iterations determines when the blending process will begin. A value close to zero results in more creative and intricate outcomes, while a value closer to one indicates a simpler alpha blending. However, low values may also bring about the introduction of additional objects and motion.
|
|
||||||
|
|
||||||
### quality
|
### run_transition parameters
|
||||||
When selecting a preset, you can choose the following values for quality:
|
* num_inference_steps: number of diffusions steps.Number of diffusion steps. Higher values will take more compute time.
|
||||||
lowest, low, medium, high, ultra.
|
* depth_strength: The strength of the diffusion iterations determines when the blending process will begin. A value close to zero results in more creative and intricate outcomes, while a value closer to one indicates a simpler alpha blending. However, low values may also bring about the introduction of additional objects and motion.
|
||||||
This affects both the num_inference_steps and how many diffusion images will be generated for the transition
|
* t_compute_max_allowed: maximum time allowed for computation. Higher values give better results but take longer. Either provide t_compute_max_allowed or nmb_max_branches.
|
||||||
|
* nmb_max_branches: The maximum number of branches to be computed. Higher values give better results. Use this if you want to have controllable results independent of your hardware. Either provide t_compute_max_allowed or nmb_max_branches.
|
||||||
|
|
||||||
## Set up the branching structure
|
### Crossfeeding to the last image.
|
||||||
|
Cross-feeding latents is a key feature of latent blending. Here, you can set how much the first image branch influences the very last one. In the animation below, these are the blue arrows.
|
||||||
|
|
||||||
There are three ways to change the branching structure.
|
```
|
||||||
### Presets
|
crossfeed_power = 0.5 # 50% of the latents in the last branch are copied from branch1
|
||||||
```python
|
crossfeed_range = 0.7 # The crossfeed is active until 70% of num_iteration, then switched off
|
||||||
quality = 'medium'
|
crossfeed_decay = 0.2 # The power of the crossfeed decreases over diffusion iterations, here it would be 0.5*0.2=0.1 in the end of the range.
|
||||||
depth_strength = 0.5 # see above (Most relevant parameters)
|
lb.set_branch1_crossfeed(crossfeed_power, crossfeed_range, crossfeed_decay)
|
||||||
|
|
||||||
lb.load_branching_profile(quality, depth_strength)
|
|
||||||
```
|
```
|
||||||
|
|
||||||
### Autosetup tree
|
### Crossfeeding to all transition images
|
||||||
```python
|
Here, you can set how much the parent branches influence the mixed one. In the animation below, these are the yellow arrows.
|
||||||
depth_strength = 0.5 # see above (Most relevant parameters)
|
|
||||||
num_inference_steps = 30 # the number of diffusion steps
|
|
||||||
nmb_branches_final = 20 # how many diffusion images will be generated for the transition
|
|
||||||
|
|
||||||
lb.autosetup_branching(num_inference_steps, list_nmb_branches, list_injection_strength)
|
```
|
||||||
|
crossfeed_power = 0.5 # 50% of the latents in the last branch are copied from the parents
|
||||||
|
crossfeed_range = 0.7 # The crossfeed is active until 70% of num_iteration, then switched off
|
||||||
|
crossfeed_decay = 0.2 # The power of the crossfeed decreases over diffusion iterations, here it would be 0.5*0.2=0.1 in the end of the range.
|
||||||
|
lb.set_parental_crossfeed(crossfeed_power, crossfeed_range, crossfeed_decay)
|
||||||
```
|
```
|
||||||
|
|
||||||
### Manual specification
|
|
||||||
```python
|
|
||||||
num_inference_steps = 30 # the number of diffusion steps
|
|
||||||
list_nmb_branches = [2, 4, 8, 20]
|
|
||||||
list_injection_strength = [0.0, 0.3, 0.5, 0.9]
|
|
||||||
|
|
||||||
lb.setup_branching(num_inference_steps, list_nmb_branches, list_injection_strength=list_injection_strength)
|
|
||||||
```
|
|
||||||
|
|
||||||
# Installation
|
# Installation
|
||||||
#### Packages
|
#### Packages
|
||||||
|
@ -95,8 +89,6 @@ pip install -r requirements.txt
|
||||||
#### Download Models from Huggingface
|
#### Download Models from Huggingface
|
||||||
[Download the Stable Diffusion v2-1_768 Model](https://huggingface.co/stabilityai/stable-diffusion-2-1)
|
[Download the Stable Diffusion v2-1_768 Model](https://huggingface.co/stabilityai/stable-diffusion-2-1)
|
||||||
|
|
||||||
[Download the Stable Diffusion Inpainting Model](https://huggingface.co/stabilityai/stable-diffusion-2-inpainting)
|
|
||||||
|
|
||||||
[Download the Stable Diffusion x4 Upscaler](https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler)
|
[Download the Stable Diffusion x4 Upscaler](https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler)
|
||||||
|
|
||||||
#### (Optional but recommended) Install [Xformers](https://github.com/facebookresearch/xformers)
|
#### (Optional but recommended) Install [Xformers](https://github.com/facebookresearch/xformers)
|
||||||
|
@ -119,31 +111,30 @@ pip install -v -U git+https://github.com/facebookresearch/xformers.git@main#egg=
|
||||||
## Method
|
## Method
|
||||||
![](animation.gif)
|
![](animation.gif)
|
||||||
|
|
||||||
In the figure above, a diffusion tree is illustrated. The diffusion steps are represented on the y-axis, with temporal blending on the x-axis. The diffusion trajectory for the first prompt is the most left column, with the trajectory for the second prompt to the right. At the third iteration, three branches are created, followed by seven at iteration six and the final ten at iteration nine.
|
In the figure above, a diffusion tree is illustrated. The diffusion steps are represented on the y-axis, with temporal blending on the x-axis. The diffusion trajectory for the first prompt is the most left column, which is always computed first. Next, the the trajectory for the second prompt is computed, which may be influenced by the first branch (blue arrows, for a description see above at `Crossfeeding to the last image.`). Finally, all transition images in between are computed. For the transition, there can be an influence of the parents, which helps preserving structures (yellow arrows, for a description see above at `Crossfeeding to all transition images`). Importantly, the place of injection on the x-axis is not hardfixes a priori, but set dynamically using [Perceptual Similarity](https://richzhang.github.io/PerceptualSimilarity), always adding a branch where it is needed most.
|
||||||
|
|
||||||
This example can be manually set up using the following code
|
The concrete parameters for the transition above would be:
|
||||||
```python
|
```
|
||||||
num_inference_steps = 10
|
lb.set_branch1_crossfeed(crossfeed_power=0.8, crossfeed_range=0.6, crossfeed_decay=0.4)
|
||||||
list_nmb_branches = [2, 3, 7, 10]
|
lb.set_parental_crossfeed(crossfeed_power=0.8, crossfeed_range=0.8, crossfeed_decay=0.2)
|
||||||
list_injection_idx = [0, 3, 6, 9]
|
imgs_transition = lb.run_transition(num_inference_steps=10, depth_strength=0.2, nmb_max_branches=7)
|
||||||
|
|
||||||
lb.setup_branching(num_inference_steps, list_nmb_branches, list_injection_idx=list_injection_idx)
|
|
||||||
```
|
```
|
||||||
|
|
||||||
Instead of specifying the absolute injection indices using list_injection_idx, we can also pass the list_injection_strength, which are independent of the total number of diffusion iterations (num_inference_steps).
|
|
||||||
```python
|
|
||||||
list_injection_strength = [0, 0.3, 0.6, 0.9]
|
|
||||||
lb.setup_branching(num_inference_steps, list_nmb_branches, list_injection_strength=list_injection_strength)
|
|
||||||
```
|
|
||||||
## Perceptual aspects
|
## Perceptual aspects
|
||||||
With latent blending, we can create transitions that appear to defy the laws of nature, yet appear completely natural and believable. The key is to surpress processing in our [dorsal visual stream](https://en.wikipedia.org/wiki/Two-streams_hypothesis#Dorsal_stream), which is achieved by avoiding motion in the transition. Without motion, our visual system has difficulties detecting the transition, leaving viewers with the illusion of a single, continuous image. However, when motion is introduced, the visual system can detect the transition and the viewer becomes aware of the transition, leading to a jarring effect. Therefore, best results will be achieved when optimizing the transition parameters, particularly the depth of the first injection.
|
With latent blending, we can create transitions that appear to defy the laws of nature, yet appear completely natural and believable. The key is to surpress processing in our [dorsal visual stream](https://en.wikipedia.org/wiki/Two-streams_hypothesis#Dorsal_stream), which is achieved by avoiding motion in the transition. Without motion, our visual system has difficulties detecting the transition, leaving viewers with the illusion of a single, continuous image, see [change blindness](https://en.wikipedia.org/wiki/Change_blindness). However, when motion is introduced, the visual system can detect the transition and the viewer becomes aware of the transition, leading to a jarring effect. Therefore, best results will be achieved when optimizing the transition parameters, particularly the crossfeeding parameters and the depth of the first injection.
|
||||||
|
|
||||||
|
# Changelog
|
||||||
|
* New blending engine with cross-feeding capabilities, enabling structure preserving transitions
|
||||||
|
* LPIPS image similarity for finding the next best injection branch, resulting in smoother transitions
|
||||||
|
* Time-based computation: instead of specifying how many frames your transition has, you can tell your compute budget and get a transition within that budget.
|
||||||
|
* New multi-movie engine
|
||||||
|
* Simpler and more powerful gradio ui. You can iterate faster and stitch together a multi movie.
|
||||||
|
* Inpaint support dropped (as it only makes sense for a single transition)
|
||||||
|
|
||||||
# Coming soon...
|
# Coming soon...
|
||||||
- [ ] Huggingface / Colab Interface
|
- [ ] Huggingface Space
|
||||||
- [ ] Interface for making longer videos with many prompts
|
- [ ] More manipulations to the latent (translation, zoom, masking)
|
||||||
- [ ] Transitions with Depth model
|
- [ ] Transitions with Depth model
|
||||||
- [ ] Zooming
|
|
||||||
- [ ] Iso-perceptual spacing for branches (=better transitions)
|
|
||||||
|
|
||||||
Stay tuned on twitter: ```@j_stelzer```
|
Stay tuned on twitter: ```@j_stelzer```
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue