upd
This commit is contained in:
parent
fc88024601
commit
13d8006347
59
README.md
59
README.md
|
@ -1,28 +1,25 @@
|
||||||
# Quickstart
|
# Quickstart
|
||||||
|
|
||||||
Latent blending enables video transitions with incredible smoothness between prompts, computed within seconds. Powered by [stable diffusion 2.1](https://stability.ai/blog/stablediffusion2-1-release7-dec-2022), this method involves specific mixing of intermediate latent representations to create a seamless transition – with users having the option to fully customize the transition and run high-resolution upscaling.
|
Latent blending enables video transitions with incredible smoothness between prompts, computed within seconds. Powered by [stable diffusion XL](https://stability.ai/stable-diffusion), this method involves specific mixing of intermediate latent representations to create a seamless transition – with users having the option to fully customize the transition directly in high-resolution.
|
||||||
|
|
||||||
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1I77--5PS6C-sAskl9OggS1zR0HLKdq1M?usp=sharing)
|
|
||||||
|
|
||||||
```python
|
```python
|
||||||
fp_ckpt = hf_hub_download(repo_id="stabilityai/stable-diffusion-2-1-base", filename="v2-1_512-ema-pruned.ckpt")
|
pretrained_model_name_or_path = "stabilityai/stable-diffusion-xl-base-1.0"
|
||||||
|
pipe = DiffusionPipeline.from_pretrained(pretrained_model_name_or_path, torch_dtype=torch.float16).to('cuda')
|
||||||
sdh = StableDiffusionHolder(fp_ckpt)
|
dh = DiffusersHolder(pipe)
|
||||||
lb = LatentBlending(sdh)
|
lb = LatentBlending(dh)
|
||||||
|
|
||||||
lb.set_prompt1('photo of my first prompt1')
|
lb.set_prompt1('photo of my first prompt1')
|
||||||
lb.set_prompt2('photo of my second prompt')
|
lb.set_prompt2('photo of my second prompt')
|
||||||
depth_strength = 0.6 # How deep the first branching happens
|
depth_strength = 0.6 # How deep the first branching happens
|
||||||
t_compute_max_allowed = 10 # How much compute time we give to the transition
|
t_compute_max_allowed = 10 # How much compute time we give to the transition
|
||||||
imgs_transition = lb.run_transition(depth_strength=depth_strength, t_compute_max_allowed=t_compute_max_allowed)
|
imgs_transition = lb.run_transition(
|
||||||
|
depth_strength=depth_strength,
|
||||||
|
num_inference_steps=num_inference_steps,
|
||||||
|
t_compute_max_allowed=t_compute_max_allowed)
|
||||||
```
|
```
|
||||||
## Gradio UI
|
## Gradio UI
|
||||||
To run the UI on your local machine, run `gradio_ui.py`
|
Coming soon again :)
|
||||||
If you want to specify the output directory, you can create a `.env` file in the latentblending git directory.
|
|
||||||
In here, specify:
|
|
||||||
```
|
|
||||||
DIR_OUT="SET_PATH_HERE"
|
|
||||||
```
|
|
||||||
|
|
||||||
## Example 1: Simple transition
|
## Example 1: Simple transition
|
||||||
![](example1.jpg)
|
![](example1.jpg)
|
||||||
|
@ -32,12 +29,6 @@ To run a simple transition between two prompts, run `example1_standard.py`
|
||||||
To run multiple transition between K prompts, resulting in a stitched video, run `example2_multitrans.py`.
|
To run multiple transition between K prompts, resulting in a stitched video, run `example2_multitrans.py`.
|
||||||
[View a longer example video here.](https://vimeo.com/789052336/80dcb545b2)
|
[View a longer example video here.](https://vimeo.com/789052336/80dcb545b2)
|
||||||
|
|
||||||
## Example 3: High-resolution with upscaling
|
|
||||||
![](example3.jpg)
|
|
||||||
You can run a high-res transition using the x4 upscaling model in a two-stage procedure, see `example3_upscaling.py`. [View as video here.](https://vimeo.com/787639426/f88dae2ea6)
|
|
||||||
|
|
||||||
## Example 4: Multi transition with high-resolution with upscaling
|
|
||||||
You can run a multi transition movie and upscale it, see `example4_multitrans_upscaling.py`.
|
|
||||||
|
|
||||||
# Customization
|
# Customization
|
||||||
|
|
||||||
|
@ -46,8 +37,8 @@ You can find the [most relevant parameters here.](parameters.md)
|
||||||
|
|
||||||
### Change the height/width
|
### Change the height/width
|
||||||
```python
|
```python
|
||||||
lb.set_height(512)
|
size_output = (1024, 768)
|
||||||
lb.set_width(1024)
|
lb.set_dimensions(size_output)
|
||||||
```
|
```
|
||||||
### Change guidance scale
|
### Change guidance scale
|
||||||
```python
|
```python
|
||||||
|
@ -87,22 +78,6 @@ lb.set_parental_crossfeed(crossfeed_power, crossfeed_range, crossfeed_decay)
|
||||||
pip install -r requirements.txt
|
pip install -r requirements.txt
|
||||||
```
|
```
|
||||||
|
|
||||||
#### (Optional but recommended) Install [Xformers](https://github.com/facebookresearch/xformers)
|
|
||||||
With xformers, stable diffusion will run faster with smaller memory inprint. Necessary for higher resolutions / upscaling model.
|
|
||||||
|
|
||||||
```commandline
|
|
||||||
conda install xformers -c xformers/label/dev
|
|
||||||
```
|
|
||||||
|
|
||||||
Alternatively, you can build it from source:
|
|
||||||
```commandline
|
|
||||||
# (Optional) Makes the build much faster
|
|
||||||
pip install ninja
|
|
||||||
# Set TORCH_CUDA_ARCH_LIST if running and building on different GPU types
|
|
||||||
pip install -v -U git+https://github.com/facebookresearch/xformers.git@main#egg=xformers
|
|
||||||
# (this can take dozens of minutes)
|
|
||||||
```
|
|
||||||
|
|
||||||
# How does latent blending work?
|
# How does latent blending work?
|
||||||
## Method
|
## Method
|
||||||
![](animation.gif)
|
![](animation.gif)
|
||||||
|
@ -120,6 +95,8 @@ imgs_transition = lb.run_transition(num_inference_steps=10, depth_strength=0.2,
|
||||||
With latent blending, we can create transitions that appear to defy the laws of nature, yet appear completely natural and believable. The key is to surpress processing in our [dorsal visual stream](https://en.wikipedia.org/wiki/Two-streams_hypothesis#Dorsal_stream), which is achieved by avoiding motion in the transition. Without motion, our visual system has difficulties detecting the transition, leaving viewers with the illusion of a single, continuous image, see [change blindness](https://en.wikipedia.org/wiki/Change_blindness). However, when motion is introduced, the visual system can detect the transition and the viewer becomes aware of the transition, leading to a jarring effect. Therefore, best results will be achieved when optimizing the transition parameters, particularly the crossfeeding parameters and the depth of the first injection.
|
With latent blending, we can create transitions that appear to defy the laws of nature, yet appear completely natural and believable. The key is to surpress processing in our [dorsal visual stream](https://en.wikipedia.org/wiki/Two-streams_hypothesis#Dorsal_stream), which is achieved by avoiding motion in the transition. Without motion, our visual system has difficulties detecting the transition, leaving viewers with the illusion of a single, continuous image, see [change blindness](https://en.wikipedia.org/wiki/Change_blindness). However, when motion is introduced, the visual system can detect the transition and the viewer becomes aware of the transition, leading to a jarring effect. Therefore, best results will be achieved when optimizing the transition parameters, particularly the crossfeeding parameters and the depth of the first injection.
|
||||||
|
|
||||||
# Changelog
|
# Changelog
|
||||||
|
* SD XL support
|
||||||
|
* Diffusers backend, greatly simplifing installation and use (bring your own pipe)
|
||||||
* New blending engine with cross-feeding capabilities, enabling structure preserving transitions
|
* New blending engine with cross-feeding capabilities, enabling structure preserving transitions
|
||||||
* LPIPS image similarity for finding the next best injection branch, resulting in smoother transitions
|
* LPIPS image similarity for finding the next best injection branch, resulting in smoother transitions
|
||||||
* Time-based computation: instead of specifying how many frames your transition has, you can tell your compute budget and get a transition within that budget.
|
* Time-based computation: instead of specifying how many frames your transition has, you can tell your compute budget and get a transition within that budget.
|
||||||
|
@ -128,9 +105,13 @@ With latent blending, we can create transitions that appear to defy the laws of
|
||||||
* Inpaint support dropped (as it only makes sense for a single transition)
|
* Inpaint support dropped (as it only makes sense for a single transition)
|
||||||
|
|
||||||
# Coming soon...
|
# Coming soon...
|
||||||
|
- [ ] Gradio interface
|
||||||
- [ ] Huggingface Space
|
- [ ] Huggingface Space
|
||||||
- [ ] More manipulations to the latent (translation, zoom, masking)
|
- [ ] Controlnet
|
||||||
- [ ] Transitions with Depth model
|
- [ ] IP-Adapter
|
||||||
|
- [ ] Latent Consistency
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
Stay tuned on twitter: ```@j_stelzer```
|
Stay tuned on twitter: ```@j_stelzer```
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue