23 Commits

Author SHA1 Message Date
DGX
9f9512fa48 movie import fix 2024-02-21 12:43:17 +00:00
DGX
359ef99eaf movie engine fix 2024-02-21 12:42:17 +00:00
DGX
37fc1cf05f removed ffmpeg 2024-02-06 12:45:07 +00:00
DGX
0d44404903 removed dependency 2024-02-06 12:36:41 +00:00
DGX
5ea7981a9c random seeds 2024-02-06 12:01:42 +00:00
Johannes Stelzer
179a42b9bf Update README.md 2024-02-05 14:07:34 +00:00
DGX
b9ed277055 pretrained path 2024-02-01 13:26:12 +00:00
DGX
896ba0c768 missing numpy import 2024-02-01 13:25:15 +00:00
Johannes Stelzer
f72cc12fb3 Update README.md 2024-01-31 16:22:38 +00:00
Johannes Stelzer
01a960c48d diffusersholder automatically spawned in blendingengine 2024-01-31 11:12:47 +00:00
Johannes Stelzer
646a3c757e Update README.md 2024-01-26 11:52:04 +00:00
Johannes Stelzer
47e72ed76f Update README.md 2024-01-10 10:00:33 +01:00
DGX
4b235b874e compile flag with sfast 2024-01-10 08:58:30 +00:00
DGX
1775c9a90a Merge branch 'main' of github.com:lunarring/latentblending 2024-01-10 08:47:44 +00:00
DGX
a0f35f2a41 moved examples 2024-01-10 08:47:35 +00:00
Johannes Stelzer
f5965154ba trailing comma 2024-01-09 21:30:37 +01:00
Johannes Stelzer
b83d3ee0a0 lpips darwin 2024-01-09 21:21:23 +01:00
Johannes Stelzer
4501d80044 Update README.md 2024-01-09 21:16:10 +01:00
Johannes Stelzer
6e138c54a2 Update README.md 2024-01-09 21:12:14 +01:00
Johannes Stelzer
1ba4b578a0 Update README.md 2024-01-09 21:11:40 +01:00
DGX
4042a098b0 accelerate 2024-01-09 20:10:12 +00:00
DGX
9d5b545c1a accelerate 2024-01-09 20:09:16 +00:00
Johannes Stelzer
f1a1b47923 Merge pull request #13 from lunarring/package
Package
2024-01-09 21:08:37 +01:00
7 changed files with 113 additions and 378 deletions

View File

@@ -2,32 +2,48 @@
Latent blending enables video transitions with incredible smoothness between prompts, computed within seconds. Powered by [stable diffusion XL](https://stability.ai/stable-diffusion), this method involves specific mixing of intermediate latent representations to create a seamless transition with users having the option to fully customize the transition directly in high-resolution. The new version also supports SDXL Turbo, allowing to generate transitions faster than they are typically played back! Latent blending enables video transitions with incredible smoothness between prompts, computed within seconds. Powered by [stable diffusion XL](https://stability.ai/stable-diffusion), this method involves specific mixing of intermediate latent representations to create a seamless transition with users having the option to fully customize the transition directly in high-resolution. The new version also supports SDXL Turbo, allowing to generate transitions faster than they are typically played back!
```python ```python
import torch
from diffusers import AutoPipelineForText2Image
from latentblending.blending_engine import BlendingEngine
from latentblending.diffusers_holder import DiffusersHolder
pipe = AutoPipelineForText2Image.from_pretrained("stabilityai/sdxl-turbo", torch_dtype=torch.float16, variant="fp16").to("cuda") pipe = AutoPipelineForText2Image.from_pretrained("stabilityai/sdxl-turbo", torch_dtype=torch.float16, variant="fp16").to("cuda")
dh = DiffusersHolder(pipe) be = BlendingEngine(pipe)
lb = LatentBlending(dh) be.set_prompt1("photo of underwater landscape, fish, und the sea, incredible detail, high resolution")
lb.set_prompt1("photo of underwater landscape, fish, und the sea, incredible detail, high resolution") be.set_prompt2("rendering of an alien planet, strange plants, strange creatures, surreal")
lb.set_prompt2("rendering of an alien planet, strange plants, strange creatures, surreal") be.set_negative_prompt("blurry, ugly, pale")
lb.set_negative_prompt("blurry, ugly, pale")
# Run latent blending # Run latent blending
lb.run_transition() be.run_transition()
# Save movie # Save movie
lb.write_movie_transition('movie_example1.mp4', duration_transition=12) be.write_movie_transition('movie_example1.mp4', duration_transition=12)
``` ```
# Installation
```commandline
pip install git+https://github.com/lunarring/latentblending
```
# Extra speedup with stable_fast compile
Install https://github.com/chengzeyi/stable-fast
Then enable pipe compilation by setting *do_compile=True*
```python
be = BlendingEngine(pipe, do_compile=True)
```
## Gradio UI ## Gradio UI
Coming soon again :) Coming soon again :)
## Example 1: Simple transition ## Example 1: Simple transition
![](example1.jpg) ![](example1.jpg)
To run a simple transition between two prompts, run `example1_standard.py` To run a simple transition between two prompts, see `examples/single_trans.py`
## Example 2: Multi transition ## Example 2: Multi transition
To run multiple transition between K prompts, resulting in a stitched video, run `example2_multitrans.py`. To run multiple transition between K prompts, resulting in a stitched video, see `examples/multi_trans.py`.
[View a longer example video here.](https://vimeo.com/789052336/80dcb545b2) [View a longer example video here.](https://youtu.be/RLF-yW5dR_Q)
# Customization # Customization
@@ -35,19 +51,19 @@ To run multiple transition between K prompts, resulting in a stitched video, run
### Change the height/width ### Change the height/width
```python ```python
size_output = (1024, 768) size_output = (1024, 768)
lb.set_dimensions(size_output) be.set_dimensions(size_output)
``` ```
### Change the number of diffusion steps (set_num_inference_steps) ### Change the number of diffusion steps (set_num_inference_steps)
```python ```python
lb.set_num_inference_steps(50) be.set_num_inference_steps(50)
``` ```
For SDXL this is set as default=30, for SDXL Turbo a value of 4 is taken. For SDXL this is set as default=30, for SDXL Turbo a value of 4 is taken.
### Change the guidance scale ### Change the guidance scale
```python ```python
lb.set_guidance_scale(3.0) be.set_guidance_scale(3.0)
``` ```
For SDXL this is set as default=4.0, for SDXL Turbo a value of 0 is taken. For SDXL this is set as default=4.0, for SDXL Turbo a value of 0 is taken.
@@ -55,7 +71,7 @@ For SDXL this is set as default=4.0, for SDXL Turbo a value of 0 is taken.
```python ```python
depth_strength = 0.5 depth_strength = 0.5
nmb_max_branches = 15 nmb_max_branches = 15
lb.set_branching(depth_strength=depth_strength, t_compute_max_allowed=None, nmb_max_branches=None) be.set_branching(depth_strength=depth_strength, t_compute_max_allowed=None, nmb_max_branches=None)
``` ```
* depth_strength: The strength of the diffusion iterations determines when the blending process will begin. A value close to zero results in more creative and intricate outcomes, while a value closer to one indicates a simpler alpha blending. However, low values may also bring about the introduction of additional objects and motion. * depth_strength: The strength of the diffusion iterations determines when the blending process will begin. A value close to zero results in more creative and intricate outcomes, while a value closer to one indicates a simpler alpha blending. However, low values may also bring about the introduction of additional objects and motion.
* t_compute_max_allowed: maximum time allowed for computation. Higher values give better results but take longer. Either provide t_compute_max_allowed or nmb_max_branches. Does not work for SDXL Turbo. * t_compute_max_allowed: maximum time allowed for computation. Higher values give better results but take longer. Either provide t_compute_max_allowed or nmb_max_branches. Does not work for SDXL Turbo.
@@ -66,7 +82,7 @@ You can find the [most relevant parameters here.](parameters.md)
### Change guidance scale ### Change guidance scale
```python ```python
lb.set_guidance_scale(5.0) be.set_guidance_scale(5.0)
``` ```
### Crossfeeding to the last image. ### Crossfeeding to the last image.
@@ -76,7 +92,7 @@ Cross-feeding latents is a key feature of latent blending. Here, you can set how
crossfeed_power = 0.5 # 50% of the latents in the last branch are copied from branch1 crossfeed_power = 0.5 # 50% of the latents in the last branch are copied from branch1
crossfeed_range = 0.7 # The crossfeed is active until 70% of num_iteration, then switched off crossfeed_range = 0.7 # The crossfeed is active until 70% of num_iteration, then switched off
crossfeed_decay = 0.2 # The power of the crossfeed decreases over diffusion iterations, here it would be 0.5*0.2=0.1 in the end of the range. crossfeed_decay = 0.2 # The power of the crossfeed decreases over diffusion iterations, here it would be 0.5*0.2=0.1 in the end of the range.
lb.set_branch1_crossfeed(crossfeed_power, crossfeed_range, crossfeed_decay) be.set_branch1_crossfeed(crossfeed_power, crossfeed_range, crossfeed_decay)
``` ```
### Crossfeeding to all transition images ### Crossfeeding to all transition images
@@ -86,16 +102,10 @@ Here, you can set how much the parent branches influence the mixed one. In the a
crossfeed_power = 0.5 # 50% of the latents in the last branch are copied from the parents crossfeed_power = 0.5 # 50% of the latents in the last branch are copied from the parents
crossfeed_range = 0.7 # The crossfeed is active until 70% of num_iteration, then switched off crossfeed_range = 0.7 # The crossfeed is active until 70% of num_iteration, then switched off
crossfeed_decay = 0.2 # The power of the crossfeed decreases over diffusion iterations, here it would be 0.5*0.2=0.1 in the end of the range. crossfeed_decay = 0.2 # The power of the crossfeed decreases over diffusion iterations, here it would be 0.5*0.2=0.1 in the end of the range.
lb.set_parental_crossfeed(crossfeed_power, crossfeed_range, crossfeed_decay) be.set_parental_crossfeed(crossfeed_power, crossfeed_range, crossfeed_decay)
``` ```
# Installation
#### Packages
```commandline
pip install -r requirements.txt
```
# How does latent blending work? # How does latent blending work?
## Method ## Method
![](animation.gif) ![](animation.gif)
@@ -104,9 +114,9 @@ In the figure above, a diffusion tree is illustrated. The diffusion steps are re
The concrete parameters for the transition above would be: The concrete parameters for the transition above would be:
``` ```
lb.set_branch1_crossfeed(crossfeed_power=0.8, crossfeed_range=0.6, crossfeed_decay=0.4) be.set_branch1_crossfeed(crossfeed_power=0.8, crossfeed_range=0.6, crossfeed_decay=0.4)
lb.set_parental_crossfeed(crossfeed_power=0.8, crossfeed_range=0.8, crossfeed_decay=0.2) be.set_parental_crossfeed(crossfeed_power=0.8, crossfeed_range=0.8, crossfeed_decay=0.2)
imgs_transition = lb.run_transition(num_inference_steps=10, depth_strength=0.2, nmb_max_branches=7) imgs_transition = be.run_transition(num_inference_steps=10, depth_strength=0.2, nmb_max_branches=7)
``` ```
## Perceptual aspects ## Perceptual aspects
@@ -124,6 +134,7 @@ With latent blending, we can create transitions that appear to defy the laws of
* Inpaint support dropped (as it only makes sense for a single transition) * Inpaint support dropped (as it only makes sense for a single transition)
# Coming soon... # Coming soon...
- [ ] MacOS support
- [ ] Gradio interface - [ ] Gradio interface
- [ ] Huggingface Space - [ ] Huggingface Space
- [ ] Controlnet - [ ] Controlnet

View File

@@ -1,33 +1,42 @@
import torch import torch
import warnings import warnings
from blending_engine import BlendingEngine
from diffusers_holder import DiffusersHolder
from diffusers import AutoPipelineForText2Image from diffusers import AutoPipelineForText2Image
from movie_util import concatenate_movies from latentblending.movie_util import concatenate_movies
from latentblending.blending_engine import BlendingEngine
import numpy as np
torch.set_grad_enabled(False) torch.set_grad_enabled(False)
torch.backends.cudnn.benchmark = False torch.backends.cudnn.benchmark = False
warnings.filterwarnings('ignore') warnings.filterwarnings('ignore')
# %% First let us spawn a stable diffusion holder. Uncomment your version of choice. # %% First let us spawn a stable diffusion holder. Uncomment your version of choice.
pipe = AutoPipelineForText2Image.from_pretrained("stabilityai/sdxl-turbo", torch_dtype=torch.float16, variant="fp16") pretrained_model_name_or_path = "stabilityai/stable-diffusion-xl-base-1.0"
pipe.to('cuda') # pretrained_model_name_or_path = "stabilityai/sdxl-turbo"
dh = DiffusersHolder(pipe)
pipe = AutoPipelineForText2Image.from_pretrained(pretrained_model_name_or_path, torch_dtype=torch.float16, variant="fp16")
pipe.to('cuda')
be = BlendingEngine(pipe, do_compile=True)
be.set_negative_prompt("blurry, pale, low-res, lofi")
# %% Let's setup the multi transition # %% Let's setup the multi transition
fps = 30 fps = 30
duration_single_trans = 10 duration_single_trans = 10
be.set_dimensions((1024, 1024))
nmb_prompts = 20
# Specify a list of prompts below # Specify a list of prompts below
#%%
list_prompts = [] list_prompts = []
list_prompts.append("Photo of a house, high detail") list_prompts.append("high resolution ultra 8K image with lake and forest")
list_prompts.append("Photo of an elephant in african savannah") list_prompts.append("strange and alien desolate lanscapes 8K")
list_prompts.append("photo of a house, high detail") list_prompts.append("ultra high res psychedelic skyscraper city landscape 8K unreal engine")
#%%
fp_movie = f'surreal_nmb{len(list_prompts)}.mp4'
# You can optionally specify the seeds # Specify the seeds
list_seeds = [95437579, 33259350, 956051013] list_seeds = np.random.randint(0, np.iinfo(np.int32).max, len(list_prompts))
fp_movie = 'movie_example2.mp4'
be = BlendingEngine(dh)
list_movie_parts = [] list_movie_parts = []
for i in range(len(list_prompts) - 1): for i in range(len(list_prompts) - 1):

View File

@@ -1,8 +1,7 @@
import torch import torch
import warnings import warnings
from blending_engine import BlendingEngine
from diffusers_holder import DiffusersHolder
from diffusers import AutoPipelineForText2Image from diffusers import AutoPipelineForText2Image
from latentblending.blending_engine import BlendingEngine
warnings.filterwarnings('ignore') warnings.filterwarnings('ignore')
torch.set_grad_enabled(False) torch.set_grad_enabled(False)
@@ -12,9 +11,7 @@ torch.backends.cudnn.benchmark = False
pipe = AutoPipelineForText2Image.from_pretrained("stabilityai/sdxl-turbo", torch_dtype=torch.float16, variant="fp16") pipe = AutoPipelineForText2Image.from_pretrained("stabilityai/sdxl-turbo", torch_dtype=torch.float16, variant="fp16")
pipe.to("cuda") pipe.to("cuda")
dh = DiffusersHolder(pipe) be = BlendingEngine(pipe)
be = BlendingEngine(dh)
be.set_prompt1("photo of underwater landscape, fish, und the sea, incredible detail, high resolution") be.set_prompt1("photo of underwater landscape, fish, und the sea, incredible detail, high resolution")
be.set_prompt2("rendering of an alien planet, strange plants, strange creatures, surreal") be.set_prompt2("rendering of an alien planet, strange plants, strange creatures, surreal")
be.set_negative_prompt("blurry, ugly, pale") be.set_negative_prompt("blurry, ugly, pale")

View File

@@ -5,10 +5,12 @@ import warnings
import time import time
from tqdm.auto import tqdm from tqdm.auto import tqdm
from PIL import Image from PIL import Image
from latentblending.movie_util import MovieSaver
from typing import List, Optional from typing import List, Optional
import lpips import lpips
from latentblending.utils import interpolate_spherical, interpolate_linear, add_frames_linear_interp, yml_load, yml_save import platform
from latentblending.diffusers_holder import DiffusersHolder
from latentblending.utils import interpolate_spherical, interpolate_linear, add_frames_linear_interp
from lunar_tools import MovieSaver, fill_up_frames_linear_interpolation
warnings.filterwarnings('ignore') warnings.filterwarnings('ignore')
torch.backends.cudnn.benchmark = False torch.backends.cudnn.benchmark = False
torch.set_grad_enabled(False) torch.set_grad_enabled(False)
@@ -17,12 +19,15 @@ torch.set_grad_enabled(False)
class BlendingEngine(): class BlendingEngine():
def __init__( def __init__(
self, self,
dh: None, pipe: None,
do_compile: bool = False,
guidance_scale_mid_damper: float = 0.5, guidance_scale_mid_damper: float = 0.5,
mid_compression_scaler: float = 1.2): mid_compression_scaler: float = 1.2):
r""" r"""
Initializes the latent blending class. Initializes the latent blending class.
Args: Args:
pipe: diffusers pipeline (SDXL)
do_compile: compile pipeline for faster inference using stable fast
guidance_scale_mid_damper: float = 0.5 guidance_scale_mid_damper: float = 0.5
Reduces the guidance scale towards the middle of the transition. Reduces the guidance scale towards the middle of the transition.
A value of 0.5 would decrease the guidance_scale towards the middle linearly by 0.5. A value of 0.5 would decrease the guidance_scale towards the middle linearly by 0.5.
@@ -35,7 +40,8 @@ class BlendingEngine():
and guidance_scale_mid_damper <= 1.0, \ and guidance_scale_mid_damper <= 1.0, \
f"guidance_scale_mid_damper neees to be in interval (0,1], you provided {guidance_scale_mid_damper}" f"guidance_scale_mid_damper neees to be in interval (0,1], you provided {guidance_scale_mid_damper}"
self.dh = dh
self.dh = DiffusersHolder(pipe)
self.device = self.dh.device self.device = self.dh.device
self.set_dimensions() self.set_dimensions()
@@ -64,7 +70,10 @@ class BlendingEngine():
self.multi_transition_img_first = None self.multi_transition_img_first = None
self.multi_transition_img_last = None self.multi_transition_img_last = None
self.dt_unet_step = 0 self.dt_unet_step = 0
self.lpips = lpips.LPIPS(net='alex').cuda(self.device) if platform.system() == "Darwin":
self.lpips = lpips.LPIPS(net='alex')
else:
self.lpips = lpips.LPIPS(net='alex').cuda(self.device)
self.set_prompt1("") self.set_prompt1("")
self.set_prompt2("") self.set_prompt2("")
@@ -76,13 +85,23 @@ class BlendingEngine():
self.benchmark_speed() self.benchmark_speed()
self.set_branching() self.set_branching()
if do_compile:
print("starting compilation")
from sfast.compilers.diffusion_pipeline_compiler import (compile, CompilationConfig)
self.dh.pipe.enable_xformers_memory_efficient_attention()
config = CompilationConfig.Default()
config.enable_xformers = True
config.enable_triton = True
config.enable_cuda_graph = True
self.dh.pipe = compile(self.dh.pipe, config)
def benchmark_speed(self): def benchmark_speed(self):
""" """
Measures the time per diffusion step and for the vae decoding Measures the time per diffusion step and for the vae decoding
""" """
print("starting speed benchmark...")
text_embeddings = self.dh.get_text_embedding("test") text_embeddings = self.dh.get_text_embedding("test")
latents_start = self.dh.get_noise(np.random.randint(111111)) latents_start = self.dh.get_noise(np.random.randint(111111))
# warmup # warmup
@@ -96,6 +115,7 @@ class BlendingEngine():
t0 = time.time() t0 = time.time()
img = self.dh.latent2image(list_latents[-1]) img = self.dh.latent2image(list_latents[-1])
self.dt_vae = time.time() - t0 self.dt_vae = time.time() - t0
print(f"time per unet iteration: {self.dt_unet_step} time for vae: {self.dt_vae}")
def set_dimensions(self, size_output=None): def set_dimensions(self, size_output=None):
r""" r"""
@@ -268,7 +288,7 @@ class BlendingEngine():
if t_compute_max_allowed is None and nmb_max_branches is None: if t_compute_max_allowed is None and nmb_max_branches is None:
t_compute_max_allowed = 20 t_compute_max_allowed = 20
elif t_compute_max_allowed is not None and nmb_max_branches is not None: elif t_compute_max_allowed is not None and nmb_max_branches is not None:
raise ValueErorr("Either specify t_compute_max_allowed or nmb_max_branches") raise ValueError("Either specify t_compute_max_allowed or nmb_max_branches")
self.list_idx_injection, self.list_nmb_stems = self.get_time_based_branching(depth_strength, t_compute_max_allowed, nmb_max_branches) self.list_idx_injection, self.list_nmb_stems = self.get_time_based_branching(depth_strength, t_compute_max_allowed, nmb_max_branches)
@@ -676,7 +696,7 @@ class BlendingEngine():
""" """
# Let's get more cheap frames via linear interpolation (duration_transition*fps frames) # Let's get more cheap frames via linear interpolation (duration_transition*fps frames)
imgs_transition_ext = add_frames_linear_interp(self.tree_final_imgs, duration_transition, fps) imgs_transition_ext = fill_up_frames_linear_interpolation(self.tree_final_imgs, duration_transition, fps)
# Save as MP4 # Save as MP4
if os.path.isfile(fp_movie): if os.path.isfile(fp_movie):
@@ -686,12 +706,6 @@ class BlendingEngine():
ms.write_frame(img) ms.write_frame(img)
ms.finalize() ms.finalize()
def save_statedict(self, fp_yml):
# Dump everything relevant into yaml
imgs_transition = self.tree_final_imgs
state_dict = self.get_state_dict()
state_dict['nmb_images'] = len(imgs_transition)
yml_save(fp_yml, state_dict)
def get_state_dict(self): def get_state_dict(self):
state_dict = {} state_dict = {}
@@ -813,14 +827,18 @@ if __name__ == "__main__":
from diffusers import AutoencoderTiny from diffusers import AutoencoderTiny
# pretrained_model_name_or_path = "stabilityai/stable-diffusion-xl-base-1.0" # pretrained_model_name_or_path = "stabilityai/stable-diffusion-xl-base-1.0"
pretrained_model_name_or_path = "stabilityai/sdxl-turbo" pretrained_model_name_or_path = "stabilityai/sdxl-turbo"
pipe = DiffusionPipeline.from_pretrained(pretrained_model_name_or_path)
pipe = DiffusionPipeline.from_pretrained(pretrained_model_name_or_path, torch_dtype=torch.float16, variant="fp16") # pipe.to("mps")
pipe.to("cuda") pipe.to("cuda")
pipe.vae = AutoencoderTiny.from_pretrained('madebyollin/taesdxl', torch_device='cuda', torch_dtype=torch.float16)
pipe.vae = pipe.vae.cuda() # pipe.vae = AutoencoderTiny.from_pretrained('madebyollin/taesdxl', torch_device='cuda', torch_dtype=torch.float16)
# pipe.vae = pipe.vae.cuda()
dh = DiffusersHolder(pipe) dh = DiffusersHolder(pipe)
xxx
# %% Next let's set up all parameters # %% Next let's set up all parameters
prompt1 = "photo of underwater landscape, fish, und the sea, incredible detail, high resolution" prompt1 = "photo of underwater landscape, fish, und the sea, incredible detail, high resolution"
prompt2 = "rendering of an alien planet, strange plants, strange creatures, surreal" prompt2 = "rendering of an alien planet, strange plants, strange creatures, surreal"
@@ -829,19 +847,20 @@ if __name__ == "__main__":
duration_transition = 12 # In seconds duration_transition = 12 # In seconds
# Spawn latent blending # Spawn latent blending
lb = LatentBlending(dh) be = BlendingEngine(dh)
lb.set_prompt1(prompt1) be.set_prompt1(prompt1)
lb.set_prompt2(prompt2) be.set_prompt2(prompt2)
lb.set_negative_prompt(negative_prompt) be.set_negative_prompt(negative_prompt)
# Run latent blending # Run latent blending
t0 = time.time() t0 = time.time()
lb.run_transition(fixed_seeds=[420, 421]) be.run_transition(fixed_seeds=[420, 421])
dt = time.time() - t0 dt = time.time() - t0
print(f"dt = {dt}")
# Save movie # Save movie
fp_movie = f'test.mp4' fp_movie = f'test.mp4'
lb.write_movie_transition(fp_movie, duration_transition) be.write_movie_transition(fp_movie, duration_transition)

View File

@@ -1,301 +0,0 @@
# Copyright 2022 Lunar Ring. All rights reserved.
# Written by Johannes Stelzer, email stelzer@lunar-ring.ai twitter @j_stelzer
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import subprocess
import os
import numpy as np
from tqdm import tqdm
import cv2
from typing import List
import ffmpeg # pip install ffmpeg-python. if error with broken pipe: conda update ffmpeg
class MovieSaver():
def __init__(
self,
fp_out: str,
fps: int = 24,
shape_hw: List[int] = None,
crf: int = 21,
codec: str = 'libx264',
preset: str = 'fast',
pix_fmt: str = 'yuv420p',
silent_ffmpeg: bool = True):
r"""
Initializes movie saver class - a human friendly ffmpeg wrapper.
After you init the class, you can dump numpy arrays x into moviesaver.write_frame(x).
Don't forget toi finalize movie file with moviesaver.finalize().
Args:
fp_out: str
Output file name. If it already exists, it will be deleted.
fps: int
Frames per second.
shape_hw: List[int, int]
Output shape, optional argument. Can be initialized automatically when first frame is written.
crf: int
ffmpeg doc: the range of the CRF scale is 051, where 0 is lossless
(for 8 bit only, for 10 bit use -qp 0), 23 is the default, and 51 is worst quality possible.
A lower value generally leads to higher quality, and a subjectively sane range is 1728.
Consider 17 or 18 to be visually lossless or nearly so;
it should look the same or nearly the same as the input but it isn't technically lossless.
The range is exponential, so increasing the CRF value +6 results in
roughly half the bitrate / file size, while -6 leads to roughly twice the bitrate.
codec: int
Number of diffusion steps. Larger values will take more compute time.
preset: str
Choose between ultrafast, superfast, veryfast, faster, fast, medium, slow, slower, veryslow.
ffmpeg doc: A preset is a collection of options that will provide a certain encoding speed
to compression ratio. A slower preset will provide better compression
(compression is quality per filesize).
This means that, for example, if you target a certain file size or constant bit rate,
you will achieve better quality with a slower preset. Similarly, for constant quality encoding,
you will simply save bitrate by choosing a slower preset.
pix_fmt: str
Pixel format. Run 'ffmpeg -pix_fmts' in your shell to see all options.
silent_ffmpeg: bool
Surpress the output from ffmpeg.
"""
if len(os.path.split(fp_out)[0]) > 0:
assert os.path.isdir(os.path.split(fp_out)[0]), "Directory does not exist!"
self.fp_out = fp_out
self.fps = fps
self.crf = crf
self.pix_fmt = pix_fmt
self.codec = codec
self.preset = preset
self.silent_ffmpeg = silent_ffmpeg
if os.path.isfile(fp_out):
os.remove(fp_out)
self.init_done = False
self.nmb_frames = 0
if shape_hw is None:
self.shape_hw = [-1, 1]
else:
if len(shape_hw) == 2:
shape_hw.append(3)
self.shape_hw = shape_hw
self.initialize()
print(f"MovieSaver initialized. fps={fps} crf={crf} pix_fmt={pix_fmt} codec={codec} preset={preset}")
def initialize(self):
args = (
ffmpeg
.input('pipe:', format='rawvideo', pix_fmt='rgb24', s='{}x{}'.format(self.shape_hw[1], self.shape_hw[0]), framerate=self.fps)
.output(self.fp_out, crf=self.crf, pix_fmt=self.pix_fmt, c=self.codec, preset=self.preset)
.overwrite_output()
.compile()
)
if self.silent_ffmpeg:
self.ffmpg_process = subprocess.Popen(args, stdin=subprocess.PIPE, stderr=subprocess.DEVNULL, stdout=subprocess.DEVNULL)
else:
self.ffmpg_process = subprocess.Popen(args, stdin=subprocess.PIPE)
self.init_done = True
self.shape_hw = tuple(self.shape_hw)
print(f"Initialization done. Movie shape: {self.shape_hw}")
def write_frame(self, out_frame: np.ndarray):
r"""
Function to dump a numpy array as frame of a movie.
Args:
out_frame: np.ndarray
Numpy array, in np.uint8 format. Convert with np.astype(x, np.uint8).
Dim 0: y
Dim 1: x
Dim 2: RGB
"""
assert out_frame.dtype == np.uint8, "Convert to np.uint8 before"
assert len(out_frame.shape) == 3, "out_frame needs to be three dimensional, Y X C"
assert out_frame.shape[2] == 3, f"need three color channels, but you provided {out_frame.shape[2]}."
if not self.init_done:
self.shape_hw = out_frame.shape
self.initialize()
assert self.shape_hw == out_frame.shape, f"You cannot change the image size after init. Initialized with {self.shape_hw}, out_frame {out_frame.shape}"
# write frame
self.ffmpg_process.stdin.write(
out_frame
.astype(np.uint8)
.tobytes()
)
self.nmb_frames += 1
def finalize(self):
r"""
Call this function to finalize the movie. If you forget to call it your movie will be garbage.
"""
if self.nmb_frames == 0:
print("You did not write any frames yet! nmb_frames = 0. Cannot save.")
return
self.ffmpg_process.stdin.close()
self.ffmpg_process.wait()
duration = int(self.nmb_frames / self.fps)
print(f"Movie saved, {duration}s playtime, watch here: \n{self.fp_out}")
def concatenate_movies(fp_final: str, list_fp_movies: List[str]):
r"""
Concatenate multiple movie segments into one long movie, using ffmpeg.
Parameters
----------
fp_final : str
Full path of the final movie file. Should end with .mp4
list_fp_movies : list[str]
List of full paths of movie segments.
"""
assert fp_final[-4] == ".", "fp_final seems to miss file extension: {fp_final}"
for fp in list_fp_movies:
assert os.path.isfile(fp), f"Input movie does not exist: {fp}"
assert os.path.getsize(fp) > 100, f"Input movie seems empty: {fp}"
if os.path.isfile(fp_final):
os.remove(fp_final)
# make a list for ffmpeg
list_concat = []
for fp_part in list_fp_movies:
list_concat.append(f"""file '{fp_part}'""")
# save this list
fp_list = "tmp_move.txt"
with open(fp_list, "w") as fa:
for item in list_concat:
fa.write("%s\n" % item)
cmd = f'ffmpeg -f concat -safe 0 -i {fp_list} -c copy {fp_final}'
subprocess.call(cmd, shell=True)
os.remove(fp_list)
if os.path.isfile(fp_final):
print(f"concatenate_movies: success! Watch here: {fp_final}")
def add_sound(fp_final, fp_silentmovie, fp_sound):
cmd = f'ffmpeg -i {fp_silentmovie} -i {fp_sound} -c copy -map 0:v:0 -map 1:a:0 {fp_final}'
subprocess.call(cmd, shell=True)
if os.path.isfile(fp_final):
print(f"add_sound: success! Watch here: {fp_final}")
def add_subtitles_to_video(
fp_input: str,
fp_output: str,
subtitles: list,
fontsize: int = 50,
font_name: str = "Arial",
color: str = 'yellow'
):
from moviepy.editor import VideoFileClip, TextClip, CompositeVideoClip
r"""
Function to add subtitles to a video.
Args:
fp_input (str): File path of the input video.
fp_output (str): File path of the output video with subtitles.
subtitles (list): List of dictionaries containing subtitle information
(start, duration, text). Example:
subtitles = [
{"start": 1, "duration": 3, "text": "hello test"},
{"start": 4, "duration": 2, "text": "this works"},
]
fontsize (int): Font size of the subtitles.
font_name (str): Font name of the subtitles.
color (str): Color of the subtitles.
"""
# Check if the input file exists
if not os.path.isfile(fp_input):
raise FileNotFoundError(f"Input file not found: {fp_input}")
# Check the subtitles format and sort them by the start time
time_points = []
for subtitle in subtitles:
if not isinstance(subtitle, dict):
raise ValueError("Each subtitle must be a dictionary containing 'start', 'duration' and 'text'.")
if not all(key in subtitle for key in ["start", "duration", "text"]):
raise ValueError("Each subtitle dictionary must contain 'start', 'duration' and 'text'.")
if subtitle['start'] < 0 or subtitle['duration'] <= 0:
raise ValueError("'start' should be non-negative and 'duration' should be positive.")
time_points.append((subtitle['start'], subtitle['start'] + subtitle['duration']))
# Check for overlaps
time_points.sort()
for i in range(1, len(time_points)):
if time_points[i][0] < time_points[i - 1][1]:
raise ValueError("Subtitle time intervals should not overlap.")
# Load the video clip
video = VideoFileClip(fp_input)
# Create a list to store subtitle clips
subtitle_clips = []
# Loop through the subtitle information and create TextClip for each
for subtitle in subtitles:
text_clip = TextClip(subtitle["text"], fontsize=fontsize, color=color, font=font_name)
text_clip = text_clip.set_position(('center', 'bottom')).set_start(subtitle["start"]).set_duration(subtitle["duration"])
subtitle_clips.append(text_clip)
# Overlay the subtitles on the video
video = CompositeVideoClip([video] + subtitle_clips)
# Write the final clip to a new file
video.write_videofile(fp_output)
class MovieReader():
r"""
Class to read in a movie.
"""
def __init__(self, fp_movie):
self.video_player_object = cv2.VideoCapture(fp_movie)
self.nmb_frames = int(self.video_player_object.get(cv2.CAP_PROP_FRAME_COUNT))
self.fps_movie = int(self.video_player_object.get(cv2.CAP_PROP_FPS))
self.shape = [100, 100, 3]
self.shape_is_set = False
def get_next_frame(self):
success, image = self.video_player_object.read()
if success:
if not self.shape_is_set:
self.shape_is_set = True
self.shape = image.shape
return image
else:
return np.zeros(self.shape)
if __name__ == "__main__":
fps = 2
list_fp_movies = []
for k in range(4):
fp_movie = f"/tmp/my_random_movie_{k}.mp4"
list_fp_movies.append(fp_movie)
ms = MovieSaver(fp_movie, fps=fps)
for fn in tqdm(range(30)):
img = (np.random.rand(512, 1024, 3) * 255).astype(np.uint8)
ms.write_frame(img)
ms.finalize()
fp_final = "/tmp/my_concatenated_movie.mp4"
concatenate_movies(fp_final, list_fp_movies)

View File

@@ -1,6 +1,6 @@
lpips==0.1.4 lpips==0.1.4
opencv-python opencv-python
ffmpeg-python
diffusers==0.25.0 diffusers==0.25.0
transformers transformers
pytest pytest
accelerate

View File

@@ -6,14 +6,14 @@ with open('requirements.txt') as f:
setup( setup(
name='latentblending', name='latentblending',
version='0.2', version='0.3',
url='https://github.com/lunarring/latentblending', url='https://github.com/lunarring/latentblending',
description='Butter-smooth video transitions', description='Butter-smooth video transitions',
long_description=open('README.md').read(), long_description=open('README.md').read(),
install_requires=required, install_requires=[
dependency_links=[ 'lunar_tools @ git+https://github.com/lunarring/lunar_tools.git#egg=lunar_tools'
'git+https://github.com/lunarring/lunar_tools#egg=lunar_tools' ] + required,
],
include_package_data=False, include_package_data=False,
) )