Key features
- Synchronized audio-video generation: Generates motion, dialogue, SFX, and music together in one pass
- Multiple generation modes: Text-to-video, image-to-video, and video-to-video
- Control options: Canny, Depth, and Pose video-to-video control via IC-LoRAs
- Keyframe-driven generation: Interpolate between keyframe images
- Native upscaling: Spatial (2x) and temporal (2x) upscalers for higher resolution and FPS
- Prompt enhancement: Automatic prompt enhancement support
Model checkpoints
| Name | Description |
|---|---|
| ltx-2-19b-dev | Full model in bf16, flexible and trainable |
| ltx-2-19b-dev-fp8 | Full model in fp8 quantization |
| ltx-2-19b-distilled | Distilled version, 8 steps, CFG=1 |
| ltx-2-spatial-upscaler-x2-1.0 | 2x spatial upscaler for higher resolution |
| ltx-2-temporal-upscaler-x2-1.0 | 2x temporal upscaler for higher FPS |
Getting started
LTX-2 is natively supported in ComfyUI. To get started:- Update ComfyUI to the latest version
- Go to Template Library > Video > choose any LTX-2 workflow
- Follow the pop-up to download models and run the workflow
Workflows
Text-to-video
Generate videos from text prompts. Distilled version (faster, 8 steps):Text-to-Video Distilled
Download workflow
Image-to-video
Generate videos from an input image. Distilled version (faster, 8 steps):Image-to-Video Distilled
Download workflow
Control-to-video
Generate videos with structural control using IC-LoRAs. Depth control: Canny control: Pose control:Prompting tips
When writing prompts for LTX-2, focus on detailed, chronological descriptions of actions and scenes. Include specific movements, appearances, camera angles, and environmental details in a single flowing paragraph. Start directly with the action and keep descriptions literal and precise. Structure your prompts with:- Main action in a single sentence
- Specific details about movements and gestures
- Character/object appearances
- Background and environment details
- Camera angles and movements
- Lighting and colors
- Any changes or sudden events
Resources
Limitations
- Not intended to provide factual information
- May amplify existing societal biases
- May fail to generate videos that match prompts perfectly
- Prompt following is heavily influenced by prompting style
- Audio without speech may be of lower quality