AI & Automation · ComfyUI

Use the LTX 2.3 All-in-One Workflow in ComfyUI (04/27/26)

This guide turns FutuTek’s LTX 2.3 walkthrough into a practical setup-and-run reference. You will load the required models, choose one of six workflow modes, provide the right input, and run cinematic AI video generations from one ComfyUI workflow.

Difficulty: Intermediate Hardware: 12GB+ VRAM recommended System RAM: 16GB+ Status: Published

Outcome

By the end, you should have a working LTX 2.3 all-in-one ComfyUI workflow that can produce:

1. Text-to-video

Generate a cinematic clip from a written scene prompt.

2. Audio-to-video

Use an audio clip plus a visual prompt to generate a video around the sound.

3. Image-to-video

Animate a source image using a prompt and optional dialogue.

4. Lipsync

Combine a starting frame, an audio track, and a performance prompt.

5. First/last-frame animation

Create a transition between a beginning image and ending image.

6. Video-to-video motion transfer

Transfer motion from a source video onto a new character or subject.

Recommended path: start with text-to-video at a modest resolution and short duration. Once the workflow and model paths are proven, move to image, audio, lipsync, and motion-transfer modes.

Prerequisites

  • ComfyUI installed and updated. The video does not cover ComfyUI installation, so update ComfyUI first.
  • GPU with 12GB VRAM or higher. Use the lower-VRAM GGUF option if you are near the 12GB floor.
  • 16GB system RAM or more. More RAM gives ComfyUI more room when offloading or swapping.
  • Enough disk space for model files. LTX 2.3 components are large; keep models on a fast SSD if possible.
  • ComfyUI Manager is strongly recommended. It makes missing-node fixes less miserable, which is the entire point of computers: slightly different misery.
Licensing and safety: check the license terms for each model/workflow source before commercial use. Keep generated likenesses and voices consent-based, especially for lipsync and motion-transfer projects.

Required files and destinations

The video description lists these model/resources. Put each file in the matching ComfyUI folder, then restart ComfyUI or refresh model lists.

WhatDestinationNotes
All-in-one workflow/resourcesImport into ComfyUIDownload from the creator’s Google Drive folder, then drag the workflow JSON into the ComfyUI canvas.
Low-VRAM LTX 2.3 GGUF modelComfyUI/models/diffusion_models/Use for 12GB-class GPUs.
FP8 scaled LTX 2.3 modelComfyUI/models/diffusion_models/Use when you have more VRAM, typically 16GB+.
Gemma text encoderComfyUI/models/text_encoders/Needed for LTX 2.3 prompt understanding.
LTX 2.3 text projectionComfyUI/models/text_encoders/Keep the filename unchanged.
Audio VAE and video VAEComfyUI/models/vae/Required for audio/video decoding in the workflow.
Tiny VAE preview fileWorkflow-dependent preview VAE locationOptional/recommended for faster previews; follow the file notes from the source page.
Spatial upscalerComfyUI/models/latent_upscale_models/Use the ltx-2.3-spatial-upscaler-x2-1.0.safetensors file.
Distilled LoRA and camera movement LoRAsComfyUI/models/loras/Used for speed and camera moves such as dolly-in effects.

Setup path

  1. Update ComfyUI. If a workflow opens with missing LTX nodes, update ComfyUI and any custom nodes first.
  2. Download the workflow and model files. Keep the filenames unchanged so existing loader nodes can find them more easily.
  3. Place models in the exact folders listed above. Do not put diffusion models in checkpoints unless your specific workflow loader asks for that.
  4. Restart ComfyUI. A browser refresh is not always enough after adding new model files or nodes.
  5. Import the workflow JSON. Drag the workflow onto the ComfyUI canvas or use ComfyUI’s workflow import option.
  6. Select the mode number. The all-in-one workflow uses a settings/control area where you choose option 1 through 6.
  7. Adjust duration and resolution conservatively. Start short and moderate. Increase length, resolution, or upscale only after the first run succeeds.
  8. Add the required input. Depending on the mode, provide a prompt, image, audio file, start/end frames, source video, or a combination.
  9. Click Run. The workflow automatically routes to the correct group/sampler for the chosen mode.

Six workflow modes

Option 1: Text-to-video

Use this for a full scene generated from text. In the video, the creator sets length and resolution, then prompts a blonde woman and little boy on a train traveling through old countryside. The workflow selects the text-to-video group automatically.

A blonde woman and a little boy riding a vintage train through an old countryside village, cinematic lighting, realistic faces, gentle camera movement, warm nostalgic mood.

Option 2: Audio-to-video

Load an audio clip into the Load Audio node, then describe the visual subject and setting. The video example uses a YouTuber standing in a park introducing LTX.

A tech YouTuber standing in a green park, speaking to camera, natural daylight, realistic facial expression, subtle handheld camera movement.

Option 3: Image-to-video

Upload a source image and write a prompt that tells the model what movement and performance to add. Keep the prompt aligned with the source image so the identity, clothing, and setting do not drift too far.

A YouTuber standing on a rainy street, speaking to camera, raindrops visible on jacket, neon reflections, subtle head movement and natural gestures.

Option 4: Lipsync

Set a short duration first, load the starting frame, choose an audio track, and optionally select a camera-movement LoRA such as Dolly In. The video uses a 12-second length and lower resolution for faster processing.

A singer performing emotionally on stage, close-up portrait, dolly-in camera movement, expressive eyes, realistic mouth movement synchronized to the song.

Option 5: First and last frame animation

Load a beginning image and ending image. Use this when you need a controlled transition, then stitch multiple segments together into a longer scene.

A cinematic walk through a flower field, camera moving forward, soft wind through flowers, dreamy golden-hour lighting, smooth transition from the first frame to the final frame.

Option 6: Video-to-video motion transfer

Load a source video for motion, then define the new starting frame/character and optional prompt. The video example transfers a Latin dance motion to Wonder Woman and Iron Man.

Wonder Woman and Iron Man dancing together, dynamic Latin dance movement, cinematic lighting, playful energy, realistic body motion.

Prompt and configuration patterns

  • Describe subject + setting + motion + camera + mood. LTX 2.3 responds better when movement is explicit.
  • Keep dialogue/audio modes visually simple. For lipsync and audio-to-video, avoid huge action scenes until sync quality is proven.
  • Use shorter tests. A 6–12 second test is easier to troubleshoot than a giant render that fails at the finish line like a dramatic little goblin.
  • Use first/last frames for story control. This is the cleanest path when you want multiple consistent segments for a longer video.
  • Lower resolution before lowering expectations. If you hit VRAM issues, reduce resolution, duration, or upscale settings before changing the whole workflow.

Success checks

A good first run: the workflow starts without missing-node errors, loads the selected model files, routes to the correct mode group, and produces a playable video output.
  • The chosen option number activates the expected group: text, audio, image, lipsync, frame transition, or motion transfer.
  • The queue starts instead of failing immediately on missing models.
  • Preview frames appear during generation or the final video appears in the output node/folder.
  • The output roughly follows the prompt and input media.
  • For lipsync, mouth movement follows the audio closely enough before you increase resolution or duration.

Troubleshooting

ProblemMost likely causeFix
Missing nodes when opening the workflowComfyUI or required custom nodes are outdatedUpdate ComfyUI, update custom nodes through ComfyUI Manager, restart, then reload the workflow.
Model not foundFile is in the wrong folder or renamedMove the file to the destination shown in this guide and keep the downloaded filename unchanged.
CUDA out of memory / VRAM failureResolution, length, or model choice is too heavyUse the 12GB GGUF model, lower resolution, shorten duration, reduce upscale use, and close other GPU-heavy apps.
Audio-to-video runs but the scene is wrongPrompt does not describe the visual subject clearlyDescribe who is speaking, where they are, camera angle, lighting, and the kind of motion you expect.
Lipsync looks offDuration, audio, or face framing is mismatchedUse a clear front-facing image, clean audio, shorter clips, and avoid extreme head turns on the first pass.
Motion transfer identity driftsSource motion is too complex or the target frame is ambiguousUse a clearer target frame, simpler source video, and a prompt that reinforces the subject’s outfit and environment.

Sources