Detailed guide • ComfyUI • LTX 2.3

LTX 2.3 Character & Background Replacement in ComfyUI

Use FutuTek’s LTX 2.3 workflow to replace a person or character in an existing video from a single reference image, or keep the character and transform the background, while preserving dialogue, facial expression, body language, timing, and motion.

2 modesCharacter or background replacement
1 imageReference image for character swaps
Auto dialogueOriginal speech is injected into prompt
Motion transferIC LoRA + pose/segmentation support

Source video: How to Replace Characters or Backgrounds in Videos with LTX 2.3 by FutuTek (5:00).

1. What this workflow does

This workflow is a video-to-video editing system for ComfyUI built around LTX 2.3, Flux 2 Klein Edit, IC LoRA, SAM3 segmentation, and DWpose. The video describes two primary use cases:

Character Replacement

Swap the performer in the source video for a new character based on a single reference image. The goal is to preserve the original performance: facial expressions, eye movement, rhythm, body language, emotional delivery, and timing.

Background Replacement

Keep the original character but transform the environment around them. The workflow uses segmentation and pose guidance so the subject remains consistent while the scene becomes a newly generated animated environment.

Important: the video explicitly does not re-teach ComfyUI and LTX 2.3 installation. This guide maps the required files and operational steps, but assumes ComfyUI is already working.

2. Requirements and model files

Use the author’s workflow/resources folder first, then place the model files in the ComfyUI folders listed below. Exact filenames and node expectations can change, so restart ComfyUI and use Manager’s missing-node/model warnings to confirm.

ResourceDestinationPurposeLink
Workflow & ResourcesGoogle Drive workflow/resources folderImport workflow JSON/resources from the authorsource
Flux 2 Klein Edit/ComfyUI/models/diffusion_modelsFirst-frame image editing for character/background integrationsource
Qwen text encoder for Flux Klein/ComfyUI/models/text_encodersText encoder used by Flux Klein subgraphsource
Flux2 VAE/ComfyUI/models/vaeVAE for Flux Kleinsource
MelBand RoFormer audio model/ComfyUI/models/diffusion_modelsAudio/dialogue extraction or separation supportsource
LTX 2.3 low-VRAM GGUF/ComfyUI/models/diffusion_models12 GB VRAM path / quantized modelsource
LTX 2.3 FP8 transformer/ComfyUI/models/diffusion_models16 GB+ VRAM pathsource
Gemma 3 text encoder/ComfyUI/models/text_encodersLTX 2.3 text encodersource
LTX text projection/ComfyUI/models/text_encodersLTX 2.3 text projectionsource
LTX audio VAE/ComfyUI/models/vaeAudio VAE for LTX 2.3source
LTX video VAE/ComfyUI/models/vaeVideo VAE for LTX 2.3source
Tiny VAE preview/ComfyUI/models/vae or preview-related folderFaster previews; verify workflow expected pathsource
Spatial upscaler/ComfyUI/models/latent_upscale_models2x latent spatial upscalingsource
Distilled LoRA/ComfyUI/models/lorasFaster/distilled LTX generationsource
Camera movement LoRAs/ComfyUI/models/lorasOptional movement/control LoRAssource
VRAM choice: the description identifies a low-VRAM LTX 2.3 GGUF path for about 12 GB VRAM and an FP8 transformer path for 16 GB+ VRAM. Use one path consistently with the workflow variant you load.

3. Installation map

Download the workflow/resources

Get the workflow JSON and any custom resource files from the Google Drive folder. Keep an untouched backup of the original workflow before editing node paths.

Place model files in their ComfyUI folders

Put diffusion models in ComfyUI/models/diffusion_models, text encoders in ComfyUI/models/text_encoders, VAEs in ComfyUI/models/vae, LoRAs in ComfyUI/models/loras, and the latent upscaler in ComfyUI/models/latent_upscale_models.

Install/update custom nodes

The workflow references SAM3 segmentation, DWpose, LTX 2.3 nodes, Flux/Klein edit nodes, audio extraction/separation support, and likely Kijai/Lightricks-related ComfyUI node packs. Use ComfyUI Manager to install missing nodes after loading the workflow.

Restart and reload

Restart ComfyUI after model placement. Then load the workflow and resolve red nodes or missing model dropdown entries before running a full video job.

4. Settings section

The video’s Settings section controls the workflow’s operating mode and output constraints.

SettingWhat to choosePractical guidance
ModeCharacter Replacement or Background ReplacementChoose character replacement when the person/subject changes; choose background replacement when the original subject remains.
Video lengthLength of the generated outputStart short for testing. Long clips multiply VRAM/time/failure risk.
Starting frameFirst source frame used for alignmentPick a clear frame that represents the subject and lighting well.
FPSOutput frame rateUse modest FPS for tests; increase after visual stability is proven.
ResolutionOutput dimensionsUse lower resolution while debugging, then upscale or increase once settings work.
Keep original voice audiotrue or falsetrue preserves original voice; false lets LTX generate a more natural voice for the new character according to the video.

5. Prompting strategy

The workflow is designed so you do not manually type dialogue. You write the character, environment, and action. The workflow reads the original video dialogue and injects it into the final prompt.

Initial prompt should include

  • Who/what the character is.
  • Clothing, style, era, realism level.
  • Environment and lighting.
  • Action/emotion matching the source performance.
  • Camera feel: cinematic, handheld, close-up, interview, etc.

Avoid overloading it with

  • Manual dialogue already present in the source video.
  • Contradictory motion instructions.
  • Too many character details that fight the reference image.
  • Background details in Character Replacement mode unless needed.
Example character prompt:
A realistic cinematic female astronaut commander in a white EVA suit, expressive face, natural skin texture, matching the original actor's emotion and timing, standing in the same lighting and camera angle.

Example background prompt:
The same speaker stands inside a neon rain-soaked cyberpunk alley, cinematic reflections, moody blue and magenta light, realistic atmosphere, preserve the original body motion and expression.

6. Character Replacement workflow

  1. Prepare the source video. Use a clip where the performer is visible, motion is readable, and the first frame is clean.
  2. Choose Character Replacement. Set length, start frame, FPS, resolution, and voice behavior.
  3. Add the reference image. Use a clear single image of the new character. The closer the framing and lighting are to the first frame, the easier the edit.
  4. Write the initial prompt. Describe the new character and how they should belong in the scene.
  5. Run the first-frame generation. Flux 2 Klein Edit creates the first output frame. This frame must match the source first frame almost perfectly except for the replaced character.
  6. Inspect the first frame before continuing. If the character, lighting, scale, or pose is wrong, restart/tune before waiting for a full run.
  7. Let dialogue injection happen. The workflow extracts source dialogue and combines it into the final prompt.
  8. Generate the output. IC LoRA transfers the original motion/performance onto the new character.
Critical success factor: first-frame alignment. The transcript emphasizes that IC LoRA depends on the first output frame matching the first input frame almost perfectly, aside from the intended replacement.

7. Background Replacement workflow

  1. Choose Background Replacement. Keep the original character and change the environment.
  2. Usually keep original voice audio. The video keeps original voice in the background example because the character remains the same.
  3. Write an environment-focused prompt. Describe the new world, lighting, mood, materials, and cinematic atmosphere.
  4. Run first-frame background edit. Flux 2 Klein Edit replaces the background in the first input frame.
  5. Use segmentation and pose isolation. SAM3 isolates the main character; DWpose creates a clean pose reference. This helps prevent random extra characters from appearing in the generated background.
  6. Generate animated environment. IC LoRA preserves the original movements while integrating the subject into the new scene.
Best use case: interviews, monologues, short acting clips, or performance shots where the person should stay recognizable but the world around them should change dramatically.

8. Quality control before publishing a result

First frame

  • Subject scale matches source.
  • Lighting direction matches.
  • Hands/face are not distorted.
  • Replacement area blends naturally.

Motion

  • Eyes track naturally.
  • Facial expression follows source.
  • Body rhythm is preserved.
  • No extra limbs/characters appear.

Audio/dialogue

  • Dialogue extraction is correct.
  • Prompt injection did not hallucinate lines.
  • Voice choice matches intent.
  • Lip timing is acceptable.

9. Troubleshooting

ProblemLikely causeFix
Missing red nodesCustom nodes not installedUse ComfyUI Manager to install missing node packs, restart, reload workflow.
Model dropdown blankFile in wrong folder or not restartedVerify exact folder, filename, and restart ComfyUI.
Output drifts from source motionFirst frame mismatch or weak pose/motion guidanceRegenerate first frame closer to input; lower ambition; use cleaner source clip.
Random extra people in background modeSegmentation/pose not isolating main subjectCheck SAM3 mask and DWpose output; use a clearer source frame; simplify background prompt.
Character does not match referenceReference image conflicts with source pose/lightingUse a reference with similar angle, face visibility, and lighting; simplify prompt.
Out of memoryResolution/FPS/length/model too highUse low-VRAM GGUF path, shorter clips, lower resolution, fewer frames, or FP8 variants.
Dialogue wrongAudio extraction misread sourceUse cleaner audio; manually inspect final prompt; optionally transcribe externally and paste corrected dialogue if workflow allows.

10. Run checklist

  • ☐ ComfyUI launches cleanly and workflow loads without missing nodes.
  • ☐ Correct LTX model path chosen for VRAM level.
  • ☐ Flux 2 Klein Edit model, Qwen encoder, and Flux VAE are installed.
  • ☐ LTX text encoders, projection, video/audio VAEs, LoRAs, and upscaler are in place.
  • ☐ Source video is short, clean, and has a strong first frame.
  • ☐ Character reference image is clear and compatible with the source angle.
  • ☐ Mode is set correctly: character vs. background replacement.
  • ☐ First generated frame is inspected before full generation.
  • ☐ Final prompt includes extracted dialogue correctly.
  • ☐ Output is reviewed for motion, identity consistency, artifacts, and audio timing.

11. Sources

Guide created 2026-05-22 from video transcript, metadata, and the resource links supplied in the video description.