FlowLong

Videos are muted by default — click any video to unmute it (others auto-mute). Press Enter to pause or resume all videos.

30-Second Long-form Audio-Video Generation

30-second videos with synchronized audio, from LTX2.
Extending LTX2's base 5-second generation budget by ×6.

16-Second Audio-Video Samples

Additional 16-second samples with synchronized audio, from LTX2.
Extending LTX2's base 5-second generation budget by ×3.

30-Second Long Video Samples

30-second videos from HunyuanVideo-1.5.
Extending HunyuanVideo-1.5's base 5-second generation budget by ×6.

Comparison with Bidirectional Models

30-second generation against bidirectional long-video baselines (RIFLEx, UltraVico) on the same prompts.
RIFLEx is built on CogVideoX (5B); UltraVico and FlowLong (Ours) both use Wan2.1 (1.3B).

RIFLExCogVideoX (5B)

UltraVicoWan2.1 (1.3B)

FlowLong (Ours)Wan2.1 (1.3B)

Comparison with Autoregressive Models

30-second generation against autoregressive long-video baselines on 10 diverse prompts.
All baselines and FlowLong (Ours) are built on Wan2.1 (1.3B).

CausVid

Self-Forcing

Deep-Forcing

LongLive

Infinity-RoPE

FlowLong (Ours)

Prompts 1–4 of 10

Application: Long Text-to-3D Gaussian Splatting

FlowLong's long-generation idea applied to text-to-3DGS: VIST3A runs out of frames mid-trajectory while FlowLong (Ours) continues seamlessly.

VIST3A

FlowLong (Ours)

VIST3A

FlowLong (Ours)

Out of frames

FlowLong: Inference-time Long Video Generation via Manifold-constrained Tweedie Matching

FlowLong: Inference-time Long Video Generation
via Manifold-constrained Tweedie Matching