Alibaba

Wan 2.6

Name: Wan 2.6
Brand: Alibaba

Alibaba's Wan2.6 series delivers multi-shot storytelling, native audio-visual synchronization, and reference-to-video generation up to 15 seconds at 1080p.

30クレジットから

1080p

今すぐ試すクレジット料金

Wan 2.6でできること

Multi-Shot Narratives

AI-planned shot transitions maintain consistent characters, environments, and lighting across multiple camera angles in a single generation.

Native Audio Sync

Built-in audio-visual synchronization supports automatic dubbing or custom audio files — no separate dubbing step required.

Reference-to-Video

Upload a character reference video to preserve appearance and voice across new scenes with clone-level consistency.

サンプルギャラリー

About Wan 2.6

Wan 2.6 is Alibaba's flagship video generation series, unveiled on December 16, 2025 as part of Tongyi Lab's multimodal AI program. It is the most capable version in the Wan family, built around three generation modes: text-to-video (wan2.6-t2v), image-to-video (wan2.6-i2v), and a dedicated reference-to-video (wan2.6-r2v) model. The defining upgrade over earlier versions is an intelligent multi-shot narrative system combined with native audio-visual synchronization — meaning a single generation pass can produce a coherent sequence of shots, complete with soundtrack, dialogue, and sound effects, without requiring manual editing or dubbing.

How Wan 2.6 Works

The model is built on a diffusion transformer architecture that understands spatial relationships, temporal continuity, and multimodal inputs simultaneously. In text-to-video mode you describe the scene — including shot transitions such as "wide establishing shot → close-up reaction → slow zoom out" — and the model plans camera angles and maintains character and environment consistency automatically. In image-to-video mode, a reference image anchors the visual identity of the first frame before the model animates forward. In reference-to-video mode, Alibaba's documentation describes Wan2.6-R2V as using a character reference video (appearance + voice) combined with text prompts to generate entirely new scenes starring that subject. The API enforces a 1,500-character prompt limit for R2V to ensure prompt detail does not compromise consistency.

Wan 2.6 vs Earlier Versions

Version	Max Duration	Max Resolution	Audio	Multi-Shot
Wan 2.1	5 seconds	720P	No	No
Wan 2.2	5 seconds	720P	No	No
Wan 2.5	10 seconds	1080P	Basic	No
Wan 2.6	15 seconds	1080P	Native A/V sync	Yes

Wan 2.6 is the first version in the series to support multi-shot continuity as an explicitly documented API feature. The reference-to-video mode (R2V) is exclusive to Wan 2.6 — earlier versions have no equivalent. Audio quality has also improved substantially over Wan 2.5, with more natural-sounding output, though complex voice realism still trails dedicated audio-first models.

Tips for Best Results

Structure prompts as shot lists: Describe transitions explicitly ("wide shot → close-up → reveal") to take full advantage of the multi-shot narrative engine.
Match duration to content complexity: Use 5s clips to test prompt and character consistency before committing to 10s or 15s generations.
Choose resolution by purpose: 720P reduces generation time for drafts; use 1080P at 24fps for final delivery or client-facing content.
Reference-to-video prompts: Stay within the 1,500-character limit and focus on performance beats, audio cues, and scene atmosphere rather than exhaustive visual description.
Know the limitations: Complex multi-character action sequences and anime-style requests tend to produce more artifacts than realistic single-character scenes.

技術仕様

最大解像度1080p

最大時間15 seconds

アスペクト比16:9, 9:16, 1:1, 4:3

出力形式MP4

Model Variants

Wan 2.6

text to video

Wan 2.6 I2V

image to video

クレジット料金

Variant	クレジット	Duration
Wan 2.6 T2V	30	5s
Wan 2.6 I2V	30	5s

1クレジット = $0.012

ユースケース

Social Media Content

Create polished short-form videos for TikTok, Instagram Reels, and YouTube Shorts with multi-shot continuity.

Brand Marketing

Produce professional marketing videos featuring narration, product showcases, and synchronized audio in a single pass.

Character-Driven Stories

Use reference-to-video to cast a recurring character and direct new scenes while preserving their look and voice.

E-commerce Demos

Generate product videos from unboxing to usage with synchronized audio descriptions and cinematic camera work.

類似モデル

Premium

video

Google

Veo 3.1

Google DeepMind's state-of-the-art video generation model featuring native audio synthesis, up to 4K resolution, and cinematic realism with advanced physics simulation.

text-to-videoimage-to-videohigh-quality

9クレジットから

New

video

Kuaishou

Kling 2.1

Kuaishou's cinematic AI video model powered by 3D spatiotemporal attention — delivering industry-leading physics simulation, hyper-realistic facial expressions, and up to 1080p output across Standard, Pro, and Master tiers.

text-to-videoimage-to-videoprofessional

11クレジットから

Popular

video

OpenAI

Sora 2

OpenAI's flagship video-and-audio generation model with advanced physics simulation, native synchronized audio, and multi-shot scene control — released September 30, 2025

text-to-videoimage-to-videocinematic

5クレジットから

Wan 2.6で作成する準備はできましたか？

Wan 2.6で素晴らしいコンテンツの作成を始めましょう

Wan 2.6を今すぐ試す

サンプルギャラリー

About Wan 2.6

How Wan 2.6 Works

Wan 2.6 vs Earlier Versions

Version	Max Duration	Max Resolution	Audio	Multi-Shot
Wan 2.1	5 seconds	720P	No	No
Wan 2.2	5 seconds	720P	No	No
Wan 2.5	10 seconds	1080P	Basic	No
Wan 2.6	15 seconds	1080P	Native A/V sync	Yes

Tips for Best Results

Structure prompts as shot lists: Describe transitions explicitly ("wide shot → close-up → reveal") to take full advantage of the multi-shot narrative engine.

Match duration to content complexity: Use 5s clips to test prompt and character consistency before committing to 10s or 15s generations.

Choose resolution by purpose: 720P reduces generation time for drafts; use 1080P at 24fps for final delivery or client-facing content.

Reference-to-video prompts: Stay within the 1,500-character limit and focus on performance beats, audio cues, and scene atmosphere rather than exhaustive visual description.

Know the limitations: Complex multi-character action sequences and anime-style requests tend to produce more artifacts than realistic single-character scenes.

Variant

クレジット

Duration

Wan 2.6 T2V

Wan 2.6 I2V