PremiumVeo 3.1
Google DeepMind's state-of-the-art video generation model featuring native audio synthesis, up to 4K resolution, and cinematic realism with advanced physics simulation.
9クレジットから
AI-planned shot transitions maintain consistent characters, environments, and lighting across multiple camera angles in a single generation.
Built-in audio-visual synchronization supports automatic dubbing or custom audio files — no separate dubbing step required.
Upload a character reference video to preserve appearance and voice across new scenes with clone-level consistency.
Wan 2.6 is Alibaba's flagship video generation series, unveiled on December 16, 2025 as part of Tongyi Lab's multimodal AI program. It is the most capable version in the Wan family, built around three generation modes: text-to-video (wan2.6-t2v), image-to-video (wan2.6-i2v), and a dedicated reference-to-video (wan2.6-r2v) model. The defining upgrade over earlier versions is an intelligent multi-shot narrative system combined with native audio-visual synchronization — meaning a single generation pass can produce a coherent sequence of shots, complete with soundtrack, dialogue, and sound effects, without requiring manual editing or dubbing.
The model is built on a diffusion transformer architecture that understands spatial relationships, temporal continuity, and multimodal inputs simultaneously. In text-to-video mode you describe the scene — including shot transitions such as "wide establishing shot → close-up reaction → slow zoom out" — and the model plans camera angles and maintains character and environment consistency automatically. In image-to-video mode, a reference image anchors the visual identity of the first frame before the model animates forward. In reference-to-video mode, Alibaba's documentation describes Wan2.6-R2V as using a character reference video (appearance + voice) combined with text prompts to generate entirely new scenes starring that subject. The API enforces a 1,500-character prompt limit for R2V to ensure prompt detail does not compromise consistency.
| Version | Max Duration | Max Resolution | Audio | Multi-Shot |
|---|---|---|---|---|
| Wan 2.1 | 5 seconds | 720P | No | No |
| Wan 2.2 | 5 seconds | 720P | No | No |
| Wan 2.5 | 10 seconds | 1080P | Basic | No |
| Wan 2.6 | 15 seconds | 1080P | Native A/V sync | Yes |
Wan 2.6 is the first version in the series to support multi-shot continuity as an explicitly documented API feature. The reference-to-video mode (R2V) is exclusive to Wan 2.6 — earlier versions have no equivalent. Audio quality has also improved substantially over Wan 2.5, with more natural-sounding output, though complex voice realism still trails dedicated audio-first models.
| Variant | クレジット | Duration |
|---|---|---|
| Wan 2.6 T2V | 30 | 5s |
| Wan 2.6 I2V | 30 | 5s |
1クレジット = $0.012
Create polished short-form videos for TikTok, Instagram Reels, and YouTube Shorts with multi-shot continuity.
Produce professional marketing videos featuring narration, product showcases, and synchronized audio in a single pass.
Use reference-to-video to cast a recurring character and direct new scenes while preserving their look and voice.
Generate product videos from unboxing to usage with synchronized audio descriptions and cinematic camera work.
PremiumGoogle DeepMind's state-of-the-art video generation model featuring native audio synthesis, up to 4K resolution, and cinematic realism with advanced physics simulation.
9クレジットから
NewKuaishou
Kuaishou's cinematic AI video model powered by 3D spatiotemporal attention — delivering industry-leading physics simulation, hyper-realistic facial expressions, and up to 1080p output across Standard, Pro, and Master tiers.
11クレジットから
PopularOpenAI
OpenAI's flagship video-and-audio generation model with advanced physics simulation, native synchronized audio, and multi-shot scene control — released September 30, 2025
5クレジットから