Our method maintains consistent subject identities across shots. VideoCrafter2 shows diverse motion but inconsistent characters, Tokenflow Encoder causes blurring, ConsiS Im2vid shows degraded motion, and inconsistent identities (see the different facial features). VSTAR Struggles with adhering to text prompts, but maintains good identity, and shows extensive non-specific motion.