Our method maintains consistent subject identities across shots and and follows the text prompts. VideoCrafter2 shows diverse motion but inconsistent characters, Tokenflow Encoder makes strong coloring artifacts, and causes blurring, ConsiS Im2vid shows degraded motion. VSTAR briefly shows the initial and final scenes while transitioning to the middle sequence, maintaining good identity and extensive motion.