Our method maintains consistent subject identities across shots and follows the text prompts. VideoCrafter2 shows diverse motion but inconsistent characters, Tokenflow Encoder mainly affects coloring, and ConsiS Im2vid introduces inconsistent identities. VSTAR struggles with prompt adherence, briefly displays initial and final scenes while transitioning to the middle sequence, yet maintains good identity and shows extensive motion.