Comparative Performance Analysis of Text-to-Video Models Across Workflow Stages and Quality Dimensions
DOI:
https://doi.org/10.51903/k8v16h58Keywords:
Artificial Intelligence, Evaluation Framework, Generative AI, Text-to-Video Models, Video ProductionAbstract
The rapid advancement of text-to-video (T2V) generative artificial intelligence has transformed digital content creation, yet a structured evaluation framework aligned with real-world production workflows remains absent. Traditional metrics correlate poorly with human perception and practical deployment needs. This study proposes the Production-Pipeline Evaluation Framework (PPEF) to assess leading T2V models across three workflow stages: pre-production, generation, and post-production. We evaluated six prominent frameworks (Sora, Runway Gen-3, Pika 2.0, CogVideoX-5B, HunyuanVideo, and Open-Sora 2.0) using a dataset of 300 production-oriented prompts. Performance was measured utilizing nine multi-dimensional metrics, encompassing automated standards, novel production-based indicators, and human perceptual evaluation (N=30). Results indicate that only closed-source models Sora (PPEF=0.742) and Runway Gen-3 (0.718) surpassed the production-ready threshold of 0.70. Among open-source alternatives, HunyuanVideo (0.685) demonstrated the strongest overall profile. Crucially, the composite PPEF score demonstrated a high correlation with human perception (Spearman ρ=0.847), significantly outperforming traditional automated metrics. The integration of production-based metrics revealed specific deployment advantages, such as Pika 2.0's generation speed and Runway Gen-3's post-production editability. These findings are synthesized into an IT Implementation Matrix, providing practitioners and organizations with structured, evidence-based guidance for selecting and deploying generative AI video tools based on technical maturity, budget, and specific workflow requirements.
References
[1] W. Kong et al., “HunyuanVideo: A Systematic Framework For Large Video Generative Models,” ArXiv, vol. abs/2412.03603, doi: 10.48550/ARXIV.2412.03603.
[2] O. Weerakoon, V. Leppänen, and T. Mäkilä, “Enhancing Pedagogy with Generative AI: Video Production from Course Descriptions,” ACM Int. Conf. Proceeding Ser., vol. 24, pp. 249–255, Jun. 2024, doi: 10.1145/3674912.3674922
[3] Y. Chen et al., “CinePreGen: Camera Controllable Video Previsualization via Engine-powered Diffusion,” vol. 1, 2024, doi: 10.48550/arXiv.2408.17424.
[4] J. Xing et al., “Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance,” 2023, Accessed: Apr. 03, 2026. [Online]. Available: https://doubiiu.github.io/projects/Make-Your-Video
[5] D. J. Zhang et al., “Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation”.
[6] Y. Liu et al., “EvalCrafter: Benchmarking and Evaluating Large Video Generation Models”, Accessed: Apr. 03, 2026. [Online]. Available: http://evalcrafter.github.io
[7] Z. Zhang, W. Sun, and G. Zhai, “A Perspective on Quality Evaluation for AI-Generated Videos,” Sensors 2025, Vol. 25, Page 4668, vol. 25, no. 15, p. 4668, Jul. 2025, doi: 10.3390/S25154668.
[8] K. Huang et al., “FilMaster: Bridging Cinematic Principles and Generative AI for Automated Film Generation,” Jun. 2025, Accessed: Apr. 03, 2026. [Online]. Available: https://arxiv.org/pdf/2506.18899
[9] Y. Liu et al., “FETV: A Benchmark for Fine-Grained Evaluation of Open-Domain Text-to-Video Generation”, Accessed: Apr. 03, 2026. [Online]. Available: https://github.com/llyx97/FETV.
[10] R. Zhang et al., “Generative AI for Film Creation: A Survey of Recent Advances”.
[11] J. Wang et al., “AesopAgent: Agent-driven Evolutionary System on Story-to-Video Production,” ArXiv, vol. abs/2403.07952, doi: 10.48550/ARXIV.2403.07952.
[12] D. Jiang et al., “GenAI Arena: An Open Evaluation Platform for Generative Models,” 2024, Accessed: Apr. 03, 2026. [Online]. Available: https://hf.co/spaces/TIGER-Lab/GenAI-Arena
[13] S. Ge, A. Mahapatra, G. Parmar, J.-Y. Zhu, and J.-B. Huang, “On the Content Bias in Fréchet Video Distance”, Accessed: Apr. 03, 2026. [Online]. Available: https://content-debiased-fvd.github.io/
[14] L. Cao and J. Dong, “Generative AI for 3D Film and Animation Modelling: Pathways, Workflows, and Emerging Standards,” Asian Res. J. Arts Soc. Sci., vol. 23, no. 11, pp. 109–118, Nov. 2025, doi: 10.9734/ARJASS/2025/V23I11832.
[15] Y. Wang et al., “LAVIE: HIGH-QUALITY VIDEO GENERATION WITH CASCADED LATENT DIFFUSION MODELS,” 2023, Accessed: Apr. 03, 2026. [Online]. Available: https://vchitect.github.io/LaVie-project/.
[16] V. De Masi, Q. Di, S. Li, and Y. Song, “Design Principles for AI-Assisted Filmmaking: Lessons from ‘Our T2 Remake’ and Beyond,” Contemp. Vis. Cult. Art, vol. 1, no. 1, pp. 1–22, Jul. 2025, doi: 10.63385/CVCA.V1I1.60.
[17] W. Wang et al., “MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation”, Accessed: Apr. 03, 2026. [Online]. Available: https://magicvideov2.github.io/
[18] G. Ya, O. Luo, G. M. Favero, Z. H. Luo, A. Jolicoeur-Martineau, and C. Pal, “BEYOND FVD: ENHANCED EVALUATION METRICS FOR VIDEO GENERATION QUALITY”, Accessed: Apr. 03, 2026. [Online]. Available: https://oooolga.github.io/JEDi.github.io/;
[19] M. Liao et al., “Evaluation of Text-to-Video Generation Models: A Dynamics Perspective”.
[20] A. Rakheja, A. Ashdhir, A. Bhattacharjee, and V. Sharma, “World Consistency Score: A Unified Metric for Video Generation Quality,” 2025.
[21] X. Liu, X. Xiang, and Z. Li, “A Survey of AI-Generated Video Evaluation,” vol. 1, doi: https://doi.org/10.48550/arXiv.2410.19884.
[22] J. Cheng et al., “VPO: Aligning Text-to-Video Generation Models with Prompt Optimization”, Accessed: Apr. 03, 2026. [Online]. Available: https://github.com/thu-coai/VPO.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Januar Tito Bagaskoro, Muhammad Sholikhan (Author)

This work is licensed under a Creative Commons Attribution 4.0 International License.



