Comparative Performance Analysis of Text-to-Video Models Across Workflow Stages and Quality Dimensions

Januar Tito Bagaskoro; Muhammad Sholikhan

doi:10.51903/k8v16h58

Authors

Januar Tito Bagaskoro Department of Design Communication Visual, Faculty of Academic Study. Universitas Sains dan Teknologi Komputer, Semarang, Indonesia, 50171 Author https://orcid.org/0009-0005-6240-8145
Muhammad Sholikhan Department of Design Communication Visual, Faculty of Academic Study. Universitas Sains dan Teknologi Komputer, Semarang, Indonesia, 50171 Author https://orcid.org/0000-0002-4638-3532

DOI:

https://doi.org/10.51903/k8v16h58

Keywords:

Artificial Intelligence, Evaluation Framework, Generative AI, Text-to-Video Models, Video Production

Abstract

The rapid advancement of text-to-video (T2V) generative artificial intelligence has transformed digital content creation, yet a structured evaluation framework aligned with real-world production workflows remains absent. Traditional metrics correlate poorly with human perception and practical deployment needs. This study proposes the Production-Pipeline Evaluation Framework (PPEF) to assess leading T2V models across three workflow stages: pre-production, generation, and post-production. We evaluated six prominent frameworks (Sora, Runway Gen-3, Pika 2.0, CogVideoX-5B, HunyuanVideo, and Open-Sora 2.0) using a dataset of 300 production-oriented prompts. Performance was measured utilizing nine multi-dimensional metrics, encompassing automated standards, novel production-based indicators, and human perceptual evaluation (N=30). Results indicate that only closed-source models Sora (PPEF=0.742) and Runway Gen-3 (0.718) surpassed the production-ready threshold of 0.70. Among open-source alternatives, HunyuanVideo (0.685) demonstrated the strongest overall profile. Crucially, the composite PPEF score demonstrated a high correlation with human perception (Spearman ρ=0.847), significantly outperforming traditional automated metrics. The integration of production-based metrics revealed specific deployment advantages, such as Pika 2.0's generation speed and Runway Gen-3's post-production editability. These findings are synthesized into an IT Implementation Matrix, providing practitioners and organizations with structured, evidence-based guidance for selecting and deploying generative AI video tools based on technical maturity, budget, and specific workflow requirements.

References

[1] W. Kong et al., “HunyuanVideo: A Systematic Framework For Large Video Generative Models,” ArXiv, vol. abs/2412.03603, doi: 10.48550/ARXIV.2412.03603.

[2] O. Weerakoon, V. Leppänen, and T. Mäkilä, “Enhancing Pedagogy with Generative AI: Video Production from Course Descriptions,” ACM Int. Conf. Proceeding Ser., vol. 24, pp. 249–255, Jun. 2024, doi: 10.1145/3674912.3674922

[3] Y. Chen et al., “CinePreGen: Camera Controllable Video Previsualization via Engine-powered Diffusion,” vol. 1, 2024, doi: 10.48550/arXiv.2408.17424.

[4] J. Xing et al., “Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance,” 2023, Accessed: Apr. 03, 2026. [Online]. Available: https://doubiiu.github.io/projects/Make-Your-Video

[5] D. J. Zhang et al., “Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation”.

[6] Y. Liu et al., “EvalCrafter: Benchmarking and Evaluating Large Video Generation Models”, Accessed: Apr. 03, 2026. [Online]. Available: http://evalcrafter.github.io

[7] Z. Zhang, W. Sun, and G. Zhai, “A Perspective on Quality Evaluation for AI-Generated Videos,” Sensors 2025, Vol. 25, Page 4668, vol. 25, no. 15, p. 4668, Jul. 2025, doi: 10.3390/S25154668.

[8] K. Huang et al., “FilMaster: Bridging Cinematic Principles and Generative AI for Automated Film Generation,” Jun. 2025, Accessed: Apr. 03, 2026. [Online]. Available: https://arxiv.org/pdf/2506.18899

[9] Y. Liu et al., “FETV: A Benchmark for Fine-Grained Evaluation of Open-Domain Text-to-Video Generation”, Accessed: Apr. 03, 2026. [Online]. Available: https://github.com/llyx97/FETV.

[10] R. Zhang et al., “Generative AI for Film Creation: A Survey of Recent Advances”.

[11] J. Wang et al., “AesopAgent: Agent-driven Evolutionary System on Story-to-Video Production,” ArXiv, vol. abs/2403.07952, doi: 10.48550/ARXIV.2403.07952.

[12] D. Jiang et al., “GenAI Arena: An Open Evaluation Platform for Generative Models,” 2024, Accessed: Apr. 03, 2026. [Online]. Available: https://hf.co/spaces/TIGER-Lab/GenAI-Arena

[13] S. Ge, A. Mahapatra, G. Parmar, J.-Y. Zhu, and J.-B. Huang, “On the Content Bias in Fréchet Video Distance”, Accessed: Apr. 03, 2026. [Online]. Available: https://content-debiased-fvd.github.io/

[14] L. Cao and J. Dong, “Generative AI for 3D Film and Animation Modelling: Pathways, Workflows, and Emerging Standards,” Asian Res. J. Arts Soc. Sci., vol. 23, no. 11, pp. 109–118, Nov. 2025, doi: 10.9734/ARJASS/2025/V23I11832.

[15] Y. Wang et al., “LAVIE: HIGH-QUALITY VIDEO GENERATION WITH CASCADED LATENT DIFFUSION MODELS,” 2023, Accessed: Apr. 03, 2026. [Online]. Available: https://vchitect.github.io/LaVie-project/.

[16] V. De Masi, Q. Di, S. Li, and Y. Song, “Design Principles for AI-Assisted Filmmaking: Lessons from ‘Our T2 Remake’ and Beyond,” Contemp. Vis. Cult. Art, vol. 1, no. 1, pp. 1–22, Jul. 2025, doi: 10.63385/CVCA.V1I1.60.

[17] W. Wang et al., “MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation”, Accessed: Apr. 03, 2026. [Online]. Available: https://magicvideov2.github.io/

[18] G. Ya, O. Luo, G. M. Favero, Z. H. Luo, A. Jolicoeur-Martineau, and C. Pal, “BEYOND FVD: ENHANCED EVALUATION METRICS FOR VIDEO GENERATION QUALITY”, Accessed: Apr. 03, 2026. [Online]. Available: https://oooolga.github.io/JEDi.github.io/;

[19] M. Liao et al., “Evaluation of Text-to-Video Generation Models: A Dynamics Perspective”.

[20] A. Rakheja, A. Ashdhir, A. Bhattacharjee, and V. Sharma, “World Consistency Score: A Unified Metric for Video Generation Quality,” 2025.

[21] X. Liu, X. Xiang, and Z. Li, “A Survey of AI-Generated Video Evaluation,” vol. 1, doi: https://doi.org/10.48550/arXiv.2410.19884.

[22] J. Cheng et al., “VPO: Aligning Text-to-Video Generation Models with Prompt Optimization”, Accessed: Apr. 03, 2026. [Online]. Available: https://github.com/thu-coai/VPO.

Comparative Performance Analysis of Text-to-Video Models Across Workflow Stages and Quality Dimensions

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

Sidebar

Editorial Team

Reviewer Team

Peer Review Process

Open Access Policy

Publication Ethics

DOI Policy

Journal License

Archive Policy

Repository Policy

Policy of Screening for Plagiarsm

Open Access Statement

Copyright Terms

Article Processing Charge (APC)