• Faculty Resource Center
  • Biochemistry
  • Bioengineering
  • Cancer Research
  • Developmental Biology
  • Engineering
  • Environment
  • Immunology and Infection
  • Neuroscience
  • JoVE Journal
  • JoVE Encyclopedia of Experiments
  • JoVE Chrome Extension
  • Environmental Sciences
  • Pharmacology
  • JoVE Science Education
  • JoVE Lab Manual
  • JoVE Business
  • Videos Mapped to your Course
  • High Schools
  • Videos Mapped to Your Course

JoVE Logo

Accelerate your science research and education

18,000+ videos of laboratory methods and science concepts, see what scientists say​, 1,000+ universities, colleges and biopharma institutional subscribers, morven a. cameron, western sydney university, edwin s monuki, university of california, irvine, delphine dean, clemson university, donna gibson, memorial sloan kettering cancer center.

video research papers

Electromagnetic Source Imaging in Presurgical Evaluation of Children with Drug-Resistant Epilepsy

Institution.

video research papers

Simultaneous Magnetoencephalography (MEG) and Electroencephalography (EEG) Recording in Children with Drug-Resistant Epilepsy

video research papers

Acanthamoeba Culture and Preparation for Microscopic Motility Analysis

video research papers

Visualization and Amoeba Size Analysis of Acanthamoeba species

video research papers

Quantification of Acanthamoeba spp. Motility

video research papers

Synthesis of Zn 2 GeO 4 : Mn Persistent Luminescent Nanoparticles (PLNPs)

Simple Hit Counter

IEEE Account

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

Subscribe to the PwC Newsletter

Join the community, add a new evaluation result row, text-to-video generation.

65 papers with code • 6 benchmarks • 9 datasets

This task refers to video generation based on a given sentence or sequence of words.

Benchmarks Add a Result

--> --> --> --> --> --> -->
Trend Dataset Best ModelPaper Code Compare
Snap Video (512x288)
REGIS-Fuse (Finetuning, 128x128)
VideoCrafter2
NUWA (128×128)
MAGVIT
VideoFactory

video research papers

Most implemented papers

Modelscope text-to-video technical report.

video research papers

This paper introduces ModelScopeT2V, a text-to-video synthesis model that evolves from a text-to-image synthesis model (i. e., Stable Diffusion).

VideoComposer: Compositional Video Synthesis with Motion Controllability

The pursuit of controllability as a higher standard of visual content creation has yielded remarkable progress in customizable image synthesis.

Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation

To replicate the success of text-to-image (T2I) generation, recent works employ large-scale video datasets to train a text-to-video (T2V) generator.

Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models

We first pre-train an LDM on images only; then, we turn the image generator into a video generator by introducing a temporal dimension to the latent space diffusion model and fine-tuning on encoded image sequences, i. e., videos.

VideoCrafter1: Open Diffusion Models for High-Quality Video Generation

The I2V model is designed to produce videos that strictly adhere to the content of the provided reference image, preserving its content, structure, and style.

Latte: Latent Diffusion Transformer for Video Generation

We propose a novel Latent Diffusion Transformer, namely Latte, for video generation.

MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and GENeration

Altogether, MUGEN can help progress research in many tasks in multimodal understanding and generation.

Make-A-Video: Text-to-Video Generation without Text-Video Data

We propose Make-A-Video -- an approach for directly translating the tremendous recent progress in Text-to-Image (T2I) generation to Text-to-Video (T2V).

Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free Videos

Generating text-editable and pose-controllable character videos have an imperious demand in creating various digital human.

Free-Bloom: Zero-Shot Text-to-Video Generator with LLM Director and LDM Animator

Text-to-video is a rapidly growing research area that aims to generate a semantic, identical, and temporal coherence sequence of frames that accurately align with the input text prompt.

A survey of recent work on video summarization: approaches and techniques

  • Published: 15 May 2021
  • Volume 80 , pages 27187–27221, ( 2021 )

Cite this article

video research papers

  • Vasudha Tiwari   ORCID: orcid.org/0000-0002-0087-3069 1 &
  • Charul Bhatnagar 1  

2259 Accesses

33 Citations

Explore all metrics

The volume of video data generated has seen an exponential growth over the years and video summarization has emerged as a process that can facilitate efficient storage, quick browsing, indexing, fast retrieval and quick sharing of the content. In view of the vast literature available on different aspects of video summarization approaches and techniques, a need has arisen to summarize and organize various recent research findings, future research focus and trends, challenges, performance measures and evaluation and datasets for testing and validations. This paper investigates into the existing video summarization frameworks and presents a comprehensive view of the existing approaches and techniques. It highlights the recent advances in the techniques and discusses the paradigm shift that has occurred over the last two decades in the area, leading to considerable improvement. Attempts are made to consolidate the most significant findings right from the basic summarization structure to the classification of summarization techniques and noteworthy contributions in the area. Additionally, the existing datasets categorized domain-wise for the purpose of video summarization and evaluation are enumerated. The present study would be helpful in: assimilating important research findings and data for ready reference, identifying groundwork and exploring potential directions for further research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

video research papers

Similar content being viewed by others

video research papers

Multi-view Video Summarization

video research papers

A comprehensive study of automatic video summarization techniques

video research papers

Classification and comparison of on-line video summarisation methods

Explore related subjects.

  • Artificial Intelligence

Ajmal M, Ashraf MH, Shakir M, Abbas Y and Shah FA (2012) Video summarization: techniques and classification. In: International Conference on Computer Vision and Graphics pp. 1–13. https://doi.org/10.1007/978-3-642-33564-8_1

Angadi S, Naik V (2014), “Entropy based fuzzy C means clustering and key frame extraction for sports video summarization”, in fifth international conference on signal and image processing, pp. 271-279.

Aparício M, Figueiredo P, Raposo F, Martins de Matos D, Ribeiro R, Marujo L (2016) Summarization of films and documentaries based on subtitles and scripts. Pattern Recogn Lett 73:7–12

Article   Google Scholar  

Atencio P, German ST, Branch JW, Delrieux C (2019) Video summarization by deep visual and categorical diversity. IET Comput Vis 13(6):569–577

Barbeiri TTDS, Goularte R (2020) Content selection criteria for news multi-video summarization based on human strategies. International Journal on Digital Libraries, 1–14

Basavarajaiah M, Sharma P (2019) Survey of de domain video summarization techniques. ACM Comput Surv 52(6):1–29

Baskurt KB, Samet R (2019) Video synopsis: a survey. Comput Vis Image Underst 181:26–38

Cao Y et al. (2013) Recognise human activities from partially observed videos. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition , pp. 2658–2665. doi: https://doi.org/10.1109/CVPR.2013.343 .

Chang SF (2003) Content- based video summarization and adaptation for ubiquitous media access. In: 12th international conference on image analysis and processing, pp. 494-496, doi: https://doi.org/10.1109/ICIAP.2003.1234098 .

Chen Y, Zhang B (2014) Surveillance video summarization by jointly applying moving object detection and tracking. International Journal of Computational Vision and Robotics 4(3):212–234

Chen T, Lu A, Hu SM (2012) Visual storylines: semantic visualization of movie sequence. Comput Graph 36(4):241–249

Chen B, Chen Y, Chen F (2017) Video to text summary: joint video summarization and captioning with recurrent neural networks. Proceedings of the British Machine Vision Conference (BMVC) 118:1–118.14. https://doi.org/10.5244/C.31.118

Choudary C, Liu T (2007) Summarization of visual content in instructional videos. IEEE Transactions on Multimedia 9(7):1443–1455

Chu WS, Song Y and Jaimes A (2015) Video co-summarization: Video summarization by visual co-occurrence. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3584–3592. doi: https://doi.org/10.1109/CVPR.2015.7298981

Cong Y, Yuan J, Luo J (2012) Towards scalable summarization of consumer videos via sparse dictionary selection. IEEE Transactions on Multimedia 14(1):66–75

Coppola C, Cosar S, Faria DR, Belloto N (2020) Social activity recognition on continuous RGB-D video sequences. Int J Soc Robot 12:201–215

Cosar S, Donatiello G, Bogorny V, Garate C, Alvares LO, Bremond F (2017) Towards abnormal trajectory and event detection in video surveillance. IEEE Transactions on Circuits and Systems for Video Technology 27(3):683–695

Dang CT, Radha H (2014) Heterogeneity image patch index and its application to consumer video summarization. IEEE Trans Image Process 23(6):2704–2718

Article   MathSciNet   MATH   Google Scholar  

De Aliva SEF et al (2011) VSUMM: a mechanism designed to produce static video summaries and a novel evaluation method. Pattern Recogn Lett 32(1):56–68

De Silva GC, Yamasaki T and Aizawa K (2005) Evaluation of video summarization for a large number of cameras in ubiquitous home. In: proceedings of the 13 th annual ACM international conference on multimedia, pp. 820-828. doi: https://doi.org/10.1145/1101149.1101329 .

Duque D, Santos H and Cortez P (2007) Prediction of abnormal behaviors for intelligent video surveillance systems. In: IEEE Symposium on Computational Intelligence and Data Mining, pp. 362–367. doi: https://doi.org/10.1109/CIDM.2007.368897 .

Evangelopoulos G, et al. (2008) Movie summarization based on audiovisual saliency detection. In: 15 th IEEE international conference on image processing, pp. 2528-2531, doi: https://doi.org/10.1109/ICIP.2008.4712308 .

Evangelopoulos G et al (n.d.) Multimodal saliency and Fusion for Movie Summarization Based on Aural, Visual and Textual Attention. IEEE Transactions on Multimedia 15(7):1553–1568

Fakhar B, Kanan HR, Behrad A (2019) Event detection in soccer videos using unsupervised learning of Spatio-temporal features based on pooled spatial pyramid model. Multimed Tools Appl 78(12):16995–17025

Fei M, Jian W, Mao W (2017) Memorable and rich video summarization. J Vis Commun Image Represent 42:207–217

Fu Y, Guo Y, Zhu Y, Liu F, Song C, Zhou Z (2010) Multi-View Video Summarization. IEEE Transactions on Multimedia 12(7):717–729

Garcia AM, Tan C, Lim JH, Tan AH (2017) Summarization of egocentric videos: a comprehensive survey. IEEE Transactions on Human-Machine Systems 47(1):65–76

Google Scholar  

Goldman DB, Curless B, Salesin D, Seitz SM (2006) Schematic storyboarding for video visualization and editing. ACM Transactions on Graphics (TOG) 25(3):862–871

Gong B, Chao WL, Grauman K and Sha F (2014) Diverse sequential subset selection for supervised video summarization. In advances in neural information processing systems, pp. 2069-2077.

Gowsikhaa D, Abirami S, Baskaran R (2014) Automated human behavior analysis from surveillance videos: a survey. Artif Intell Rev 42(4):747–765

Guo Z, Gao L, Zhen X, Zou F, Shen F, Zheng K (2016) Spatial and temporal scoring for egocentric video summarization. Neurocomputing 208:299–308

Gygli M, Grabner H, Riemenschneider H and Van Gool L (2014) Creating summaries from user videos. In: European Conference on Computer Vision, pp. 505–520. https://doi.org/10.1007/978-3-319-10584-0_33

Gygli M, Grabner H and Van Gool L (2015) Video summarization by learning submodular mixture of objectives. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3090–3098. doi: https://doi.org/10.1109/CVPR.2015.7298928 .

Haq IU, Muhammad K, Hussain T, Kwon S, Sodanil M, Baik SW, Lee MY (2019) Movie scene segmentation using object detection and set theory. International Journal of Distributed Sensor Networks 15(6):155014771984527

Herranz L, Martinez JM (2010) A framework for scalable summarization of video. IEEE Transactions on Circuits and Systems for Video Technology 20(9):1265–1270

Hesham M, Hani B, Fouad N and Amer E (2018) Smart trailer: automatic generation of movie trailer using only subtitles. In: 2018 first international workshop on deep and representation learning (IWRDL), pp. 26-30. doi: https://doi.org/10.1109/IWDRL.2018.8358211 .

Hussein N, Gavves E and Smeulders AW (2019) VideoGraph: Recognising minutes- long human activities in videos”. arXiv preprint arXiv:1905.05143.

Ide I et al. (2017) Summarization of news videos considering the consistency of auditory and visual contents. In: IEEE International Symposium on Multimedia, pp. 193–199, doi: https://doi.org/10.1109/ISM.2017.33 .

Javed A, Irtaza A, Malik H, Mahmood MT, Adnan S (2019) Multimodal framework based on audio-visual features for summarization of cricket videos. IET Image Process 13(4):615–622

Ji Z, Zhang Y, Pang Y, Li X (2018) Hypergraph dominant set based multi-video summarization. Signal Process 148:114–123

Ji H, Hooshyar D, Kim K, Lim H (2019) A semantic – based video scene segmentation using a deep neural network. J Inf Sci 45(6):833–844

Ji Z, Xiong K, Pang Y, Li X (2020) Video summarization with attention-based encoder- decoder networks. IEEE Transactions on Circuits and Systems for Video Technology 30(6):1709–1717

Ji Z, Zhao Y, Pang Y, Li X (2020) Cross-modal guidance based auto-encoder for multi-video summarization. Pattern Recogn Lett 135:131–137. https://doi.org/10.1016/j.patrec.2020.04.011

Jiang Y, Cui K, Peng B and Xu C (2019) Comprehensive video understanding: video summarization with content-based video recommender design. In: 2019 IEEE/CVF international conference on computer vision workshop (ICCVW), pp. 1562-1569. doi: https://doi.org/10.1109/ICCVW.2019.00195 .

Joho H, Jose JM, Valenti R, Sebe N (2009) Exploiting facial expressions for affective video summarization. Proceedings of the ACM International Conference on Image and Video Retrieval, Article 31:1–8. https://doi.org/10.1145/1646396.1646435

Kanehira A, Van Gool L, Ushiku Y and Harada T (2018) Viewpoint – aware video summarization . In: IEEE/CVF Conference on Computer Vision and Pattern Recognition , pp. 7435–7444. doi: https://doi.org/10.1109/CVPR.2018.00776 .

Kato K, Ide I, Deguchi D and Murase H (2014) Estimation of the representative story transition in a chronological semantic structure of news topics. In: Proceedings of International Conference on Multimedia Retrieval, pp. 487–490. doi: https://doi.org/10.1145/2578726.2578800 .

Kavitha J, Rani PAJ (2015) Static and multi resolution feature extraction for video summarization. Procedia Computer Science 47:292–300

Khan AA, Shao J, Ali W, Tumrani S (2020) Content- aware summarization of broadcast sports videos: an audio-visual feature extraction approach. Neural Process Lett 52:1–24. https://doi.org/10.1007/s11063-020-10200-3

Khan G, Jabeen S, Khan MZ, Khan MUG, Iqbal R (2020) Blockchain-enabled deep semantic video-to-video summarization for IoT devices. Computers & Electrical Engineering 81:81. https://doi.org/10.1016/j.compeleceng.2019.106524

Khosla A, Hamid R, Lin CJ and Sundaresan N (2013) Large-scale video summarization using web-image priors. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2698–2705. doi: https://doi.org/10.1109/CVPR.2013.348 .

Khosla A, Raju AS, Torallba A and Olivia A (2015) Understanding and predicting image memorability at a large scale. In: IEEE International Conference on Computer Vision , pp. 2390–2398, doi: https://doi.org/10.1109/ICCV.2015.275 .

Kim C, Hwang JN (2002) Object-based video abstraction for video surveillance systems. IEEE Transactions on Circuits and Systems for Video Technology 12(12):1128–1138

Kim G, Sigal L and Xing EP (2014) Joint summarization of large-scale collections of web images and videos for storyline reconstruction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4225–4232. doi: https://doi.org/10.1109/CVPR.2014.538 .

Klaser A, Marszalek M, Schmid C (2008) A Spatio–temporal descriptor based on 3D gradients. Proceedings of British Machine Vision Conference 99:1–99.10. https://doi.org/10.5244/C.22.99

Kota BU, Ahmed S, Stone A, Davila K, Stelur S, Govindaraju V (2019) Summarizing Lecture Videos by Key Handwritten Content Regions. In: 2019 International conference on document analysis and recognition workshops (ICDARW) 4: 13–18. IEEE.

Kwon J, Lee KM (2015) A unified framework for event summarization and rare event detection from multiple views. IEEE Trans Pattern Anal Mach Intell 37(9):1737–1750

Lai PK, Decombas M, Moutet K and Laganiere R (2016) Video summarization of surveillance cameras. In: IEEE International Conference on Advanced Video and Signal based Surveillance, pp. 286–294, doi: https://doi.org/10.1109/AVSS.2016.7738018 .

Lee YJ, Grauman K (2015) Predicting important objects for egocentric video summarization. Int J Comput Vis 114(1):38–55

Article   MathSciNet   Google Scholar  

Lee YJ, Ghosh J and Grauman K (2012) Discovering important people and objects for egocentric video summarization. In: IEEE Conference on Computer Vision and Pattern Recognition , pp. 1346–1353. doi: https://doi.org/10.1109/CVPR.2012.6247820 .

Lee S, Sung J, Yu Y and Kim G (2018) A memory network approach for story-based temporal summarization of 360 videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1410–1419.

Lew MS, Sebe N, Djeraba C, Jain R (2006) Content-based multimedia information retrieval: state of the art and challenges. ACM Trans Multimed Comput Commun Appl 2(1):1–19

Li Y, Merialdo B (2016) Multimedia maximal marginal relevance for multi-video summarization. Multimed Tools Appl 75(1):199–220

Li B, Pan H and Sezan I (2003) A general framework for sports video summarization with its application to soccer. In: IEEE international conference on acoustics, speech, and signal processing, pp. III-169. doi: https://doi.org/10.1109/ICASSP.2003.1199134 .

Lie WN and Lai CM (2004) News video summarization based on spatial and motion feature analysis. In: Pacific-Rim Conference on Multimedia, pp. 246–255. https://doi.org/10.1007/978-3-540-30542-2_31

Liu T, Kender JR (2002) Rule-based semantic summarization of instructional videos. In: Proceedings of International Conference on Image Processing pp. I-I. IEEE. doi: https://doi.org/10.1109/ICIP.2002.1038095 .

Lu Z and Grauman K (2013) Story-driven summarization for egocentric video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2714–2721, doi: https://doi.org/10.1109/CVPR.2013.350

Lu G, Zhou Y, Li X, Yan P (2017) Unsupervised, efficient and scalable key-frame selection for automatic summarization of surveillance videos. Multimed Tools Appl 76(5):6309–6331

Ma M, Mei S, Wan S, Hou J, Wang Z, Feng DD (2020) Video summarization via block sparse dictionary selection. Neurocomputing 378:197–209

Mademlis I, Tefas A, Nikolaidis N, Pitas I (2016) Multimodal stereoscopic movie summarization conforming to narrative characteristics. IEEE Trans Image Process 25(12):5828–5840

Mademlis I, Tefas A, Nikolaidis N and Pitas I (2017) summarization of human activity videos via low – rank approximation. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 1627-1631. doi: https://doi.org/10.1109/ICASSP.2017.7952432 .

Mahasseni B, Lam M and Todorovic S (2017) Unsupervised video summarization with adversarial LSTM networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pp. 2982–2991.doi: https://doi.org/10.1109/CVPR.2017.318 .

Matthews CE, Kuncheva LI, Yousefi P (2019) Classification and comparison of on-line video summarization methods. Mach Vis Appl 30:507–518

Mendi E, Clemente HB, Bayrak C (2013) Sports video summarization based on motion analysis. Computers & Electrical Engineering 39(3):790–796

Meng J, Wang S, Wang H, Yuan J, Tan YP (2018) Video summarization via multiview representative selection. IEEE Trans Image Process 27(5):2134–2145

Money AG, Agius H (2008) Video summarization: a conceptual framework and survey of the state of art. J Vis Commun Image Represent 19(2):121–143

Moses TM and Balachandran K (2017) A classified study on semantic analysis of video summarization. In: 2017 international conference on algorithms, methodology, models and applications in emerging technologies (ICAMMAET), pp 1-6. doi: https://doi.org/10.1109/ICAMMAET.2017.8186684

Niebles JC, Chen CW and Fei-Fei L (2010) Modeling temporal structure of decomposable motion segments for activity classification. In: European Conference on Computer Vision, pp. 392–405, https://doi.org/10.1007/978-3-642-15552-9_29

Oh S et al (2011) A large-scale benchmark dataset for event recognition in surveillance video. CVPR 2011, pp. 3153–3160, doi: https://doi.org/10.1109/CVPR.2011.5995586 .

Otani M, Nakashima Y, Rahtu E and Heikkila J (2019) Rethinking the evaluation of video summaries. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp. 7588-7596. doi: https://doi.org/10.1109/CVPR.2019.00778 .

Ouyang JQ, Liu R (2013) Ontology reasoning scheme for constructing meaningful sports video summarization. IET Image Process 7(4):324–334

Panda R, Roy-Chowdhury AK (2017) Multi-view surveillance video summarization via joint embedding and sparse optimization. IEEE Transactions on Multimedia 19(9):2010–2021

Panda R, Mithun NC, Roy-Chowdhury AK (2017) Diversity-aware multi-video summarization. IEEE Trans Image Process 26(10):4712–4724

Panda R, Kuanar SK, Chowdhury AS (2018) Nyström approximated temporally constrained multisimilarity spectral clustering approach for movie scene detection. IEEE Transactions on Cybernetics 48(3):836–847

Paul M, Haque SM, Chakraborty S (2013) Human detection in surveillance videos and its applications- a review. EURASIP Journal on Advances in Signal processing 2013(1):176

Peng WT, Chu WT, Chang CT, Chou CN, Huang WJ, Chang WY, Hung YP (2011) Editing by viewing: automatic home video summarization by viewing behaviour analysis. IEEE Transactions on Multimedia 13(3):539–550

Pereira MHR, Padua FLC, Dalip DH et al (2019) Multimodal approach for tension levels estimation in news videos. Multimed Tools Appl 78:23783–23808

Pirsiavash H and Ramanan D (2012) Detecting activities of daily living in first-person camera views. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 2847-2854, doi: https://doi.org/10.1109/CVPR.2012.6248010 .

Potapov D, Douze M, Harchaoui Z and Schmid C (2014) Category-specific video summarization. In: European Conference on Computer Vision, pp. 540–555. https://doi.org/10.1007/978-3-319-10599-4_35

Rahman MR, Subhlok J and Shah S (2020) Visual summarization of lecture video segments for enhanced navigation. arXiv preprint arXiv:2006.02434.

Rani S, Kumar M (2020) Social media video summarization using multi-visual features and Kohnen's self-organizing map. Inf Process Manag 57(3):102190

Rav-Acha A, Pritch Y and Peleg S (2006) Making a long video short: dynamic video synopsis. In: 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06), pp. 435-441. doi: https://doi.org/10.1109/CVPR.2006.179

Safdarnejad SM, Liu X, Udpa L, Andrus B, Wood J and Craven D (2015) Sports videos in the wild (SVW): a video dataset for sports analysis. In: 11th IEEE international conference and workshops on automatic face and gesture recognition, pp. 1-7, doi: https://doi.org/10.1109/FG.2015.7163105 .

Sah S et al. (2017): Semantic text summarization of long videos. In: IEEE Winter Conference on Applications of Computer Vision, pp. 989–997. doi: https://doi.org/10.1109/WACV.2017.115 .

Sasithradevi A, Roomi SMM (2020) A new pyramidal opponent color-shape model-based video shot boundary detection. J Vis Commun Image Represent 67:102754

Scovanner P, Ali S and Shah M (2007) A 3-dimensional sift descriptor and its application to action recognition. In: Proceedings of the ACM International Conference on Multimedia, pp. 357–360. doi: https://doi.org/10.1145/1291233.1291311 .

Sharghi A, Gong B and Shah M (2016) Query-focused extractive video summarization. In: European conference on computer vision, pp. 3-19. Springer. doi: https://doi.org/10.1007/978-3-319-46484-8_1 .

Sharghi A, Laurel JS and Gong B (2017) Query-focused video summarization: dataset, evaluation and a memory network based approach. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2127–2136. doi: https://doi.org/10.1109/CVPR.2017.229 .

Song Y, Vallmitjana J, Stent A and Jaimes A (2015) TVSum: summarizing web videos using titles. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5179–5187. doi: https://doi.org/10.1109/CVPR.2015.7299154

Sreeja MU, Kovoor BC (2019) Towards genre-specific frameworks for video summarization: a survey. J Vis Commun Image Represent 62:340–358

Tejero-de Pablos A, Nakashima Y, Sato T, Yokoya N, Linna M, Rahtu E (2018) Summarization of user-generated sports video by using deep action recognition features. IEEE Transactions on Multimedia 20(8):2000–2011

Thomas SS, Gupta S, Subramanian VK (2017) Perceptual video summarization-a new framework for video summarization. IEEE Transactions on Circuits and Systems for Video Technology 27(8):1790–1802

Thomas SS, Gupta S, Subramanian VK (2018) Event detection on roads using perceptual video summarization. IEEE Trans Intell Transp Syst 19(9):2944–2954

Thomas SS, Gupta S, Subramanian VK (2019) Context driven optimized perceptual video summarization and retrieval. IEEE Transactions on Circuits and Systems for Video Technology 29(10):3132–3145

Truong BT, Venkatesh S (2007) Video abstraction: a systematic review and classification. ACM Trans Multimed Comput Commun Appl 3(1):3

Tsai C, Kang LW, Lin CW, Lin W (2013) Scene based movie summarization via role-community networks. IEEE Transactions on Circuits and Systems for Video Technology 23(11):1927–1940

Ul Haq I, Ullah A, Muhammad K, Lee MY, Baik SW (2019) Personalised movie summarization using deep CNN- assisted facial expression recognition. Complexity 2019:1–10. https://doi.org/10.1155/2019/3581419

Vaca-Castano G, Das S, Sousa JP, Lobo ND, Shah M (2017) Improved scene identification and object detection on egocentric vision of daily activities. Comput Vis Image Underst 156:92–103

Varini P, Serra G, Cucchiara R (2017) Personalised egocentric video summarization of cultural tour on user preferences input. IEEE Transactions on Multimedia 19(12):2832–2845

Vasudevan AB, Gygli M, Volokitin A and Van Gool L (2017) Query-adaptive video summarization via quality aware relevance estimation. In: proceedings of the 25th ACM international conference on multimedia, pp. 582-590. https://doi.org/10.1145/3123266.3123297 .

Wu J, Zhong SH, Liu Y (2020) Dynamic graph convolutional network for multi-video summarization. Pattern Recogn 107:107382. https://doi.org/10.1016/j.patcog.2020.107382

Xiong B, Kim G and Sigal L (2015) Storyline representation of egocentric videos with an applications to story-based search. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4525–4533. doi: https://doi.org/10.1109/ICCV.2015.514

Xu J, Mukherjee L, Li Y, Warner J, Rehg JM and Singh V (2015) Gaze-enabled egocentric video summarization via constrained submodular maximization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2235–2244. doi: https://doi.org/10.1109/CVPR.2015.7298836

Yu Y, Lee S, Na J, Kang J, and Kim G (2018) A deep ranking model for spatio-temporal highlight detection from a 360 video. arXiv preprint arXiv:1801.10312.

Zhang K, Chao WL, Sha F and Grauman K (2016) Video summarization with long short-term memory. In: European Conference on Computer Vision, pp. 766–782. doi: https://doi.org/10.1007/978-3-319-46478-7_47

Zhang K, Chao W, Sha F and Grauman K (2016) Summary transfer: exemplar-based subset selection for video summarization. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 1059-1067, doi: https://doi.org/10.1109/CVPR.2016.120 .

Zhang Y, Lu H, Zhang L, Ruan X, Sakai S (2016) Video anomaly detection based on locality sensitive hashing filters. Pattern Recogn 59:302–311

Zhang S, Zhu Y, Roy Chowdhury AK (2016) Context – aware surveillance video summarization. IEEE Trans Image Process 25(11):5469–5478

Zhang Y, Tao R, Wang Y (2017) Motion-state-adaptive video summarization via spatiotemporal analysis. IEEE Transactions on Circuits and Systems for Video Technology 27(6):1340–1352

Zhang Y, Kampffmeyer M, Zhao X, Tan M (2019) Deep reinforcement learning for query-conditioned video summarization. Appl Sci 9(4):750

Zhao B and Xing EP (2014) Quasi real-time summarization for consumer videos. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2513–2520. doi: https://doi.org/10.1109/CVPR.2014.322 .

Zhao B, Li X and Lu X (2018) HSA-RNN: hierarchical structure-adaptive RNN for video summarization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pp. 7405–7414. doi: https://doi.org/10.1109/CVPR.2018.00773 .

Zhong H, Shi J and Visontai M (2004) Detecting unusual activity in video. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2004), pp. II-II, doi: https://doi.org/10.1109/CVPR.2004.1315249 .

Zhou B, Lapedriza A, Xiao J, Torralba A and Oliva A (2014) Learning deep features for scene recognition using places database. In Advances in neural information processing systems, pp 487–495.

Zhou K, Qiao Y and Xiang T (2017) Deep reinforcement learning for unsupervised video summarization with diversity-representativeness reward. arXiv preprint arXiv:1801.00054.

Zhu X, Wu X, Fan J, Elmagarmid AK, Aref WG (2004) Exploring video content structure for hierarchical summarization. Multimedia Systems 10:98–115

Zhu X, Elmagarmid AK, Xue X, Wu L, Catlin AC (2005) InsightVideo: toward hierarchical video content Organization for Efficient Browsing, summarization and retrieval. IEEE Transactions on Multimedia 7(4):648–666

Zhu X, Loy CC, Gong S (2016) Learning from multiple sources for video summarization. Int J Comput Vis 117:247–268

Zhukov D et al. (2019) Cross-task weakly supervised learning from instructional videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3537–3545

Download references

Author information

Authors and affiliations.

Department of Computer Engineering and Applications, GLA University, Mathura, India

Vasudha Tiwari & Charul Bhatnagar

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Vasudha Tiwari .

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Tiwari, V., Bhatnagar, C. A survey of recent work on video summarization: approaches and techniques. Multimed Tools Appl 80 , 27187–27221 (2021). https://doi.org/10.1007/s11042-021-10977-y

Download citation

Received : 24 June 2020

Revised : 18 March 2021

Accepted : 30 April 2021

Published : 15 May 2021

Issue Date : July 2021

DOI : https://doi.org/10.1007/s11042-021-10977-y

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Computer vision
  • Video summarization
  • Dynamic summaries
  • Hierarchical summaries
  • Multi-view summaries
  • User oriented summaries
  • Multi-video summarization
  • Find a journal
  • Publish with us
  • Track your research

Analysing video and audio data: existing approaches and new innovations

  • Conference: Surface Learning Workshop 2012

Elizabeth FitzGerald at The Open University (UK)

  • The Open University (UK)

Discover the world's research

  • 25+ million members
  • 160+ million publication pages
  • 2.3+ billion citations

Kinyota Mjege

  • Indian J Ind Relat

Kuldeep Kaur

  • Vincent Naano Anney

Makungu Bulayi

  • Joseph Rosen
  • Eric M Hoffert
  • J Appl Ling

Neil Mercer

  • Paul Tennent
  • Chris Greenhalgh
  • Svenja Adolphs
  • Stamatina Anastopoulou

Mike Sharples

  • F. Erickson
  • TEACH TEACH EDUC
  • Joelle K. Jay
  • Kerri L. Johnson

Reed Stevens

  • Gina Cherry
  • Janice Fournier
  • Recruit researchers
  • Join for free
  • Login Email Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google Welcome back! Please log in. Email · Hint Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google No account? Sign up

VideoPoet: A large language model for zero-shot video generation

December 19, 2023

Posted by Dan Kondratyuk and David Ross, Software Engineers, Google Research

Quick links

  • Copy link ×

A recent wave of video generation models has burst onto the scene, in many cases showcasing stunning picturesque quality. One of the current bottlenecks in video generation is in the ability to produce coherent large motions. In many cases, even the current leading models either generate small motion or, when producing larger motions, exhibit noticeable artifacts.

To explore the application of language models in video generation, we introduce VideoPoet ( website , research paper ), a large language model (LLM) that is capable of a wide variety of video generation tasks, including text-to-video, image-to-video, video stylization, video inpainting and outpainting , and video-to-audio. One notable observation is that the leading video generation models are almost exclusively diffusion-based (for one example, see Imagen Video ). On the other hand, LLMs are widely recognized as the de facto standard due to their exceptional learning capabilities across various modalities, including language, code, and audio (e.g., AudioPaLM ). In contrast to alternative models in this space, our approach seamlessly integrates many video generation capabilities within a single LLM, rather than relying on separately trained components that specialize on each task.

The diagram below illustrates VideoPoet’s capabilities. Input images can be animated to produce motion, and (optionally cropped or masked) video can be edited for inpainting or outpainting. For stylization, the model takes in a video representing the depth and optical flow, which represent the motion, and paints contents on top to produce the text-guided style.

An overview of VideoPoet, capable of multitasking on a variety of video-centric inputs and outputs. The LLM can optionally take text as input to guide generation for text-to-video, image-to-video, video-to-audio, stylization, and outpainting tasks. Resources used: and .

Language models as video generators

One key advantage of using LLMs for training is that one can reuse many of the scalable efficiency improvements that have been introduced in existing LLM training infrastructure. However, LLMs operate on discrete tokens, which can make video generation challenging. Fortunately, there exist video and audio tokenizers, which serve to encode video and audio clips as sequences of discrete tokens (i.e., integer indices), and which can also be converted back into the original representation.

VideoPoet trains an autoregressive language model to learn across video, image, audio, and text modalities through the use of multiple tokenizers ( MAGVIT V2 for video and image and SoundStream for audio). Once the model generates tokens conditioned on some context, these can be converted back into a viewable representation with the tokenizer decoders.

A detailed look at the VideoPoet task design, showing the training and inference inputs and outputs of various tasks. Modalities are converted to and from tokens using tokenizer encoder and decoders. Each modality is surrounded by boundary tokens, and a task token indicates the type of task to perform.

Examples generated by VideoPoet

Some examples generated by our model are shown below.

Videos generated by VideoPoet from various text prompts. For specific text prompts refer to .

For text-to-video, video outputs are variable length and can apply a range of motions and styles depending on the text content. To ensure responsible practices, we reference artworks and styles in the public domain e.g., Van Gogh’s “Starry Night”.

           
           

For image-to-video, VideoPoet can take the input image and animate it with a prompt.

An example of image-to-video with text prompts to guide the motion. Each video is paired with an image to its left. : “A ship navigating the rough seas, thunderstorm and lightning, animated oil on canvas”. : “Flying through a nebula with many twinkling stars”. : “A wanderer on a cliff with a cane looking down at the swirling sea fog below on a windy day”. Reference: , public domain**.

For video stylization, we predict the optical flow and depth information before feeding into VideoPoet with some additional input text.

Examples of video stylization on top of VideoPoet text-to-video generated videos with text prompts, depth, and optical flow used as conditioning. The left video in each pair is the input video, the right is the stylized output. : “Wombat wearing sunglasses holding a beach ball on a sunny beach.” : “Teddy bears ice skating on a crystal clear frozen lake.” : “A metal lion roaring in the light of a forge.”

VideoPoet is also capable of generating audio. Here we first generate 2-second clips from the model and then try to predict the audio without any text guidance. This enables generation of video and audio from a single model.

        
An example of video-to-audio, generating audio from a video example without any text input.

By default, the VideoPoet model generates videos in portrait orientation to tailor its output towards short-form content. To showcase its capabilities, we have produced a brief movie composed of many short clips generated by VideoPoet. For the script, we asked Bard to write a short story about a traveling raccoon with a scene-by-scene breakdown and a list of accompanying prompts. We then generated video clips for each prompt, and stitched together all resulting clips to produce the final video below.

When we developed VideoPoet, we noticed some nice properties of the model’s capabilities, which we highlight below.

We are able to generate longer videos simply by conditioning on the last 1 second of video and predicting the next 1 second. By chaining this repeatedly, we show that the model can not only extend the video well but also faithfully preserve the appearance of all objects even over several iterations.

Here are two examples of VideoPoet generating long video from text input:

                
                

It is also possible to interactively edit existing video clips generated by VideoPoet. If we supply an input video, we can change the motion of objects to perform different actions. The object manipulation can be centered at the first frame or the middle frames, which allow for a high degree of editing control.

For example, we can randomly generate some clips from the input video and select the desired next clip.

An input video on the left is used as conditioning to generate four choices given the initial prompt: “Closeup of an adorable rusty broken-down steampunk robot covered in moss moist and budding vegetation, surrounded by tall grass”. For the first three outputs we show what would happen for unprompted motions. For the last video in the list below, we add to the prompt, “powering up with smoke in the background” to guide the action.

Image to video control

Similarly, we can apply motion to an input image to edit its contents towards the desired state, conditioned on a text prompt.

Animating a painting with different prompts. : “A woman turning to look at the camera.” : “A woman yawning.” **

Camera motion

We can also accurately control camera movements by appending the type of desired camera motion to the text prompt. As an example, we generated an image by our model with the prompt, “Adventure game concept art of a sunrise over a snowy mountain by a crystal clear river” . The examples below append the given text suffix to apply the desired motion.

Prompts from left to right: “Zoom out”, “Dolly zoom”, “Pan left”, “Arc shot”, “Crane shot”, “FPV drone shot”.

Evaluation results

We evaluate VideoPoet on text-to-video generation with a variety of benchmarks to compare the results to other approaches. To ensure a neutral evaluation, we ran all models on a wide variation of prompts without cherry-picking examples and asked people to rate their preferences. The figure below highlights the percentage of the time VideoPoet was chosen as the preferred option in green for the following questions.

Text fidelity

User preference ratings for text fidelity, i.e., what percentage of videos are preferred in terms of accurately following a prompt.

Motion interestingness

User preference ratings for motion interestingness, i.e., what percentage of videos are preferred in terms of producing interesting motion.

Based on the above, on average people selected 24–35% of examples from VideoPoet as following prompts better than a competing model vs. 8–11% for competing models. Raters also preferred 41–54% of examples from VideoPoet for more interesting motion than 11–21% for other models.

Through VideoPoet, we have demonstrated LLMs’ highly-competitive video generation quality across a wide variety of tasks, especially in producing interesting and high quality motions within videos. Our results suggest the promising potential of LLMs in the field of video generation. For future directions, our framework should be able to support “any-to-any” generation, e.g., extending to text-to-audio, audio-to-video, and video captioning should be possible, among many others.

To view more examples in original quality, see the website demo .

Acknowledgements

This research has been supported by a large body of contributors, including Dan Kondratyuk, Lijun Yu, Xiuye Gu, José Lezama, Jonathan Huang, Rachel Hornung, Hartwig Adam, Hassan Akbari, Yair Alon, Vighnesh Birodkar, Yong Cheng, Ming-Chang Chiu, Josh Dillon, Irfan Essa, Agrim Gupta, Meera Hahn, Anja Hauth, David Hendon, Alonso Martinez, David Minnen, David Ross, Grant Schindler, Mikhail Sirotenko, Kihyuk Sohn, Krishna Somandepalli, Huisheng Wang, Jimmy Yan, Ming-Hsuan Yang, Xuan Yang, Bryan Seybold, and Lu Jiang.

We give special thanks to Alex Siegman,Victor Gomes, and Brendan Jou for managing computing resources. We also give thanks to Aren Jansen, Marco Tagliasacchi, Neil Zeghidour, John Hershey for audio tokenization and processing, Angad Singh for storyboarding in “Rookie the Raccoon”, Cordelia Schmid for research discussions, David Salesin, Tomas Izo, and Rahul Sukthankar for their support, and Jay Yagnik as architect of the initial concept.

** (a) The Storm on the Sea of Galilee , by Rembrandt 1633, public domain. (b) Pillars of Creation , by NASA 2014, public domain. (c) Wanderer above the Sea of Fog , by Caspar David Friedrich, 1818, public domain (d) Mona Lisa , by Leonardo Da Vinci, 1503, public domain.

  • Generative AI
  • Machine Intelligence
  • Machine Perception

Other posts of interest

video research papers

September 20, 2024

  • Algorithms & Theory ·

video research papers

September 19, 2024

  • Climate & Sustainability ·

video research papers

September 16, 2024

  • Machine Intelligence ·
  • Methodology
  • Open access
  • Published: 28 July 2018

Research as storytelling: the use of video for mixed methods research

  • Erica B. Walker   ORCID: orcid.org/0000-0001-9258-3036 1 &
  • D. Matthew Boyer 2  

Video Journal of Education and Pedagogy volume  3 , Article number:  8 ( 2018 ) Cite this article

19k Accesses

5 Citations

4 Altmetric

Metrics details

Mixed methods research commonly uses video as a tool for collecting data and capturing reflections from participants, but it is less common to use video as a means for disseminating results. However, video can be a powerful way to share research findings with a broad audience especially when combining the traditions of ethnography, documentary filmmaking, and storytelling.

Our literature review focused on aspects relating to video within mixed methods research that applied to the perspective presented within this paper: the history, affordances and constraints of using video in research, the application of video within mixed methods design, and the traditions of research as storytelling. We constructed a Mind Map of the current literature to reveal convergent and divergent themes and found that current research focuses on four main properties in regards to video: video as a tool for storytelling/research, properties of the camera/video itself, how video impacts the person/researcher, and methods by which the researcher/viewer consumes video. Through this process, we found that little has been written about how video could be used as a vehicle to present findings of a study.

From this contextual framework and through examples from our own research, we present current and potential roles of video storytelling in mixed methods research. With digital technologies, video can be used within the context of research not only as data and a tool for analysis, but also to present findings and results in an engaging way.

Conclusions

In conclusion, previous research has focused on using video as a tool for data collection and analysis, but there are emerging opportunities for video to play an increased role in mixed methods research as a tool for the presentation of findings. By leveraging storytelling techniques used in documentary film, while staying true to the analytical methods of the research design, researchers can use video to effectively communicate implications of their work to an audience beyond academics and use video storytelling to disseminate findings to the public.

Using motion pictures to support ethnographic research began in the late nineteenth century when both fields were early in their development (Henley, 2010 ; “Using Film in Ethnographic Field Research, - The University of Manchester,” n.d ). While technologies have changed dramatically since the 1890s, researchers are still employing visual media to support social science research. Photographic imagery and video footage can be integral aspects of data collection, analysis, and reporting research studies. As digital cameras have improved in quality, size, and affordability, digital video has become an increasingly useful tool for researchers to gather data, aid in analysis, and present results.

Storytelling, however, has been around much longer than either video or ethnographic research. Using narrative devices to convey a message visually was a staple in the theater of early civilizations and remains an effective tool for engaging an audience today. Within the medium of video, storytelling techniques are an essential part of a documentary filmmaker’s craft. Storytelling can also be a means for researchers to document and present their findings. In addition, multimedia outputs allow for interactions beyond traditional, static text (R. Goldman, 2007 ; Tobin & Hsueh, 2007 ). Digital video as a vehicle to share research findings builds on the affordances of film, ethnography, and storytelling to create new avenues for communicating research (Heath, Hindmarsh, & Luff, 2010 ).

In this study, we look at the current literature regarding the use of video in research and explore how digital video affordances can be applied in the collection and analysis of quantitative and qualitative human subject data. We also investigate how video storytelling can be used for presenting research results. This creates a frame for how data collection and analysis can be crafted to maximize the potential use of video data to create an audiovisual narrative as part of the final deliverables from a study. As researchers we ask the question: have we leveraged the use of video to communicate our work to its fullest potential? By understanding the role of video storytelling, we consider additional ways that video can be used to not only collect and analyze data, but also to present research findings to a broader audience through engaging video storytelling. The intent of this study is to develop a frame that improves our understanding of the theoretical foundations and practical applications of using video in data collection, analysis, and the presentation of research findings.

Literature review

The review of relevant literature includes important aspects for situating this exploration of video research methods: the history, affordances and constraints of using video in research, the use of video in mixed methods design, and the traditions of research as storytelling. Although this overview provides an extensive foundation for understanding video research methods, this is not intended to serve as a meta-analysis of all publications related to video and research methods. Examples of prior work provide a conceptual and operational context for the role of video in mixed methods research and present theoretical and practical insights for engaging in similar studies. Within this context, we examine ethical and logistical/procedural concerns that arise in the design and application of video research methods, as well as the affordances and constraints of integrating video. In the following sections, the frame provided by the literature is used to view practical examples of research using video.

The history of using video in research is founded first in photography and next in film followed more recently, by digital video. All three tools provide the ability to create instant artifacts of a moment or period of time. These artifacts become data that can be analyzed at a later date, perhaps in a different place and by a different audience, giving researchers the chance to intricately and repeatedly examine the archive of information contained within. These records “enable access to the fine details of conduct and interaction that are unavailable to more traditional social science methods” (Heath et al., 2010 , p. 2).

In social science research, video has been used for a range of purposes and accompanies research observation in many situations. For example, in classroom research, video is used to record a teacher in practice and then used as a guide and prompt to interview the teacher as they reflect upon their practice (e.g. Tobin & Hsueh, 2007 ). Video captures events from a situated perspective, providing a record that “resists, at least in the first instance, reduction to categories or codes, and thus preserves the original record for repeated scrutiny” (Heath et al., 2010 , p. 6). In analysis, these audio-visual recordings allow the social science researcher the chance to reflect on their subjectivities throughout analysis and use the video as a microscope that “allow(s) actions to be observed in a detail not even accessible to the actors themselves” (Knoblauch & Tuma, 2011 , p. 417).

Examining the affordances and constraints of video in research provides a researcher the opportunity to examine the value of including video within a study . An affordance of video, when used in research, is that it allows the researcher to see an event through the camera lens either actively or passively and later share what they have seen, or more specifically, the way they saw it (Chalfen, 2011 ). Cameras can be used to capture an event in three different modes: Responsive, Interactive, and Constructive. Responsive mode is reactive. In this mode, the researcher captures and shows the viewer what is going on in front of the lens but does not directly interfere with the participants or events. Interactive mode puts the filmmaker into the storyline as a participant and allows the viewer to observe the interactions between the researcher and participant. One example of video captured in Interactive mode is an interview. In Constructive mode, the researcher reprocesses the recorded events to create an explicitly interpretive final product through the process of editing the video (MacDougall, 2011 ). All of these modes, in some way, frame or constrain what is captured and consequently shared with the audience.

Due to the complexity of the classroom-research setting, everything that happens during a study cannot be captured using video, observation, or any other medium. Video footage, like observation, is necessarily selective and has been stripped of the full context of the events, but it does provide a more stable tool for reflection than the ever-changing memories of the researcher and participants (Roth, 2007 ). Decisions regarding inclusion and exclusion are made by the researcher throughout the entire research process from the initial framing of the footage to the final edit of the video. Members of the research team should acknowledge how personal bias impacts these decisions and make their choices clear in the research protocol to ensure inclusivity (Miller & Zhou, 2007 ).

One affordance of video research is that analysis of footage can actually disrupt the initial assumptions of a study. Analysis of video can be standardized or even mechanized by seeking out predetermined codes, but it can also disclose the subjective by revealing the meaning behind actions and not just the actions themselves (S. Goldman & McDermott, 2007 ; Knoblauch & Tuma, 2011 ). However, when using subjective analysis the researcher needs to keep in mind that the footage only reveals parts of an event. Ideally, a research team has a member who acts as both a researcher and a filmmaker. That team member can provide an important link between the full context of the event and the narrower viewpoint revealed through the captured footage during the analysis phase.

Although many participants are initially camera-shy, they often find enjoyment from participating in a study that includes video (Tobin & Hsueh, 2007 ). Video research provides an opportunity for participants to observe themselves and even share their experience with others through viewing and sharing the videos. With increased accessibility of video content online and the ease of sharing videos digitally, it is vital from an ethical and moral perspective that participants understand the study release forms and how their image and words might continue to be used and disseminated for years after the study is completed.

Including video in a research study creates both affordances and constraints regarding the dissemination of results. Finding a journal for a video-based study can be difficult. Traditional journals rely heavily on static text and graphics, but newly-created media journals include rich and engaging data such as video and interactive, web-based visualizations (Heath et al., 2010 ). In addition, videos can provide opportunities for research results to reach a broader audience outside of the traditional research audience through online channels such as YouTube and Vimeo.

Use of mixed methods with video data collection and analysis can complement the design-based, iterative nature of research that includes human participants. Design-based video research allows for both qualitative and quantitative collection and analysis of data throughout the project, as various events are encapsulated for specific examination as well as analyzed comparatively for changes over time. Design research, in general, provides the structure for implementing work in practice and iterative refinement of design towards achieving research goals (Collins, Joseph, & Bielaczyc, 2004 ). Using an integrated mixed method design that cycles through qualitative and quantitative analyses as the project progresses gives researchers the opportunity to observe trends and patterns in qualitative data and quantitative frequencies as each round of analysis informs additional insights (Gliner et al., 2009 ). This integrated use also provides a structure for evaluating project fidelity in an ongoing basis through a range of data points and findings from analyses that are consistent across the project. The ability to revise procedures for data collection, systematic analysis, and presenting work does not change the data being collected, but gives researchers the opportunity to optimize procedural aspects throughout the process.

Research as storytelling refers to the narrative traditions that underpin the use of video methods to analyze in a chronological context and present findings in a story-like timeline. These traditions are evident in ethnographic research methods that journal lived experiences through a period of time and in portraiture methods that use both aesthetic and scientific language to construct a portrait (Barone & Eisner, 2012 ; Heider, 2009 ; Lawrence-Lightfoot, 2005 ; Lenette, Cox & Brough, 2013 ).

In existing research, there is also attention given to the use of film and video documentaries as sources of data (e.g. Chattoo & Das, 2014 ; Warmington, van Gorp & Grosvenor, 2011 ), however, our discussion here focuses on using media to capture information and communicate resulting narratives for research purposes. In our work, we promote a perspective on emergent storytelling that develops from data collection and analysis, allowing the research to drive the narrative, and situating it in the context from where data was collected. We rely on theories and practices of research and storytelling that leverage the affordances of participant observation and interview for the construction of narratives (Bailey & Tilley, 2002 ; de Carteret, 2008 ; de Jager, Fogarty & Tewson, 2017 ; Gallagher, 2011 ; Hancox, 2017 ; LeBaron, Jarzabkowski, Pratt & Fetzer, 2017 ; Lewis, 2011 ; Meadows, 2003 ).

The type of storytelling used with research is distinctly different from methods used with documentaries, primarily with the distinction that, while documentary filmmakers can edit their film to a predetermined narrative, research storytelling requires that the data be analyzed and reported within a different set of ethical standards (Dahlstrom, 2014 ; Koehler, 2012 ; Nichols, 2010 ). Although documentary and research storytelling use a similar audiovisual medium, creating a story for research purposes is ethically-bounded by expectations in social science communities for being trustworthy in reporting and analyzing data, especially related to human subjects. Given that researchers using video may not know what footage will be useful for future storytelling, they may need to design their data collection methods to allow for an abundance of video data, which can impact analysis timelines as well. We believe it important to note these differences in the construction of related types of stories to make overt the essential need for research to consider not only analysis but also creation of the reporting narrative when designing and implementing data collection methods.

This study uses existing literature as a frame for understanding and implementing video research methods, then employs this frame as perspective on our own work, illuminating issues related to the use of video in research. In particular, we focus on using video research storytelling techniques to design, implement, and communicate the findings of a research study, providing examples from Dr. Erica Walker’s professional experience as a documentary filmmaker as well as evidence from current and former academic studies. The intent is to improve understanding of the theoretical foundations and practical applications for video research methods and better define how those apply to the construction of story-based video output of research findings.

The study began with a systematic analysis of theories and practices, using interpretive analytic methods, with thematic coding of evidence for conceptual and operational aspects of designing and implementing video research methods. From this information, a frame was constructed that includes foundational aspects of using digital video in research as well as the practical aspects of using video to create narratives with the intent of presenting research findings. We used this frame to interpret aspects of our own video research, identifying evidence that exemplifies aspects of the frame we used.

A primary goal for the analysis of existing literature was to focus on evidentiary data that could provide examples that illuminate the concepts that underpin the understanding of how, when, and why video research methods are useful for a range of publishing and dissemination of transferable knowledge from research. This emphasis on communicating results in both theoretical and practical ways highlighted areas within the analysis for potential contextual similarities between our work and other projects. A central reason for interpreting findings and connecting them with evidence was the need to provide examples that could serve as potentially transferable findings for others using video with their research. Given the need for a fertile environment (Zhao & Frank, 2003 ) and attention to contextual differences to avoid lethal mutations (Brown & Campione, 1996 ), understand that these examples may not work for every situation, but the intent is to provide clear evidence of how video research methods can leverage storytelling to report research findings in a way that is consumable by a broader audience.

In the following section, we present findings from the review of research and practice, along with evidence from our work with video research, connecting the conceptual and operational frame to examples and teasing out aspects from existing literature.

Results and findings

When looking at the current literature regarding the use of video in research, we developed a Mind Map to categorize convergent and divergent themes in the current literature, see Fig.  1 . Although this is far from a complete meta-analysis on video research (notably absent is a comprehensive discussion of ethical concerns regarding video research), the Mind Map focuses on four main properties in regards to video: video as a tool for storytelling/research, properties of the camera/video itself, how video impacts the person/researcher, and methods by which the researcher/viewer consumes video.

figure 1

Mind Map of current literature regarding the use of video in mixed methods research. Link to the fully interactive Mind Map- http://clemsongc.com/ebwalker/mindmap/

Video, when used as a tool for research, can document and share ethnographic, epistemic, and storytelling data to participants and to the research team (R. Goldman, 2007 ; Heath et al., 2010 ; Miller & Zhou, 2007 ; Tobin & Hsueh, 2007 ). Much of the research in this area focuses on the properties (both positive and negative) inherent in the camera itself such as how video footage can increase the ability to see and experience the world, but can also act as a selective lens that separates an event from its natural context (S. Goldman & McDermott, 2007 ; Jewitt, n.d .; Knoblauch & Tuma, 2011 ; MacDougall, 2011 ; Miller & Zhou, 2007 ; Roth, 2007 ; Sossi, 2013 ).

Some research speaks to the role of the video-researcher within the context of the study, likening a video researcher to a participant-observer in ethnographic research (Derry, 2007 ; Roth, 2007 ; Sossi, 2013 ). The final category of research within the Mind Map focuses on the process of converting the video from an observation to records to artifact to dataset to pattern (Barron, 2007 ; R. Goldman, 2007 ; Knoblauch & Tuma, 2011 ; Newbury, 2011 ). Through this process of conversion, the video footage itself becomes an integral part of both the data and findings.

The focus throughout current literature was on video as data and the role it plays in collection and analysis during a study, but little has been written about how video could be used as a vehicle to present findings of a study. Current literature also did not address whether video-data could be used as a tool to communicate the findings of the research to a broader audience.

In a recent two-year study, the research team led by Dr. Erica Walker collected several types of video footage with the embedded intent to use video as both data and for telling the story of the study and findings once concluded (Walker, 2016 ). The study focused on a multidisciplinary team that converted a higher education Engineering course from lecture-based to game-based learning using the Cognitive Apprenticeship educational framework. The research questions examined the impact that the intervention had on student learning of domain content and twenty-first Century Skills. Utilizing video as both a data source and a delivery method was built into the methodology from the beginning. Therefore, interviews were conducted with the researchers and instructors before, during, and after the study to document consistency and changes in thoughts and observations as the study progressed. At the conclusion of the study, student participants reflected on their experience directly through individual video interviews. In addition, every class was documented using two static cameras, placed at different angles and framing, and a mobile camera unit to capture closeup shots of student-instructor, student-student, and student-content interactions. This resulted in more than six-hundred minutes of interview footage and over five-thousand minutes of classroom footage collected for the study.

Video data can be analyzed through quantitative methods (frequencies and word maps) as well as qualitative methods (emergent coding and commonalities versus outliers). Ideally, both methods are used in tandem so that preliminary results can continue to inform the overall analysis as it progresses. In order to capitalize on both methods, each interview was transcribed. The researchers leveraged digital and analog methods of coding such as digital word-search alongside hand coding the printed transcripts. Transcriptions contained timecode notations throughout, so coded segments could quickly be located in the footage and added to a timeline creating preliminary edits.

There are many software workflows that allow researchers to code, notate timecode for analysis, and pre-edit footage. In the study, Opportunities for Innovation: Game-based Learning in an Engineering Senior Design Course, NVivo qualitative analysis software was used together with paper-based analog coding. In a current study, also based on a higher education curriculum intervention, we are digitally coding and pre-trimming the footage in Adobe Prelude in addition to analog coding on the printed transcripts. Both workflows offer advantages. NVivo has built-in tools to create frequency maps and export graphs and charts relevant to qualitative analysis whereas Adobe Prelude adds coding notes directly into the footage metadata and connects directly with Adobe Premiere video editing software, which streamlines the editing process.

From our experience with both workflows, Prelude works better for a research team that has multiple team members with more video experience because it aligns with video industry workflows, implements tools that filmmakers already use, and Adobe Team Projects allows for co-editing and coding from multiple off-site locations. On the other hand, NVivo works better for research teams where members have more separate roles. NVivo is a common qualitative-analysis software so team members more familiar with traditional qualitative research can focus on coding and those more familiar with video editing can edit based on those codes allowing each team member to work within more familiar software workflows.

In both of these studies, assessments regarding storytelling occurred in conjunction with data processing and analysis. As findings were revealed, appropriate clips were grouped into timelines and edited to produce a library of short, topic-driven videos posted online , see Fig.  2 . A collection of story-based, topic-driven videos can provide other practitioners and researchers a first-hand account of how a study was designed and conducted, what worked well, recommendations of what to do differently, participant perspectives, study findings, and suggestions for further research. In fact, the videos cover many of the same topics traditionally found in publications, but in a collection of short videos accessible to a broad audience online.

figure 2

The YouTube channel created for Opportunities for Innovation: Game-based Learning in an Engineering Senior Design Course containing twenty-four short topical videos. Direct link- https://goo.gl/p8CBGG

By sharing the results of the study publicly online, conversations between practitioners and researchers can develop on a public stage. Research videos are easy to share across social media channels which can broaden the academic audience and potentially open doors for future research collaborations. As more journals move to accept multi-media studies, publicly posted videos provide additional ways to expose both academics and the general public to important study results and create easy access to related resources.

Video research as storytelling: The intersection and divergence of documentary filmmaking and video research

“Film and writing are such different modes of communication, filmmaking is not just a way of communicating the same kinds of knowledge that can be conveyed by an anthropological text. It is a way of creating different knowledge” (MacDougall, 2011 ).

When presenting research, choosing either mode of communication comes with affordances and constraints for the researcher, the participants, and the potential audience.

Many elements of documentary filmmaking, but not all, are relevant and appropriate when applied to gathering data and presenting results in video research. Documentary filmmakers have a specific angle on a story that they want to share with a broad audience. In many cases, they hope to incite action in viewers as a response to the story that unfolds on screen. In order to further their message, documentarians carefully consider the camera shots and interview clips that will convey the story clearly in a similar way to filmmakers in narrative genres. Decisions regarding what to capture and how to use the footage happen throughout the entire filmmaking process: prior to shooting footage (pre-production), while capturing footage (production), and during the editing phase (post-production).

Video researchers can employ many of the same technical skills from documentary filmmaking including interview techniques such as pre-written questions; camera skills such as framing, exposure, and lighting; and editing techniques that help draw a viewer through the storyline (Erickson, 2007 ; Tobin & Hsueh, 2007 ). In both documentary filmmaking and in video research, informed decisions are made about what footage to capture and how to employ editing techniques to produce a compelling final video.

Where video research diverges from documentary filmmaking is in how the researcher thinks about, captures, and processes the footage. Video researchers collect video as data in a more exploratory way whereas documentary filmmakers often look to capture preconceived video that will enable them to tell a specific story. For a documentary filmmaker, certain shots and interview responses are immediately discarded as they do not fit the intended narrative. For video researchers, all the video that is captured throughout a study is data and potentially part of the final research narrative. It is during the editing process (post-production) where the distinction between data and narrative becomes clear.

During post-production, video researchers are looking for clips that clearly reflect the emergent storylines seen in the collective data pool rather than the footage necessary to tell a predetermined story. Emergent storylines can be identified in several ways. Researchers look for divergent statements (where an interview subject makes unique observation different from other interviewees), convergent statements (where many different interviewees respond similarly), and unexpected statements (where something different from what was expected is revealed) (Knoblauch & Tuma, 2011 ).

When used thoughtfully, video research provides many sources of rich data. Examples include reflections of the experience, in the direct words of participants, that contain insights provided by body language and tone, an immersive glimpse into the research world as it unfolds, and the potential to capture footage throughout the entire research process rather than just during prescribed times. Video research becomes especially powerful when combined with qualitative and quantitative data from other sources because it can help reveal the context surrounding insights discovered during analysis.

We are not suggesting that video researchers should become documentary filmmakers, but researchers can learn from the stylistic approaches employed in documentary filmmaking. Video researchers implementing these tools can leverage the strengths of short-format video as a storytelling device to share findings with a more diverse audience, increase audience understanding and consumption of findings, and encourage a broader conversation around the research findings.

Implications for future work

As the development of digital media technologies continues to progress, we can expect new functionalities far exceeding current tools. These advancements will continue to expand opportunities for creating and sharing stories through video. By considering the role of video from the first stages of designing a study, researchers can employ methods that capitalize on these emerging technologies. Although they are still rapidly advancing, researchers can look for ways that augmented reality and virtual reality could change data analysis and reporting of research findings. Another emergent area is the use of machine learning and artificial intelligence to rapidly process video footage based on automated thematic coding. Continued advancements in this area could enable researchers to quickly quantify data points in large quantities of footage.

In addition to exploring new functionalities, researchers can still use current tools more effectively for capturing data, supporting analysis, and reporting findings. Mobile devices provide ready access to collect periodic video reflections from study participants and even create research vlogs (video blogs) to document and share ongoing studies as they progress. In addition, participant-created videos are rich artifacts for evaluating technical and conceptual knowledge as well as affective responses. Most importantly, as a community, researchers, designers, and documentarians can continue to take strengths from each field to further the reach of important research findings into the public sphere.

In conclusion, current research is focused on using video as a tool for data collection and analysis, but there are new, emerging opportunities for video to play an increased and diversified role in mixed methods research, especially as a tool for the presentation and consumption of findings. By leveraging the storytelling techniques used in documentary filmmaking, while staying true to the analytical methods of research design, researchers can use video to effectively communicate implications of their work to an audience beyond academia and leverage video storytelling to disseminate findings to the public.

Bailey, P. H., & Tilley, S. (2002). Storytelling and the interpretation of meaning in qualitative research. J Adv Nurs, 38(6), 574–583. http://doi.org/10.1046/j.1365-2648.2000.02224.x

Barone, T., & Eisner, E. W. (2012). Arts based research (pp. 1–183). https://doi.org/10.4135/9781452230627

Barron B (2007) Video as a tool to advance understanding of learning and development in peer, family, and other informal learning contexts. Video Research in the Learning Sciences:159–187

Brown AL, Campione JC (1996) Psychological theory and the design of innovative learning environments: on procedures, principles and systems. In: Schauble L, Glaser R (eds) Innovations in learning: new environments for education. Lawrence Erlbaum Associates, Hillsdale, NJ, pp 234–265

Google Scholar  

Chalfen, R. (2011). Looking Two Ways: Mapping the Social Scientific Study of Visual Culture. In E. Margolis & L. Pauwels (Eds.), The Sage handbook of visual research methods . books.google.com

Chattoo, C. B., & Das, A. (2014). Assessing the Social Impact of Issues-Focused Documentaries: Research Methods and Future Considerations Center for Media & Social Impact, 24. Retrieved from https://www.namac.org/wpcontent/uploads/2015/01/assessing_impact_social_issue_documentaries_cmsi.pdf

Collins, A., Joseph, D., & Bielaczyc, K. (2004). Design research: theoretical and methodological issues. Journal of the Learning Sciences, 13(1), 15–42. https://doi.org/ https://doi.org/10.1207/s15327809jls1301_2

Dahlstrom, M. F. (2014). Using narratives and storytelling to communicate science with nonexpert audiences. Proc Natl Acad Sci, 111(Supplement_4), 13614–13620. http://doi.org/10.1073/pnas.1320645111

de Carteret, P. (2008). Storytelling as research praxis, and conversations that enabled it to emerge. Int J Qual Stud Educ, 21(3), 235–249. http://doi.org/10.1080/09518390801998296

de Jager A, Fogarty A, Tewson A (2017) Digital storytelling in research: a systematic review. Qual Rep 22(10):2548–2582

Derry SJ (2007) Video research in classroom and teacher learning (Standardize that!). Video Research in the Learning Sciences:305–320

Erickson F (2007) Ways of seeing video: toward a phenomenology of viewing minimally edited footage. Video Research in the Learning Sciences:145–155

Gallagher, K. M. (2011). In search of a theoretical basis for storytelling in education research: story as method. International Journal of Research and Method in Education, 34(1), 49–61. http://doi.org/10.1080/1743727X.2011.552308

Gliner, J. A., Morgan, G. A., & Leech, N. L. (2009). Research Methods in Applied Settings: An Integrated Approach to Design and Analysis, Second Edition . Taylor & Francis

Goldman R (2007) Video representations and the perspectivity framework: epistemology, ethnography, evaluation, and ethics. Video Research in the Learning Sciences 37:3–37

Goldman S, McDermott R (2007) Staying the course with video analysis Video Research in the Learning Sciences:101–113

Hancox, D. (2017). From subject to collaborator: transmedia storytelling and social research. Convergence, 23(1), 49–60. http://doi.org/10.1177/1354856516675252

Heath, C., Hindmarsh, J., & Luff, P.(2010). Video in Qualitative Research. SAGE Publications. Retrieved from https://market.android.com/details?id=book-MtmViguNi4UC

Heider KG (2009) Ethnographic film: revised edition. University of Texas Press

Henley P (2010) The Adventure of the Real: Jean Rouch and the Craft of Ethnographic Cinema. University of Chicago Press

Jewitt, C. (n.d). An introduction to using video for research - NCRM EPrints Repository. National Centre for Research Methods. Institute for Education, London. Retrieved from http://eprints.ncrm.ac.uk/2259/4/NCRM_workingpaper_0312.pdf

Knoblauch H, Tuma R (2011) Videography: An interpretative approach to video-recorded micro-social interaction. The SAGE Handbook of Visual Research Methods :414–430

Koehler D (2012) Documentary and ethnography: exploring ethical fieldwork models. Elon Journal Undergraduate Research in Communications 3(1):53–59 Retrieved from https://www.elon.edu/docs/e-web/academics/communications/research/vol3no1/EJSpring12_Full.pdf#page=53i

Lawrence-Lightfoot, S. (2005). Reflections on portraiture: a dialogue between art and science. Qualitative Inquiry: QI, 11(1), 3–15. https://doi.org/10.1177/1077800404270955

LeBaron, C., Jarzabkowski, P., Pratt, M. G., & Fetzer, G. (2017). An introduction to video methods in organizational research. Organ Res Methods, 21(2), 109442811774564. http://doi.org/10.1177/1094428117745649

Lenette, C., Cox, L., & Brough, M. (2013). Digital storytelling as a social work tool: learning from ethnographic research with women from refugee backgrounds. Br J Soc Work, 45(3), 988–1005. https://doi.org/10.1093/bjsw/bct184

Lewis, P. J. (2011). Storytelling as research/research as storytelling. Qual Inq, 17(6), 505–510. http://doi.org/10.1177/1077800411409883

(2011) Anthropological filmmaking: An empirical art. In: The sage handbook of visual research methods. MacDougall, D, pp 99–113

Meadows D (2003) Digital storytelling: research-based practice in new media. Visual Com(2):189–193

Miller K, Zhou X (2007) Learning from classroom video: what makes it compelling and what makes it hard. Video Research in the Learning Sciences:321–334

Newbury, D. (2011). Making arguments with images: Visual scholarship and academic publishing. In Eric Margolis & (Ed.), The SAGE Handbook of Visual Research Methods . na

Nichols B (2010) Why are ethical issues central to documentary filmmaking? Introduction to Documentary , Second Edition . In: 42–66

Roth W-M (2007) Epistemic mediation: video data as filters for the objectification of teaching by teachers. In: Goldman R, Pea R, Barron B, Derry SJ (eds) Video research in the learning sciences. Lawrence Erlbaum Ass Mahwah, NJ, pp 367–382

Sossi, D. (2013). Digital Icarus? Academic Knowledge Construction and Multimodal Curriculum Development, 339

Tobin J, Hsueh Y (2007) The poetics and pleasures of video ethnography of education. Video Research in the Learning Sciences:77–92

Using Film in Ethnographic Field Research - Methods@Manchester - The University of Manchester. (n.d.). Retrieved March 12, 2018, from https://www.methods.manchester.ac.uk/themes/ethnographic-methods/ethnographic-field-research/

Walker, E. B. (2016). Opportunities for Innovation: Game-based Learning in an Engineering Senior Design Course (PhD). Clemson University. Retrieved from http://tigerprints.clemson.edu/all_dissertations/1805/

Warmington, P., van Gorp, A., & Grosvenor, I. (2011). Education in motion: uses of documentary film in educational research. Paedagog Hist, 47(4), 457–472. https://doi.org/10.1080/00309230.2011.588239

Zhao, Y., & Frank, K. A. (2003). Factors affecting technology uses in schools: an ecological perspective. Am Educ Res J , 40(4), 807–840. https://doi.org/10.3102/00028312040004807

Download references

There was no external or internal funding for this study.

Availability of data and materials

Data is available in the Mind Map online which visually combines and interprets the full reference section available at the end of the full document.

Author information

Authors and affiliations.

Department of Graphic Communication, Clemson University, 207 Godfrey Hall, Clemson, SC, 29634, USA

Erica B. Walker

College of Education, Clemson University, 207 Tillman Hall, Clemson, SC, 29634, USA

D. Matthew Boyer

You can also search for this author in PubMed   Google Scholar

Contributions

Article was co-written by both authors and based on previous work by Dr. Walker where Dr. Boyer served in the role of dissertation Committee Chair. Both authors read and approved the final manuscript.

Corresponding author

Correspondence to Erica B. Walker .

Ethics declarations

Competing interests.

Neither of the authors have any competing interest regarding this study or publication.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/ ), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article.

Walker, E.B., Boyer, D.M. Research as storytelling: the use of video for mixed methods research. Video J. of Educ. and Pedagogy 3 , 8 (2018). https://doi.org/10.1186/s40990-018-0020-4

Download citation

Received : 10 May 2018

Accepted : 04 July 2018

Published : 28 July 2018

DOI : https://doi.org/10.1186/s40990-018-0020-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Mixed methods
  • Storytelling
  • Video research

video research papers

  • Access through  your organization
  • Purchase PDF

Article preview

Introduction, section snippets, references (166), cited by (57).

Elsevier

Engineering Applications of Artificial Intelligence

A systematic review on content-based video retrieval ☆.

  • • Finding of Dimensionality Reduction (DR) approaches for video retrieval: DR, an important pre-processing procedure to reduce curse of dimensionality effects, is usual in automatic processes to learn from data (Han and Kamber, 2011). This curse involves phenomena regarding the increasing sparsity of the data as the number of dimensions grows (Liu and Motoda, 2007). In CBVIR, dimensionality reduction can yield a small set of video indexes more useful for retrieval and cheaper to be extracted from new videos than the original set of features. Thus, this paper pays attention to the DR approaches used in relevant papers. Besides dimensionality reduction, this work reviews segmentation, feature extraction and machine learning approaches due to their frequent use in the literature and their discussion in previous surveys (Puthenputhussery et al., 2017, Priya and Shanmugam, 2013, Hu et al., 2011);
  • • Proposal of a review protocol on video indexing and retrieval: by sharing the designed protocol, this work supports other researchers to (1) replicate and update the current review and (2) apply the SR method in other research topics associated with video content. Different from Cedillo-Hernandez et al. (2014), our protocol reviews papers that consider video retrieval and indexing regardless of the video domain;
  • • Publication period: we supplement earlier surveys by focusing on papers published from 2011 to 2018.

Systematic review process

Approaches for video indexing and retrieval, other findings from the systematic review, future directions, final remarks, credit authorship contribution statement, acknowledgments, a smart atlas for endomicroscopy using automated video retrieval, méd. image anal., a survey on the use of pattern recognition methods for abstraction, indexing and retrieval of images and video, pattern recognit., automatic speech recognition for under-resourced languages: a survey, speech commun., hybrid soft computing approaches to content based video retrieval: a brief review, appl. softw. comput., weber binarized statistical image features (wbsif) based video copy detection, j. vis. commun. image represent., feature selection in machine learning: a new perspective, neurocomputing, a spatio-temporal pyramid matching for video retrieval, comput. vis. image underst., a kpca spatio-temporal differential geometric trajectory cloud classifier for recognizing human actions in a cbvr system, expert syst. appl., deep learning for visual understanding: a review, an effective and economical architecture for semantic-based heterogeneous multimedia big data retrieval, j. syst. softw., content-based retrieval of human actions from realistic video databases, inform. sci., active learning for human action retrieval using query pool selection, relevance feedback for real-world human action retrieval, pattern recognit. lett., design of video retrieval system using mpeg-7 descriptors, procedia eng., methods and challenges in shot boundary detection: a review, tag-based video retrieval by embedding semantic content in a continuous word space, enhancing sketch-based sport video retrieval by suggesting relevant motion paths, hierarchical key-frame based video shot clustering using generalized trace kernel, learning semantic and visual similarity for endomicroscopy video retrieval, ieee trans. méd. imaging, a content based video retrieval analysis system with extensive features by using kullback-leibler, int. j. comput. intell. syst., content based video retrieval using surf descriptor, a medical image retrieval scheme with relevance feedback through a medical social network, soc. netw. anal. min., large-scale endoscopic image and video linking with gradient-based signatures, visual indexing and retrieval, statistics for high-dimensional data, a visual model approach for parsing colonoscopy videos, content based video retrival system for mexican culture heritage based on object matching and local-global descriptors, surveillance video retrieval using effective matching techniques, choosing multiple parameters for support vector machines, mach. learn., automated surgical step recognition in normalized cataract surgery videos, real-time analysis of cataract surgery videos using statistical models, multimedia tools appl., deep learning with r, a novel multi-metric scheme using dynamic time warping for similarity video clip search, automatic data-driven real-time segmentation and recognition of surgical workflow, int. j. comput. assist. radiol. surg., video cut detection without thresholds, complex event detection by identifying reliable shots from untrimmed videos, content-based video indexing and retrieval, a qoe centric distributed caching approach for vehicular video streaming in cellular networks, wirel. commun. mob. comput., video big data retrieval over media cloud: a context-aware online learning approach, ieee trans. multimed., video captioning with attention-based lstm and semantic consistency, performance characterization of video-shot-change detection methods, ieee trans. circuits syst. video technol., a sketch-based approach to video retrieval using qualitative features, digital image processing, empirical evaluation of dissimilarity measures for 3d object retrieval with application to multi-feature retrieval, an improved system for concept-based video retrieval, cold start thread recommendation as extreme multi-label classification, data mining: concepts and techniques, handbook of cluster analysis, a survey on visual content-based video indexing and retrieval, ieee trans. syst. man cybern. c, lvtia: a new method for keyphrase extraction from scientific video lectures.

The rapid growth in the popularity of e-learning has led to the production of large volumes of scientific lecture videos in various fields. Since researchers in different fields need such information in their research, it is necessary to provide appropriate techniques for retrieving such multimedia information to access them easily Dias, Barrére, and de Souza (2020), Spolaôr et al. (2020), Turcu, Mihaescu, Heras, Palanca, and Turcu (2019). Retrieving such information requires applying proper automatic keyword/keyphrase extraction approaches.

Human action recognition using attention based LSTM network with dilated CNN features

Video based action recognition is an emerging and challenging area of research in this era particularly for identifying and recognizing actions in a video sequence from a surveillance stream. The action recognition in a video has many applications, such as content-based video retrieval [1], surveillance systems for security and privacy purposes [2], human–computer-interaction, and activity recognition [3]. Nowadays, the digital contents are exponentially growing day-by-day, so effective AI-based intelligent internet of things (IoT) systems [2,4] are needed for surveillance to monitor and identify human actions and activities.

An overview of violence detection techniques: current challenges and future directions

Content-based video retrieval with prototypes of deep features, a distributed content-based video retrieval system for large datasets, video big data analytics in the cloud: a reference architecture, survey, opportunities, and open research issues.

  • Survey paper
  • Open access
  • Published: 06 June 2019

Intelligent video surveillance: a review through deep learning techniques for crowd analysis

  • G. Sreenu   ORCID: orcid.org/0000-0002-2298-9177 1 &
  • M. A. Saleem Durai 1  

Journal of Big Data volume  6 , Article number:  48 ( 2019 ) Cite this article

73k Accesses

268 Citations

12 Altmetric

Metrics details

Big data applications are consuming most of the space in industry and research area. Among the widespread examples of big data, the role of video streams from CCTV cameras is equally important as other sources like social media data, sensor data, agriculture data, medical data and data evolved from space research. Surveillance videos have a major contribution in unstructured big data. CCTV cameras are implemented in all places where security having much importance. Manual surveillance seems tedious and time consuming. Security can be defined in different terms in different contexts like theft identification, violence detection, chances of explosion etc. In crowded public places the term security covers almost all type of abnormal events. Among them violence detection is difficult to handle since it involves group activity. The anomalous or abnormal activity analysis in a crowd video scene is very difficult due to several real world constraints. The paper includes a deep rooted survey which starts from object recognition, action recognition, crowd analysis and finally violence detection in a crowd environment. Majority of the papers reviewed in this survey are based on deep learning technique. Various deep learning methods are compared in terms of their algorithms and models. The main focus of this survey is application of deep learning techniques in detecting the exact count, involved persons and the happened activity in a large crowd at all climate conditions. Paper discusses the underlying deep learning implementation technology involved in various crowd video analysis methods. Real time processing, an important issue which is yet to be explored more in this field is also considered. Not many methods are there in handling all these issues simultaneously. The issues recognized in existing methods are identified and summarized. Also future direction is given to reduce the obstacles identified. The survey provides a bibliographic summary of papers from ScienceDirect, IEEE Xplore and ACM digital library.

Bibliographic Summary of papers in different digital repositories

Bibliographic summary about published papers under the area “Surveillance video analysis through deep learning” in digital repositories like ScienceDirect, IEEExplore and ACM are graphically demonstrated.

ScienceDirect

SceinceDirect lists around 1851 papers. Figure  1 demonstrates the year wise statistics.

figure 1

Year wise paper statistics of “surveillance video analysis by deep learning”, in ScienceDirect

Table  1 list title of 25 papers published under same area.

Table  2 gives the list of journals in ScienceDirect where above mentioned papers are published.

Keywords always indicate the main disciplines of the paper. An analysis is conducted through keywords used in published papers. Table  3 list the frequency of most frequently used keywords.

ACM digital library includes 20,975 papers in the given area. The table below includes most recently published surveillance video analysis papers under deep learning field. Table  4 lists the details of published papers in the area.

IEEE Xplore

Table  5 shows details of published papers in the given area in IEEEXplore digital library.

Violence detection among crowd

The above survey presents the topic surveillance video analysis as a general topic. By going more deeper into the area more focus is given to violence detection in crowd behavior analysis.

Table  6 lists papers specific to “violence detection in crowd behavior” from above mentioned three journals.

Introduction

Artificial intelligence paves the way for computers to think like human. Machine learning makes the way more even by adding training and learning components. The availability of huge dataset and high performance computers lead the light to deep learning concept, which extract automatically features or the factors of variation that distinguishes objects from one another. Among the various data sources which contribute to terabytes of big data, video surveillance data is having much social relevance in today’s world. The widespread availability of surveillance data from cameras installed in residential areas, industrial plants, educational institutions and commercial firms contribute towards private data while the cameras placed in public places such as city centers, public conveyances and religious places contribute to public data.

Analysis of surveillance videos involves a series of modules like object recognition, action recognition and classification of identified actions into categories like anomalous or normal. This survey giving specific focus on solutions based on deep learning architectures. Among the various architectures in deep learning, commonly used models for surveillance analysis are CNN, auto-encoders and their combination. The paper Video surveillance systems-current status and future trends [ 14 ] compares 20 papers published recently in the area of surveillance video analysis. The paper begins with identifying the main outcomes of video analysis. Application areas where surveillance cameras are unavoidable are discussed. Current status and trends in video analysis are revealed through literature review. Finally the vital points which need more consideration in near future are explicitly stated.

Surveillance video analysis: relevance in present world

The main objectives identified which illustrate the relevance of the topic are listed out below.

Continuous monitoring of videos is difficult and tiresome for humans.

Intelligent surveillance video analysis is a solution to laborious human task.

Intelligence should be visible in all real world scenarios.

Maximum accuracy is needed in object identification and action recognition.

Tasks like crowd analysis are still needs lot of improvement.

Time taken for response generation is highly important in real world situation.

Prediction of certain movement or action or violence is highly useful in emergency situation like stampede.

Availability of huge data in video forms.

The majority of papers covered for this survey give importance to object recognition and action detection. Some papers are using procedures similar to a binary classification that whether action is anomalous or not anomalous. Methods for Crowd analysis and violence detection are also included. Application areas identified are included in the next section.

Application areas identified

The contexts identified are listed as application areas. Major part in existing work provides solutions specifically based on the context.

Traffic signals and main junctions

Residential areas

Crowd pulling meetings

Festivals as part of religious institutions

Inside office buildings

Among the listed contexts crowd analysis is the most difficult part. All type of actions, behavior and movement are needed to be identified.

Surveillance video data as Big Data

Big video data have evolved in the form of increasing number of public cameras situated towards public places. A huge amount of networked public cameras are positioned around worldwide. A heavy data stream is generated from public surveillance cameras that are creatively exploitable for capturing behaviors. Considering the huge amount of data that can be documented over time, a vital scenario is facility for data warehousing and data analysis. Only one high definition video camera can produce around 10 GB of data per day [ 87 ].

The space needed for storing large amount of surveillance videos for long time is difficult to allot. Instead of having data, it will be useful to have the analysis result. That will result in reduced storage space. Deep learning techniques are involved with two main components; training and learning. Both can be achieved with highest accuracy through huge amount of data.

Main advantages of training with huge amount of data are listed below. It’s possible to adapt variety in data representation and also it can be divided into training and testing equally. Various data sets available for analysis are listed below. The dataset not only includes video sequences but also frames. The analysis part mainly includes analysis of frames which were extracted from videos. So dataset including images are also useful.

The datasets widely used for various kinds of application implementation are listed in below Table  7 . The list is not specific to a particular application though it is specified against an application.

Methods identified/reviewed other than deep learning

Methods identified are mainly classified into two categories which are either based on deep learning or not based on deep learning. This section is reviewing methods other than deep learning.

SVAS deals with automatic recognition and deduction of complex events. The event detection procedure consists of mainly two levels, low level and high level. As a result of low level analysis people and objects are detected. The results obtained from low level are used for high level analysis that is event detection. The architecture proposed in the model includes five main modules. The five sections are

Event model learning

Action model learning

Action detection

Complex event model learning

Complex event detection

Interval-based spatio-temporal model (IBSTM) is the proposed model and is a hybrid event model. Other than this methods like Threshold models, Bayesian Networks, Bag of actions and Highly cohesive intervals and Markov logic networks are used.

SVAS method can be improved to deal with moving camera and multi camera data set. Further enhancements are needed in dealing with complex events specifically in areas like calibration and noise elimination.

Multiple anomalous activity detection in videos [ 88 ] is a rule based system. The features are identified as motion patterns. Detection of anomalous events are done either by training the system or by following dominant set property.

The concept of dominant set where events are detected as normal based on dominant behavior and anomalous events are decided based on less dominant behavior. The advantage of rule based system is that easy to recognize new events by modifying some rules. The main steps involved in a recognition system are

Pre processing

Feature extraction

Object tracking

Behavior understanding

As a preprocessing system video segmentation is used. Background modeling is implemented through Gaussian Mixture Model (GMM). For object recognition external rules are required. The system is implemented in Matlab 2014. The areas were more concentration further needed are doubtful activities and situations where multiple object overlapping happens.

Mining anomalous events against frequent sequences in surveillance videos from commercial environments [ 89 ] focus on abnormal events linked with frequent chain of events. The main result in identifying such events is early deployment of resources in particular areas. The implementation part is done using Matlab, Inputs are already noticed events and identified frequent series of events. The main investigation under this method is to recognize events which are implausible to chase given sequential pattern by fulfilling the user identified parameters.

The method is giving more focus on event level analysis and it will be interesting if pay attention at entity level and action level. But at the same time going in such granular level make the process costly.

Video feature descriptor combining motion and appearance cues with length invariant characteristics [ 90 ] is a feature descriptor. Many trajectory based methods have been used in abundant installations. But those methods have to face problems related with occlusions. As a solution to that, feature descriptor using optical flow based method.

As per the algorithm the training set is divided into snippet set. From each set images are extracted and then optical flow are calculated. The covariance is calculated from optical flow. One class SVM is used for learning samples. For testing also same procedure is performed.

The model can be extended in future by handling local abnormal event detection through proposed feature which is related with objectness method.

Multiple Hierarchical Dirichlet processes for anomaly detection in Traffic [ 91 ] is mainly for understanding the situation in real world traffic. The anomalies are mainly due to global patterns instead of local patterns. That include entire frame. Concept of super pixel is included. Super pixels are grouped into regions of interest. Optical flow based method is used for calculating motion in each super pixel. Points of interest are then taken out in active super pixel. Those interested points are then tracked by Kanade–Lucas–Tomasi (KLT) tracker.

The method is better the handle videos involving complex patterns with less cost. But not mentioning about videos taken in rainy season and bad weather conditions.

Intelligent video surveillance beyond robust background modeling [ 92 ] handle complex environment with sudden illumination changes. Also the method will reduce false alerts. Mainly two components are there. IDS and PSD are the two components.

First stage intruder detection system will detect object. Classifier will verify the result and identify scenes causing problems. Then in second stage problematic scene descriptor will handle positives generated from IDS. Global features are used to avoid false positives from IDS.

Though the method deals with complex scenes, it does not mentioning about bad weather conditions.

Towards abnormal trajectory and event detection in video surveillance [ 93 ] works like an integrated pipeline. Existing methods either use trajectory based approaches or pixel based approaches. But this proposal incorporates both methods. Proposal include components like

Object and group tracking

Grid based analysis

Trajectory filtering

Abnormal behavior detection using actions descriptors

The method can identify abnormal behavior in both individual and groups. The method can be enhanced by adapting it to work in real time environment.

RIMOC: a feature to discriminate unstructured motions: application to violence detection for video surveillance [ 94 ]. There is no unique definition for violent behaviors. Those kind of behaviors show large variances in body poses. The method works by taking the eigen values of histograms of optical flow.

The input video undergoes dense sampling. Local spatio temporal volumes are created around each sampled point. Those frames of STV are coded as histograms of optical flow. Eigen values are computed from this frame. The papers already published in surveillance area span across a large set. Among them methods which are unique in either implementation method or the application for which it is proposed are listed in the below Table  8 .

The methods already described and listed are able to perform following steps

Object detection

Object discrimination

Action recognition

But these methods are not so efficient in selecting good features in general. The lag identified in methods was absence of automatic feature identification. That issue can be solved by applying concepts of deep learning.

The evolution of artificial intelligence from rule based system to automatic feature identification passes machine learning, representation learning and finally deep learning.

Real-time processing in video analysis

Real time Violence Detection Framework for Football Stadium comprising of Big Data Analysis and deep learning through Bidirectional LSTM [ 103 ] predicts violent behavior of crowd in real time. The real time processing speed is achieved through SPARK frame work. The model architecture includes Apache spark framework, spark streaming, Histogram of oriented Gradients function and bidirectional LSTM. The model takes stream of videos from diverse sources as input. The videos are converted in the form of non overlapping frames. Features are extracted from this group of frames through HOG FUNCTION. The images are manually modeled into different groups. The BDLSTM is trained through all these models. The SPARK framework handles the streaming data in a micro batch mode. Two kinds of processing are there like stream and batch processing.

Intelligent video surveillance for real-time detection of suicide attempts [ 104 ] is an effort to prevent suicide by hanging in prisons. The method uses depth streams offered by an RGB-D camera. The body joints’ points are analyzed to represent suicidal behavior.

Spatio-temporal texture modeling for real-time crowd anomaly detection [ 105 ]. Spatio temporal texture is a combination of spatio temporal slices and spatio temporal volumes. The information present in these slices are abstracted through wavelet transforms. A Gaussian approximation model is applied to texture patterns to distinguish normal behaviors from abnormal behaviors.

Deep learning models in surveillance

Deep convolutional framework for abnormal behavior detection in a smart surveillance system [ 106 ] includes three sections.

Human subject detection and discrimination

A posture classification module

An abnormal behavior detection module

The models used for above three sections are, Correspondingly

You only look once (YOLO) network

Long short-term memory (LSTM)

For object discrimination Kalman filter based object entity discrimination algorithm is used. Posture classification study recognizes 10 types of poses. RNN uses back propagation through time (BPTT) to update weight.

The main issue identified in the method is that similar activities like pointing and punching are difficult to distinguish.

Detecting Anomalous events in videos by learning deep representations of appearance and motion [ 107 ] proposes a new model named as AMDN. The model automatically learns feature representations. The model uses stacked de-noising auto encoders for learning appearance and motion features separately and jointly. After learning, multiple one class SVM’s are trained. These SVM predict anomaly score of each input. Later these scores are combined and detect abnormal event. A double fusion framework is used. The computational overhead in testing time is too high for real time processing.

A study of deep convolutional auto encoders for anomaly detection in videos [ 12 ] proposes a structure that is a mixture of auto encoders and CNN. An auto encoder includes an encoder part and decoder part. The encoder part includes convolutional and pooling layers, the decoding part include de convolutional and unpool layers. The architecture allows a combination of low level frames withs high level appearance and motion features. Anomaly scores are represented through reconstruction errors.

Going deeper with convolutions [ 108 ] suggests improvements over traditional neural network. Fully connected layers are replaced by sparse ones by adding sparsity into architecture. The paper suggests for dimensionality reduction which help to reduce the increasing demand for computational resources. Computing reductions happens with 1 × 1 convolutions before reaching 5 × 5 convolutions. The method is not mentioning about the execution time. Along with that not able to make conclusion about the crowd size that the method can handle successfully.

Deep learning for visual understanding: a review [ 109 ], reviewing the fundamental models in deep learning. Models and technique described were CNN, RBM, Autoencoder and Sparse coding. The paper also mention the drawbacks of deep learning models such as people were not able to understand the underlying theory very well.

Deep learning methods other than the ones discussed above are listed in the following Table  9 .

The methods reviewed in above sections are good in automatic feature generation. All methods are good in handling individual entity and group entities with limited size.

Majority of problems in real world arises among crowd. Above mentioned methods are not effective in handling crowd scenes. Next section will review intelligent methods for analyzing crowd video scenes.

Review in the field of crowd analysis

The review include methods which are having deep learning background and methods which are not having that background.

Spatial temporal convolutional neural networks for anomaly detection and localization in crowded scenes [ 114 ] shows the problem related with crowd analysis is challenging because of the following reasons

Large number of pedestrians

Close proximity

Volatility of individual appearance

Frequent partial occlusions

Irregular motion pattern in crowd

Dangerous activities like crowd panic

Frame level and pixel level detection

The paper suggests optical flow based solution. The CNN is having eight layers. Training is based on BVLC caffe. Random initialization of parameters is done and system is trained through stochastic gradient descent based back propagation. The implementation part is done by considering four different datasets like UCSD, UMN, Subway and finally U-turn. The details of implementation regarding UCSD includes frame level and pixel level criterion. Frame level criterion concentrates on temporal domain and pixel level criterion considers both spatiial and temporal domain. Different metrics to evaluate performance includes EER (Equal Error Rate) and Detection Rate (DR).

Online real time crowd behavior detection in video sequences [ 115 ] suggests FSCB, behavior detection through feature tracking and image segmentation. The procedure involves following steps

Feature detection and temporal filtering

Image segmentation and blob extraction

Activity detection

Activity map

Activity analysis

The main advantage is no need of training stage for this method. The method is quantitatively analyzed through ROC curve generation. The computational speed is evaluated through frame rate. The data set considered for experiments include UMN, PETS2009, AGORASET and Rome Marathon.

Deep learning for scene independent crowd analysis [ 82 ] proposes a scene independent method which include following procedures

Crowd segmentation and detection

Crowd tracking

Crowd counting

Pedestrian travelling time estimation

Crowd attribute recognition

Crowd behavior analysis

Abnormality detection in a crowd

Attribute recognition is done thorugh a slicing CNN. By using a 2D CNN model learn appearance features then represent it as a cuboid. In the cuboid three temporal filters are identified. Then a classifier is applied on concatenated feature vector extracted from cuboid. Crowd counting and crowd density estimation is treated as a regression problem. Crowd attribute recognition is applied on WWW Crowd dataset. Evaluation metrics used are AUC and AP.

The analysis of High Density Crowds in videos [ 80 ] describes methods like data driven crowd analysis and density aware tracking. Data driven analysis learn crowd motion patterns from large collection of crowd videos through an off line manner. Learned pattern can be applied or transferred in applications. The solution includes a two step procedure. Global crowded scene matching and local crowd patch matching. Figure  2 illustrates the two step procedure.

figure 2

a Test video, b results of global matching, c a query crowd patch, d matching crowd patches [ 80 ]

The database selected for experimental evaluation includes 520 unique videos with 720 × 480 resolutions. The main evaluation is to track unusual and unexpected actions of individuals in a crowd. Through experiments it is proven that data driven tracking is better than batch mode tracking. Density based person detection and tracking include steps like baseline detector, geometric filtering and tracking using density aware detector.

A review on classifying abnormal behavior in crowd scene [ 77 ] mainly demonstrates four key approaches such as Hidden Markov Model (HMM), GMM, optical flow and STT. GMM itself is enhanced with different techniques to capture abnormal behaviours. The enhanced versions of GMM are

GMM and Markov random field

Gaussian poisson mixture model and

GMM and support vector machine

GMM architecture includes components like local descriptor, global descriptor, classifiers and finally a fusion strategy. The distinction between normal and and abnormal behaviour is evaluated based on Mahalanobis distance method. GMM–MRF model mainly divided into two sections where first section identifies motion pttern through GMM and crowd context modelling is done through MRF. GPMM adds one extra feture such as count of occurrence of observed behaviour. Also EM is used for training at later stage of GPMM. GMM–SVM incorporate features such as crowd collectiveness, crowd density, crowd conflict etc. for abnormality detection.

HMM has also variants like

HM and OSVMs

Hidden Markov Model is a density aware detection method used to detect motion based abnormality. The method generates foreground mask and perspective mask through ORB detector. GM-HMM involves four major steps. First step GMBM is used for identifying foreground pixels and further lead to development of blobs generation. In second stage PCA–HOG and motion HOG are used for feature extraction. The third stage applies k means clustering to separately cluster features generated through PCA–HOG and motion–HOG. In final stage HMM processes continuous information of moving target through the application of GM. In SLT-HMM short local trajectories are used along with HMM to achieve better localization of moving objects. MOHMM uses KLT in first phase to generate trajectories and clustering is applied on them. Second phase uses MOHMM to represent the trajectories to define usual and unusual frames. OSVM uses kernel functions to solve the nonlinearity problem by mapping high dimensional features in to a linear space by using kernel function.

In optical flow based method the enhancements made are categorized into following techniques such as HOFH, HOFME, HMOFP and MOFE.

In HOFH video frames are divided into several same size patches. Then optical flows are extracted. It is divided into eight directions. Then expectation and variance features are used to calculate optical flow between frames. HOFME descriptor is used at the final stage of abnormal behaviour detection. As the first step frame difference is calculated then extraction of optical flow pattern and finally spatio temporal description using HOFME is completed. HMOFP Extract optical flow from each frame and divided into patches. The optical flows are segmented into number of bins. Maximum amplitude flows are concatenated to form global HMOFP. MOFE method convert frames into blobs and optical flow in all the blobs are extracted. These optical flow are then clustered into different groups. In STT, crowd tracking and abnormal behaviour detection is done through combing spatial and temporal dimensions of features.

Crowd behaviour analysis from fixed and moving cameras [ 78 ] covers topics like microscopic and macroscopic crowd modeling, crowd behavior and crowd density analysis and datasets for crowd behavior analysis. Large crowds are handled through macroscopic approaches. Here agents are handled as a whole. In microscopic approaches agents are handled individually. Motion information to represent crowd can be collected through fixed and moving cameras. CNN based methods like end-to-end deep CNN, Hydra-CNN architecture, switching CNN, cascade CNN architecture, 3D CNN and spatio temporal CNN are discussed for crowd behaviour analysis. Different datasets useful specifically for crowd behaviour analysis are also described in the chapter. The metrics used are MOTA (multiple person tracker accuracy) and MOTP (multiple person tracker precision). These metrics consider multi target scenarios usually present in crowd scenes. The dataset used for experimental evaluation consists of UCSD, Violent-flows, CUHK, UCF50, Rodriguez’s, The mall and finally the worldExpo’s dataset.

Zero-shot crowd behavior recognition [ 79 ] suggests recognizers with no or little training data. The basic idea behind the approach is attribute-context cooccurrence. Prediction of behavioural attribute is done based on their relationship with known attributes. The method encompass different steps like probabilistic zero shot prediction. The method calculates the conditional probability of known to original appropriate attribute relation. The second step includes learning attribute relatedness from Text Corpora and Context learning from visual co-occurrence. Figure  3 shows the illustration of results.

figure 3

Demonstration of crowd videos ranked in accordance with prediction values [ 79 ]

Computer vision based crowd disaster avoidance system: a survey [ 81 ] covers different perspectives of crowd scene analysis such as number of cameras employed and target of interest. Along with that crowd behavior analysis, people count, crowd density estimation, person re identification, crowd evacuation, and forensic analysis on crowd disaster and computations on crowd analysis. A brief summary about benchmarked datasets are also given.

Fast Face Detection in Violent Video Scenes [ 83 ] suggests an architecture with three steps such as violent scene detector, a normalization algorithm and finally a face detector. ViF descriptor along with Horn–Schunck is used for violent scene detection, used as optical flow algorithm. Normalization procedure includes gamma intensity correction, difference Gauss, Local Histogram Coincidence and Local Normal Distribution. Face detection involve mainly two stages. First stage is segmenting regions of skin and the second stage check each component of face.

Rejecting Motion Outliers for Efficient Crowd Anomaly Detection [ 54 ] provides a solution which consists of two phases. Feature extraction and anomaly classification. Feature extraction is based on flow. Different steps involved in the pipeline are input video is divided into frames, frames are divided into super pixels, extracting histogram for each super pixel, aggregating histograms spatially and finally concatenation of combined histograms from consecutive frames for taking out final feature. Anomaly can be detected through existing classification algorithms. The implementation is done through UCSD dataset. Two subsets with resolution 158 × 238 and 240 × 360 are present. The normal behavior was used to train k means and KUGDA. The normal and abnormal behavior is used to train linear SVM. The hardware part includes Artix 7 xc7a200t FPGA from Xilinx, Xilinx IST and XPower Analyzer.

Deep Metric Learning for Crowdedness Regression [ 84 ] includes deep network model where learning of features and distance measurements are done concurrently. Metric learning is used to study a fine distance measurement. The proposed model is implemented through Tensorflow package. Rectified linear unit is used as an activation function. The training method applied is gradient descent. Performance is evaluated through mean squared error and mean absolute error. The WorldExpo dataset and the Shanghai Tech dataset are used for experimental evaluation.

A Deep Spatiotemporal Perspective for Understanding Crowd Behavior [ 61 ] is a combination of convolution layer and long short-term memory. Spatial informations are captured through convolution layer and temporal motion dynamics are confined through LSTM. The method forecasts the pedestrian path, estimate the destination and finally categorize the behavior of individuals according to motion pattern. Path forecasting technique includes two stacked ConvLSTM layers by 128 hidden states. Kernel of ConvLSTM size is 3 × 3, with a stride of 1 and zeropadding. Model takes up a single convolution layer with a 1 × 1 kernel size. Crowd behavior classification is achieved through a combination of three layers namely an average spatial pooling layer, a fully connected layer and a softmax layer.

Crowded Scene Understanding by Deeply Learned Volumetric Slices [ 85 ] suggests a deep model and different fusion approaches. The architecture involves convolution layers, global sum pooling layer and fully connected layers. Slice fusion and weight sharing schemes are required by the architecture. A new multitask learning deep model is projected to equally study motion features and appearance features and successfully join them. A new concept of crowd motion channels are designed as input to the model. The motion channel analyzes the temporal progress of contents in crowd videos. The motion channels are stirred by temporal slices that clearly demonstrate the temporal growth of contents in crowd videos. In addition, we also conduct wide-ranging evaluations by multiple deep structures with various data fusion and weights sharing schemes to find out temporal features. The network is configured with convlutional layer, pooling layer and fully connected layer with activation functions such as rectified linear unit and sigmoid function. Three different kinds of slice fusion techniques are applied to measure the efficiency of proposed input channels.

Crowd Scene Understanding from Video A survey [ 86 ] mainly deals with crowd counting. Different approaches for crowd counting are categorized into six. Pixel level analysis, texture level analysis, object level analysis, line counting, density mapping and joint detection and counting. Edge features are analyzed through pixel level analysis. Image patches are analysed through texture level analysis. Object level analysis is more accurate compared to pixel and texture analysis. The method identifies individual subjects in a scene. Line counting is used to take the count of people crossed a particular line.

Table  10 will discuss some more crowd analysis methods.

Results observed from the survey and future directions

The accuracy analysis conducted for some of the above discussed methods based on various evaluation criteria like AUC, precision and recall are discussed below.

Rejecting Motion Outliers for Efficient Crowd Anomaly Detection [ 54 ] compare different methods as shown in Fig.  4 . KUGDA is a classifier proposed in Rejecting Motion Outliers for Efficient Crowd Anomaly Detection [ 54 ].

figure 4

Comparing KUGDA with K-means [ 54 ]

Fast Face Detection in Violent Video Scenes [ 83 ] uses a ViF descriptor for violence scene detection. Figure 5 shows the evaluation of an SVM classifier using ROC curve.

figure 5

Receiver operating characteristics of a classifier with ViF descriptor [ 83 ]

Figure  6 represents a comparison of detection performance which is conducted by different methods [ 80 ]. The comparison shows the improvement of density aware detector over other methods.

figure 6

Comparing detection performance of density aware detector with different methods [ 80 ]

As an analysis of existing methods the following shortcomings were identified. Real world problems are having following objectives like

Time complexity

Bad weather conditions

Real world dynamics

Overlapping of objects

Existing methods were handling the problems separately. No method handles all the objectives as features in a single proposal.

To handle effective intelligent crowd video analysis in real time the method should be able to provide solutions to all these problems. Traditional methods are not able to generate efficient economic solution in a time bounded manner.

The availability of high performance computational resource like GPU allows implementation of deep learning based solutions for fast processing of big data. Existing deep learning architectures or models can be combined by including good features and removing unwanted features.

The paper reviews intelligent surveillance video analysis techniques. Reviewed papers cover wide variety of applications. The techniques, tools and dataset identified were listed in form of tables. Survey begins with video surveillance analysis in general perspective, and then finally moves towards crowd analysis. Crowd analysis is difficult in such a way that crowd size is large and dynamic in real world scenarios. Identifying each entity and their behavior is a difficult task. Methods analyzing crowd behavior were discussed. The issues identified in existing methods were listed as future directions to provide efficient solution.

Abbreviations

Surveillance Video Analysis System

Interval-Based Spatio-Temporal Model

Kanade–Lucas–Tomasi

Gaussian Mixture Model

Support Vector Machine

Deep activation-based attribute learning

Hidden Markov Model

You only look once

Long short-term memory

Area under the curve

Violent flow descriptor

Kardas K, Cicekli NK. SVAS: surveillance video analysis system. Expert Syst Appl. 2017;89:343–61.

Article   Google Scholar  

Wang Y, Shuai Y, Zhu Y, Zhang J. An P Jointly learning perceptually heterogeneous features for blind 3D video quality assessment. Neurocomputing. 2019;332:298–304 (ISSN 0925-2312) .

Tzelepis C, Galanopoulos D, Mezaris V, Patras I. Learning to detect video events from zero or very few video examples. Image Vis Comput. 2016;53:35–44 (ISSN 0262-8856) .

Fakhar B, Kanan HR, Behrad A. Learning an event-oriented and discriminative dictionary based on an adaptive label-consistent K-SVD method for event detection in soccer videos. J Vis Commun Image Represent. 2018;55:489–503 (ISSN 1047-3203) .

Luo X, Li H, Cao D, Yu Y, Yang X, Huang T. Towards efficient and objective work sampling: recognizing workers’ activities in site surveillance videos with two-stream convolutional networks. Autom Constr. 2018;94:360–70 (ISSN 0926-5805) .

Wang D, Tang J, Zhu W, Li H, Xin J, He D. Dairy goat detection based on Faster R-CNN from surveillance video. Comput Electron Agric. 2018;154:443–9 (ISSN 0168-1699) .

Shao L, Cai Z, Liu L, Lu K. Performance evaluation of deep feature learning for RGB-D image/video classification. Inf Sci. 2017;385:266–83 (ISSN 0020-0255) .

Ahmed SA, Dogra DP, Kar S, Roy PP. Surveillance scene representation and trajectory abnormality detection using aggregation of multiple concepts. Expert Syst Appl. 2018;101:43–55 (ISSN 0957-4174) .

Arunnehru J, Chamundeeswari G, Prasanna Bharathi S. Human action recognition using 3D convolutional neural networks with 3D motion cuboids in surveillance videos. Procedia Comput Sci. 2018;133:471–7 (ISSN 1877-0509) .

Guraya FF, Cheikh FA. Neural networks based visual attention model for surveillance videos. Neurocomputing. 2015;149(Part C):1348–59 (ISSN 0925-2312) .

Pathak AR, Pandey M, Rautaray S. Application of deep learning for object detection. Procedia Comput Sci. 2018;132:1706–17 (ISSN 1877-0509) .

Ribeiro M, Lazzaretti AE, Lopes HS. A study of deep convolutional auto-encoders for anomaly detection in videos. Pattern Recogn Lett. 2018;105:13–22.

Huang W, Ding H, Chen G. A novel deep multi-channel residual networks-based metric learning method for moving human localization in video surveillance. Signal Process. 2018;142:104–13 (ISSN 0165-1684) .

Tsakanikas V, Dagiuklas T. Video surveillance systems-current status and future trends. Comput Electr Eng. In press, corrected proof, Available online 14 November 2017.

Wang Y, Zhang D, Liu Y, Dai B, Lee LH. Enhancing transportation systems via deep learning: a survey. Transport Res Part C Emerg Technol. 2018. https://doi.org/10.1016/j.trc.2018.12.004 (ISSN 0968-090X) .

Huang H, Xu Y, Huang Y, Yang Q, Zhou Z. Pedestrian tracking by learning deep features. J Vis Commun Image Represent. 2018;57:172–5 (ISSN 1047-3203) .

Yuan Y, Zhao Y, Wang Q. Action recognition using spatial-optical data organization and sequential learning framework. Neurocomputing. 2018;315:221–33 (ISSN 0925-2312) .

Perez M, Avila S, Moreira D, Moraes D, Testoni V, Valle E, Goldenstein S, Rocha A. Video pornography detection through deep learning techniques and motion information. Neurocomputing. 2017;230:279–93 (ISSN 0925-2312) .

Pang S, del Coz JJ, Yu Z, Luaces O, Díez J. Deep learning to frame objects for visual target tracking. Eng Appl Artif Intell. 2017;65:406–20 (ISSN 0952-1976) .

Wei X, Du J, Liang M, Ye L. Boosting deep attribute learning via support vector regression for fast moving crowd counting. Pattern Recogn Lett. 2017. https://doi.org/10.1016/j.patrec.2017.12.002 .

Xu M, Fang H, Lv P, Cui L, Zhang S, Zhou B. D-stc: deep learning with spatio-temporal constraints for train drivers detection from videos. Pattern Recogn Lett. 2017. https://doi.org/10.1016/j.patrec.2017.09.040 (ISSN 0167-8655) .

Hassan MM, Uddin MZ, Mohamed A, Almogren A. A robust human activity recognition system using smartphone sensors and deep learning. Future Gener Comput Syst. 2018;81:307–13 (ISSN 0167-739X) .

Wu G, Lu W, Gao G, Zhao C, Liu J. Regional deep learning model for visual tracking. Neurocomputing. 2016;175:310–23 (ISSN 0925-2312) .

Nasir M, Muhammad K, Lloret J, Sangaiah AK, Sajjad M. Fog computing enabled cost-effective distributed summarization of surveillance videos for smart cities. J Parallel Comput. 2018. https://doi.org/10.1016/j.jpdc.2018.11.004 (ISSN 0743-7315) .

Najva N, Bijoy KE. SIFT and tensor based object detection and classification in videos using deep neural networks. Procedia Comput Sci. 2016;93:351–8 (ISSN 1877-0509) .

Yu Z, Li T, Yu N, Pan Y, Chen H, Liu B. Reconstruction of hidden representation for Robust feature extraction. ACM Trans Intell Syst Technol. 2019;10(2):18.

Mammadli R, Wolf F, Jannesari A. The art of getting deep neural networks in shape. ACM Trans Archit Code Optim. 2019;15:62.

Zhou T, Tucker R, Flynn J, Fyffe G, Snavely N. Stereo magnification: learning view synthesis using multiplane images. ACM Trans Graph. 2018;37:65

Google Scholar  

Fan Z, Song X, Xia T, Jiang R, Shibasaki R, Sakuramachi R. Online Deep Ensemble Learning for Predicting Citywide Human Mobility. Proc ACM Interact Mob Wearable Ubiquitous Technol. 2018;2:105.

Hanocka R, Fish N, Wang Z, Giryes R, Fleishman S, Cohen-Or D. ALIGNet: partial-shape agnostic alignment via unsupervised learning. ACM Trans Graph. 2018;38:1.

Xu M, Qian F, Mei Q, Huang K, Liu X. DeepType: on-device deep learning for input personalization service with minimal privacy concern. Proc ACM Interact Mob Wearable Ubiquitous Technol. 2018;2:197.

Potok TE, Schuman C, Young S, Patton R, Spedalieri F, Liu J, Yao KT, Rose G, Chakma G. A study of complex deep learning networks on high-performance, neuromorphic, and quantum computers. J Emerg Technol Comput Syst. 2018;14:19.

Pouyanfar S, Sadiq S, Yan Y, Tian H, Tao Y, Reyes MP, Shyu ML, Chen SC, Iyengar SS. A survey on deep learning: algorithms, techniques, and applications. ACM Comput Surv. 2018;51:92.

Tian Y, Lee GH, He H, Hsu CY, Katabi D. RF-based fall monitoring using convolutional neural networks. Proc ACM Interact Mob Wearable Ubiquitous Technol. 2018;2:137.

Roy P, Song SL, Krishnamoorthy S, Vishnu A, Sengupta D, Liu X. NUMA-Caffe: NUMA-aware deep learning neural networks. ACM Trans Archit Code Optim. 2018;15:24.

Lovering C, Lu A, Nguyen C, Nguyen H, Hurley D, Agu E. Fact or fiction. Proc ACM Hum-Comput Interact. 2018;2:111.

Ben-Hamu H, Maron H, Kezurer I, Avineri G, Lipman Y. Multi-chart generative surface modeling. ACM Trans Graph. 2018;37:215

Ge W, Gong B, Yu Y. Image super-resolution via deterministic-stochastic synthesis and local statistical rectification. ACM Trans Graph. 2018;37:260

Hedman P, Philip J, Price T, Frahm JM, Drettakis G, Brostow G. Deep blending for free-viewpoint image-based rendering. ACM Trans Graph. 2018;37:257

Sundararajan K, Woodard DL. Deep learning for biometrics: a survey. ACM Comput Surv. 2018;51:65.

Kim H, Kim T, Kim J, Kim JJ. Deep neural network optimized to resistive memory with nonlinear current–voltage characteristics. J Emerg Technol Comput Syst. 2018;14:15.

Wang C, Yang H, Bartz C, Meinel C. Image captioning with deep bidirectional LSTMs and multi-task learning. ACM Trans Multimedia Comput Commun Appl. 2018;14:40.

Yao S, Zhao Y, Shao H, Zhang A, Zhang C, Li S, Abdelzaher T. RDeepSense: Reliable Deep Mobile Computing Models with Uncertainty Estimations. Proc ACM Interact Mob Wearable Ubiquitous Technol. 2018;1:173.

Liu D, Cui W, Jin K, Guo Y, Qu H. DeepTracker: visualizing the training process of convolutional neural networks. ACM Trans Intell Syst Technol. 2018;10:6.

Yi L, Huang H, Liu D, Kalogerakis E, Su H, Guibas L. Deep part induction from articulated object pairs. ACM Trans Graph. 2018. https://doi.org/10.1145/3272127.3275027 .

Zhao N, Cao Y, Lau RW. What characterizes personalities of graphic designs? ACM Trans Graph. 2018;37:116.

Tan J, Wan X, Liu H, Xiao J. QuoteRec: toward quote recommendation for writing. ACM Trans Inf Syst. 2018;36:34.

Qu Y, Fang B, Zhang W, Tang R, Niu M, Guo H, Yu Y, He X. Product-based neural networks for user response prediction over multi-field categorical data. ACM Trans Inf Syst. 2018;37:5.

Yin K, Huang H, Cohen-Or D, Zhang H. P2P-NET: bidirectional point displacement net for shape transform. ACM Trans Graph. 2018;37:152.

Yao S, Zhao Y, Shao H, Zhang C, Zhang A, Hu S, Liu D, Liu S, Su L, Abdelzaher T. SenseGAN: enabling deep learning for internet of things with a semi-supervised framework. Proc ACM Interact Mob Wearable Ubiquitous Technol. 2018;2:144.

Saito S, Hu L, Ma C, Ibayashi H, Luo L, Li H. 3D hair synthesis using volumetric variational autoencoders. ACM Trans Graph. 2018. https://doi.org/10.1145/3272127.3275019 .

Chen A, Wu M, Zhang Y, Li N, Lu J, Gao S, Yu J. Deep surface light fields. Proc ACM Comput Graph Interact Tech. 2018;1:14.

Chu W, Xue H, Yao C, Cai D. Sparse coding guided spatiotemporal feature learning for abnormal event detection in large videos. IEEE Trans Multimedia. 2019;21(1):246–55.

Khan MUK, Park H, Kyung C. Rejecting motion outliers for efficient crowd anomaly detection. IEEE Trans Inf Forensics Secur. 2019;14(2):541–56.

Tao D, Guo Y, Yu B, Pang J, Yu Z. Deep multi-view feature learning for person re-identification. IEEE Trans Circuits Syst Video Technol. 2018;28(10):2657–66.

Zhang D, Wu W, Cheng H, Zhang R, Dong Z, Cai Z. Image-to-video person re-identification with temporally memorized similarity learning. IEEE Trans Circuits Syst Video Technol. 2018;28(10):2622–32.

Serrano I, Deniz O, Espinosa-Aranda JL, Bueno G. Fight recognition in video using hough forests and 2D convolutional neural network. IEEE Trans Image Process. 2018;27(10):4787–97. https://doi.org/10.1109/tip.2018.2845742 .

Article   MathSciNet   MATH   Google Scholar  

Li Y, Li X, Zhang Y, Liu M, Wang W. Anomalous sound detection using deep audio representation and a blstm network for audio surveillance of roads. IEEE Access. 2018;6:58043–55.

Muhammad K, Ahmad J, Mehmood I, Rho S, Baik SW. Convolutional neural networks based fire detection in surveillance videos. IEEE Access. 2018;6:18174–83.

Ullah A, Ahmad J, Muhammad K, Sajjad M, Baik SW. Action recognition in video sequences using deep bi-directional LSTM with CNN features. IEEE Access. 2018;6:1155–66.

Li Y. A deep spatiotemporal perspective for understanding crowd behavior. IEEE Trans Multimedia. 2018;20(12):3289–97.

Pamula T. Road traffic conditions classification based on multilevel filtering of image content using convolutional neural networks. IEEE Intell Transp Syst Mag. 2018;10(3):11–21.

Vandersmissen B, et al. indoor person identification using a low-power FMCW radar. IEEE Trans Geosci Remote Sens. 2018;56(7):3941–52.

Min W, Yao L, Lin Z, Liu L. Support vector machine approach to fall recognition based on simplified expression of human skeleton action and fast detection of start key frame using torso angle. IET Comput Vision. 2018;12(8):1133–40.

Perwaiz N, Fraz MM, Shahzad M. Person re-identification using hybrid representation reinforced by metric learning. IEEE Access. 2018;6:77334–49.

Olague G, Hernández DE, Clemente E, Chan-Ley M. Evolving head tracking routines with brain programming. IEEE Access. 2018;6:26254–70.

Dilawari A, Khan MUG, Farooq A, Rehman Z, Rho S, Mehmood I. Natural language description of video streams using task-specific feature encoding. IEEE Access. 2018;6:16639–45.

Zeng D, Zhu M. Background subtraction using multiscale fully convolutional network. IEEE Access. 2018;6:16010–21.

Goswami G, Vatsa M, Singh R. Face verification via learned representation on feature-rich video frames. IEEE Trans Inf Forensics Secur. 2017;12(7):1686–98.

Keçeli AS, Kaya A. Violent activity detection with transfer learning method. Electron Lett. 2017;53(15):1047–8.

Lu W, et al. Unsupervised sequential outlier detection with deep architectures. IEEE Trans Image Process. 2017;26(9):4321–30.

Feizi A. High-level feature extraction for classification and person re-identification. IEEE Sens J. 2017;17(21):7064–73.

Lee Y, Chen S, Hwang J, Hung Y. An ensemble of invariant features for person reidentification. IEEE Trans Circuits Syst Video Technol. 2017;27(3):470–83.

Uddin MZ, Khaksar W, Torresen J. Facial expression recognition using salient features and convolutional neural network. IEEE Access. 2017;5:26146–61.

Mukherjee SS, Robertson NM. Deep head pose: Gaze-direction estimation in multimodal video. IEEE Trans Multimedia. 2015;17(11):2094–107.

Hayat M, Bennamoun M, An S. Deep reconstruction models for image set classification. IEEE Trans Pattern Anal Mach Intell. 2015;37(4):713–27.

Afiq AA, Zakariya MA, Saad MN, Nurfarzana AA, Khir MHM, Fadzil AF, Jale A, Gunawan W, Izuddin ZAA, Faizari M. A review on classifying abnormal behavior in crowd scene. J Vis Commun Image Represent. 2019;58:285–303.

Bour P, Cribelier E, Argyriou V. Chapter 14—Crowd behavior analysis from fixed and moving cameras. In: Computer vision and pattern recognition, multimodal behavior analysis in the wild. Cambridge: Academic Press; 2019. pp. 289–322.

Chapter   Google Scholar  

Xu X, Gong S, Hospedales TM. Chapter 15—Zero-shot crowd behavior recognition. In: Group and crowd behavior for computer vision. Cambridge: Academic Press; 2017:341–369.

Rodriguez M, Sivic J, Laptev I. Chapter 5—The analysis of high density crowds in videos. In: Group and crowd behavior for computer vision. Cambridge: Academic Press. 2017. pp. 89–113.

Yogameena B, Nagananthini C. Computer vision based crowd disaster avoidance system: a survey. Int J Disaster Risk Reduct. 2017;22:95–129.

Wang X, Loy CC. Chapter 10—Deep learning for scene-independent crowd analysis. In: Group and crowd behavior for computer vision. Cambridge: Academic Press; 2017. pp. 209–52.

Arceda VM, Fabián KF, Laura PL, Tito JR, Cáceres JG. Fast face detection in violent video scenes. Electron Notes Theor Comput Sci. 2016;329:5–26.

Wang Q, Wan J, Yuan Y. Deep metric learning for crowdedness regression. IEEE Trans Circuits Syst Video Technol. 2018;28(10):2633–43.

Shao J, Loy CC, Kang K, Wang X. Crowded scene understanding by deeply learned volumetric slices. IEEE Trans Circuits Syst Video Technol. 2017;27(3):613–23.

Grant JM, Flynn PJ. Crowd scene understanding from video: a survey. ACM Trans Multimedia Comput Commun Appl. 2017;13(2):19.

Tay L, Jebb AT, Woo SE. Video capture of human behaviors: toward a Big Data approach. Curr Opin Behav Sci. 2017;18:17–22 (ISSN 2352-1546) .

Chaudhary S, Khan MA, Bhatnagar C. Multiple anomalous activity detection in videos. Procedia Comput Sci. 2018;125:336–45.

Anwar F, Petrounias I, Morris T, Kodogiannis V. Mining anomalous events against frequent sequences in surveillance videos from commercial environments. Expert Syst Appl. 2012;39(4):4511–31.

Wang T, Qiao M, Chen Y, Chen J, Snoussi H. Video feature descriptor combining motion and appearance cues with length-invariant characteristics. Optik. 2018;157:1143–54.

Kaltsa V, Briassouli A, Kompatsiaris I, Strintzis MG. Multiple Hierarchical Dirichlet Processes for anomaly detection in traffic. Comput Vis Image Underst. 2018;169:28–39.

Cermeño E, Pérez A, Sigüenza JA. Intelligent video surveillance beyond robust background modeling. Expert Syst Appl. 2018;91:138–49.

Coşar S, Donatiello G, Bogorny V, Garate C, Alvares LO, Brémond F. Toward abnormal trajectory and event detection in video surveillance. IEEE Trans Circuits Syst Video Technol. 2017;27(3):683–95.

Ribeiro PC, Audigier R, Pham QC. Romaric Audigier, Quoc Cuong Pham, RIMOC, a feature to discriminate unstructured motions: application to violence detection for video-surveillance. Comput Vis Image Underst. 2016;144:121–43.

Şaykol E, Güdükbay U, Ulusoy Ö. Scenario-based query processing for video-surveillance archives. Eng Appl Artif Intell. 2010;23(3):331–45.

Castanon G, Jodoin PM, Saligrama V, Caron A. Activity retrieval in large surveillance videos. In: Academic Press library in signal processing. Vol. 4. London: Elsevier; 2014.

Cheng HY, Hwang JN. Integrated video object tracking with applications in trajectory-based event detection. J Vis Commun Image Represent. 2011;22(7):673–85.

Hong X, Huang Y, Ma W, Varadarajan S, Miller P, Liu W, Romero MJ, del Rincon JM, Zhou H. Evidential event inference in transport video surveillance. Comput Vis Image Underst. 2016;144:276–97.

Wang T, Qiao M, Deng Y, Zhou Y, Wang H, Lyu Q, Snoussi H. Abnormal event detection based on analysis of movement information of video sequence. Optik. 2018;152:50–60.

Ullah H, Altamimi AB, Uzair M, Ullah M. Anomalous entities detection and localization in pedestrian flows. Neurocomputing. 2018;290:74–86.

Roy D, Mohan CK. Snatch theft detection in unconstrained surveillance videos using action attribute modelling. Pattern Recogn Lett. 2018;108:56–61.

Lee WK, Leong CF, Lai WK, Leow LK, Yap TH. ArchCam: real time expert system for suspicious behaviour detection in ATM site. Expert Syst Appl. 2018;109:12–24.

Dinesh Jackson Samuel R, Fenil E, Manogaran G, Vivekananda GN, Thanjaivadivel T, Jeeva S, Ahilan A. Real time violence detection framework for football stadium comprising of big data analysis and deep learning through bidirectional LSTM. Comput Netw. 2019;151:191–200 (ISSN 1389-1286) .

Bouachir W, Gouiaa R, Li B, Noumeir R. Intelligent video surveillance for real-time detection of suicide attempts. Pattern Recogn Lett. 2018;110:1–7 (ISSN 0167-8655) .

Wang J, Xu Z. Spatio-temporal texture modelling for real-time crowd anomaly detection. Comput Vis Image Underst. 2016;144:177–87 (ISSN 1077-3142) .

Ko KE, Sim KB. Deep convolutional framework for abnormal behavior detection in a smart surveillance system. Eng Appl Artif Intell. 2018;67:226–34.

Dan X, Yan Y, Ricci E, Sebe N. Detecting anomalous events in videos by learning deep representations of appearance and motion. Comput Vis Image Underst. 2017;156:117–27.

Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR). 2015.

Guo Y, Liu Y, Oerlemans A, Lao S, Lew MS. Deep learning for visual understanding: a review. Neurocomputing. 2016;187(26):27–48.

Babaee M, Dinh DT, Rigoll G. A deep convolutional neural network for video sequence background subtraction. Pattern Recogn. 2018;76:635–49.

Xue H, Liu Y, Cai D, He X. Tracking people in RGBD videos using deep learning and motion clues. Neurocomputing. 2016;204:70–6.

Dong Z, Jing C, Pei M, Jia Y. Deep CNN based binary hash video representations for face retrieval. Pattern Recogn. 2018;81:357–69.

Zhang C, Tian Y, Guo X, Liu J. DAAL: deep activation-based attribute learning for action recognition in depth videos. Comput Vis Image Underst. 2018;167:37–49.

Zhou S, Shen W, Zeng D, Fang M, Zhang Z. Spatial–temporal convolutional neural networks for anomaly detection and localization in crowded scenes. Signal Process Image Commun. 2016;47:358–68.

Pennisi A, Bloisi DD, Iocchi L. Online real-time crowd behavior detection in video sequences. Comput Vis Image Underst. 2016;144:166–76.

Feliciani C, Nishinari K. Measurement of congestion and intrinsic risk in pedestrian crowds. Transp Res Part C Emerg Technol. 2018;91:124–55.

Wang X, He X, Wu X, Xie C, Li Y. A classification method based on streak flow for abnormal crowd behaviors. Optik Int J Light Electron Optics. 2016;127(4):2386–92.

Kumar S, Datta D, Singh SK, Sangaiah AK. An intelligent decision computing paradigm for crowd monitoring in the smart city. J Parallel Distrib Comput. 2018;118(2):344–58.

Feng Y, Yuan Y, Lu X. Learning deep event models for crowd anomaly detection. Neurocomputing. 2017;219:548–56.

Download references

Acknowledgements

Not applicable.

Author information

Authors and affiliations.

VIT, Vellore, 632014, Tamil Nadu, India

G. Sreenu & M. A. Saleem Durai

You can also search for this author in PubMed   Google Scholar

Contributions

GS and MASD selected and analyzed different papers for getting more in depth view about current scenarios of the problem and its solutions. Both authors read and approved the final manuscript.

Corresponding author

Correspondence to G. Sreenu .

Ethics declarations

Competing interests.

The authors declare that they have no competing interests.

Availability of data and materials

Data sharing is not applicable to this article as no datasets were generated or analysed during the current study.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/ ), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article.

Sreenu, G., Saleem Durai, M.A. Intelligent video surveillance: a review through deep learning techniques for crowd analysis. J Big Data 6 , 48 (2019). https://doi.org/10.1186/s40537-019-0212-5

Download citation

Received : 07 December 2018

Accepted : 28 May 2019

Published : 06 June 2019

DOI : https://doi.org/10.1186/s40537-019-0212-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Video surveillance
  • Deep learning
  • Crowd analysis

video research papers

Logo of Peer Recognized

Peer Recognized

Make a name in academia

How to create a scientific explainer video or video abstract (with examples)

Scientific videos or video abstracts are an incredibly powerful way to attract attention to your research project and to yourself. Such attention can then turn into citations to your research papers, it can lead to collaborations and many other opportunities.

I know this first hand since I can, without a doubt, attribute a large part of my success as a scientist to the animated scientific videos I create for my research projects (see them in my YouTube channel ). Basically, whenever I publish an important research article, I try to produce a video where, in simple language, I explain the findings using visual aids. When I post such a video abstract on social media and on YouTube, I add a link to the respective research article for driving readers to it.

The video below is the one that started it all. I made it during the last year of my Ph.D. in 2014 and even now, years later, it is still bringing in collaborations. This video is probably one that has ensured me the most contracts with the industry.

In this article, I want to reveal to you all the secrets and tricks that I know about creating scientific videos.

You will learn:

Before we dive into the creation of scientific videos, let me give you some ideas for the various situations in which you could use one:  

  • Upload on YouTube or Vimeo for rising interest in your research and driving traffic to your journal paper.
  • Add a video abstract to your paper (by adding it directly in the paper and on the journal website).
  • Use a short video during a presentation to focus any daydreamers. 
  • Use a video as a social media status update about your latest field trip.  
  • Film a video that adds to the explanation of methodology for your newest journal paper.  
  • Develop a promotional video to help your new research project get traction.  
  • Use a video in a class lecture to demonstrate a particular scientific phenomena.  
  • Add a video of your research process or computer simulation to a journal paper.
  • In a workshop, use a video to show people how your new test method works.
  • You could even make a video featuring yourself to use in addition to your CV. This will not only attract attention, but demonstrate your out-of-the-box thinking.  

These are just some possible applications for a scientific video. Learning to create videos will open many new ways for attracting interest to your research, educating others about science, and capturing the attention of your audience during a conference presentation.

How to create a script for a scientific video

To create a scientific explainer video, you will first need to come up with an idea, decide what exactly you want to show in it, what footage you will use, and what will you say.

Often your plan will be to explain your newest research paper and you might think:

“Why do I need a script? The paper is the script”.

I’m sorry to disappoint you. When writing a journal paper it is required to move from the known to the specific. We start with a broad overview in the introduction, then narrow down to the methods we used before arriving at the results, and finally lay out the conclusions. 

Because people normally consume videos (and any other content) on the internet differently than they read a research paper, repeating the sequence from a research paper is way too long of a wind-off. In fact, a study by Microsoft showed that the attention span of people in recent years has dropped to 8 seconds. Don’t waste these 8 precious seconds providing boring background information.

When creating videos, grab the attention of people before they get blown away by the information firehouse that the Internet is.

To develop content that has the chance to attract attention on the internet, it is worth keeping in mind the lead-body-tail structure . This structure is what science journalists (in fact, any journalists) normally use for news stories.

You could imagine the lead-body-tail structure displayed as an inverted pyramid shown in the figure below.

lead-body-tail structure for scientific communication

In the lead-body-tail structure, you will start with the things your audience must know, follow with what is good to know, and then finish with what would be nice to know. Here is an example of how you could structure the argument:

  • Lead : Start with the most critical information that captures the interest of your audience. This could be a problem statement or an unanswered question. After this short introduction, you could provide a preview for how your research solves that problem.
  • Body : Next, explain the crucial details of your research. To do it, you could provide evidence and logical arguments from your research or other studies. You could even throw in a anology or a joke that makes the content light and easy to understand.
  • Tail: Finally, reward those who have stuck with your video until the end by offering finer details about the methodology, and any background information.

Another thing to remember when creating a video script is deciding who is your target audience . Knowing who you are addressing will allow you to define the level of detail of your video. For example, jargon might be okay for researchers in your domain but reaching out to a wider public requires a broader perspective and less terminology.

In a video made for a broad non-specialist audience you want to sell the result, not the features.

A science video script template

Once you have the general idea for your new video and you know who your target audience is, develop a script. A script could be as simple as a table with the following columns:  

  • Time: time or duration of the scene.
  • Action in the video: the activity in the video scene.  
  • Voiceover / dialog / captions: the information conveyed to the viewer through voice or captions.  
  • Other columns , depending on the type of video you are making, these could include props, characters in the scene, editing effects, music, shot type (e.g. close-up, zoom), and scene number. 

Below you can download a script template to use for your videos.

MS Word icon

As an example, here is a script created by the great staff of Lib4RI for a video that I took part in (both the video and the script are licensed under CC BY-NC-ND 4.0 license ). The video described in the script is a fun take on the struggles of a researcher to find funding for open access publishing. Watch it to see how it compares to the script.

Best apps for creating a research video

After developing the script, you will need to create the video footage. You could do it using a camera and a microphone alone, but to leave a more lasting impression, you could use images, drawings, animations, and presenting software.

What follows are some apps that you can use for creating a science video. We will discuss the apps for editing footage later on.

MS PowerPoint logo

Rarely anyone knows this, but PowerPoint might be the only tool you need for creating a simple scientific explainer video. The major advantages of relying on PowerPoing is the familiarity of this tool and the fact that most scientists will be able to access PowerPoint for free since it is available on most PCs owned by universities.

See below a video that I created using nothing but PowerPoint.

To create a video in PowerPoint, you will want to open the Slide Show tab and click on “Record Slide Show”.

video research papers

You could directly add the narration while you are recording the slide show. In my experience, though, it sounds more professional if you have recorded and edited your voice beforehand using a different app.

To add a pre-recorded narration, drag the audio file in the first slide of your presentation (see below apps for recording the voiceover). To make the file plays back throughout your presentation, under the “Playback” tab, tick boxes for “Play Across Slides” and “hide during show” and set it to “start automatically”.

video research papers

After finishing recording the slide show, click on “Export” and select the “Create a Video” button.

video research papers

That’s it. Your video is ready for prime time.

Creating a short GIF for social media using PowerPoint

Powerpoint can also be handy for short video presentations that showcase your research on social media. In this case, you would not even need to record a voiceover. In the video below Mike Morrison explains why such a short video makes sense.

Mike Morrisson has created a template that will allow you to create such a video abstract in a breeze. All you will have to do is add information about the research in up to 5 slides and export it as a GIF.

That’s it, you can now post it on Twitter and it should grab the attention of the casual scroller much better than a static image that people normally put when they publish a paper. You can download the template below.

MS PowerPoint logo

You will download this Twitter video writing template

video research papers

Prezi Present allows you to create a virtual space through which you can move and zoom in and out. Using this in a video can help to make your point, especially to demonstrate how various parts of your research are connected. The zooming feature can also be a great way to highlight the global significance of your work and then zoom into the details of what you have accomplished.

Of course, there will be some learning curve before you can comfortably use Prezi, but I can testify from my own experience that it is worth it. Many of my explainer videos are made using Prezi, including the one below.

I used to create my videos using a screen capture software, but not Prezi offers to record a video presentation using Prezi Video . You can even record a video where your face is alongside your Prezi presentation. This is looks something like the screenshot below.

Prezi video

Prezi Cost: Free for 5 initial projects and then 3/USD month if you have an .edu email address.

Videoscribe

VideoScribe allows creating whiteboard-type videos like the one below. If done right, such videos offer a great way to attract attention to your research.

There is a slight learning curve to learning the bells and whistles of the app. For example, you will have to set the sequence of illustrations, the duration of each drawing motion, add sound, and make other tweaks.

You will also need images in .svg vector format to allow animations. If you simply upload .jpeg or .png type figure, the app will not know which lines to trace and will simply sort of “paint” the whole picture at once.

The app holds a reasonable library of images that might be useful in some situations. If you can’t find what you need, you will have to create them on your own or find online (keep reading to see how to do it). If you only have the figure in .jpeg or .png form, one workaround is to use Adobe illustrator to “trace” the lines in a figure and then save the file as .svg.

Price of Videoscribe: 39 USD/month. For the videos that I have created, I made sure that I finish the project before one month is over. Then I unsubscribed from VideoScribe until the next time I had to make a video.

I made the video below using VideoScribe and my own drawings. It is linked to a review paper that I wrote and I actually put a link to it in the research paper itself. Many publishers nowadays allow to embed videos into the article.

There are certainly alternatives to VideoScribe (for example Doodly ). Even PowerPont has added an option for creating whiteboard-style videos. To make one, you can use the animations tab and click the “Ink Reply” feature (not available on all PowerPoint versions). To use this feature, you will have to create the drawings right there in PowerPoint.

video research papers

Vyond allows to create animated videos. You know, the ones that feature cartoon-like characters that talk and move around. If made well, such a video will definitely attract the attention to your research project.

The price of 249 USD/year for the cheapest version is a lot. One thing you could do is make a video using the 14-day free trial. It will have a Vyond logo in the corner though (by the way so will the cheapest paid version). To remove the watermark, you will have to open your wallet to a whopping 649 USD/year. Perhaps you could ask the outreach department of your university if they are willing to get the subscription?

The video below was created using Vyond. Besides being an example of what the app can do, it’s a great explanation of #betterposter design for scientific posters. You can download the #betterposter template here .

Other apps for creating a video abstract

The tools listed above certainly can offer many great ways to create and edit scientific videos. But every so often, new apps appear that capture the attention of the internet. Pay attention to this. Whenever you see that cool video that everyone is sharing, think about what tools were used to create it. Perhaps learning to use the particular app is simpler than it seems.

Science video editing tools

Once you have created the draft footage, you will need to edit it down into the desired sequence, add a soundtrack, create transitions, add special effects, write captions or do whatever else your script requires.

Some of the apps I listed above will allow you to natively finish the whole project. But in some cases you ming need to to some post-processing to achieve the desired effect. This means that you will need to learn using video editing software, at least to some degree. Here are some of the tools that I have tried out and can recommend.

video research papers

Camtasia is my go-to app for screen recording and video editing. Its intuitive interface makes it quick to learn for anyone without video editing experience. It might lack some features that the more advanced editors have, but for the absolute majority of projects, it will be good enough. The only problem I have with it is that for projects with a lot of video tracks, many cuts, and special effects, it tends to crash. So I save backups often.

Price: 199 USD if you have an .edu email address.

For creating the video below I used Camtasia to first capture screen on different apps. Then I cut them using Camtasia and put together in one video along with the voiceover.

ShareX logo

ShareX is an open-source screen recording app. Since it is free to use, it is a good option for the researcher on a tight budget.

video research papers

Adobe After Effects

Adobe After Effects is a professional-grade video editor. For the average researcher that wants to create a scientific video, this app will be overkill. The apps will definitely take a lot of time to master everything that it has to offer, but if you want to make something really special, like adding effects, or using premade templates, you will probably find no better tool.

Price: 20.99 USD/month for the Adobe Creative Cloud which offers many other tools. Before subscribing, though, check with your university. They often have a PC with Adobe Creative Cloud somewhere on the campus for researchers to use.

I used After Effects to make the video below. It was necessary to use After Effects rather than any of the simpler tools because of the fancy intro that I wanted to add in the video to make it feel like an authentic news piece. I think the hours I spent fiddling with it were worth it. Don’t you?

The rest of the video, by the way is made using a PowerPoint presentation and a large screen. There was no studio involved and I was working in absolute solitude.

Audacity is an audio recording and editing app. It is open-source meaning that it’s free. I use it to record the voiceover for all my videos.

A very handy feature Audacity offers is noise removal. By using it, you can discard the background noise, and remove any clicks or other unwanted sounds from the record. This feature is especially useful in case you use your phone or a simple headset to record the audio. Using the noise removal function will make the voiceover sound a little more professional. Price: Free.

Google Recorder

Voice recording apps for smartphone

There are countless apps that allow you to record voice on your phone. My favorite for my Android phone is the Recorder by Google . It will record the voice like any other app but has the advantage of being easily synchronizable with my Google Drive account. This means that as soon as I finish recording my voice, I can send it to my PC and edit using Audacity or any of the video editors that I mentioned before.

Creating closed captions

Closed captions are the text version of your voice. Having the closed captions can be especially useful for posting videos on social media This is because many people by default have the sound turned off.

video research papers

There are many sites will will allow you to create the closed captions, but in my experience YouTube is all one needs. Once you have enabled the option to create closed captions, first make sure they are correct. In my experience YouTube does a fairly good job in transcribing voice, but there are always some glitches so I always read through.

Once you are satisfied with the result, you can download the closed captions (YouTube calls them subtitles). Many social networks, including Facebook and LinkedIn only accept SRT format for closed captions. To download the SRT file from YouTube, go to the desired video through your Creator Studio , click on Details and select Subtitles . Now click on the three dots next to the Edit button and you will be able to download the SRT file. Save this file and upload in any other platform along with the video.

Creating graphics for science videos

When creating a video, you might need some graphics, scientific illustrations, icons, or other types of visuals to make your point. If you decide on creating a video using animations, you might need a lot of them.

Regardless if you would like to create your own illustrations or download and legally use something that others have already created, read this article . In it, I review and compare many different ways for creating and editing graphics. I also offer websites to find free-to-share visuals made by others.

Create a thumbnail for your video

Even though the saying goes “Don’t judge a book by it’s cover”, people certainly do judge a book by its cover. The same goes for a science video. Just like a book’s cover, a great thumbnail for a video has only one main task – generate enough interest so that people would click on it to watch the video.

To create the thumbnail, usually you would add a short intriguing text to a screenshot from the video. To get some ideas, check out the thumbnails for the videos of your favourite creators on YouTube.

video research papers

You can make the thumbnail with almost any graphics design program, including Microsoft PowerPoint, Adobe Illustrator , Inkscape , or anything else. The problem with using these apps for creating a video thumbnail is that you will have to start from a blank slate.

Canva logo

I have found that Canva is a great website to create visuals, including video thumbnails. Canva offers many free templates for a particular social media site. I would first type “thumbnail” followed by the name of the social media site in the search box. This returns a range of free templates that I have the right dimensions and serve as the base for my creation. I would then upload a screenshot from my video and edit the text to make it fit for my video.

Adding special effects to your video abstracts

Motion Array logo

Motion Array

Motion Array is a library of stock videos, special effects, sound effects, music, templates, and other resources for creating professional-looking videos. My limited comparison of the options for my own video projects turned out this as the best alternative in the price/library size scale. Among other things, I used it for the Research Paper Writing Forecast that I showed to you above. Price: 29.99 USD/month.

Gear for creating research videos

If you decide in favor of shooting your videos (rather than creating them using one of the apps we reviewed), you will need to have some hardware. Here is a list of what I have found useful.

a smartphone for filming video

The obvious first requirement if you want to film a video (as opposed to creating one using graphics or animations) is a camera. Considering how good the cameras in modern smartphones are, don’t think any further. Use the camera you already have. It will be perfect for most cases. Plus, it is always in your pocket so that you can film on the go.

A portable tripod

smartphone tripod for filming videos

A tripod for a smartphone will be useful if you want to make a video of yourself without having it look like a selfie. It can also be useful for recording an experiment in the lab or even to create a timelapse of a field day.

There are of course fancy large tripods and they have their use. But I have found that one of the most practical tripods is the one you see in the picture. Besides being able to place it on a flat surface, you can use its’ flexible legs to wrap around objects. In many cases, this can be extremely useful to film an experiment. And considering how small it is compared to the full-blown tripods, you can just dump it in your bag to sit there for whenever you might need it. Mine is in my backpack most of the time whether I need it or not.

It’s cheap too. Here is one for 16 USD .

A light for filming

video research papers

With a good light source, even a bad camera will be able to shoot great-looking videos. The opposite is not true. Even the most expensive camera with whatever night-time features it promises, will not be able to make a good video if the filmed subject is dimly lit.

If you want to film your face or perhaps an experiment, it might be worth investing in a light. I use YONGNUO YN300 for my camera but cheap selfie ring might be all you need to film selfie-style videos. When making the decision, pay attention to the maximum light output, the mechanism to secure it on or next to your camera, and the ability to move around, if you need it.

A video stablizer

video research papers

Even though modern smartphones do a great job in stabilizing the video, if you want to document research field trips, it might be a good idea to invest in a gimbal stabilizer . It will make your footage stable and pleasant to look at.

An external microphone

external microphone

To record a narration for a video, you will need a microphone. Sure, any smartphone or PC has a built-in microphone that you could use. (I have used my smartphone microphone for almost all my scientific videos). In many cases, though, an external microphone will be a better option.

First of all, the sound quality is quite bad when recording on a smartphone. It captures a lot of background noise and the sound changes significantly depending on how far you are from the phone.

There are workarounds to these problems. For example, I always tried to find a quiet room and set the phone on my small flexible tripod to ensure it is a constant distance from my mouth. You could also use a headset with a built-in microphone.

You can also use the “remove background noise” feature of the Audacity app to make the narration sound more professional.

After trying our different options, I decided to invest in a microphone. In many cases, for a researcher, the best type of microphone will be Rode Lavalier GO or a similar lavalier microphone. It definitely has better quality than any smartphone microphone. If you choose the Lavalier Go, you will need an adapter , though, to connect it to your smartphone.

The advantage of a lavalier microphone is that it can be attached to your clothes meaning that a constant distance will be maintained from your mouth when you record the narration. Plus, it captures significantly less background noise compared to a smartphone microphone.

Finally, a lavalier microphone will ensure a better narration quality for videos shot in the lab or during a field trip. This is because a smartphone microphone records sound from all sides and the loudness of your voice will change depending on how you and other sources of sound are located with respect to the phone. With a lavalier microphone attached to your clothes, your voice will always be consistent and more prominent than other sources of sound.

I have spent a lot of time explaining microphones. If you think the sound quality is so important, just ignore my advice on this one and keep using your smartphone. Sound quality definitely is not as important as a good script and appropriate graphics for an explainer video.

Getting help with creating scientific videos

Let’s say you have a great idea for a scientific video but don’t have the skills or time needed to create it. Thankfully with the advance of freelancing platforms, it is quite easy to find someone who can do the job for you without breaking the bank. Here are two platforms for finding freelancers.

Fiverr logo

Fiverr is a platform where you can find freelancers for many types of work, including the creation of videos. Every freelancer has a rating, so make sure to check their profile before handing out what is called a “gig”. https://www.fiverr.com/

In-demand talent on demand.™ Upwork is how.™

Upwork is similar to Fiverr, except that you will create a description of the job first. Then freelancers will place bids on it. Similar to Fiverr, check the rating of the people who are offering to do the work and don’t fall for the cheapest offer. To start you will have to create a profile: https://www.upwork.com/

Where to post scientific videos for maximum exposure

Social media is the natural first destination we think of thinking about posting a video on the internet. It is certainly one (and perhaps the most important) option, but before deciding where to post your video abstract, it is worth considering how to gain maximum exposure for your videos for the longest period of time.

One thing to remember is that in the hierarchy of internet channels, there is usually a correlation between value and the amount of control you have over curating the content.

With every layer of managers, you give up the ability to curate the content. For example, if you post a video in a LinkedIn group of your research domain, the manager of that group will be able to decide to promote it or not. Even if he does, in a couple of days your video will be replaced by a new post.

If, on the other hand, you have a personal website, you can freely decide which videos to post most prominently and when to replace them with fresh content. The figure below shows an approximate hierarchy of online channels in terms of which provides the most value (and control) to the owner.  

video research papers

Post on social media

I have observed that my LinkedIn videos attract more likes than any other post type. I am not the only one who has noticed it. Different studies that track engagement with social media posts have shown that videos attract the most views on social media.

When publishing videos on social media, consider using their own native video upload tool (rather than pasting a link to YouTube). Social media platforms want users to use their platforms as much as possible so they create algorithms that favor native videos. Most notably, embedded videos are usually set to auto-play whereas linked videos have to be clicked to start playing. As an example, one study found that Facebook embedded videos are shared five times more often compared to linked YouTube videos and they attract five times more comments.

Post the videos on YouTube

Last year my videos got more than 12,000 views on YouTube, leaving views of my research papers in the dust. This is because the audience for videos is wider than for research articles. Industry professionals, for example, rarely have the time or patience to get into the level of detail that journal articles possess. Watching a summary video might provide all the information they need.

My explainer videos were the primary driver behind many industry consulting opportunities that I have had.  

A study by Google (see the figure below) showed that learning something new, or satisfying ones curiosity, about a particular subject are among the top reasons people watch YouTube.  

video research papers

Well, it just so happens that creating something new and answering questions are among the top things that we researchers do. This places us in an ideal position to create videos that people will search for (YouTube is the second largest search engine after Google). Being a researcher certainly adds credibility to your videos.

One thing to remember is that despite the advances in voice recognition, YouTube and other search engines still rely heavily on text to decide which videos to show when people search for a certain topic. This means you should put the effort in creating the title of the video and the description below it using keywords that people might use to search for the topic.

Of course, don’t just litter the title with random keywords. The title should also attract enough curiosity so that people would click on it in the search results.

Embed a scientific video in your research paper

Videos can also be submitted together with a research paper and for some articles, I have even embedded a video in the document itself. When you view the paper online, you would see my video embedded in the document, like in this paper .

Evermore journals are encouraging to add a video to the research paper. Some journals call them a video abstract . The highly respected New England Journal of Medicine has even got a step further and offers what they call “Quick Takes” for some of their best research papers. These videos are about 2 minutes long and created by the journal in collaboration with the authors of the respective paper. See an example below.

Some people even call such videos the “abstract of the future”. Certainly in our age of attention deficit, such short summary videos could help grab the attention of people on social media.

There are even journals popping up which publish peer-reviewed research videos as a replacement of the traditional journal articles. At this time, though, the idea is still in its infancy and I wouldn’t recommend you to ditch scientific writing in favor of directing research videos 🙂

Add the videos to your personal academic website

Your academic website serves as the central hub for your online brand. People who type your name into the search box will most probably check out your website (if you have one, and you definitely should ). This makes your website a great home for your research videos.

When you post your videos on social media, in a couple of days they will sink deep in the timeline never to be seen again. On your website, on the other hand, your scientific videos will always be on display to inform visitors about your research and lead them to your research papers.

On my academic website , I have made a separate section where all the videos are displayed along with a short description and links to the corresponding research articles.

To add the video to your website you can either directly upload it to the website or embed a link to YouTube (or whichever other video hosting platform you use). I find the embedding option better since it also increases the view count to your YouTube channel thus signaling the YouTube bots that your videos are interesting. This increases the chances that they will be suggested to people watching YouTube.

To embed a video in your website you will have to find the option to insert a HTML box. Then go to YouTube and click the share button below the video. This will bring up a box where you should select the “Embed” option and copy the code into the html box on your website.

video research papers

The second option for adding a video to your website is a bit simpler but not offered by all the website builders. You might be able to simply add videos by inserting a video box and copying the YouTube link into it.

A video is just another name for a scientific presentation

In this article, I showed you some useful tools and principles for creating scientific videos. What you need now are ideas for making them interesting enough that they would capture the interest of people. I have created a course that will allow you to do just that.

Creating science videos and delivering powerful presentations have a lot in common. In the online course “ Scientific Presentation Skills “, I will show you how to become a convincing presenter one skill at a time using the  Five S presenting pyramid .

Scientific Presentations Masterclass banner

The Five-S pyramid starts from the basics of putting together the presentation  Substance  (first S), advances to devising a presentation  Structure  (second S), shows how to put up a  Show  (third S), tell memorable  Stories  (fourth S), and finally, it will offer advice for how the  Speaker  (fifth S) can work on improving presenting skills, including dealing with stage fright. 

Related articles:

A drawing of a phd student showing a presentation tools on a powerpoint slide

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

I want to join the Peer Recognized newsletter!

This site uses Akismet to reduce spam. Learn how your comment data is processed .

Privacy Overview

CookieDurationDescription
cookielawinfo-checkbox-analytics11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional11 monthsThe cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy11 monthsThe cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Copyright © 2024 Martins Zaumanis

Contacts:  [email protected]  

Privacy Policy 

Help | Advanced Search

Computer Science > Computer Vision and Pattern Recognition

Title: snap video: scaled spatiotemporal transformers for text-to-video synthesis.

Abstract: Contemporary models for generating images show remarkable quality and versatility. Swayed by these advantages, the research community repurposes them to generate videos. Since video content is highly redundant, we argue that naively bringing advances of image models to the video generation domain reduces motion fidelity, visual quality and impairs scalability. In this work, we build Snap Video, a video-first model that systematically addresses these challenges. To do that, we first extend the EDM framework to take into account spatially and temporally redundant pixels and naturally support video generation. Second, we show that a U-Net - a workhorse behind image generation - scales poorly when generating videos, requiring significant computational overhead. Hence, we propose a new transformer-based architecture that trains 3.31 times faster than U-Nets (and is ~4.5 faster at inference). This allows us to efficiently train a text-to-video model with billions of parameters for the first time, reach state-of-the-art results on a number of benchmarks, and generate videos with substantially higher quality, temporal consistency, and motion complexity. The user studies showed that our model was favored by a large margin over the most recent methods. See our website at this https URL .
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as: [cs.CV]
  (or [cs.CV] for this version)
  Focus to learn more arXiv-issued DOI via DataCite

Submission history

Access paper:.

  • HTML (experimental)
  • Other Formats

References & Citations

  • Google Scholar
  • Semantic Scholar

BibTeX formatted citation

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

The PMC website is updating on October 15, 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List

Logo of brainsci

Does Video Gaming Have Impacts on the Brain: Evidence from a Systematic Review

Denilson brilliant t..

1 Department of Biomedicine, Indonesia International Institute for Life Sciences (i3L), East Jakarta 13210, Indonesia

2 Smart Ageing Research Center (SARC), Tohoku University, Sendai 980-8575, Japan; pj.ca.ukohot@iur (R.N.); pj.ca.ukohot@atuyr (R.K.)

3 Department of Cognitive Health Science, Institute of Development, Aging and Cancer (IDAC), Tohoku University, Sendai 980-8575, Japan

Ryuta Kawashima

4 Department of Functional Brain Imaging, Institute of Development, Aging and Cancer (IDAC), Tohoku University, Sendai 980-8575, Japan

Video gaming, the experience of playing electronic games, has shown several benefits for human health. Recently, numerous video gaming studies showed beneficial effects on cognition and the brain. A systematic review of video gaming has been published. However, the previous systematic review has several differences to this systematic review. This systematic review evaluates the beneficial effects of video gaming on neuroplasticity specifically on intervention studies. Literature research was conducted from randomized controlled trials in PubMed and Google Scholar published after 2000. A systematic review was written instead of a meta-analytic review because of variations among participants, video games, and outcomes. Nine scientific articles were eligible for the review. Overall, the eligible articles showed fair quality according to Delphi Criteria. Video gaming affects the brain structure and function depending on how the game is played. The game genres examined were 3D adventure, first-person shooting (FPS), puzzle, rhythm dance, and strategy. The total training durations were 16–90 h. Results of this systematic review demonstrated that video gaming can be beneficial to the brain. However, the beneficial effects vary among video game types.

1. Introduction

Video gaming refers to the experience of playing electronic games, which vary from action to passive games, presenting a player with physical and mental challenges. The motivation to play video games might derive from the experience of autonomy or competing with others, which can explain why video gaming is pleasurable and addictive [ 1 ].

Video games can act as “teachers” depending on the game purpose [ 2 ]. Video gaming has varying effects depending on the game genre. For instance, an active video game can improve physical fitness [ 3 , 4 , 5 , 6 ], whereas social video games can improve social behavior [ 7 , 8 , 9 ]. The most interesting results show that playing video games can change cognition and the brain [ 10 , 11 , 12 , 13 ].

Earlier studies have demonstrated that playing video games can benefit cognition. Cross-sectional and longitudinal studies have demonstrated that the experience of video gaming is associated with better cognitive function, specifically in terms of visual attention and short-term memory [ 14 ], reaction time [ 15 ], and working memory [ 16 ]. Additionally, some randomized controlled studies show positive effects of video gaming interventions on cognition [ 17 , 18 ]. Recent meta-analytical studies have also supported the positive effects of video gaming on cognition [ 10 , 11 , 12 , 13 ]. These studies demonstrate that playing video games does provide cognitive benefits.

The effects of video gaming intervention are ever more widely discussed among scientists [ 13 ]. A review of the results and methodological quality of recently published intervention studies must be done. One systematic review of video gaming and neural correlates has been reported [ 19 ]. However, the technique of neuroimaging of the reviewed studies was not specific. This systematic review reviewed only magnetic resonance imaging (MRI) studies in contrast to the previous systematic review to focus on neuroplasticity effect. Neuroplasticity is capability of the brain that accommodates adaptation for learning, memorizing, and recovery purposes [ 19 ]. In normal adaptation, the brain is adapting to learn, remember, forget, and repair itself. Recent studies using MRI for brain imaging techniques have demonstrated neuroplasticity effects after an intervention, which include cognitive, exercise, and music training on the grey matter [ 20 , 21 , 22 , 23 , 24 ] and white matter [ 25 , 26 , 27 , 28 , 29 ]. However, the molecular mechanisms of the grey and white matter change remain inconclusive. The proposed mechanisms for the grey matter change are neurogenesis, gliogenesis, synaptogenesis, and angiogenesis, whereas those for white matter change are myelin modeling and formation, fiber organization, and angiogenesis [ 30 ]. Recent studies using MRI technique for brain imaging have demonstrated video gaming effects on neuroplasticity. Earlier imaging studies using cross-sectional and longitudinal methods have shown that playing video games affects the brain structure by changing the grey matter [ 31 , 32 , 33 ], white matter [ 34 , 35 ], and functional connectivity [ 36 , 37 , 38 , 39 ]. Additionally, a few intervention studies have demonstrated that playing video games changed brain structure and functions [ 40 , 41 , 42 , 43 ].

The earlier review also found a link between neural correlates of video gaming and cognitive function [ 19 ]. However, that review used both experimental and correlational studies and included non-healthy participants, which contrasts to this review. The differences between this and the previous review are presented in Table 1 . This review assesses only experimental studies conducted of healthy participants. Additionally, the cross-sectional and longitudinal studies merely showed an association between video gaming experiences and the brain, showing direct effects of playing video games in the brain is difficult. Therefore, this systematic review specifically examined intervention studies. This review is more specific as it reviews intervention and MRI studies on healthy participants. The purposes of this systematic review are therefore to evaluate the beneficial effects of video gaming and to assess the methodological quality of recent video gaming intervention studies.

Differences between previous review and current review.

DifferencePrevious ReviewCurrent Review
Type of reviewed studiesExperimental and correlational studiesExperimental studies only
Neuroimaging technique of reviewed studiesCT, fMRI, MEG, MRI, PET, SPECT, tDCS, EEG, and NIRSfMRI and MRI only
Participants of reviewed studiesHealthy and addicted participantHealthy participants Only

CT, computed tomography; fMRI, functional magnetic resonance imaging; MEG, magnetoencephalography MRI, magnetic resonance imaging; PET, positron emission tomography; SPECT, single photon emission computed tomography; tDCS, transcranial direct current stimulation; EEG, electroencephalography; NIRS, near-infrared spectroscopy.

2. Materials and Methods

2.1. search strategy.

This systematic review was designed in accordance with the PRISMA checklist [ 44 ] shown in Appendix Table A1 . A literature search was conducted using PubMed and Google Scholar to identify relevant studies. The keywords used for the literature search were combinations of “video game”, “video gaming”, “game”, “action video game”, “video game training”, “training”, “play”, “playing”, “MRI”, “cognitive”, “cognition”, “executive function”, and “randomized control trial”.

2.2. Inclusion and Exclusion Criteria

The primary inclusion criteria were randomized controlled trial study, video game interaction, and MRI/fMRI analysis. Studies that qualified with only one or two primary inclusions were not included. Review papers and experimental protocols were also not included. The secondary inclusion criteria were publishing after 2000 and published in English. Excluded were duration of less than 4 weeks or unspecified length intervention or combination intervention. Also excluded were studies of cognition-based games, and studies of participants with psychiatric, cognitive, neurological, and medical disorders.

2.3. Quality Assessment

Each of the quality studies was assessed using Delphi criteria [ 45 ] with several additional elements [ 46 ]: details of allocation methods, adequate descriptions of control and training groups, statistical comparisons between control and training groups, and dropout reports. The respective total scores (max = 12) are shown in Table 3. The quality assessment also includes assessment for risk of bias, which is shown in criteria numbers 1, 2, 5, 6, 7, 9, and 12.

2.4. Statistical Analysis

Instead of a meta-analysis study, a systematic review of the video game training/video gaming and the effects was conducted because of the variation in ranges of participant age, video game genre, control type, MRI and statistical analysis, and training outcomes. Therefore, the quality, inclusion and exclusion, control, treatment, game title, participants, training period, and MRI analysis and specification of the studies were recorded for the respective games.

The literature search made of the databases yielded 140 scientific articles. All scientific articles were screened based on inclusion and exclusion criteria. Of those 140 scientific articles, nine were eligible for the review [ 40 , 41 , 42 , 43 , 47 , 48 , 49 , 50 , 51 ]. Video gaming effects are listed in Table 2 .

Summary of beneficial effect of video gaming.

AuthorYearParticipant AgeGame GenreControlDurationBeneficial Effect
Gleich et al. [ ]201718–363D adventurepassive8 weeksIncreased activity in hippocampus
Decreased activity in DLPFC
Haier et al. [ ]200912–15puzzlepassive3 monthsIncreased GM in several visual–spatial processing area
Decreased activity in frontal area
Kuhn et al. [ ]201419–293D adventurepassive8 weeksIncreased GM in hippocampal, DLPFC and cerebellum
Lee et al. [ ]201218–30strategyactive8–10 weeksDecreased activity in DLPFC
8–11 weeksNon-significant activity difference
Lorenz et al. [ ]201519–273D adventurepassive8 weeksPreserved activity in ventral striatum
Martinez et al. [ ]201316–21puzzlepassive4 weeksFunctional connectivity change in multimodal integration system
Functional connectivity change in higher-order executive processing
Roush [ ]201350–65rhythm danceactive24 weeksIncreased activity in visuospatial working memory area
Increased activity in emotional and attention area
passiveSimilar compared to active control-
West et al. [ ]201755–753D adventureactive24 weeksNon-significant GM difference
passiveIncreased cognitive performance and short-term memory
Increased GM in hippocampus and cerebellum
West et al. [ ]201818–29FPSactive8 weeksIncreased GM in hippocampus (spatial learner *)
Increased GM in amygdala (response learner *)
Decreased GM in hippocampus (response learner)

Duration was converted into weeks (1 month = 4 weeks); DLPFC, dorsolateral prefrontal cortex; GM, grey matter; FPS, first person shooting. * Participants were categorized based on how they played during the video gaming intervention.

We excluded 121 articles: 46 were not MRI studies, 16 were not controlled studies, 38 were not intervention studies, 13 were review articles, and eight were miscellaneous, including study protocols, non-video gaming studies, and non-brain studies. Of 18 included scientific articles, nine were excluded. Of those nine excluded articles, two were cognitive-based game studies, three were shorter than 4 weeks in duration or were without a specified length intervention, two studies used a non-healthy participant treatment, and one was a combination intervention study. A screening flowchart is portrayed in Figure 1 .

An external file that holds a picture, illustration, etc.
Object name is brainsci-09-00251-g001.jpg

Flowchart of literature search.

3.1. Quality Assessment

The assessment methodology based on Delphi criteria [ 45 ] for the quality of eligible studies is presented in Table 3 . The quality scores assigned to the studies were 3–9 (mean = 6.10; S.D. = 1.69). Overall, the studies showed fair methodological quality according to the Delphi criteria. The highest quality score of the nine eligible articles was assigned to “Playing Super Mario 64 increases hippocampal grey matter in older adult” published by West et al. in 2017, which scored 9 of 12. The scores assigned for criteria 6 (blinded care provider) and 7 (blinded patient) were lowest because of unspecified information related to blinding for those criteria. Additionally, criteria 2 (concealed allocation) and 5 (blinding assessor) were low because only two articles specified that information. All articles met criteria 3 and 4 adequately.

Methodological quality of eligible studies.

AuthorYearQ1Q2Q3Q4Q5Q6Q7Q8Q9Q10Q11Q12Score
Gleich et al. [ ]20171011000001116
Haier et al. [ ]20091011000001105
Kuhn et al. [ ]20141011000001105
Lee et al. [ ]20120011000011116
Lorenz et al. [ ]20151011000101117
Martinez et al. [ ]20130011000000103
Roush [ ]20131111100011007
West et al. [ ]20171111000111119
West et al. [ ]20180011100111017
Score 629920034875

Q1, Random allocation; Q2, Concealed allocation; Q3, Similar baselines among groups; Q4, Eligibility specified; Q5, Blinded assessor outcome; Q6, Blinded care provider; Q7, Blinded patient; Q8, Intention-to-treat analysis; Q9, Detail of allocation method; Q10, Adequate description of each group; Q11, Statistical comparison between groups; Q12, Dropout report (1, specified; 0, unspecified).

3.2. Inclusion and Exclusion

Most studies included participants with little or no experience with gaming and excluded participants with psychiatric/mental, neurological, and medical illness. Four studies specified handedness of the participants and excluded participants with game training experience. The inclusion and exclusion criteria are presented in Table 4 .

Inclusion and exclusion criteria for eligible studies.

AuthorYearInclusionExclusion
i1i2i3e1e2e3e4e5
Gleich et al. [ ]201710011111
Haier et al. [ ]200910111100
Kuhn et al. [ ]201410011111
Lee et al. [ ]201211011010
Lorenz et al. [ ]201511010011
Martinez et al. [ ]201311111001
Roush [ ]201300100100
West et al. [ ]201711011110
West et al. [ ]201810011100
total 84387654

i1, Little/no experience in video gaming; i2, Right-handed; i3, Sex-specific; e1, Psychiatric/mental illness; e2, Neurological illness; e3, Medical illness; e4, MRI contraindication; e5, experience in game training.

3.3. Control Group

Nine eligible studies were categorized as three types based on the control type. Two studies used active control, five studies used passive control, and two studies used both active and passive control. A summary of the control group is presented in Table 5 .

Control group examined eligible studies.

ControlAuthorYear
Active controlLee et al. [ ]2012
West et al. [ ]2018
Passive controlGleich et al. [ ]2017
Haier et al. [ ]2009
Kuhn et al. [ ]2014
Lorenz et al. [ ]2015
Martinez et al. [ ]2013
Active–passive controlRoush [ ]2013
West et al. [ ]2017

3.4. Game Title and Genre

Of the nine eligible studies, four used the same 3D adventure game with different game platforms, which were “Super Mario 64” original and the DS version. One study used first-person shooting (FPS) shooting games with many different game titles: “Call of Duty” is one title. Two studies used puzzle games: “Tetris” and “Professor Layton and The Pandora’s Box.” One study used a rhythm dance game: Dance Revolution. One study used a strategy game: “Space Fortress.” Game genres are presented in Table 6 .

Genres and game titles of video gaming intervention.

GenreAuthorYearTitle
3D adventureGleich et al. [ ]2017Super Mario 64 DS
Kuhn et al. [ ]2014Super Mario 64
Lorenz et al. [ ]2015Super Mario 64 DS
West et al. [ ]2017Super Mario 64
FPSWest et al. * [ ]2018Call of Duty
PuzzleHaier et al. [ ]2009Tetris
Martinez et al. [ ]2013Professor Layton and The Pandora’s Box
Rhythm danceRoush [ ]2013Dance Revolution
StrategyLee et al. [ ]2012Space Fortress

* West et al. used multiple games; other games are Call of Duty 2, 3, Black Ops, and World at War, Killzone 2 and 3, Battlefield 2, 3, and 4, Resistance 2 and Fall of Man, and Medal of Honor.

3.5. Participants and Sample Size

Among the nine studies, one study examined teenage participants, six studies included young adult participants, and two studies assessed older adult participants. Participant information is shown in Table 7 . Numbers of participants were 20–75 participants (mean = 43.67; S.D. = 15.63). Three studies examined female-only participants, whereas six others used male and female participants. Six studies with female and male participants had more female than male participants.

Participant details of eligible studies.

CategoryAuthorYearAgeSample SizeRatio (%)Detail
LowestHighestRangeFemaleMale
TeenagerHaier et al. [ ]2009121534470.4529.54Training ( 24)
Control ( 20)
Young adultGleich et al. [ ]2017183618261000Training ( 15)
Control ( 11)
Kuhn et al. [ ]20141929104870.829.2Training ( 23)
Control ( 25)
Lee et al. [ ]20121830127561.438.6Training A ( 25)
Training B ( 25)
Control ( 25)
Lorenz et al. [ ]201519278507228Training ( 25
Control ( 25)
Martinez et al. [ ]201316215201000Training ( 10)
Control ( 10)
West et al. [ ]20181829114367.432.5Action game ( 21)
Non-action game ( 22)
Older adultRoush [ ]2013506515391000Training ( 19)
Active control ( 15)
Passive control ( 5)
West et al. [ ]20175575204866.733.3Training ( 19)
Active control ( 14)
Passive control ( 15)

3.6. Training Period and Intensity

The training period was 4–24 weeks (mean = 11.49; S.D. = 6.88). One study by Lee et al. had two length periods and total hours because the study examined video game training of two types. The total training hours were 16–90 h (mean = 40.63; S.D. = 26.22), whereas the training intensity was 1.5–10.68 h/week (mean = 4.96; S.D. = 3.00). One study did not specify total training hours. Two studies did not specify the training intensity. The training periods and intensities are in Table 8 .

Periods and intensities of video gaming intervention.

AuthorYearLength (Week)Total HoursAverage Intensity (h/Week)
Gleich et al. [ ]2017849.56.2
Haier et al. [ ]200912181.5
Kuhn et al. [ ]2014846.885.86
Lorenz et al. [ ]20128283.5
Lee et al. [ ]20158–11 *27n/a
Martinez et al. [ ]20134164
Roush [ ]201324nsn/a
West et al. [ ]201724723
West et al. [ ]20188.49010.68

The training length was converted into weeks (1 month = 4 weeks). ns, not specified; n/a, not available; * exact length is not available.

3.7. MRI Analysis and Specifications

Of nine eligible studies, one study used resting-state MRI analysis, three studies (excluding that by Haier et al. [ 40 ]) used structural MRI analysis, and five studies used task-based MRI analysis. A study by Haier et al. used MRI analyses of two types [ 40 ]. A summary of MRI analyses is presented in Table 9 . The related resting-state, structural, and task-based MRI specifications are presented in Table 10 , Table 11 and Table 12 respectively.

MRI analysis details of eligible studies.

MRI AnalysisAuthorYearContrastStatistical ToolStatistical Method Value
RestingMartinez et al. [ ]2013(post- > pre-training) > (post>pre-control)MATLAB; SPM8TFCE uncorrected<0.005
StructuralHaier et al. * [ ]2009(post>pre-training) > (post>pre-control)MATLAB 7; SurfStatFWE corrected<0.005
Kuhn et al. [ ]2014(post>pre-training) > (post>pre-control)VBM8; SPM8FWE corrected<0.001
West et al. [ ]2017(post>pre-training) > (post>pre-control)BpipeUncorrected<0.0001
West et al. [ ]2018(post>pre-training) > (post>pre-control)BpipeBonferroni corrected<0.001
TaskGleich et al. [ ]2017(post>pre-training) > (post>pre-control)SPM12Monte Carlo corrected<0.05
Haier et al. * [ ]2009(post>pre-training) > (post>pre-control)SPM7FDR corrected<0.05
Lee et al. [ ]2012(post>pre-training) > (post>pre-control)FSL; FEATuncorrected<0.01
Lorenz et al. [ ]2015(post>pre-training) > (post>pre-control)SPM8Monte Carlo corrected<0.05
Roush [ ]2013post>pre-trainingMATLAB 7; SPM8uncorrected=0.001

* Haier et al. conducted structural and task analyses. + Compared pre-training and post-training between groups without using contrast. TFCE, Threshold Free Cluster Enhancement; FEW, familywise error rate; FDR, false discovery rate.

Resting-State MRI specifications of eligible studies.

AuthorYearResting StateStructural
ImagingTR (s)TE (ms)SliceImagingTR (s)TE (ms)Slice
] 2013gradient-echo planar image328.136T1-weighted0.924.2158

Structural MRI specifications of eligible studies.

AuthorYearImagingTR (s)TE (ms)
Kuhn et al. [ ]20143D T1 weighted MPRAGE2.54.77
West et al. [ ]20173D gradient echo MPRAGE2.32.91
West et al. [ ]20183D gradient echo MPRAGE2.32.91

Task-Based MRI specifications of eligible studies.

AuthorYearTaskBOLDStructural
ImagingTR (s)TE (ms)SliceImagingTR (s)TE (ms)Slice
Gleich et al. [ ]2017win–loss paradigmT2 echo-planar image23036T1-weighted2.54.77176
Haier et al. [ ]2009TetrisFunctional echo planar 229ns5-echo MPRAGE2.531.64; 3.5; 5.36; 7.22; 9.08ns
Lee et al. [ ]2012game controlfast echo-planar image225nsT1-weighted MPRAGE1.83.87144
Lorenz et al. [ ]2015slot machine paradigmT2 echo-planar image23036T1-weighted MPRAGE2.54.77ns
Roush [ ]2013digit symbol substitutionfast echo-planar image22534diffusion weighted imagensnsns

All analyses used 3 Tesla magnetic force; TR = repetition time; TE = echo time, ns = not specified.

4. Discussion

This literature review evaluated the effect of noncognitive-based video game intervention on the cognitive function of healthy people. Comparison of studies is difficult because of the heterogeneities of participant ages, beneficial effects, and durations. Comparisons are limited to studies sharing factors.

4.1. Participant Age

Video gaming intervention affects all age categories except for the children category. The exception derives from a lack of intervention studies using children as participants. The underlying reason for this exception is that the brain is still developing until age 10–12 [ 52 , 53 ]. Among the eligible studies were a study investigating adolescents [ 40 ], six studies investigating young adults [ 41 , 42 , 43 , 47 , 49 , 51 ] and two studies investigating older adults [ 48 , 50 ].

Differences among study purposes underlie the differences in participant age categories. The study by Haier et al. was intended to study adolescents because the category shows the most potential brain changes. The human brain is more sensitive to synaptic reorganization during the adolescent period [ 54 ]. Generally, grey matter decreases whereas white matter increases during the adolescent period [ 55 , 56 ]. By contrast, the cortical surface of the brain increases despite reduction of grey matter [ 55 , 57 ]. Six studies were investigating young adults with the intention of studying brain changes after the brain reaches maturity. The human brain reaches maturity during the young adult period [ 58 ]. Two studies were investigating older adults with the intention of combating difficulties caused by aging. The human brain shrinks as age increases [ 56 , 59 ], which almost invariably leads to declining cognitive function [ 59 , 60 ].

4.2. Beneficial Effects

Three beneficial outcomes were observed using MRI method: grey matter change [ 40 , 42 , 50 ], brain activity change [ 40 , 43 , 47 , 48 , 49 ], and functional connectivity change [ 41 ]. The affected brain area corresponds to how the respective games were played.

Four studies of 3D video gaming showed effects on the structure of hippocampus, dorsolateral prefrontal cortex (DLPFC), cerebellum [ 42 , 43 , 50 ], and DLPFC [ 43 ] and ventral striatum activity [ 49 ]. In this case, the hippocampus is used for memory [ 61 ] and scene recognition [ 62 ], whereas the DLPFC and cerebellum are used for working memory function for information manipulation and problem-solving processes [ 63 ]. The grey matter of the corresponding brain region has been shown to increase during training [ 20 , 64 ]. The increased grey matter of the hippocampus, DLPFC, and cerebellum are associated with better performance in reference and working memory [ 64 , 65 ].

The reduced activity of DLPFC found in the study by Gleich et al. corresponds to studies that showed reduced brain activity associated with brain training [ 66 , 67 , 68 , 69 ]. Decreased activity of the DLPFC after training is associated with efficiency in divergent thinking [ 70 ]. 3D video gaming also preserved reward systems by protecting the activity of the ventral striatum [ 71 ].

Two studies of puzzle gaming showed effects on the structure of the visual–spatial processing area, activity of the frontal area, and functional connectivity change. The increased grey matter of the visual–spatial area and decreased activity of the frontal area are similar to training-associated grey matter increase [ 20 , 64 ] and activity decrease [ 66 , 67 , 68 , 69 ]. In this case, visual–spatial processing and frontal area are used constantly for spatial prediction and problem-solving of Tetris. Functional connectivity of the multimodal integration and the higher-order executive system in the puzzle solving-based gaming of Professor Layton game corresponds to studies which demonstrated training-associated functional connectivity change [ 72 , 73 ]. Good functional connectivity implies better performance [ 73 ].

Strategy gaming affects the DLPFC activity, whereas rhythm gaming affects the activity of visuospatial working memory, emotional, and attention area. FPS gaming affects the structure of the hippocampus and amygdala. Decreased DLPFC activity is similar to training-associated activity decrease [ 66 , 67 , 68 , 69 ]. A study by Roush demonstrated increased activity of visuospatial working memory, emotion, and attention area, which might occur because of exercise and gaming in the Dance Revolution game. Results suggest that positive activations indicate altered functional areas by complex exercise [ 48 ]. The increased grey matter of the hippocampus and amygdala are similar to the training-associated grey matter increase [ 20 , 64 ]. The hippocampus is used for 3D navigation purposes in the FPS world [ 61 ], whereas the amygdala is used to stay alert during gaming [ 74 ].

4.3. Duration

Change of the brain structure and function was observed after 16 h of video gaming. The total durations of video gaming were 16–90 h. However, the gaming intensity must be noted because the gaming intensity varied: 1.5–10.68 h per week. The different intensities might affect the change of cognitive function. Cognitive intervention studies demonstrated intensity effects on the cortical thickness of the brain [ 75 , 76 ]. A similar effect might be observed in video gaming studies. More studies must be conducted to resolve how the intensity can be expected to affect cognitive function.

4.4. Criteria

Almost all studies used inclusion criteria “little/no experience with video games.” The criterion was used to reduce the factor of gaming-related experience on the effects of video gaming. Some of the studies also used specific handedness and specific sex of participants to reduce the variation of brain effects. Expertise and sex are shown to affect brain activity and structure [ 77 , 78 , 79 , 80 ]. The exclusion criterion of “MRI contraindication” is used for participant safety for the MRI protocol, whereas exclusion criteria of “psychiatric/mental illness”, “neurological illness”, and “medical illness” are used to standardize the participants.

4.5. Limitations and Recommendations

Some concern might be raised about the quality of methodology, assessed using Delphi criteria [ 45 ]. The quality was 3–9 (mean = 6.10; S.D. = 1.69). Low quality in most papers resulted from unspecified information corresponding to the criteria. Quality improvements for the studies must be performed related to the low quality of methodology. Allocation concealment, assessor blinding, care provider blinding, participant blinding, intention-to-treat analysis, and allocation method details must be improved in future studies.

Another concern is blinding and control. This type of study differs from medical studies in which patients can be blinded easily. In studies of these types, the participants were tasked to do either training as an active control group or to do nothing as a passive control group. The participants can expect something from the task. The expectation might affect the outcomes of the studies [ 81 , 82 , 83 ]. Additionally, the waiting-list control group might overestimate the outcome of training [ 84 ].

Considering the sample size, which was 20–75 (mean = 43.67; S.D. = 15.63), the studies must be upscaled to emphasize video gaming effects. There are four phases of clinical trials that start from the early stage and small-scale phase 1 to late stage and large-scale phase 3 and end in post-marketing observation phase 4. These four phases are used for drug clinical trials, according to the food and drug administration (FDA) [ 85 ]. Phase 1 has the purpose of revealing the safety of treatment with around 20–100 participants. Phase 2 has the purpose of elucidating the efficacy of the treatment with up to several hundred participants. Phase 3 has the purpose of revealing both efficacy and safety among 300–3000 participants. The final phase 4 has the purpose of finding unprecedented adverse effects of treatment after marketing. However, because medical studies and video gaming intervention studies differ in terms of experimental methods, slight modifications can be done for adaptation to video gaming studies.

Several unresolved issues persist in relation to video gaming intervention. First, no studies assessed chronic/long-term video gaming. The participants might lose their motivation to play the same game over a long time, which might affect the study outcomes [ 86 ]. Second, meta-analyses could not be done because the game genres are heterogeneous. To ensure homogeneity of the study, stricter criteria must be set. However, this step would engender a third limitation. Third, randomized controlled trial video gaming studies that use MRI analysis are few. More studies must be conducted to assess the effects of video gaming. Fourth, the eligible studies lacked cognitive tests to validate the cognitive change effects for training. Studies of video gaming intervention should also include a cognitive test to ascertain the relation between cognitive function and brain change.

5. Conclusions

The systematic review has several conclusions related to beneficial effects of noncognitive-based video games. First, noncognitive-based video gaming can be used in all age categories as a means to improve the brain. However, effects on children remain unclear. Second, noncognitive-based video gaming affects both structural and functional aspects of the brain. Third, video gaming effects were observed after a minimum of 16 h of training. Fourth, some methodology criteria must be improved for better methodological quality. In conclusion, acute video gaming of a minimum of 16 h is beneficial for brain function and structure. However, video gaming effects on the brain area vary depending on the video game type.

Acknowledgments

We would like to thank all our other colleagues in IDAC, Tohoku University for their support.

PRISMA Checklist of the literature review.

Section/Topic #Checklist Item Reported on Page #
Title 1Identify the report as a systematic review, meta-analysis, or both. 1
Structured summary 2Provide a structured summary including, as applicable: background; objectives; data sources; study eligibility criteria, participants, and interventions; study appraisal and synthesis methods; results; limitations; conclusions and implications of key findings; systematic review registration number. 1
Rationale 3Describe the rationale for the review in the context of what is already known. 1, 2
Objectives 4Provide an explicit statement of questions being addressed related to participants, interventions, comparisons, outcomes, and study design (PICOS). 2
Protocol and registration 5Indicate if a review protocol exists, if and where it is accessible (e.g., Web address), and if available, provide registration information including registration number. 2
Eligibility criteria 6Specify study characteristics (e.g., PICOS, length of follow-up) and report characteristics (e.g., years considered, language, publication status) used as criteria for eligibility, giving rationale. 2
Information sources 7Describe all information sources (e.g., databases with dates of coverage, contact with study authors to identify additional studies) in the search and date last searched. 2
Search 8Present full electronic search strategy for at least one database, including any limits used, such that it could be repeated. 2
Study selection 9State the process for selecting studies (i.e., screening, eligibility, included in systematic review, and if applicable, included in the meta-analysis). 3
Data collection process 10Describe method of data extraction from reports (e.g., piloted forms, independently, in duplicate) and any processes for obtaining and confirming data from investigators. 3
Data items 11List and define all variables for which data were sought (e.g., PICOS, funding sources) and any assumptions and simplifications made. 3
Risk of bias in individual studies 12Describe methods used for assessing risk of bias of individual studies (including specification of whether this was done at the study or outcome level), and how this information is to be used in any data synthesis. 2
Summary measures 13State the principal summary measures (e.g., risk ratio, difference in means). -
Synthesis of results 14Describe the methods of handling data and combining results of studies, if done, including measures of consistency (e.g., I ) for each meta-analysis. -
Risk of bias across studies 15Specify any assessment of risk of bias that might affect the cumulative evidence (e.g., publication bias, selective reporting within studies). -
Additional analyses 16Describe methods of additional analyses (e.g., sensitivity or subgroup analyses, meta-regression), if done, indicating which were pre-specified. -
Study selection 17Give numbers of studies screened, assessed for eligibility, and included in the review, with reasons for exclusions at each stage, ideally with a flow diagram. 3,5
Study characteristics 18For each study, present characteristics for which data were extracted (e.g., study size, PICOS, follow-up period) and provide the citations. 5-11
Risk of bias within studies 19Present data on risk of bias of each study, and if available, any outcome level assessment (see item 12). 5,6
Results of individual studies 20For all outcomes considered (benefits or harms), present, for each study: (a) simple summary data for each intervention group (b) effect estimates and confidence intervals, ideally with a forest plot. 4
Synthesis of results 21Present results of each meta-analysis done, including confidence intervals and measures of consistency. -
Risk of bias across studies 22Present results of any assessment of risk of bias across studies (see Item 15). -
Additional analysis 23Give results of additional analyses, if done (e.g., sensitivity or subgroup analyses, meta-regression [see Item 16]). -
Summary of evidence 24Summarize the main findings including the strength of evidence for each main outcome; consider their relevance to key groups (e.g., healthcare providers, users, and policy makers). 12,13
Limitations 25Discuss limitations at study and outcome level (e.g., risk of bias), and at review-level (e.g., incomplete retrieval of identified research, reporting bias). 13
Conclusions 26Provide a general interpretation of the results in the context of other evidence, and implications for future research. 14
Funding 27Describe sources of funding for the systematic review and other support (e.g., supply of data); role of funders for the systematic review. 14

For more information, visit: www.prisma-statement.org .

Author Contributions

D.B.T., R.N., and R.K. designed the systematic review. D.B.T. and R.N. searched and selected the papers. D.B.T. and R.N. wrote the manuscript with R.K. All authors read and approved the final manuscript. D.B.T. and R.N. contributed equally to this work.

Study is supported by JSPS KAKENHI Grant Number 17H06046 (Grant-in-Aid for Scientific Research on Innovative Areas) and 16KT0002 (Grant-in-Aid for Scientific Research (B)).

Conflicts of Interest

None of the other authors has any conflict of interest to declare. Funding sources are not involved in the study design, collection, analysis, interpretation of data, or writing of the study report.

IMAGES

  1. How To Write A Research Paper Step By Step

    video research papers

  2. Research Paper

    video research papers

  3. Tips For How To Write A Scientific Research Paper

    video research papers

  4. Research Paper Format

    video research papers

  5. (PDF) “How to Write a Scientific Research Paper”, International Journal

    video research papers

  6. Research Document Example

    video research papers

VIDEO

  1. Research basics

  2. All About Research Papers

  3. How to understand any research paper in seconds ⏰ #academia #literaturereview

  4. How to write a research paper

  5. What is research

  6. Writing Research Abstracts

COMMENTS

  1. Peer Reviewed Scientific Video Journal Article Protocols

    JoVE publishes peer-reviewed scientific video article protocols to accelerate biological, medical, chemical, physical research. Watch it now!

  2. Vidu: a Highly Consistent, Dynamic and Skilled Text-to-Video Generator

    We introduce Vidu, a high-performance text-to-video generator that is capable of producing 1080p videos up to 16 seconds in a single generation. Vidu is a diffusion model with U-ViT as its backbone, which unlocks the scalability and the capability for handling long videos. Vidu exhibits strong coherence and dynamism, and is capable of generating both realistic and imaginative videos, as well ...

  3. Performing Qualitative Content Analysis of Video Data in Social

    As video data vary considerably in type, characteristics and duration, the units of analysis need careful selection. This task mainly involves decisions around whether to examine the entire video as one piece, or segmentation of the video in terms of duration, characteristics of the video data and the research questions (Clarke, 2005). For ...

  4. Video Processing Using Deep Learning Techniques: A Systematic

    Studies show lots of advanced research on various data types such as image, speech, and text using deep learning techniques, but nowadays, research on video processing is also an emerging field of computer vision. Several surveys are present on video processing using computer vision deep learning techniques, targeting specific functionality such as anomaly detection, crowd analysis, activity ...

  5. Make-A-Video: Text-to-Video Generation without Text-Video Data

    We propose Make-A-Video -- an approach for directly translating the tremendous recent progress in Text-to-Image (T2I) generation to Text-to-Video (T2V). Our intuition is simple: learn what the world looks like and how it is described from paired text-image data, and learn how the world moves from unsupervised video footage. Make-A-Video has three advantages: (1) it accelerates training of the ...

  6. VideoPoet: A Large Language Model for Zero-Shot Video Generation

    We present VideoPoet, a language model capable of synthesizing high-quality video, with matching audio, from a large variety of conditioning signals. VideoPoet employs a decoder-only transformer architecture that processes multimodal inputs -- including images, videos, text, and audio. The training protocol follows that of Large Language Models (LLMs), consisting of two stages: pretraining and ...

  7. Video Generation

    carolineec/EverybodyDanceNow • • ICCV 2019. This paper presents a simple method for "do as I do" motion transfer: given a source video of a person dancing, we can transfer that performance to a novel (amateur) target after only a few minutes of the target subject performing standard moves. 13. Paper. Code.

  8. Characteristics, hotspots, and prospects of short video research: A

    The number of such papers is increasing annually in China; moreover, several core groups of authors and research institutions focusing on short video research have already been formed. Some popular topics of research on these videos include the main characteristics of short videos, phenomenon of media convergence based on short videos, and ...

  9. Video summarization using deep learning techniques: a ...

    One of the critical multimedia analysis problems in today's digital world is video summarization (VS). Many VS methods have been suggested based on deep learning methods. Nevertheless, These are inefficient in processing, extracting, and deriving information in the minimum amount of time from long-duration videos. Detailed analysis and investigation of numerous deep learning approach ...

  10. Text-to-Video Generation

    Text-to-video is a rapidly growing research area that aims to generate a semantic, identical, and temporal coherence sequence of frames that accurately align with the input text prompt. 2. Paper. Code. This task refers to video generation based on a given sentence or sequence of words.

  11. Video Processing Using Deep Learning Techniques: A Systematic

    This paper aims to present a Systematic Literature. Review (SLR) on video processing using deep learning to in vestigate the applications, functionalities, techniques, datasets, issues, and ...

  12. A survey of recent work on video summarization: approaches and

    The paper has been organised as follows: Section 2 presents the video summarization approach by formulating the problem mathematically and discussing the general existing frameworks. Section 3 discusses the video summarization techniques by furnishing possible categorisation of summarization techniques, multi-video summarization and presenting the paradigm shift that has occurred in the ...

  13. Using video-based observation research methods in primary care health

    The purpose of this paper is to describe the use of video-based observation research methods in primary care environment and highlight important methodological considerations and provide practical guidance for primary care and human factors researchers conducting video studies to understand patient-clinician interaction in primary care settings ...

  14. Title: Video Understanding with Large Language Models: A Survey

    View a PDF of the paper titled Video Understanding with Large Language Models: A Survey, by Yunlong Tang and 19 other authors. With the burgeoning growth of online video platforms and the escalating volume of video content, the demand for proficient video understanding tools has intensified markedly. Given the remarkable capabilities of large ...

  15. Video Improves Learning in Higher Education: A Systematic Review

    As a result, many different media—videos, lectures, videoconferences—can leverage these methods (e.g., using video and audio channels to communicate information). This distinction between media and method has a long and controversial history (Clark, 1983, 1994; Kozma, 1994; Warnick & Burbules, 2007).On one hand, many of the mechanisms by which media improve learning can be replicated in ...

  16. (PDF) Analysing video and audio data: existing ...

    MK7 6A A. +44 (0)1908 65 9866. e.j.fitz gerald@op en.ac.uk. ABSTRACT. Across many subject disciplines, video and audio data are. recorded in order to document processes, procedures or ...

  17. VideoPoet: A large language model for zero-shot video generation

    An example of image-to-video with text prompts to guide the motion. Each video is paired with an image to its left. Left: "A ship navigating the rough seas, thunderstorm and lightning, animated oil on canvas".Middle: "Flying through a nebula with many twinkling stars".Right: "A wanderer on a cliff with a cane looking down at the swirling sea fog below on a windy day".

  18. Research as storytelling: the use of video for mixed methods research

    Video, when used as a tool for research, can document and share ethnographic, epistemic, and storytelling data to participants and to the research team (R. Goldman, 2007; Heath et al., 2010; Miller & Zhou, 2007; Tobin & Hsueh, 2007).Much of the research in this area focuses on the properties (both positive and negative) inherent in the camera itself such as how video footage can increase the ...

  19. A systematic review on content-based video retrieval

    Abstract. Content-based video retrieval and indexing have been associated with intelligent methods in many applications such as education, medicine and agriculture. However, an extensive and replicable review of the recent literature is missing. Moreover, relevant topics that can support video retrieval, such as dimensionality reduction, have ...

  20. Intelligent video surveillance: a review through deep learning

    Big data applications are consuming most of the space in industry and research area. Among the widespread examples of big data, the role of video streams from CCTV cameras is equally important as other sources like social media data, sensor data, agriculture data, medical data and data evolved from space research. Surveillance videos have a major contribution in unstructured big data. CCTV ...

  21. How to create a scientific explainer video or video abstract (with

    Develop a promotional video to help your new research project get traction. Use a video in a class lecture to demonstrate a particular scientific phenomena. Add a video of your research process or computer simulation to a journal paper. In a workshop, use a video to show people how your new test method works.

  22. Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis

    View a PDF of the paper titled Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis, by Willi Menapace and 10 other authors. Contemporary models for generating images show remarkable quality and versatility. Swayed by these advantages, the research community repurposes them to generate videos.

  23. Does Video Gaming Have Impacts on the Brain: Evidence from a Systematic

    Recently, numerous video gaming studies showed beneficial effects on cognition and the brain. A systematic review of video gaming has been published. However, the previous systematic review has several differences to this systematic review. This systematic review evaluates the beneficial effects of video gaming on neuroplasticity specifically ...

  24. Between Escapism and Social Engagement: Ted Lasso and the Privileges of

    Shannon Sweeney is a PhD candidate at the University of Iowa. A television studies scholar, her research revolves around online television in its various forms. She is interested in the relationship between television and everyday life, such as how the medium can create or reinforce our preferences and identities.