Multimodal Learning With Transformers: A Survey
New citation alert added.
This alert has been successfully added and will be sent to:
You will be notified whenever a record that you have chosen has been cited.
To manage your alert preferences, click on the button below.
New Citation Alert!
Please log in to your account
Information & Contributors
Bibliometrics & citations, view options.
- Maurya A Ye J Rafique M Cappello F Nicolae B Costan A Nicolae B Sato K (2024) Breaking the Memory Wall: A Study of I/O Patterns and GPU Memory Utilization for Hybrid CPU-GPU Offloaded Optimizers Proceedings of the 14th Workshop on AI and Scientific Computing at Scale using Flexible Computing Infrastructures 10.1145/3659995.3660038 (9-16) Online publication date: 3-Jun-2024 https://dl.acm.org/doi/10.1145/3659995.3660038
- Kim H Roknaldin A Nayak S Chavan A Lu S Joyner D Kim M Wang X Xia M (2024) Multimodal Deep Learning for Classifying Student-generated Questions in Computer-supported Collaborative Learning Proceedings of the Eleventh ACM Conference on Learning @ Scale 10.1145/3657604.3662026 (134-142) Online publication date: 9-Jul-2024 https://dl.acm.org/doi/10.1145/3657604.3662026
- Zhao Y Harrison B Yu T (2024) DinoDroid: Testing Android Apps Using Deep Q-Networks ACM Transactions on Software Engineering and Methodology 10.1145/3652150 33 :5 (1-24) Online publication date: 4-Jun-2024 https://dl.acm.org/doi/10.1145/3652150
- Show More Cited By
Recommendations
A survey on deep learning for multimodal data fusion.
With the wide deployments of heterogeneous networks, huge amounts of data with characteristics of high volume, high variety, high velocity, and high veracity are generated. These data, referred to multimodal big data, contain abundant intermodality and ...
Classifying Multimodal Data Using Transformers
The increasing prevalence of multimodal data in our society has led to the increased need for machines to make sense of such data holistically. However, data scientists and machine learning engineers aspiring to work on such data face challenges fusing ...
Learning human multimodal dialogue strategies
We investigate the use of different machine learning methods in combination with feature selection techniques to explore human multimodal dialogue strategies and the use of those strategies for automated dialogue systems. We learn policies from data ...
Information
Published in.
IEEE Computer Society
United States
Publication History
- Research-article
Contributors
Other metrics, bibliometrics, article metrics.
- 16 Total Citations View Citations
- 0 Total Downloads
- Downloads (Last 12 months) 0
- Downloads (Last 6 weeks) 0
- Ma J Wang P Kong D Wang Z Liu J Pei H Zhao J (2024) Robust Visual Question Answering: Datasets, Methods, and Future Challenges IEEE Transactions on Pattern Analysis and Machine Intelligence 10.1109/TPAMI.2024.3366154 46 :8 (5575-5594) Online publication date: 1-Aug-2024 https://dl.acm.org/doi/10.1109/TPAMI.2024.3366154
- Wu J Li X Xu S Yuan H Ding H Yang Y Li X Zhang J Tong Y Jiang X Ghanem B Tao D (2024) Towards Open Vocabulary Learning: A Survey IEEE Transactions on Pattern Analysis and Machine Intelligence 10.1109/TPAMI.2024.3361862 46 :7 (5092-5113) Online publication date: 5-Feb-2024 https://dl.acm.org/doi/10.1109/TPAMI.2024.3361862
- Tao Y Yang M Li H Wu Y Hu B (2024) DepMSTAT: Multimodal Spatio-Temporal Attentional Transformer for Depression Detection IEEE Transactions on Knowledge and Data Engineering 10.1109/TKDE.2024.3350071 36 :7 (2956-2966) Online publication date: 5-Jan-2024 https://dl.acm.org/doi/10.1109/TKDE.2024.3350071
- Liu P Ge Y Duan L Li W Luo H Lv F (2024) Transferring Multi-Modal Domain Knowledge to Uni-Modal Domain for Urban Scene Segmentation IEEE Transactions on Intelligent Transportation Systems 10.1109/TITS.2024.3382880 25 :9 (11576-11589) Online publication date: 10-Apr-2024 https://dl.acm.org/doi/10.1109/TITS.2024.3382880
- Tariq S Khalid U Arfeto B Duong T Shin H (2024) Integrating Sustainable Big AI: Quantum Anonymous Semantic Broadcast IEEE Wireless Communications 10.1109/MWC.007.2300503 31 :3 (86-99) Online publication date: 14-Jun-2024 https://dl.acm.org/doi/10.1109/MWC.007.2300503
- Lu N Tan Z Qian J (2024) MRSLN Neurocomputing 10.1016/j.neucom.2024.127467 580 :C Online publication date: 1-May-2024 https://dl.acm.org/doi/10.1016/j.neucom.2024.127467
- Mohammed A Geng X Wang J Ali Z (2024) Driver distraction detection using semi-supervised lightweight vision transformer Engineering Applications of Artificial Intelligence 10.1016/j.engappai.2023.107618 129 :C Online publication date: 16-May-2024 https://dl.acm.org/doi/10.1016/j.engappai.2023.107618
View options
Login options.
Check if you have access through your login credentials or your institution to get full access on this article.
Full Access
Share this publication link.
Copying failed.
Share on social media
Affiliations, export citations.
- Please download or close your previous search result export first before starting a new bulk export. Preview is not available. By clicking download, a status dialog will open to start the export process. The process may take a few minutes but once it finishes a file will be downloadable from your browser. You may continue to browse the DL while the export process is in progress. Download
- Download citation
- Copy citation
We are preparing your search results for download ...
We will inform you here when the file is ready.
Your file of search results citations is now ready.
Your search export query has expired. Please try again.
- Access through your organization
- Purchase PDF
Article preview
Introduction, section snippets, references (243), cited by (30).
Expert Systems with Applications
Review a comprehensive survey on applications of transformers for deep learning tasks.
- • The paper presents a comprehensive survey on transformers for deep learning tasks.
- • The paper conducts a thorough analysis on highly effective models in five domains.
- • The paper classifies the models based on respective tasks using a proposed taxonomy.
- • The characteristics of the surveyed models are deeply explored and analyzed.
- • Future directions and challenges for transformer-based models are deciphered.
Preliminaries
Research methodology, related work, transformer applications, application-based classification taxonomy of transformers, future prospects and challenges, credit authorship contribution statement, declaration of competing interest, acknowledgement, temporal convolutional networks and transformers for classifying the sleep stage in awake or asleep using pulse oximetry signals, journal of computational science, framewise phoneme classification with bidirectional lstm and other neural network architectures, neural networks, image segmentation techniques, computer vision, graphics, and image processing, fully transformer network for skin lesion analysis, medical image analysis, deep learning architectures in emerging cloud computing architectures: recent development, challenges and next research trend, applied soft computing, swinbts: a method for 3d multimodal brain tumor segmentation using swin transformer, brain sciences, an end-to-end framework combining time-frequency expert knowledge and modified transformer networks for vibration signal classification, vision transformer in stenosis detection of coronary arteries, ammu: a survey of transformer-based biomedical pretrained language models, journal of biomedical informatics, deepgene transformer: transformer for the gene expression-based classification of cancer subtypes, knowledge distillation-based deep learning classification network for peripheral blood leukocytes, biomedical signal processing and control, transforming medical imaging with transformers a comparative review of key properties, current progresses, and future perspectives, transformer models for text-based emotion detection: a review of bert-based approaches, artificial intelligence review, sit: self-supervised vision transformer.
- Akbari, H., Yuan, L., Qian, R., Chuang, W., Chang, S., Cui, Y., et al. (2021). VATT: Transformers for Multimodal...
VQA: visual question answering
Xls-r: self-supervised cross-lingual speech representation learning at scale, vq-wav2vec: self-supervised learning of discrete speech representations.
- Baevski, A., Zhou, Y., Mohamed, A., & Auli, M. (2020). wav2vec 2.0: A Framework for Self-Supervised Learning of Speech...
- Bahdanau, D., Cho, K., & Bengio, Y. (2015). Neural Machine Translation by Jointly Learning to Align and Translate. In...
An empirical evaluation of generic convolutional and recurrent networks for sequence modeling
Beit: bert pre-training of image transformers, a large annotated corpus for learning natural language inference, visualizing transformers for nlp: a brief survey.
- Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., et al. (2020). Language Models are Few-Shot...
End-to-end object detection with transformers
Constrained transformer network for ecg signal processing and arrhythmia classification, bmc medical informatics and decision making, artificial neural networks-based machine learning for wireless networks: a tutorial, ieee communications surveys and tutorials.
- Chen, C., Li, O., Tao, D., Barnett, A., Rudin, C., & Su, J. (2019). This Looks Like That: Deep Learning for...
UNITER: universal image-text representation learning
A more effective ct synthesizer using transformers for cone-beam ct-guided adaptive radiotherapy, frontiers in oncology.
- Chen, L., Lu, K., Rajeswaran, A., Lee, K., Grover, A., Laskin, M., et al. (2021). Decision Transformer: Reinforcement...
Dual-path transformer network: Direct context-aware modeling for end-to-end monaural speech separation
Generative pretraining from pixels, wavlm: large-scale self-supervised pre-training for full stack speech processing, ieee journal of selected topics in signal processing, unispeech-sat: universal speech representation learning with speaker aware pre-training, natural language processing, fundamentals of artificial intelligence, electra: pre-training text encoders as discriminators rather than generators, transformers as soft reasoners over language, wireless power transfer for future networks: signal processing, machine learning, computing, and sensing, unsupervised cross-lingual representation learning for speech recognition.
- Conneau, A., & Lample, G. (2019). Cross-lingual Language Model Pretraining. In H. M. Wallach, H. Larochelle, A....
ConViT: Improving vision transformers with soft convolutional inductive biases
New types of deep neural network learning for speech recognition and related applications: an overview, bert: pre-training of deep bidirectional transformers for language understanding.
- Ding, M., Yang, Z., Hong, W., Zheng, W., Zhou, C., Yin, D., et al. (2021). CogView: Mastering Text-to-Image Generation...
Speech-transformer: A no-recurrence sequence-to-sequence model for speech recognition
An image is worth 16 × 16 words: transformers for image recognition at scale, multi30k: multilingual english-german image descriptions, switch transformers: scaling to trillion parameter models with simple and efficient sparsity, journal of machine learning research, full-field temperature prediction in tunnel fires using limited monitored ceiling flow temperature data with transformer-based deep learning models, recent advancements and applications of deep learning in heart failure: α systematic review, antibody design using deep learning: from sequence and structure design to affinity maturation, advancements in deep learning for b-mode ultrasound segmentation: a comprehensive review, anomaly detection for asynchronous multivariate time series of nuclear power plants using a temporal-spatial transformer, natural language processing for detecting brand hate speech.
Design and Implementation Smart Transformer based on IoT
- August 2019
- Conference: IEEE International Conference on Computing, Electronics and Communications Engineering
- Al Jabal Al Gharbi University
- This person is not on ResearchGate, or hasn't claimed this research yet.
Abstract and Figures
Discover the world's research
- 25+ million members
- 160+ million publication pages
- 2.3+ billion citations
- Walid K Hasan
- Haitham Khaled
- Nurgül Erdal
- K Yasavanth kumar
- Ruhul Amin Choudhury
- D. Saravanan
- Gowdhamkumar S
- Jambulingam S
- Shaista Hassan Mir
- Sahreen Ashruf
- MOBILE NETW APPL
- Benedict Occhiogrosso
- Donald Wilcher
- Yasser Gadallah
- Ehab Elalamy
- M.S. Sujatha
- S Sahreen Ashruf
- Recruit researchers
- Join for free
- Login Email Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google Welcome back! Please log in. Email · Hint Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google No account? Sign up
IEEE Account
- Change Username/Password
- Update Address
Purchase Details
- Payment Options
- Order History
- View Purchased Documents
Profile Information
- Communications Preferences
- Profession and Education
- Technical Interests
- US & Canada: +1 800 678 4333
- Worldwide: +1 732 981 0060
- Contact & Support
- About IEEE Xplore
- Accessibility
- Terms of Use
- Nondiscrimination Policy
- Privacy & Opting Out of Cookies
A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.
IMAGES
VIDEO
COMMENTS
Transformer, first applied to the field of natural language processing, is a type of deep neural network mainly based on the self-attention mechanism. Thanks to its strong representation capabilities, researchers are looking at ways to apply transformer to computer vision tasks. In a variety of visual benchmarks, transformer-based models perform similar to or better than other types of ...
Transformer is a promising neural network learner, and has achieved great success in various machine learning tasks. Thanks to the recent prevalence of multimodal applications and Big Data, Transformer-based multimodal learning has become a hot topic in AI research. This paper presents a comprehensive survey of Transformer techniques oriented at multimodal data. The main contents of this ...
Published in: 2023 10th International Conference on Wireless Networks and Mobile Communications (WINCOM) Article #: Date of Conference: 26-28 October 2023. Date Added to IEEE Xplore: 22 November 2023. ISBN Information: Electronic ISBN: 979-8-3503-2967-4. Print on Demand (PoD) ISBN: 979-8-3503-2968-1.
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin. View a PDF of the paper titled Attention Is All You Need, by Ashish Vaswani and 7 other authors. The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder ...
Transformer is a promising neural network learner, and has achieved great success in various machine learning tasks. ... IEEE Transactions on Pattern Analysis and Machine Intelligence. Periodical Home; Latest Issue; Archive; ... Transformer-based multimodal learning has become a hot topic in AI research. This paper presents a comprehensive ...
THIS PAPER HAS BEEN ACCEPTED BY IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS FOR PUBLICATION 3 Visual Transformers Classification Original Visual Transformer SA-Net [24], FAN [28], ViT [29]. Transformer Enhanced CNN VTs [52], BoTNet [53]. CNN Enhanced Transformer Soft Inductive Bias: DeiT [41], ConViT [54].
various different fields. Sec.9discusses some aspects of Transformer that researchers might find intriguing and summarizes the paper. 2 BACKGROUND 2.1 Vanilla Transformer The vanilla Transformer [137] is a sequence-to-sequence model and consists of an encoder and a decoder, each of which is a stack of identical blocks.
This paper is not motivated to seek innovation within the attention mechanism. Instead, it focuses on overcoming the existing trade-offs between accuracy and efficiency within the context of point cloud processing, leveraging the power of scale. Drawing inspiration from recent advances in 3D large-scale representation learning, we recognize that model performance is more influenced by scale ...
The advantages of the Transformer model have inspired deep learning researchers to explore its potential for various tasks in different fields of application (Ren, Li, & Liu, 2023), leading to numerous research papers and the development of Transformer-based models for a range of tasks in the field of artificial intelligence (Reza et al., 2022 ...
A Survey of Transformers. Tianyang Lin, Yuxin Wang, Xiangyang Liu, Xipeng Qiu. View a PDF of the paper titled A Survey of Transformers, by Tianyang Lin and 3 other authors. Transformers have achieved great success in many artificial intelligence fields, such as natural language processing, computer vision, and audio processing.
Overall, this paper aims to provide a comprehensive understanding of Generative Pre-trained Transformers, enabling technologies, their impact on various applications, emerging challenges, and ...
transformer design engineer, research engineer, engineering manager and quality manager at ABB locations in Sweden, U.S. and Canada. He is Vice Chair of the Transformers Committee, Canadian C hairman of IEC TC 14 and a member of CIGRE A2.59 "Site Repair of Transformers" and A2.62 "Transformer Failures" working groups.
This paper presents two novel transformer models, the Closed Form Continuous Time Transformer (CFC-T) and the Liquid Time Constant Transformer (LTC-T), advancing the R-Transformer model by incorporating closed form continuous time recurrent neural networks and liquid time constant networks instead of traditional positional encoding. These models are designed to address long-term dependency ...
This paper conducts a literature survey and reveals general backgrounds of research and developments in the field of transformer design and optimization for the past 35 years, based on more than ...
978-1-7281-2138-3/19/$31.00 ©2019 IEEE 16 . ... department to monitor those transformers regularly. This paper provides a solution for reducing the man power in monitoring of the transformer in ...
The paper presents the detailed dynamic thermal-hydraulic network model (THNM) for liquid-immersed power transformers (LIPT). Detailed static THNMs are prevalent in thermal design practice, but detailed dynamic THNM have not yet reached an adequate technology readiness level (TRL). Dynamic THNM describes local heat transfer and hydraulic phenomena in detail, integrating them into a global ...
Temperature limits. Oil temperature = 100/105oC. Average winding temperature( paper)= 85oC for normal paper & 95oC for thermally upgraded paper & 125 or 145oC for nomex. Hotspot winding temperature (paper) based on daily average ambient=95oC for normal paper & 110oC for thermally upgraded paper.
Image caption generation has witnessed significant advancements with the integration of Deep Learning (DL) models. By leveraging DL techniques such as InceptionResNetV2 for feature extraction and transformer-based architectures for natural language processing, achieves remarkable results in generating descriptive captions for images. Unlike traditional Recurrent Neural Network approaches ...
This paper conducts a literature survey and reveals general backgrounds of research and developments in the field of transformer design and optimization for the past 35 years, based on more than 420 published articles, 50 transformer books, and 65 standards. Published in: IEEE Transactions on Power Delivery ( Volume: 24, Issue: 4, October 2009 ...
Scientific research frequently begins with a thorough review of the body of previous work, which includes a wide range of publications. This study process might be shortened by automatically summarizing scientific publications, which would be of great use to researchers. Because scientific papers have a different structure and require citation phrases, summarizing them presents different ...
Oil-paper insulation materials are widely used in various types of traction transformers. However, existing methods cannot effectively evaluate aging state of hotspot insulation paper in traction transformers. To address this issue, this paper prepared the uniformly and the non-uniformly aging oil-impregnated insulation paper samples, and their frequency domain spectroscopy (FDS) and degree of ...
This paper proposes a differential protection scheme for power transformers using wavelet transform and neural network algorithms. It utilizes the wavelet transformation (WT) analysis as a preliminary feature extractor and an artificial neural network (ANN) as the pattern classifier. Extensive simulation studies show that the wavelet transform provides an effective signal representation for ...
HVDC has been chosen as an economical and technical solution for power transmission through long distances, asynchronous interconnections and long submarine cables crossing. Despite DC transmission benefits to power systems, the converters non-linearity produces undesirable effects to the converter transformer in service, mainly listed in the technical standard IEC/IEEE 60076-57-129. However ...
A transformer is a device that transfers electrical energy from one circuit to another by magnetic coupling without requiring relative motion between its parts. It usually comprises two or more coupled windings, and, in most cases, a core to concentrate magnetic flux. An alternating voltage applied to one winding creates a time-varying magnetic flux in the core, which induces a voltage in the ...