Vol. 4 No. 2 (2024): Journal of AI-Assisted Scientific Discovery
Articles

Cross modal AI model training to increase scope and build more comprehensive and robust models.

Sarbaree Mishra
Program Manager at Molina Healthcare Inc., USA
Cover

Published 16-07-2024

Keywords

  • artificial intelligence,
  • model training,
  • multimodal data

How to Cite

[1]
Sarbaree Mishra, “Cross modal AI model training to increase scope and build more comprehensive and robust models. ”, Journal of AI-Assisted Scientific Discovery, vol. 4, no. 2, pp. 258–280, Jul. 2024, Accessed: Dec. 23, 2024. [Online]. Available: https://scienceacadpress.com/index.php/jaasd/article/view/246

Abstract

The development of cross-modal AI has gained considerable attention due to its potential to integrate and analyze information from various types of data, such as text, images, audio, and video, in ways that traditional models cannot. This approach allows AI systems to better understand and interact with the world by utilizing multiple input forms, enabling them to recognize patterns, make predictions, and perform tasks with greater accuracy and versatility. By training models on different data modalities simultaneously, researchers can create more comprehensive and robust systems that can generalize across a broader range of tasks, improving their performance in real-world scenarios that require a blend of diverse information. Cross-modal AI offers a significant advantage over single-modal models by allowing for more prosperous, more nuanced understanding and decision-making, which is especially crucial for applications in healthcare, autonomous driving, & entertainment. For example, an AI system trained on visual and textual data can better understand and describe a scene or generate relevant captions for an image. However, integrating diverse data types into a cohesive model comes with challenges, including data alignment, managing large and heterogeneous datasets, & dealing with the computational intensity of training such models. To overcome these obstacles, researchers have developed several strategies, such as designing specialized architectures that can handle different types of data, using transfer learning to leverage knowledge from one modality to enhance learning in others, and ensuring that data from various sources is synchronized and compatible. The benefits of cross-modal AI are undeniable, as it enables the creation of more adaptive, efficient, and intelligent systems that can tackle a broader range of tasks. By combining insights from multiple modalities, these models are better equipped to handle the complexities and nuances of the natural world, opening up new possibilities for AI applications across industries and making AI systems more capable of mimicking human-like perception and reasoning.

Downloads

Download data is not yet available.

References

  1. Wang, T., Li, F., Zhu, L., Li, J., Zhang, Z., & Shen, H. T. (2023). Cross-modal retrieval: a systematic review of methods and future directions. arXiv preprint arXiv:2308.14263.
  2. Kaur, P., Pannu, H. S., & Malhi, A. K. (2021). Comparative analysis on cross-modal information retrieval: A review. Computer Science Review, 39, 100336.
  3. Wang, K., Yin, Q., Wang, W., Wu, S., & Wang, L. (2016). A comprehensive survey on cross-modal retrieval. arXiv preprint arXiv:1607.06215.
  4. Bayoudh, K., Knani, R., Hamdaoui, F., & Mtibaa, A. (2022). A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets. The Visual Computer, 38(8), 2939-2970.
  5. Wang, X., Chen, G., Qian, G., Gao, P., Wei, X. Y., Wang, Y., ... & Gao, W. (2023). Large-scale multi-modal pre-trained models: A comprehensive survey. Machine Intelligence Research, 20(4), 447-482.
  6. Joshi, G., Walambe, R., & Kotecha, K. (2021). A review on explainability in multimodal deep neural nets. IEEE Access, 9, 59800-59821.
  7. Dou, Q., Ouyang, C., Chen, C., Chen, H., Glocker, B., Zhuang, X., & Heng, P. A. (2019). Pnp-adanet: Plug-and-play adversarial domain adaptation network at unpaired cross-modality cardiac segmentation. IEEE Access, 7, 99065-99076.
  8. Veale, T., Conway, A., & Collins, B. (1998). The challenges of cross-modal translation: English-to-Sign-Language translation in the Zardoz system. Machine Translation, 13, 81-106.
  9. Kang, C., Xiang, S., Liao, S., Xu, C., & Pan, C. (2015). Learning consistent feature representation for cross-modal multimedia retrieval. IEEE Transactions on Multimedia, 17(3), 370-381.
  10. Zhao, Z., Liu, B., Chu, Q., Lu, Y., & Yu, N. (2021, May). Joint color-irrelevant consistency learning and identity-aware modality adaptation for visible-infrared cross modality person re-identification. In Proceedings of the AAAI conference on artificial intelligence (Vol. 35, No. 4, pp. 3520-3528).
  11. .Wu, J., Gan, W., Chen, Z., Wan, S., & Lin, H. (2023). Ai-generated content (aigc): A survey. arXiv preprint arXiv:2304.06632.
  12. Xuan, H., Zhang, Z., Chen, S., Yang, J., & Yan, Y. (2020, April). Cross-modal attention network for temporal inconsistent audio-visual event localization. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 34, No. 01, pp. 279-286).
  13. Yang, Q., Li, N., Zhao, Z., Fan, X., Chang, E. I. C., & Xu, Y. (2020). MRI cross-modality image-to-image translation. Scientific reports, 10(1), 3753.
  14. Zhong, F., Chen, Z., & Min, G. (2018). Deep discrete cross-modal hashing for cross-media retrieval. Pattern Recognition, 83, 64-77.
  15. Gu, J., Han, Z., Chen, S., Beirami, A., He, B., Zhang, G., ... & Torr, P. (2023). A systematic survey of prompt engineering on vision-language foundation models. arXiv preprint arXiv:2307.12980.
  16. Komandla, V. Enhancing Security and Growth: Evaluating Password Vault Solutions for Fintech Companies.
  17. Komandla, V. Strategic Feature Prioritization: Maximizing Value through User-Centric Roadmaps.
  18. Katari, A., & Rodwal, A. NEXT-GENERATION ETL IN FINTECH: LEVERAGING AI AND ML FOR INTELLIGENT DATA TRANSFORMATION.
  19. Katari, A., & Vangala, R. Data Privacy and Compliance in Cloud Data Management for Fintech.
  20. Gade, K. R. (2023). Data Lineage: Tracing Data's Journey from Source to Insight. MZ Computing Journal, 4(2).
  21. Gade, K. R. (2023). The Role of Data Modeling in Enhancing Data Quality and Security in Fintech Companies. Journal of Computing and Information Technology, 3(1).
  22. Thumburu, S. K. R. (2023). Data Quality Challenges and Solutions in EDI Migrations. Journal of Innovative Technologies, 6(1).
  23. Thumburu, S. K. R. (2023). AI-Driven EDI Mapping: A Proof of Concept. Innovative Engineering Sciences Journal, 3(1).
  24. Thumburu, S. K. R. (2022). Data Integration Strategies in Hybrid Cloud Environments. Innovative Computer Sciences Journal, 8(1).
  25. Gade, K. R. (2021). Data Analytics: Data Democratization and Self-Service Analytics Platforms Empowering Everyone with Data. MZ Computing Journal, 2(1).