Vol. 4 No. 2 (2024): Journal of AI-Assisted Scientific Discovery
Articles

A polyglot data integration framework for seamless integration of heterogenous data sources and formats

Sarbaree Mishra
Program Manager at Molina Healthcare Inc., USA
Sairamesh Konidala
Vice President, JP Morgan & Chase, USA
Cover

Published 11-11-2024

Keywords

  • Data Integration,
  • Polyglot

How to Cite

[1]
Sarbaree Mishra and Sairamesh Konidala, “A polyglot data integration framework for seamless integration of heterogenous data sources and formats ”, Journal of AI-Assisted Scientific Discovery, vol. 4, no. 2, pp. 209–232, Nov. 2024, Accessed: Dec. 24, 2024. [Online]. Available: https://scienceacadpress.com/index.php/jaasd/article/view/248

Abstract

Organizations face a growing challenge of integrating data from various sources and formats, often stored in different systems. These sources can range from structured data in relational databases to semi-structured data like JSON or XML and unstructured data like text or multimedia files. Managing and merging these diverse types of data efficiently is essential for businesses to leverage the full potential of their data. This is where a polyglot data integration framework comes into play. The idea behind this framework is to provide a flexible & scalable solution that can handle a variety of data sources and formats without compromising performance or consistency. The framework ensures smooth interoperability between data systems using advanced technologies, such as cloud-based storage, APIs, and machine learning. It allows organizations to integrate their data and maintain data integrity and quality across all systems. Additionally, the framework addresses the scalability challenge, enabling businesses to handle ever-growing amounts of data without facing slowdowns or disruptions. One of the key benefits of this approach is that it allows organizations to optimize their data workflows, making the data integration process more efficient and less error-prone. This results in improved decision-making capabilities, as businesses can rely on a unified & consistent view of their data, regardless of the source or format. Moreover, the framework enhances data governance by providing mechanisms for tracking data lineage, enforcing security policies, and ensuring compliance with regulations. In summary, the polyglot data integration framework presents a comprehensive solution to the complexities of managing heterogeneous data, enabling organizations to use their data better, improve operational efficiency, and stay ahead in a competitive, data-driven world.

Downloads

Download data is not yet available.

References

  1. Khine, P. P., & Wang, Z. (2019). A review of polyglot persistence in the big data world. Information, 10(4), 141.
  2. Glake, D., Kiehn, F., Schmidt, M., Panse, F., & Ritter, N. (2022). Towards Polyglot Data Stores--Overview and Open Research Questions. arXiv preprint arXiv:2204.05779.
  3. Gessert, F., Wingerath, W., Ritter, N., Gessert, F., Wingerath, W., & Ritter, N. (2020). Polyglot persistence in data management. Fast and Scalable Cloud Data Management, 149-174.
  4. Alonso, A. N., Abreu, J., Nunes, D., Vieira, A., Santos, L., Soares, T., & Pereira, J. (2020). Towards a polyglot data access layer for a low-code application development platform. arXiv preprint arXiv:2004.13495.
  5. Justo, D., Yi, S., Stadler, L., Polikarpova, N., & Kumar, A. (2021). Towards a polyglot framework for factorized ML. Proceedings of the VLDB Endowment, 14(12), 2918-2931.
  6. Schiavio, F., Bonetta, D., & Binder, W. (2021). Language-agnostic integrated queries in a managed polyglot runtime. Proceedings of the VLDB Endowment, 14, 1414-1426.
  7. Schiavio, F. (2022). Language-agnostic integrated queries in a polyglot language runtime system.
  8. Tan, R., Chirkova, R., Gadepally, V., & Mattson, T. G. (2017, December). Enabling query processing across heterogeneous data models: A survey. In 2017 IEEE International Conference on Big Data (Big Data) (pp. 3211-3220). IEEE.
  9. Martorella, T., & Bucchiarone, A. (2023). Adaptive and Gamified Learning Paths with Polyglot and. NET Interactive. arXiv preprint arXiv:2310.07314.
  10. Trivedi, K., Shah, S., & Srivastava, K. (2020, May). An efficient e-commerce design by implementing a novel data mapper for polyglot persistence. In Advanced Computing Technologies and Applications: Proceedings of 2nd International Conference on Advanced Computing Technologies and Applications—ICACTA 2020 (pp. 149-156). Singapore: Springer Singapore.
  11. Kolovos, D., Medhat, F., Paige, R., Di Ruscio, D., Van Der Storm, T., Scholze, S., & Zolotas, A. (2019, May). Domain-specific languages for the design, deployment and manipulation of heterogeneous databases. In 2019 IEEE/ACM 11th International Workshop on Modelling in Software Engineering (MiSE) (pp. 89-92). IEEE.
  12. Keznikl, J., Malohlava, M., Bures, T., & Hnetynka, P. (2011, August). Extensible Polyglot Programming Support in Existing Component Frameworks. In 2011 37th EUROMICRO Conference on Software Engineering and Advanced Applications (pp. 107-115). IEEE.
  13. Kasrin, N., Qureshi, M., Steuer, S., & Nicklas, D. (2018). Semantic data management for experimental manufacturing technologies. Datenbank-Spektrum, 18, 27-37.
  14. Bucchiarone, A., Martorella, T., Frageri, D., Adami, F., & Guidolin, T. (2012). Scalable Personalized Education in the Age of GenAI: The Potential and Challenges of the PolyGloT Framework. In General Aspects of Applying Generative AI in Higher Education: Opportunities and Challenges (pp. 69-100). Cham: Springer Nature Switzerland.
  15. Sawant, N., & Shah, H. (2014). Big data application architecture Q&A: A problem-solution approach. Apress.
  16. Thumburu, S. K. R. (2023). Leveraging AI for Predictive Maintenance in EDI Networks: A Case Study. Innovative Engineering Sciences Journal, 3(1).
  17. Thumburu, S. K. R. (2023). Quality Assurance Methodologies in EDI Systems Development. Innovative Computer Sciences Journal, 9(1).
  18. Gade, K. R. (2023). Security First, Speed Second: Mitigating Risks in Data Cloud Migration Projects. Innovative Engineering Sciences Journal, 3(1).
  19. Gade, K. R. (2023). The Role of Data Modeling in Enhancing Data Quality and Security in Fintech Companies. Journal of Computing and Information Technology, 3(1).
  20. Katari, A., & Rodwal, A. NEXT-GENERATION ETL IN FINTECH: LEVERAGING AI AND ML FOR INTELLIGENT DATA TRANSFORMATION.
  21. Komandla, V. Crafting a Clear Path: Utilizing Tools and Software for Effective Roadmap Visualization.
  22. Gade, K. R. (2022). Data Modeling for the Modern Enterprise: Navigating Complexity and Uncertainty. Innovative Engineering Sciences Journal, 2(1).
  23. Thumburu, S. K. R. (2022). A Framework for Seamless EDI Migrations to the Cloud: Best Practices and Challenges. Innovative Engineering Sciences Journal, 2(1).
  24. Gade, K. R. (2021). Cloud Migration: Challenges and Best Practices for Migrating Legacy Systems to the Cloud. Innovative Engineering Sciences Journal, 1(1).
  25. Katari, A., & Vangala, R. Data Privacy and Compliance in Cloud Data Management for Fintech.