Vol. 3 No. 2 (2023): Journal of AI-Assisted Scientific Discovery
Articles

Data Transformation and Enrichment: Utilizing ML to automatically transform and enrich data for better analytics

Muneer Ahmed Salamkar
Senior Associate at JP Morgan Chase, USA
Karthik Allam
Big Data Infrastructure Engineer, JP Morgan & Chase, USA
Jayaram Immaneni
Sre Lead, JP Morgan Chase, USA
Cover

Published 03-07-2023

Keywords

  • Data Transformation,
  • Data Enrichment

How to Cite

[1]
Muneer Ahmed Salamkar, Karthik Allam, and Jayaram Immaneni, “Data Transformation and Enrichment: Utilizing ML to automatically transform and enrich data for better analytics”, Journal of AI-Assisted Scientific Discovery, vol. 3, no. 2, pp. 613–638, Jul. 2023, Accessed: Dec. 30, 2024. [Online]. Available: https://scienceacadpress.com/index.php/jaasd/article/view/222

Abstract

Data transformation and enrichment are critical processes in preparing raw data for meaningful analytics, and machine learning (ML) integration has revolutionized these practices. Traditional data transformation often involves manual workflows that are time-consuming, error-prone, and unable to scale with modern data's growing complexity and volume. Machine learning offers an intelligent, automated approach, enabling organizations to streamline these processes while achieving higher accuracy and efficiency. ML algorithms can identify patterns, detect anomalies, and apply context-specific transformations to raw data, ensuring consistency and quality. Moreover, ML enhances data enrichment by integrating disparate datasets, filling gaps with predictive analytics, and adding valuable context, such as geospatial tagging or sentiment analysis. This automation accelerates data preparation and empowers businesses with deeper insights, fueling more informed decision-making and competitive advantage. Use cases span diverse industries—from enriching customer profiles in marketing with behavioural insights to transforming IoT sensor data for real-time analytics in manufacturing. By leveraging ML for transformation and enrichment, organizations can reduce operational costs, minimize human intervention, and unlock the full potential of their data assets. However, implementing ML-driven data pipelines requires addressing challenges like model training, scalability, and ethical data handling. Despite these hurdles, the convergence of ML and data transformation sets a new standard for analytics readiness, enabling businesses to adapt quickly to evolving data landscapes and derive actionable insights with unprecedented speed and precision.

Downloads

Download data is not yet available.

References

  1. Krueger, R., Thom, D., & Ertl, T. (2014). Semantic enrichment of movement behavior with foursquare–a visual analytics approach. IEEE transactions on visualization and computer graphics, 21(8), 903-915.
  2. Fileto, R., May, C., Renso, C., Pelekis, N., Klein, D., & Theodoridis, Y. (2015). The Baquara2 knowledge-based framework for semantic enrichment and analysis of movement data. Data & Knowledge Engineering, 98, 104-122.
  3. Fafalios, P., Papadakos, P., & Tzitzikas, Y. (2014). Enriching textual search results at query time using entity mining, linked data and link analysis. International Journal of Semantic Computing, 8(04), 515-544.
  4. Adams, B., & Janowicz, K. (2015). Thematic signatures for cleansing and enriching place-related linked data. International Journal of Geographical Information Science, 29(4), 556-579.
  5. Karasti, H., Baker, K. S., & Halkola, E. (2006). Enriching the notion of data curation in e-science: data managing and information infrastructuring in the long term ecological research (LTER) network. Computer Supported Cooperative Work (CSCW), 15, 321-358.
  6. Goodman, K. J., & Brenna, J. T. (1992). High sensitivity tracer detection using high-precision gas chromatography-combustion isotope ratio mass spectrometry and highly enriched uniformly carbon-13 labeled precursors. Analytical Chemistry, 64(10), 1088-1095.
  7. Hoopmann, M. R., Finney, G. L., & MacCoss, M. J. (2007). High-speed data reduction, feature detection, and MS/MS spectrum quality assessment of shotgun proteomics data sets using high-resolution mass spectrometry. Analytical chemistry, 79(15), 5620-5632.
  8. Chen, W. J., Kamath, R., Kelly, A., Lopez, H. H. D., Roberts, M., & Yheng, Y. P. (2015). Systems of insight for digital transformation: Using IBM operational decision manager advanced and predictive analytics. IBM Redbooks.
  9. Alghamdi, N. A., & Al-Baity, H. H. (2022). Augmented analytics driven by AI: A digital transformation beyond business intelligence. Sensors, 22(20), 8071.
  10. Pattyam, S. P. (2020). AI in Data Science for Predictive Analytics: Techniques for Model Development, Validation, and Deployment. Journal of Science & Technology, 1(1), 511-552.
  11. Karsznia, I., & Weibel, R. (2018). Improving settlement selection for small-scale maps using data enrichment and machine learning. Cartography and Geographic Information Science, 45(2), 111-127.
  12. Sen, S., Agarwal, S., Chakraborty, P., & Singh, K. P. (2022). Astronomical big data processing using machine learning: A comprehensive review. Experimental Astronomy, 53(1), 1-43.
  13. Ragab, A., El Koujok, M., Ghezzaz, H., Amazouz, M., Ouali, M. S., & Yacout, S. (2019). Deep understanding in industrial processes by complementing human expertise with interpretable patterns of machine learning. Expert Systems with Applications, 122, 388-405.
  14. Zeng, M. L. (2019). Semantic enrichment for enhancing LAM data and supporting digital humanities. Review article. Profesional de la Información, 28(1).
  15. Mousheimish, R., Taher, Y., Zeitouni, K., & Dubus, M. (2017). Smart preserving of cultural heritage with PACT-ART: Enrichment, data mining, and complex event processing in the internet of cultural things. Multimedia Tools and Applications, 76, 26077-26101.
  16. Thumburu, S. K. R. (2022). Post-Migration Analysis: Ensuring EDI System Performance. Journal of Innovative Technologies, 5(1).
  17. Thumburu, S. K. R. (2022). The Impact of Cloud Migration on EDI Costs and Performance. Innovative Engineering Sciences Journal, 2(1).
  18. Gade, K. R. (2022). Data Modeling for the Modern Enterprise: Navigating Complexity and Uncertainty. Innovative Engineering Sciences Journal, 2(1).
  19. Gade, K. R. (2022). Migrations: AWS Cloud Optimization Strategies to Reduce Costs and Improve Performance. MZ Computing Journal, 3(1).
  20. Katari, A., & Vangala, R. Data Privacy and Compliance in Cloud Data Management for Fintech.
  21. Katari, A. Conflict Resolution Strategies in Financial Data Replication Systems
  22. Thumburu, S. K. R. (2021). Optimizing Data Transformation in EDI Workflows. Innovative Computer Sciences Journal, 7(1).
  23. Gade, K. R. (2021). Data-Driven Decision Making in a Complex World. Journal of Computational Innovation, 1(1).
  24. Thumburu, S. K. R. (2020). Enhancing Data Compliance in EDI Transactions. Innovative Computer Sciences Journal, 6(1).
  25. Gade, K. R. (2020). Data Analytics: Data Privacy, Data Ethics, Data Monetization. MZ Computing Journal, 1(1).