Container-Native Data Management for AI Workloads in Amazon EKS

Babulal Shaik

Vol. 4 No. 1 (2024): Journal of AI-Assisted Scientific Discovery

Articles

Container-Native Data Management for AI Workloads in Amazon EKS

PDF

Babulal Shaik

more info

Babulal Shaik
Cloud Solutions Architect at Amazon Web Services, USA

Published 28-05-2024

Keywords

Container-native,
AI workloads

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Abstract

As artificial intelligence (AI) continues to evolve, the demand for scalable, efficient, and high-performing data management solutions has become increasingly critical. Containerized environments, especially Kubernetes and Amazon Elastic Kubernetes Service (EKS) have emerged as powerful platforms for managing AI workloads. However, the complexity of AI workloads, which often require vast amounts of data to be processed & stored, presents unique challenges regarding data management. Container-native data management is pivotal in optimizing how data is handled within these environments. It ensures that AI workloads running on Amazon EKS are efficient and capable of meeting the high demands of AI applications. This includes overcoming challenges related to storage architecture, such as choosing between object storage, block storage, and file systems, each offering different trade-offs regarding performance, cost, and ease of access. In addition to storage concerns, data consistency and real-time processing are critical for AI workloads, which rely on fast and reliable access to data. Kubernetes & EKS provide the flexibility to manage distributed data systems, but this requires careful attention to how data is partitioned, replicated, and synchronized across clusters. Scalability is another essential factor—AI workloads can proliferate in data volume, so container-native solutions must scale without compromising performance. The best practices for managing AI workloads in Kubernetes environments involve implementing strategies such as automated scaling, distributed data management, and efficient storage backends that are well-suited to AI applications. These practices ensure that data is always available, accessible, and processed at the speed AI models require, from training to inference. By understanding the intricacies of container-native data management on Amazon EKS, developers and IT, architects can design systems that meet today’s AI-driven applications and remain flexible and scalable for future advancements in AI technologies.

PDF

Downloads

Download data is not yet available.

References

Ribeiro, R. G. B., Borin, E., Técnico-IC-PFG, R., & de Graduação, P. F. (2023). A Framework for running DASF applications with Kubernetes and Argo.
Gleb, T., & Gleb, T. (2021). Systematic Cloud Migration. Apress.
Swimmer, M., Yarochkin, F., Costoya, J., & Reyes, R. (2020). Untangling the Web of Cloud Security Threats.
Choudhary, S. (2021). Kubernetes-Based Architecture For An On-premises Machine Learning Platform (Master's thesis).
Gallardo, S. R. (2023). Serverless strategies and tools in the cloud computing continuum (Doctoral dissertation, Universitat Politècnica de València).
da Silveira, D. M. (2022). Lean Data Engineering. Combining State of the Art Principles to Process Data Efficientlys (Master's thesis, Universidade NOVA de Lisboa (Portugal)).
Sharma, A. (2023). Evaluate Kubernetes for Stateful and highly available enterprise database solutions (Master's thesis, Oslomet-storbyuniversitetet).
Kaiser, S., Haq, M. S., Tosun, A. Ş., & Korkmaz, T. (2022). Container technologies for arm architecture: A comprehensive survey of the state-of-the-art. IEEE Access, 10, 84853-84881.
Patwary, M., Ramchandran, P., Tibrewala, S., Lala, T. K., Kautz, F., Coronado, E., ... & Bhandaru, M. (2023, November). Edge Services. In 2023 IEEE Future Networks World Forum (FNWF) (pp. 1-68). IEEE.
DA SILVEIRA, D. M. (2022). LEAN DATA ENGINEERING.
Toffetti, G., Brunner, S., Blöchlinger, M., Spillner, J., & Bohnert, T. M. (2017). Self-managing cloud-native applications: Design, implementation, and experience. Future Generation Computer Systems, 72, 165-179.
Kumar, M., & Kaur, G. (2022, December). Study of container-based JupyterLab and AI Framework on HPC with GPU usage. In 2022 International Conference on Smart Generation Computing, Communication and Networking (SMART GENCON) (pp. 1-5). IEEE.
Kaiser, S., Haq, M. S., Tosun, A. Ş., & Korkmaz, T. (2022). Container technologies for arm architecture: A comprehensive survey of the state-of-the-art. IEEE Access, 10, 84853-84881.
Liu, D., Xia, Y., Shan, C., Wang, G., & Wang, Y. (2023, September). Scheduling Containerized Workflow in Multi-cluster Kubernetes. In CCF Conference on Big Data (pp. 149-163). Singapore: Springer Nature Singapore.
Ponge, J. (2020). Vert. x in action: Asynchronous and reactive java. Manning Publications.
Immaneni, J. (2023). Best Practices for Merging DevOps and MLOps in Fintech. MZ Computing Journal, 4(2).
Immaneni, J. (2023). Scalable, Secure Cloud Migration with Kubernetes for Financial Applications. MZ Computing Journal, 4(1).
Nookala, G. (2024). The Role of SSL/TLS in Securing API Communications: Strategies for Effective Implementation. Journal of Computing and Information Technology, 4(1).
Nookala, G. (2024). Adaptive Data Governance Frameworks for Data-Driven Digital Transformations. Journal of Computational Innovation, 4(1).
Komandla, V. Crafting a Clear Path: Utilizing Tools and Software for Effective Roadmap Visualization.
Komandla, V. Enhancing Product Development through Continuous Feedback Integration “Vineela Komandla”.
Thumburu, S. K. R. (2023). Data Quality Challenges and Solutions in EDI Migrations. Journal of Innovative Technologies, 6(1).
Thumburu, S. K. R. (2023). Mitigating Risk in EDI Projects: A Framework for Architects. Innovative Computer Sciences Journal, 9(1).
Gade, K. R. (2024). Data Quality Metrics for the Modern Enterprise: A Data Analytics Perspective. MZ Journal of Artificial Intelligence, 1(1).
Gade, K. R. (2024). Beyond Data Quality: Building a Culture of Data Trust. Journal of Computing and Information Technology, 4(1).
Katari, A. Case Studies of Data Mesh Adoption in Fintech: Lessons Learned-Present Case Studies of Financial Institutions.
Katari, A. (2023). Security and Governance in Financial Data Lakes: Challenges and Solutions. Journal of Computational Innovation, 3(1).
Nookala, G. (2023). Real-Time Data Integration in Traditional Data Warehouses: A Comparative Analysis. Journal of Computational Innovation, 3(1).
Boda, V. V. R., & Immaneni, J. (2021). Healthcare in the Fast Lane: How Kubernetes and Microservices Are Making It Happen. Innovative Computer Sciences Journal, 7(1).
Thumburu, S. K. R. (2022). A Framework for Seamless EDI Migrations to the Cloud: Best Practices and Challenges. Innovative Engineering Sciences Journal, 2(1).
Muneer Ahmed Salamkar. Data Visualization: AI-Enhanced Visualization Tools to Better Interpret Complex Data Patterns. Journal of Bioinformatics and Artificial Intelligence, vol. 4, no. 1, Feb. 2024, pp. 204-26
Muneer Ahmed Salamkar. Data Integration: AI-Driven Approaches to Streamline Data Integration from Various Sources. Journal of AI-Assisted Scientific Discovery, vol. 3, no. 1, Mar. 2023, pp. 668-94
Muneer Ahmed Salamkar, et al. Data Transformation and Enrichment: Utilizing ML to Automatically Transform and Enrich Data for Better Analytics. Journal of AI-Assisted Scientific Discovery, vol. 3, no. 2, July 2023, pp. 613-38
Naresh Dulam, et al. “GPT-4 and Beyond: The Role of Generative AI in Data Engineering”. Journal of Bioinformatics and Artificial Intelligence, vol. 4, no. 1, Feb. 2024, pp. 227-49
Naresh Dulam. Apache Spark: The Future Beyond MapReduce. Distributed Learning and Broad Applications in Scientific Research, vol. 1, Dec. 2015, pp. 136-5
Naresh Dulam. NoSQL Vs SQL: Which Database Type Is Right for Big Data?. Distributed Learning and Broad Applications in Scientific Research, vol. 1, May 2015, pp. 115-3
Naresh Dulam. Data Lakes: Building Flexible Architectures for Big Data Storage. Distributed Learning and Broad Applications in Scientific Research, vol. 1, Oct. 2015, pp. 95-114
Sarbaree Mishra. “The Lifelong Learner - Designing AI Models That Continuously Learn and Adapt to New Datasets”. Journal of AI-Assisted Scientific Discovery, vol. 4, no. 1, Feb. 2024, pp. 207-2
Sarbaree Mishra, and Jeevan Manda. “Improving Real-Time Analytics through the Internet of Things and Data Processing at the Network Edge ”. Journal of AI-Assisted Scientific Discovery, vol. 4, no. 1, Apr. 2024, pp. 184-06
Sarbaree Mishra, and Jeevan Manda. “Building a Scalable Enterprise Scale Data Mesh With Apache Snowflake and Iceberg”. Journal of AI-Assisted Scientific Discovery, vol. 3, no. 1, June 2023, pp. 695-16
Babulal Shaik. Network Isolation Techniques in Multi-Tenant EKS Clusters. Distributed Learning and Broad Applications in Scientific Research, vol. 6, July 2020
Babulal Shaik. Automating Compliance in Amazon EKS Clusters With Custom Policies . Journal of Artificial Intelligence Research and Applications, vol. 1, no. 1, Jan. 2021, pp. 587-10

Container-Native Data Management for AI Workloads in Amazon EKS

Keywords

Abstract

Downloads

References

Most read articles by the same author(s)

Similar Articles