Ahmed AbouZaid
Building a modern data platform based on the data lakehouse architecture and cloud-native ecosystem
AbouZaid, Ahmed; Barclay, Peter J.; Chrysoulas, Christos; Pitropakis, Nikolaos
Authors
Dr Peter Barclay P.Barclay@napier.ac.uk
Lecturer
Christos Chrysoulas
Dr Nick Pitropakis N.Pitropakis@napier.ac.uk
Associate Professor
Abstract
In today’s Big Data world, organisations can gain a competitive edge by adopting data-driven decision-making. However, a modern data platform that is portable, resilient, and efficient is required to manage organisations’ data and support their growth. Furthermore, the change in the data management architectures has been accompanied by changes in storage formats, particularly open standard formats like Apache Hudi, Apache Iceberg, and Delta Lake. With many alternatives, organisations are unclear on how to combine these into an effective platform. Our work investigates capabilities provided by Kubernetes and other Cloud-Native software, using DataOps methodologies to build a generic data platform that follows the Data Lakehouse architecture. We define the data platform specification, architecture, and core components to build a proof of concept system. Moreover, we provide a clear implementation methodology by developing the core of the proposed platform, which are infrastructure (Kubernetes), ingestion and transport (Argo Workflows), storage (MinIO), and finally, query and processing (Dremio). We then conducted performance benchmarks using an industry-standard benchmark suite to compare cold/warm start scenarios and assess Dremio’s caching capabilities, demonstrating a 12% median enhancement of query duration with caching.
Citation
AbouZaid, A., Barclay, P. J., Chrysoulas, C., & Pitropakis, N. (2025). Building a modern data platform based on the data lakehouse architecture and cloud-native ecosystem. Discover Applied Sciences, 7, Article 166. https://doi.org/10.1007/s42452-025-06545-w
Journal Article Type | Article |
---|---|
Acceptance Date | Feb 3, 2025 |
Online Publication Date | Feb 22, 2025 |
Publication Date | 2025 |
Deposit Date | Mar 6, 2025 |
Publicly Available Date | Mar 6, 2025 |
Publisher | Springer |
Peer Reviewed | Peer Reviewed |
Volume | 7 |
Article Number | 166 |
DOI | https://doi.org/10.1007/s42452-025-06545-w |
Keywords | Data Lakehouse, Kubernetes, DataOps, Cloud-Native, Big Data, Artificial Intelligence |
Public URL | http://researchrepository.napier.ac.uk/Output/4122856 |
Files
Building a modern data platform based on the data lakehouse architecture and cloud-native ecosystem
(3.5 Mb)
PDF
Publisher Licence URL
http://creativecommons.org/licenses/by/4.0/
Copyright Statement
CC BY 4.0
You might also like
A problem in querying recursive patterns with OQL
(2002)
Preprint / Working Paper
Interoperable Services for Federations of Database System
(2002)
Presentation / Conference Contribution
A dual-level presentation model for developing user-interfaces.
(2000)
Presentation / Conference Contribution
The Prometheus database for taxonomy
(2000)
Presentation / Conference Contribution