Red Hat: A Solution Means Eminently Adaptable Software
www.forbes.com
GOA, INDIA - FEBRUARY 28: Foreign tourists and travelers learn yoga on the beach on February 27, ... [+] 2006 in Arambol, Goa, India. The tiny Indian state became known as a hippie heaven in the 1960's and its beaches have hosted all night parties for adventurous backpacking tourists ever since. (Photo by Ami Vitale/Getty Images)Getty ImagesDisks are dead. To be clear, this comment is in no way related to the still-functioning hard disks inside your laptop (or desktop) machine, the industrious blade server disks located in datacenters around the globe that provide us all with the power of always-on computing and the cloud, or indeed the USBs and hard disk extensions that most of use for backup and additional memory.But disks as we once knew them are a thing of the past if we consider how used to install Windows 98 on over 30 pre-formatted 3.5-inch floppy disks, with Microsoft Office also requiring almost 20 disks. As we moved through the age of the CD-ROM into the nineties and the naughties, we started to enjoy more ubiquitous connection to the world wide web (as the Internet was once known) and users started to appreciate the need for continously composed patches, updates and downloads.What that brief history of desktop computing allows us to realize today is that data engineering has moved beyond its traditional roots i.e. its no longer just about building static pipelines to move and transform data its about designing adaptable systems that thrive in a complex world.In other words, when we talk about software solutions - that oft-hackneyed and over-user term that the IT industry loves to use - what we really means is: software that can change, morph, grow, extend and adapt.The Age Of AdaptabilityToday we know that workloads evolve, technologies shift and data sprawls across hybrid and multi-cloud environments. In this context, while scale is still vital, adaptability has overtaken it as the key driver of success for modern data systems, explained Erica Langhi, associate principal solution architect, Red Hat. For data engineers, this means rethinking how pipelines are made. They must no longer function simply as static workflows, but become real-time, modular and productized systems designed to adapt and consistently deliver value to their consumers.Langhi and team of course base their opinions on the open source DNA that beats at the heart of Red Hat which has now (very arguably, given the widespread embrace of open systems architecture by previously proprietaryonly protagonists) proven itself to be among them more manageably malleable ways to create enterprise software applications for the post-Covid things could disrupt any moment world we now live in.As such, she reminds us that open source technologies and hybrid cloud architectures provide the essential building blocks for the evolved systems that we need today. But, without thoughtful data engineering practices that prioritize usability, collaboration, lifecycle management and adaptability, even the best tools risk becoming just another layer of complexity. What this truth leads us to is a need to think about our enterprise data and its full-flowing pipeline as a product that we use in a more agile way.Data Pipelines As ProductsTraditional data pipelines were designed for linear workflows: they ingested, processed and delivered data outputs. While sufficient for the static environments of the past, this model falls short in addressing modern, dynamic use cases demands. Treating data pipelines as products flips this approach on its head, said Langhi. Productized pipelines are built as modular components, each handling specific functions like data ingestion or enrichment. These components can be updated, replaced or scaled independently, making the pipeline adaptable to changing requirements.For instance, she explains, when a new data format or source is introduced, only the relevant module needs adjustment, minimising disruption and downtime.Versioning each iteration of the pipeline ensures downstream consumers, like AI models or analytics dashboards, can trace data lineage and access accurate datasets. This supports auditing, compliance and confidence in the data. Strong governance practices further enhance these pipelines by ensuring data quality and consistency. If data is oil, metadata is gold: a vital resource for ensuring traceability and unlocking actionable insights.Lets consider what this could look like in a healthcare context. A productized pipeline might automate the ingestion and anonymisation of patient imaging data from edge devices. It could enrich the data in real-time, add metadata for regulatory compliance and make information immediately accessible to researchers or AI diagnostic models. Unlike traditional pipelines, the system would evolve to handle new data sources, scale with growing patient data and integrate emerging tools for advanced analysis, clarified Red Hats Langhi.Breaking Down SilosFor data pipelines to function as adaptable products, the Red Hat team are adamant that they must break free from silos. Data locked within department-specific systems or proprietary platforms leads to rigid workflows. This makes it nearly impossible to create pipelines that deliver value across an organisation.Open source is widely agreed to helps with this. Pipelines built with open source can harness community expertise to provide a shared, reusable foundation. This empowers users to design pipelines that are portable, interoperable and adaptable to new tools and evolving business needs.Open source data pipelines provide the flexibility needed to bridge hybrid cloud environments by combining data from on-premise systems and private and public cloud platforms into unified workflows, without requiring major re-architecture. Take Kafka: an open source data streaming pipeline, it can accelerate data delivery, enable real-time insights, provide regulatory compliance and support AI use cases, regardless of the data's origin. Kafka benefits from continuous growth and optimization through open collaboration with innovators, said Langhi. As workloads evolve and expand, combining technologies like Kafka and Kubernetes enables the development of scalable, reliable and highly available data pipelines, essential for machine learning applications. New tools can be added, workloads can be shifted across environments and processes can evolve with minimal disruption.AI Needs Quality DataOne of the most transformative applications of modern data engineering is in artificial intelligence. AIs value lies in its ability to turn data into insights. But this is only possible if the data itself is prepared to meet AI models demands. Raw data, in its unstructured and inconsistent form, holds little value until it is transformed into a usable state. This is where data engineering plays a key role, bridging the gap between raw inputs and the refined, enriched datasets that fuel AI.As AI adoption grows, data engineers are tasked with managing the ever-increasing volume, variety and velocity of data. Its no longer enough to simply collect and store information; data must be accessible, trustworthy and ready to use in real time. The evolving role of data engineers reflects this complexity. They now focus on building pipelines that deliver high-quality data for fine-tuning models, handle real-time streaming from edge devices, and adapt seamlessly to new tools and requirements, concluded Langhi.Talking about the future of data engineering, Langhi feels strongly that we need to realize how important it is to talk about cultivating systems that thrive in uncertainty and deliver real, ongoing value. As the data landscape grows more complex, the true measure of success will be the ability to adapt, iterate and deliver quality outputs.
0 Comentários
·0 Compartilhamentos
·73 Visualizações