The term “data fabric” is used in the technology industry, but its definition and implementation can vary. I've seen this across vendors: British Telecom (BT) talked about its data fabric at a analytics event last fall; Meanwhile, in storage, NetApp is refocusing its brand on intelligent infrastructure, but had previously used that term. Application platform provider Appian has a data fabric product, and database provider MongoDB has also talked about data fabrics and similar ideas.
At its core, a data fabric is a unified architecture that abstracts and integrates disparate data sources to create a seamless data layer. The principle is to create a single, synchronized layer between disparate data sources and the workloads that need access to the data – your applications, workloads and, increasingly, your AI algorithms or learning engines.
There are many reasons to want such an overlay. This fabric acts as a generalized integration layer, connecting to various data sources or adding advanced capabilities to facilitate access for applications, workloads, and models, such as providing access to these sources while keeping them in sync.
So far, so good. The challenge, however, is that we have a gap between the principle of the data fabric and its actual implementation. People use the term to represent different things. To return to our four examples:
- BT defines data fabric as a network-level overlay designed to optimize data transmission over long distances.
- NetApp's interpretation (even with the term intelligent data infrastructure) emphasizes storage efficiency and centralized management.
- Appian positions its Fabric Data product as a tool for unifying data at the application level, enabling faster development and customization of user-centric tools.
- MongoDB (and other structured data solution providers) consider data principles in the context of a data management infrastructure.
How do we cut through it all? One answer is to recognize that we can approach it from different angles. You can talk about the conceptual fabric of data – the need to combine data sources, but without overthinking it. You don't need a generic “Uber-Fabric” that covers absolutely everything. Instead, focus on the specific data you need.
If we fast forward a couple of decades, we can see similarities with the principles of service-oriented thinking, which relied on decoupling services from database systems. At that time we discussed the difference between services, processes and data. The same applies now: you can query service data or query as a service, focusing on what's needed for your workload. Create, read, update and delete remain the easiest of data services!
I'm also reminded of the origins of network acceleration, which would use caching to speed up data transfers by holding versions of the data locally rather than repeatedly accessing the source. Akamai has built its business on how to move unstructured content such as music and movies efficiently and over long distances.
This doesn't mean that data fabrics are reimagining the wheel. We are in a different (cloud) world technologically; In addition, they bring new aspects, not least around metadata management, line tracking, compliance and security features. They are especially important for AI workloads, where data management, quality, and provenance directly impact model performance and reliability.
If you're considering deploying data data, the best starting point is to think about what you want the data for. Not only will this help you target what data fabric might be most suitable, but this approach also helps you avoid the trap of trying to manage all the data in the world. Instead, you can prioritize the most valuable subset of data and consider which data layer works best for your needs:
- Network layer: For data integration in multi-cloud, on-premises and edge environments.
- Infrastructure level: If your data is centralized with a single storage provider, focus on the storage tier to serve sequential pools of data.
- Application level: To collect heterogeneous data sets for specific applications or platforms.
For example, in the case of BT, they discovered intrinsic value when using their data fabric to consolidate data from multiple sources. This reduces duplication and helps streamline operations, making data management more efficient. This is clearly a useful tool for consolidating silos and improving application rationalization.
After all, the data fabric is not a monolithic, one-size-fits-all solution. It's a strategic conceptual layer, backed by products and features, that you can apply where it makes sense to add flexibility and improve data delivery. Deployment fabric is not a “set it and forget it” exercise: it requires ongoing effort to scale, deploy, and maintain—not just the software itself, but also the configuration and integration of data sources.
Although the data fabric can exist conceptually in multiple places, it is important not to repeat delivery efforts unnecessarily. So, whether you collect data across the entire network, within an infrastructure, or at the application level, the principles remain the same: use it where it best suits your needs, and let it evolve with the data it serves.
Fast Demystifying Data Fabrics – Bridging the Gap Between Data Sources and Workloads appeared first GigaomField