Overcoming data inconsistency with a universal semantic layer

Saturday November 2, 2024. 01:21 AM , from InfoWorld

According to Gartner, bad data costs organizations $12.9 million a year. As a result, data leaders for decades have been searching for a single source of truth for their business intelligence (BI) and analytics to ensure that everyone bases business decisions on the same data and definitions.

To bring consistency to data, BI providers introduced the concept of a semantic layer — an abstraction layer between the raw data described in rows, columns, and field names that only data experts can understand and that informed insights for business users. A semantic layer hides the complexity of the data and maps it to business definitions, logic, and relationships. It allows business users to conduct self-serve analytics using standard terms like revenue and profit.

Semantic layers proliferate

Semantic layers were a welcome development until BI tools — and their associated semantic layers — proliferated. BusinessObjects built the first lightweight semantic layer into SAP BusinessObjects in the 1990s. The trouble is that early BI suites like BusinessObjects were monolithic and not particularly user-friendly. Frustrated users adopted Tableau, Power BI, and Looker with their improved ease-of-use. The issue today is that these tools have grown and replicated across organizations, dashing all hope of a single source of truth.

Now, different parts of the organization work with disparate BI, analytics, and data science tools, creating unique data definitions, dimensions, measures, logic, and context. Separate teams also manage their own semantic layers. This results in discrepancies in data interpretation, business logic, and definitions among user groups, creating mistrust of reports and intelligence derived from data.

Inconsistency often causes confusion among teams, too. For example, is an active customer someone who has bought an ongoing paid subscription for your service? Or someone who has logged in within the last seven days? Or someone who has signed up for a seven-day free trial? Inconsistent definitions affect the finance team for billing purposes, the renewals team for identifying customers, and operations for processing and reporting accurately on products sold.

The rise of semantic layers in data warehouses

As if the data landscape weren’t complex enough, data architects began implementing semantic layers within data warehouses. Architects might think of the data assets they manage as the single source of truth for all use cases. However, that is not typically the case because millions of denormalized table structures are typically not “business-ready.” When semantic layers are embedded within various warehouses, data engineers must connect analytics use cases to data by designing and maintaining data pipelines with transforms that create “analytics-ready” data.

Without a consistent semantic layer, data engineers hard-code semantic meaning in their purpose-built pipelines to support their data consumers. The semantic meanings (definitions) quickly become static and inflexible, making it difficult for centralized architecture teams to keep up with the domain-specific needs of different workgroups. The code becomes difficult to manage and inconsistent as it scales. This approach causes delays and dependencies that hinder data-based decision-making.

Localized semantic layers spread further

Adding to the challenge, with data warehouses moving to the cloud, user queries can become painfully slow. Sluggish performance almost always spurs business users to extract and load data into their preferred analytics platform for easier manipulation and faster queries, leading to further semantic spread within localized semantic layers.

In most cases today, there are bits of semantic layers floating around the data stack — a little bit in cloud data warehouses, a little in transformation pipelines, and a little in each BI tool. This semantic sprawl creates extreme inefficiency as data engineers re-create common business concepts (e.g., year-over-year projections or currency conversions) every time they design a new data pipeline. Data teams spend all day playing whack-a-mole, constantly recreating common business concepts sprinkled in various semantic layers whenever a new business question needs answering involving different data definitions or business logic. It’s a duplication of engineering effort and a waste of time and resources.

Creating a universal semantic layer

What is needed is a universal semantic layer that defines all the metrics and metadata for all possible data experiences: visualization tools, customer-facing analytics, embedded analytics, and AI agents. With a universal semantic layer, everyone across the business agrees on a standard set of definitions for terms like “customer” and “lead,” as well as standard relationships among the data (standard business logic and definitions), so data teams can build one consistent semantic data model.

A universal semantic layer sits on top of data warehouses, providing data semantics (context) to various data applications. It works seamlessly with transformation tools, allowing businesses to define metrics, prepare data models, and expose them to different BI and analytics tools.

To construct a universal semantic layer, data teams must first establish the business logic, calculations, and context that go into a semantic data model. They start by understanding the real-world problems that the business needs to solve, gathering the necessary data, and then encoding the relationships between the data and defining governance and security policies to enable trusted access. After that, they use metadata to build an abstraction over the data to expose dimensions, hierarchies, and calculations consistently to downstream data consumers.

Once the underlying data and semantics are established, the universal semantic layer must be integrated with data consumers, such as generative AI, BI, spreadsheets, and embedded analytics. Cube Cloud is a universal semantic layer platform offering numerous prebuilt integrations and a robust API suite so enterprises can model data once and deliver it anywhere. It also offers a host of developer tools to make it easier to collaborate and build data models, set up caching and pre-aggregations, and maintain data access controls.

Benefits of a universal semantic layer

With a universal semantic layer, data teams have more governance and control, and — if implemented correctly — end users get more value from data and fewer misunderstandings among teams. This enhances efficiency and ensures that all data consumption places are working with the same, accurate data. So, no matter if the data is being used by a person looking at a dashboard, or a large language model that is giving someone answers to questions, the data is consistent.

All of this makes it easier for data teams to quickly deliver data to the various consumers they work with internally and externally. Data teams can easily update or define new metrics, design domain-specific views of data, and incorporate new sources of raw data. They also can enforce governance policies, including access control, definitions, and performance.

Another benefit: As data volumes explode, cloud compute costs soar. A universal semantic layer solves this problem by preprocessing or pre-aggregating data, storing frequently used business metrics, and using them as a base for analytics, reducing cloud-data fees. A universal semantic layer also delivers exceptionally high performance and low latency on enterprise-wide data, speeding user queries.

A single source of truth at last

A universal semantic layer is required to power the next generation of data-driven applications, accepting that there will be many different tools for visualizing and using that data, and many different data sources where it is stored. And, at last, a universal semantic layer creates a single source of truth for enterprise metrics — for real this time — giving decision-makers the data they need to get consistent, fast, and accurate answers.

Artyom Keydunov is founder and CEO at Cube.

—

New Tech Forum provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to doug_dineley@foundryco.com.