Real-time analytics with StarTree Cloud and Apache Pinot

Tuesday June 3, 2025. 11:00 AM , from InfoWorld

I started using APL for data analysis around 1972. The powerful language, based on a 1962 book by Ken Iverson, A Programming Language, was and is notable for three things: its use of Greek letters and mathematical symbols, requiring a special APL keyboard; its compact, nearly “write-only” notation; and its multidimensional arrays. The latter provided the foundation for online analytical processing (OLAP) databases. Iverson received a Turing Award for his work on APL in 1979.

Early (1990s) OLAP databases relied on creating and storing pre-computed multidimensional arrays of aggregated values, called OLAP cubes. These allowed for much faster analytical SQL queries than could be accomplished using the raw data. Using OLAP queries, analysts could provide C-level managers with daily reports, typically line printed at the computer center on green-and-white fanfold paper.

One problem with using OLAP for reports is that a one-day turnaround for reports is too long. Another is that OLAP cubes are expensive to compute, so that analysts typically recomputed them weekly, often on weekends when the mainframe wasn’t heavily utilized for routine business activities. That meant that reports could easily be based on week-old data, leading to a data freshness issue. Another issue with OLAP cubes is the increased storage needed for aggregated data.

Cube-based OLAP databases slowly lost customers as their drawbacks became apparent. By 2005, companies were adopting big data solutions for data analysis, including data lakes, Hadoop, and distributed query processing. Switching from OLAP cubes to distributed query processing resulted in improved flexibility, at a cost in query response times.

Another way to save time on complex analysis queries is to use materialized views. Essentially, a view is a virtual table built on-the-fly from a SQL query. A materialized view is a physical table built from a SQL query. Materialized views typically need to be refreshed periodically to keep them up-to-date. An indexed view, introduced in SQL Server 2000, is automatically refreshed when the underlying data changes. Materialized views trade disk space for time.

You can gain many of the benefits of OLAP cubes at lower cost by doing targeted partial aggregations. The classic example for an airline flight-time database is to save aggregations for high-volume routes, such as New York to Los Angeles, but compute them on the fly for low-volume routes, such as Fairbanks to Nome.

Getting to real-time analysis

These analytic techniques are all aimed at supporting small numbers of managers with overnight reports. What about providing real-time analytic answers to large numbers of customers? That was the issue at LinkedIn that spurred the development of Apache Pinot.

Three metrics needed to change drastically: latency, freshness, and concurrency. Latency for a report query can be 100 seconds; for a real-time query, it should be more like a hundredth of a second. People got away with week-old data in the 1990s, but these days data freshness for a report should be 100 seconds, and for a real-time query should be about 10 seconds. The number of concurrent users back in the day was a single digit; these days, there might be 100 concurrent internal users for a report generation app, and potentially a million simultaneous requests for a customer-facing app.

At LinkedIn, Apache Pinot is used for Who Viewed Your Profile, Talent Analytics, Company Analytics, and several other user-facing and internal functions. At Uber Eats, Pinot is used for user ordering queries, such as “What’s the soonest I can get a hamburger delivered?”, and for internal dashboards showing demand-supply, anomalous events (like delayed orders), and live orders.

If you’d like to see a video about this stuff, this one by Tim Berglund is a good place to start.

What Apache Pinot does

Apache Pinot is an open-source, distributed database for customer-facing, real-time analytics, ingesting data from various sources and executing queries using SQL. It is implemented in Java. Pinot’s SQL support is limited to DML (Data Manipulation Language). Pinot does data definition using JSON.

Apache Pinot can scale to thousands of nodes that respond to a query request in unison. It can ingest streams of millions of events per second, and it can make that data available immediately for queries. It can respond with low (< 100ms) latency without caching.

Apache Pinot supports user-facing analytics with the capability to process hundreds of thousands of simultaneous queries. It automatically replicates and distributes data among nodes for fault tolerance. Apache Pinot supports many kinds of indexes, including its signature tunable star-tree index. It stores data in a columnar format for fast analytic queries.

How Apache Pinot works

Essentially, Apache Pinot works by scattering SQL queries from a broker node out to server nodes, which store and process segments of a Pinot table. The results from the distributed servers come back to the broker, which merges them and sends that unified result back to the client.

Apache Pinot components:

Pinot table: A logical abstraction that represents a collection of related data, that is composed of columns and rows (known as documents in Pinot).

Pinot segments: Similar to a shard/partition, data for a Pinot table is broken into multiple small chunks that pack data in a columnar fashion along with dictionaries and indexes for the columns.

Pinot servers: A node that is responsible for a set of Pinot segments, by storing them locally and processing them at query time.

Pinot brokers: A node that receives user queries, scatters them to Pinot servers, and finally, merges and sends back the results gathered from Pinot servers.

An Apache Pinot broker node receives user queries, scatters them to Pinot servers, and finally merges and sends back the results gathered from Pinot servers.StarTree

To speed up the query process and reduce the number of documents scanned, Apache Pinot applies optimizations at each level. At the broker, it does data partitioning, replica group segment assignment, and partition-aware query routing. At the servers, it applies metadata-based pruning or bloom filters. At the segments, it applies indexes for queries with filter predicates. It finally applies aggregation optimizations.

Apache Pinot indexes

All of these optimizations help, but it’s the indexes that make the most dramatic difference. Apache Pinot supports at least seven kinds of index, ranging from the common inverted index, sorted index, and range index, to the specialized star-tree index and JSON index.

Inverted index

An inverted index maintains a map of each column value to its location using a bitmap. For an aggregation query with a filter predicate (e.g., select count(*) … where ID=xxx) on a data set with ~3 billion rows without inverted indexing, the query had to do a full scan of 3 billion rows and took over 2.3s, whereas after applying an inverted index the latency dropped to just 12ms.

Sorted index

One column within a Pinot table can be configured to have a sorted index. Internally, it uses run-length encoding to capture the start and end location pointers for a given column value. The query use case and speed-up are similar to an inverted index.

Range index

A range index is a variant of an inverted index that can speed up range queries, i.e., queries with range predicates (e.g. column > value1, column