Navigation
Search
|
Onehouse opens up the lakehouse with Open Engines
Thursday April 17, 2025. 08:31 PM , from InfoWorld
Data lake vendor Onehouse on Thursday released Open Engines, a new capability on its platform which it says provides the ability to deploy open source engines on top of open data.
Available in private preview, it initially supports Apache Flink for stream processing, Trino for distributed SQL queries for business intelligence and reporting, and Ray for machine learning (ML), AI, and data science workloads. In a blog announcing Open Engines, Onehouse founder and CEO Vinoth Chandar wrote that while the industry “has made strides towards making data open with file formats … along with a budding renaissance of open data catalogs … we are still often restricted to closed compute because achieving open compute on open data is not as easy as it should be.” “With Open Engines, Onehouse is now removing the final barrier to realizing a truly universal data lakehouse and finally flipping the defaults—for both data and compute—to open,” he wrote. He also pointed out that no engine excels with all data workloads. For example, he said, the company’s deep dive blogs comparing analytics, data science and machine learning, and stream processing engines show that Apache Spark is “well-rounded, but not necessarily the best engine in any of these categories.” A more modular approach James Curtis, senior research analyst at S&P Global Market Intelligence, who specializes in data, AI and analytics, said, “my first impression of Open Engines is that this a good thing. With its carefully curated choice of engines, Onehouse is raising enterprise awareness that not every problem is a nail and not every solution is a hammer.” One of the underlying benefits of the open file formats and open table formats, he said, has been that enterprises can mix and match different engines with the data, although he noted, “while that gives organizations choice, it still doesn’t completely address other data management challenge such as security and governance, let alone the added administrative work it takes to set up and maintain these environments.” Onehouse addresses this potential added administrative burden by offering Open Engines as a managed service, said Curtis. Usman Lakani, principal advisory director at Info-Tech Research Group, said that Open Engines, as part of an open lakehouse architecture, “introduces a more modular approach by breaking the Siamese connection between compute and storage.” Organizations, he said, “would potentially no longer have to feel stuck using the data engine they initially thought would work. Rather, they would be enabled to adopt a ‘horses for courses’ approach, like having the option to select Presto for SQL-based analytics or Spark for more complex machine learning.” Lakani added that this flexibility to scale compute without the constraints of data storage “is at least a game enhancer, if not a game changer. The use of open table formats like Apache Iceberg and Hudi ensures an organization’s data is not under proprietary lock and key and promotes interoperability, which is a crucial ingredient in an open, democratic, and decentralized data infrastructure.” No one engine does everything well Gaetan Castelein, chief marketing officer (CMO) at Onehouse, said the current problem revolves around the fact there is no single query engine that can best support all use cases and workloads, especially with the rise of machine learning, AI, and real time analytics. “If you go back 10 years, all of these platforms were basically supporting batch business intelligence,” he observed. In addition, he said that while large organizations such as Uber and Walmart have installed and are using lakehouse offerings, mainstream enterprises, to a large extent, have not yet moved to them, because “today it requires building a via a do-it-yourself approach where you build your own, you cobble together a bunch of open source tools. If you have a deep engineering bench, you can do that. If you don’t have that deep engineering bench, that becomes very difficult.” Kyle Weller, VP product at Onehouse, added that organizations currently face two challenges: “[They have] chosen a Databricks or Snowflake, and that dictates the rest of their architectural choices, or they are in a situation where they’re looking to open source, but that complexity of self managing is preventative from exploring multiple engines.” Each engine has a unique specialty, he said, noting, “Flink was not invented for no reason. Flink was invented to address real time stream processing. Ray wasn’t invented just to be another item on the shelf. Ray was invented special purpose for AI use cases, ML use cases, data science.” He added, “having the optionality or ability to bring these and match them to your use cases Is so critical. [Open Engines] is a one click deployment for Trino, Ray, and Flink clusters. This is our starting point. We’ll add more engines as we go.” Getting value out of data Info-Tech’s Lakani agreed. “In the marketplace of ideas and inventions, systems need to be flexible and not impose restrictions on enhancing the art of the possible,” he said. “Open-source software has always been at the forefront of this philosophy, and over the last two decades, closed business models used by the big names in the technology industry have slowly but steadily jumped on the bandwagon.” However, he added, “this openness was primarily limited to software, leaving the capital-intensive hardware infrastructure in the gilded cage of vendor lock-in. People have to eat, and companies want to make profits — there’s no begrudging them this, but the sometimes unnecessarily complex conversations around available data tools we use overshadow the real source of value: our data. The release of Open Engines starts to chip away at this, and it’s about time.” Curtis, meanwhile, said the “choice of engines is nothing new. The better question to ask is what does choice lead to? Complexity is usually the answer. Onehouse maintains that choice doesn’t necessarily require an extra administrative lift.” He pointed out, “in an environment where alignment with one particular engine or table format is common, Onehouse’s approach is to provide an open platform that is more inclusive to different engines and table formats while also maintaining a focus on data as a first class citizen, which is ultimately where enterprises are challenged. That is, getting value out of their data.”
https://www.infoworld.com/article/3964950/onehouse-opens-up-the-lakehouse-with-open-engines.html
Related News |
25 sources
Current Date
Apr, Sat 19 - 14:22 CEST
|