Databricks’ TAO method to allow LLM training with unlabeled data

Wednesday March 26, 2025. 01:21 PM , from InfoWorld

Data lakehouse provider Databricks has unveiled a new large language model (LLM) training method, TAO that will allow enterprises to train models without labeling data.

Typically, LLMs when being adapted to new enterprise tasks are trained by using prompts or by fine-tuning the model with datasets for the specific task.

However, both these techniques have caveats. While prompting is seen as an error-prone process with limited quality gains, fine-tuning requires large amounts of human-labeled data which is either not available for most enterprises or is extremely time-consuming to actually label the data.

TAO or Test-time Adaptive Optimization, according to Databricks, provides an alternative to fine-tuning model by leveraging test-time compute and reinforcement learning (RL) to teach a model to do a task better based on past input examples alone, meaning that it scales with an adjustable tuning compute budget, not human labeling effort.

Test-time compute, which has gained popularity due to its use by OpenAI and DeepSeek across their o1 and R1 models, is the compute resources that any LLM uses during the inference phase, which is when it is being asked to complete a task and not during training.

These compute resources, which focus on how the model is actually reasoning to solve a task or query, can be used to make adjustments to improve output quality, according to a community post on Hugging Face.

However, Databricks’ Mosaic Research team has pointed out that enterprises don’t need to be alarmed about the rise in inference costs if they were to adopt TAO.

“Although TAO uses test-time compute, it uses it as part of the process to train a model; that model then executes the task directly with low inference costs (i.e., not requiring additional compute at inference time),” the team wrote in a blog post.

Mixed initial response to TAO

Databricks’ co-founder CEO Ali Ghodsi’s post about TAO on LinkedIn has attracted mixed initial response to TAO.While some users, such as Iman Makaremi, co-founding head of AI at Canadian starup Catio; and Naveed Ahamed, senior enterprise architect at Allianz Technology, were excited to implement and experiment with TAO, other users posed questions about the efficiency of TAO.

Tom Puskarich, a former senior account manager at Databricks, questioned the use of TAO when training a model for new tasks.

“If you are upgrading a current enterprise capability with a trove of past queries, but for enterprises looking to create net new capabilities, wouldn’t a training set of labeled data be important to improve quality?” Puskarich wrote.

“I love the idea of using inputs to improve but most production deployments don’t want a ton of bad experiences at the front end while the system has to learn,” the senior account manager added.

Another user — Patrick Stroh, head of Data Science and AI at ZAP Solutions pointed out that enterprise costs may increase.

“Very interesting, but also cognizant of the (likely increase) costs due to an adaptation phase. (This would likely be incremental to the standard costs (although still less than fine-tuning)). (I simply can’t understand how it would the SAME as the original LLM as noted given that adaptation compute. But I suppose they can price it that way.),” Stroh wrote.

How does TAO work?

TAO comprises four stages including response generation, response scoring, reinforcement learning, and continuous improvement.

In the response generation stage, enterprises can begin with collecting example input prompts or queries for a task, which can be automatically collected from any AI application using its proprietary AI Gateway.

Each prompt is then used to generate a diverse set of candidate responses and then these responses are systematically evaluated in the response scoring stage for quality, the company explained, adding that scoring methodologies include a variety of strategies, such as reward modeling, preference-based scoring, or task-specific verification utilizing LLM judges or custom rules.

In the reinforcement learning stage, the model is updated or tuned so that it produces outputs more closely aligned with high-scoring responses identified in the previous step.

“Through this adaptive learning process, the model refines its predictions to enhance quality,” the company explained.And finally in the continuous improvement phase, enterprise users create data, which are essentially different LLM inputs, by interacting with the model, which can be used to optimize model performance further.

TAO can increase the efficiency of inexpensive models

Databricks said it used TAO to not only achieve better model quality than fine-tuning but also upgrade the functionality of inexpensive open-source models, such as Llama, to meet the quality of more expensive proprietary models like GPT-4o and o3-mini.

“Using no labels, TAO improves the performance of Llama 3.3 70B by 2.4% on a broad enterprise benchmark,” the team wrote.

TAO is now available in preview to Databricks customers who want to tune Llama, the company said. The company is planning to add TAO to other products in the future.