Spring AI tutorial: Get started with Spring AI

Thursday December 4, 2025. 10:00 AM , from InfoWorld

Artificial intelligence and related technologies are evolving rapidly, but until recently, Java developers had few options for integrating AI capabilities directly into Spring-based applications. Spring AI changes that by leveraging familiar Spring conventions such as dependency injection and the configuration-first philosophy in a modern AI development framework.

In this article, you will learn how to integrate AI into your Spring applications. We’ll start with a simple example that sends a request to OpenAI, then use Spring AI’s prompt templates to add support for user-generated queries. You’ll also get a first look at implementing retrieval augmented generation (RAG) with Spring AI, using a vector store to manage external documents.

What is Spring AI?

Spring AI started as a project in 2023, with its first milestone version released in early 2024. Spring AI 1.0, the general availability release, was finalized in May 2025. Spring AI abstracts the processes involved in interacting with large language models (LLMs), similar to how Spring Data abstracts database access procedures. Spring AI also provides abstractions for managing prompts, selecting models, and handing AI responses. It includes support for multiple AI providers, including OpenAI, Anthropic, Hugging Face, and Ollama (for local LLMs).

Spring AI allows you to easily switch between providers simply by changing configuration properties. As a developer, you configure your AI resources in your application.yaml or application.properties file, wire in Spring beans that provide standard interfaces, and write your code against those interfaces. Spring then handles all the details of interacting with the specific models.

Also see: Spring AI: An AI framework for Java developers.

Building a Spring app that queries OpenAI

Let’s start by building a simple Spring MVC application that exposes a query endpoint, which sends a question to OpenAI. You can download the source code for this example or head over to start.spring.io and create a new project. In the dependencies section, include the dependencies you want for your application; just be sure to scroll down to the AI section and choose “OpenAI.” I chose “Spring Web” and “OpenAI” for my example.

The first thing we want to do is configure our LLM provider. I created an application.yaml file with the following contents:

spring:
application:
name: spring-ai-demo
ai:
openai:
api-key: ${OPENAI_API_KEY}
chat:
options:
model: gpt-5
temperature: 1

Under spring, I included an “ai” section, with an “openai” subsection. To use OpenAI, you need to specify an api-key, which I defined to use the OPENAI_API_KEY environment variable, so be sure to define that environment variable before running the example code. Additionally, you need to specify a set of options. The most important option is the model to use. I chose gpt-5, but you can choose any model listed on the OpenAI models page. By default, Spring AI uses gpt-4o-mini, which is less expensive, but gpt-5 supports structured reasoning, multi-step logic, planning, and more tokens. It doesn’t really matter which model we use for this example, but I wanted to show you how to configure the model.

There are several other configuration options, but the most common ones you’ll use are maxTokens, maxCompletionTokens, and temperature. The temperature controls the randomness of the response, where a low value, like 0.3, provides a more repeatable response and a higher value, like 0.7 allows the LLM to be more creative. When I ask a model to design a software component or perform a code review, I typically opt for a higher temperature of 0.7 because I want it to be more creative, but when I ask it to implement the code for a project, I set the temperature to 0.3 so that it is more rigid. For gpt-5, which is a reasoning model, the required temperature is 1, and Spring will throw an error if you try to set it to a different value.

Once the model is configured, we can build our service:

package com.infoworld.springaidemo.service;

import java.util.Map;

import com.infoworld.springaidemo.model.JokeResponse;
import com.infoworld.springaidemo.model.SimpleQueryResponse;

import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.chat.prompt.Prompt;
import org.springframework.ai.chat.prompt.PromptTemplate;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.core.io.Resource;
import org.springframework.stereotype.Service;

@Service
public class SpringAIService {

private final ChatClient chatClient;

public SpringAIService(ChatClient.Builder chatClientBuilder) {
this.chatClient = chatClientBuilder.build();
}

public String simpleQueryAsString(String query) {
return this.chatClient.prompt(query).call().content();
}

public SimpleQueryResponse simpleQuery(String query) {
return this.chatClient.prompt(query).call().entity(SimpleQueryResponse.class);
}
}

Because we have OpenAI configured in our application.yaml file, Spring will automatically create a ChatClient.Builder that we can wire into our service and then use it to create a ChatClient. The ChatClient is the main interface for interacting with chat-based models, such as GPT. In this example, we invoke its prompt() method, passing it our String query. The prompt() method also accepts a Prompt object, which you will see in a minute. The prompt() method returns a ChatClientRequestSpec instance that we can use to configure LLM calls. In this example, we simply invoke its call() method to send the message to the LLM. The call() method returns a CallResponseSpec instance. You can use that to get the text response by invoking its content() method, or you can map the response to an entity by invoking its entity() method. I provided examples of both. For the entity mapping, I passed a SimpleQueryResponse, which is a Java record:

package com.infoworld.springaidemo.model;

public record SimpleQueryResponse(String response) {
}

Now let’s build a controller so that we can test this out:

package com.infoworld.springaidemo.web;

import com.infoworld.springaidemo.model.SimpleQuery;
import com.infoworld.springaidemo.model.SimpleQueryResponse;
import com.infoworld.springaidemo.service.SpringAIService;

import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestBody;
import org.springframework.web.bind.annotation.RestController;

@RestController
public class SpringAiController {
private final SpringAIService springAIService;

public SpringAiController(SpringAIService springAIService) {
this.springAIService = springAIService;
}

@PostMapping('/simpleQuery')
public ResponseEntity simpleQuery(@RequestBody SimpleQuery simpleQuery) {
SimpleQueryResponse response = springAIService.simpleQuery(simpleQuery.query());
return ResponseEntity.ok(response);
}

}

This controller wires in the SpringAIService and exposes a PostMapping to /simpleQuery. It accepts a SimpleQuery as its request body, which is another Java record:

package com.infoworld.springaidemo.model;

public record SimpleQuery(String query) {
}

The simpleQuery() method passes the request body’s query parameter to the SpringAIService and then returns a response as a SimpleQueryResponse.

If you build the application, with mvn clean install, and then run it with mvn spring-boot:run, you can execute a POST request to /simpleQuery and get a response. For example, I posted the following SimpleQuery:

{
'query': 'Give me a one sentence summary of Spring AI'
}

And received the following response:

{
'response': 'Spring AI is a Spring project that offers vendor-neutral, idiomatic abstractions and starters to integrate LLMs and related AI capabilities (chat, embeddings, tools, vector stores) into Java/Spring applications.'
}

Now that you know how to configure a Spring application to use Spring AI, send a message to an LLM, and get a response, we can begin to explore prompts more deeply.

Download the Spring AI tutorial source code.

Supporting user input with Spring AI prompt templates

Sending a message to an LLM is a good first step in understanding Spring AI, but it is not very useful for solving business problems. Many times, you want to control the prompt and allow the user to specify specific parameters, and this is where prompt templates come in. Spring AI supports prompt templates through the PromptTemplate class. You can define prompt templates in-line, but the convention in Spring AI is to define your templates in the src/resources/templates directory using an st extension.

For our example, we’ll create a prompt template that asks the LLM to tell us a joke, but in this case, we’ll have the user provide the type of joke, such as silly or sarcastic, and the topic. Here is my joke-template.st file:

Tell me a {type} joke about {topic}

We define the template as a String that accepts variables, which in this case are a type and a topic. We can then import this template into our class using a Spring property value. I added the following to the SpringAIService:

@Value('classpath:/templates/joke-template.st')
private Resource jokeTemplate;

The value references the classpath, which includes the files found in the src/main/resources folder, then specifies the path to the template.

Next, I added a new tellMeAJoke() method to the SpringAIService:

public JokeResponse tellMeAJoke(String type, String topic) {
Prompt prompt = new PromptTemplate(jokeTemplate).create(Map.of('type', type, 'topic', topic));
return this.chatClient.prompt(prompt).call().entity(JokeResponse.class);
}

This method accepts a type and a topic and then constructs a new PromptTemplate from the joke-template.st file that we wired in above. To set its values, we pass a map of the values in the PromptTemplate’s create() method, which returns a Prompt for us to use. Finally, we use the ChatClient, but this time we pass the prompt to the prompt() method instead of the raw string, then we map the response to a JokeResponse:

package com.infoworld.springaidemo.model;

public record JokeResponse(String response) {
}

I updated the controller to create a new /tellMeAJoke PostMapping:

@PostMapping('/tellMeAJoke')
public ResponseEntity tellMeAJoke(@RequestBody JokeRequest jokeRequest) {
JokeResponse response = springAIService.tellMeAJoke(jokeRequest.type(), jokeRequest.topic());
return ResponseEntity.ok(response);
}

The request body is a JokeRequest, which is another Java record:

package com.infoworld.springaidemo.model;

public record JokeRequest(String type, String topic) {
}

Now we can POST a JSON body with a type and topic and it will tell us a joke. For example, I sent the following JokeRequest to ask for a silly joke about Java:

'type': 'silly',
'topic': 'Java'
}

And OpenAI returned the following:

{
'response': 'Why do Java developers wear glasses? Because they don't C#.'
}

While this is a trivial example, you can use the code here as a scaffold to build robust prompts and accept simple input from users, prompting OpenAI or another LLM to generate meaningful results.

Retrieval augmented generation with Spring AI

The examples we’ve built so far are very much “toy” examples, but they illustrate how to configure an LLM and execute calls to it with Spring AI. Now let’s look at something more useful. Retrieval augmented generation, or RAG, is important in the AI space because it allows us to leverage LLMs to answer questions they were not trained on, such as internal company documents. The process is conceptually very simple, but the implementation details can be confusing if you don’t have a good foundation in what you are doing. This section will build that foundation so you can start using RAG in your Spring AI programs.

To start, let’s say we create a prompt with the following format:

Use the following context to answer the user's question.
If the question cannot be answered from the context, state that clearly.

Context:
{context}

Question:
{question}

We provide the context, which is the information we want the LLM to use to answer the question, along with the question we want the LLM to answer. This is like giving the LLM a cheat sheet: The answer is here, and you just need to extract it to answer the question. The real challenge is how to store and retrieve the context we want the LLM to use. For example, you might have thousands of pages in a knowledge base that contains everything about your product, but you shouldn’t send all that information to the LLM. It would be very expensive to ask the LLM to process that much information. Besides, each LLM has a token limit, so you couldn’t send all of it even if you wanted to. Instead, we introduce the concept of a vector store.

A vector store is a database that contains documents. The interesting thing about these documents is that the vector store uses an embedding algorithm to create a multi-dimensional vector for each one. Then, you can create a similar vector for your question, and the vector store will compute a similarity score comparing your question to the documents in its database. Using this approach, you can take your question, retrieve the top three to five documents that are similar to your question, and use that as the context in the prompt.

Here’s a flow diagram summarizing the process of using a vector store:

Steven Haines

First, you gather all your documents, chunk them into smaller units, and add them to the vector store. There are different chunking strategies, but you can chunk the documents into a specific number of words, paragraphs, sentences, and so forth, including overlapping sections so that you don’t lose too much context. The smaller the chunk is, the more specific it is, but the less context it retains. Larger chunks retain more context, but lose a lot of specific knowledge, which makes similarity searches more difficult. Finding the right size for your data chunks is a balancing act and requires experimenting on your own dataset.

For our example, I took some text from the public Spring AI documentation and stored it in three text files included with the source code for this article. We’ll use this text with Spring AI’s SimpleVectorStore, which is an in-memory vector store that you can use for testing. Spring AI supports production-scale vector stores like Pinecone, Qdrant, Azure AI, PGvector, and more, but using SimpleVectorStore works for this example.

I added the following SpringRagConfig configuration class to the example code developed so far:

package com.infoworld.springaidemo;

import java.io.IOException;
import java.util.List;

import org.springframework.ai.document.Document;
import org.springframework.ai.embedding.EmbeddingModel;
import org.springframework.ai.reader.TextReader;
import org.springframework.ai.vectorstore.SimpleVectorStore;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.io.Resource;
import org.springframework.core.io.support.PathMatchingResourcePatternResolver;
import org.springframework.core.io.support.ResourcePatternResolver;

@Configuration
public class SpringRagConfig {

@Bean
public SimpleVectorStore simpleVectorStore(EmbeddingModel embeddingModel) throws RuntimeException {
// Use the builder to create and configure the SimpleVectorStore
SimpleVectorStore simpleVectorStore = SimpleVectorStore.builder(embeddingModel).build();
try {
ResourcePatternResolver resolver = new PathMatchingResourcePatternResolver();
Resource[] resources = resolver.getResources('classpath*:documents/**/*.txt');
for(Resource resource: resources) {
TextReader textReader = new TextReader(resource);
List documents = textReader.get();
simpleVectorStore.add(documents);
}
} catch (IOException e) {
throw new RuntimeException(e);
}
return simpleVectorStore;
}
}

This configuration class defines a Spring bean named simpleVectorStore that accepts an EmbeddingModel, which will automatically be created by Spring when it creates your LLM. It creates a new SimpleVectorStore by invoking the SimpleVectorStore’s static builder() method, passing it the embedding model, and calling its build() method. Then, it scans the classpath for all txt files in the src/resources/documents directory, reads them using Spring’s TextReader, retrieves their content as Document instances by calling the text reader’s get() method, and finally adds them to the SimpleVectorStore.

In a production environment, you can configure the production vector store in your application.yaml file and Spring will create it automatically. For example, if you wanted to configure Pinecone, you would add the following to your application.yaml:

spring:
ai:
vectorstore:
pinecone:
apiKey: ${PINECONE_API_KEY}
environment: ${PINECONE_ENV}
index-name: ${PINECONE_INDEX}
projectId: ${PINECONE_PROJECT_ID}

The SimpleVectorStore takes a little more configuration, but still keeps our test code simple. To use it, I first created a rag-template.st file:

Use the following context to answer the user's question.
If the question cannot be answered from the context, state that clearly.

Context:
{context}

Question:
{question}

Then I created a new SpringAIRagService:

package com.infoworld.springaidemo.service;

import java.util.List;
import java.util.Map;
import java.util.stream.Collectors;

import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.chat.prompt.Prompt;
import org.springframework.ai.chat.prompt.PromptTemplate;
import org.springframework.ai.document.Document;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.ai.vectorstore.SearchRequest;
import org.springframework.stereotype.Service;

@Service
public class SpringAIRagService {
@Value('classpath:/templates/rag-template.st')
private Resource promptTemplate;
private final ChatClient chatClient;
private final VectorStore vectorStore;

public SpringAIRagService(ChatClient.Builder chatClientBuilder, VectorStore vectorStore) {
this.chatClient = chatClientBuilder.build();
this.vectorStore = vectorStore;
}

public String query(String question) {
SearchRequest searchRequest = SearchRequest.builder().query(question).topK(2).build();
List similarDocuments = vectorStore.similaritySearch(searchRequest);
String context = similarDocuments.stream().map(Document::getText).collect(Collectors.joining('n'));

Prompt prompt = new PromptTemplate(promptTemplate).create(Map.of('context', context, 'question', question));

return chatClient.prompt(prompt).call().content();
}
}

The SpringAIRagService wires in a ChatClient.Builder, which we use to build a ChatClient, along with our VectorStore. The query() method accepts a question and uses the VectorStore to build the context. First, we need to build a SearchRequest, which we do by:

Invoking its static builder() method.

Passing the question as the query.

Using the topK() method to specify how many documents we want to retrieve from the vector store.

Calling its build() method.

In this case, we want to retrieve the top two documents that are most similar to the question. In practice, you’ll use something larger, such as the top three or top five, but since we only have three documents, I limited it to two.

Next, we invoke the vector store’s similaritySearch() method, passing it our SearchRequest. The similaritySearch() method will use the vector store’s embedding model to create a multidimensional vector of the question. It will then compare that vector to each document and return the documents that are most similar to the question. We stream over all similar documents, get their text, and build a context String.

Next, we create our prompt, which tells the LLM to answer the question using the context. Note that it is important to tell the LLM to use the context to answer the question and, if it cannot, to state that it cannot answer the question from the context. If we don’t provide these instructions, the LLM will use the data it was trained on to answer the question, which means it will use information not in the context we’ve provided.

Finally, we build the prompt, setting its context and question, and invoke the ChatClient. I added a SpringAIRagController to handle POST requests and pass them to the SpringAIRagService:

package com.infoworld.springaidemo.web;

import com.infoworld.springaidemo.model.SpringAIQuestionRequest;
import com.infoworld.springaidemo.model.SpringAIQuestionResponse;
import com.infoworld.springaidemo.service.SpringAIRagService;

import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestBody;
import org.springframework.web.bind.annotation.RestController;

@RestController
public class SpringAIRagController {
private final SpringAIRagService springAIRagService;

public SpringAIRagController(SpringAIRagService springAIRagService) {
this.springAIRagService = springAIRagService;
}

@PostMapping('/springAIQuestion')
public ResponseEntity askAIQuestion(@RequestBody SpringAIQuestionRequest questionRequest) {
String answer = springAIRagService.query(questionRequest.question());
return ResponseEntity.ok(new SpringAIQuestionResponse(answer));
}
}

The askAIQuestion() method accepts a SpringAIQuestionRequest, which is a Java record:

package com.infoworld.springaidemo.model;

public record SpringAIQuestionRequest(String question) {
}

The SpringAIQuestionRequest returns a SpringAIQuestionResponse:

package com.infoworld.springaidemo.model;

public record SpringAIQuestionResponse(String answer) {
}

Now restart your application and execute a POST to /springAIQuestion. In my case, I sent the following request body:

{
'question': 'Does Spring AI support RAG?'
}

And received the following response:

{
'answer': 'Yes. Spring AI explicitly supports Retrieval Augmented Generation (RAG), including chat memory, integrations with major vector stores, a portable vector store API with metadata filtering, and a document injection ETL framework to build RAG pipelines.'
}

As you can see, the LLM used the context of the documents we loaded into the vector store to answer the question. We can further test whether it is following our directions by asking a question that is not in our context:

{
'question': 'Who created Java?'
}

Here is the LLM’s response:

{
'answer': 'The provided context does not include information about who created Java.'
}

This is an important validation that the LLM is only using the provided context to answer the question and not using its training data or, worse, trying to make up an answer.

Conclusion

This article introduced you to using Spring AI to incorporate large language model capabilities into Spring-based applications. You can configure LLMs and other AI technologies using Spring’s standard application.yaml file, then wire them into Spring components. Spring AI provides an abstraction to interact with LLMs, so you don’t need to use LLM-specific SDKs. For experienced Spring developers, this entire process is similar to how Spring Data abstracts database interactions using Spring Data interfaces.

In this example, you saw how to configure and use a large language model in a Spring MVC application. We configured OpenAI to answer simple questions, introduced prompt templates to externalize LLM prompts, and concluded by using a vector store to implement a simple RAG service in our example application.

Spring AI has a robust set of capabilities, and we’ve only scratched the surface of what you can do with it. I hope the examples in this article provide enough foundational knowledge to help you start building AI applications using Spring. Once you are comfortable with configuring and accessing large language models in your applications, you can dive into more advanced AI programming, such as building AI agents to improve your business processes.

Read next: The hidden skills behind the AI engineer.