Semantic Kernel: Diving into Microsoft’s AI orchestration SDK

Monday July 22, 2024. 10:30 AM , from InfoWorld

Large language models (LLMs) by themselves are less than meets the eye; the moniker “stochastic parrots” isn’t wrong. Connect LLMs to specific data for retrieval-augmented generation (RAG) and you get a more reliable system, much less likely to go off the rails and “hallucinate,” which is a relatively nice way of describing LLMs that lie to you. Connect RAG systems to software that can take actions, even indirect actions like sending emails, and you may have something useful: agents. These connections, however, don’t spring into being fully grown from their father’s forehead. You need a framework that ties the components together and orchestrates them.

Microsoft’s entry into this field is Semantic Kernel, an open-source SDK that lets you build agents that can call your existing code. Semantic Kernel fills the same function in Microsoft’s open-source LLM application stack as AI orchestration does in its internal stack for Copilots: It sits in the middle and ties everything together. (See diagram below.) Copilot is, of course, Microsoft’s name for collaborative AI agents.

The OG copilot is GitHub Copilot, designed as an AI pair programmer. The first thing I learned when I used GitHub Copilot was that I couldn’t trust its suggestions. Sometimes they were good, sometimes they were bad, and sometimes they were ugly. Sometimes they didn’t compile; sometimes they had no hope of ever working; sometimes they solved the wrong problem. Using GitHub Copilot was, for me, like being paired with a smart intern who was good at searching Stack Overflow but had a drinking problem.

A recent study by code review tool vendor GitClear found that GitHub Copilot exerted “downward pressure on code quality.” That study counter-balanced GitHub’s own 2022 study, which found a 55% increase in coding speed from using GitHub Copilot. Write worse code faster, yeah, that’s the ticket. (Blue light goes on to indicate sarcasm.)

Where was I? Ah, yes, the open-source Copilot stack. Semantic Kernel is the glue, the orchestration layer, that connects LLMs with data and code. It does a bit more, as well: Semantic Kernel can generate plans using LLMs and templates. That’s a step beyond what you can do with function calling alone, and it’s a differentiator for Semantic Kernel.

Semantic Kernel uses a planner function that takes a user’s “ask” (Microsoft-speak for “request”) and returns a plan on how to accomplish the request. It does so by using AI to mix and match the plugins registered in the kernel so that it can recombine them into a series of steps that complete the goal. The current Semantic Kernel planners are HandlebarsPlanner and FunctionCallingStepwisePlanner. You may see references to Action, Sequential, or Stepwise planners; these are obsolete.

Semantic Kernel currently supports the C#, Python, and Java programming languages. Not all Semantic Kernel features are supported in all of these programming languages. Not all of these programming languages currently even have Semantic Kernel documentation, much less accurate and current documentation, but all of them have examples in the GitHub repository. Not all of the examples run properly, at least for me, and yes, I’ve let Microsoft know about the issues I’ve found.

Semantic Kernel competes directly with LangChain, LlamaIndex, and Haystack. These have differing scope and capabilities, but similar purpose and intent.

AI orchestration falls in the middle of the Microsoft Copilot stack. Semantic Kernel is an open-source version of Microsoft’s AI orchestration.
IDG

Semantic Kernel AI Orchestration Layer

Semantic Kernel includes an AI orchestration layer, plugins, connectors, and planners. The orchestration layer ties the plugins and connectors together. The planners help define the flow of plugins.

The Semantic Kernel kernel (note the lack of capitalization for the second instance of “kernel”) is essentially a traffic cop for AI applications. It selects AI services, renders prompts, invokes AI services, parses LLM responses, and creates function results. Along the way it can invoke other services, such as monitoring and responsible AI filters. (See diagram below.)

There are currently three implementations of Semantic Kernel in its repository: one in C#, one in Java, and one in Python. The three implementations have differing omissions.

The Semantic Kernel performs several functions in its role as the coordinator between models and your application.
IDG

Semantic Kernel plugins

Semantic Kernel uses the OpenAI specification for plugins. That standardization means that Semantic Kernel plugins are interoperable with ChatGPT, Bing, and Microsoft 365 plugins.

A Semantic Kernel plugin is a group of functions that can be exposed to AI apps and services. You can invoke these functions manually with function calling, or automatically with planners. For use by planners, functions need semantic descriptions, which can be defined in JSON files or by using code annotations. Within a plugin, you can create two types of functions: prompts and native functions.

Let’s start with examples of native functions.

The C# code that follows is an excerpt from the DocumentPlugin, which can be found in the document plugin folder in the GitHub repository. It demonstrates how you can use the SKFunction attribute to describe the function to planner, and how you can describe an input parameter.

[SKFunction, Description('Read all text from a document')]public async Task ReadTextAsync(   [Description('Path to the file to read')] string filePath){    this._logger.LogInformation('Reading text from {0}', filePath);    using var stream = await this._fileSystemConnector.GetFileContentStreamAsync(filePath).ConfigureAwait(false);    return this._documentConnector.ReadText(stream);}

The Java code that follows is from the MathPlugin sample, and demonstrates using the @DefineKernelFunction and @KernelFunctionParameter annotations to communicate with planner:

@DefineKernelFunction(name = 'add', description = 'Add two numbers')public static double add(      @KernelFunctionParameter(name = 'number1', description = 'The first number to add', type = double.class) double number1,      @KernelFunctionParameter(name = 'number2', description = 'The second number to add', type = double.class) double number2) {   return number1 + number2;}

The following Python code is from the Python MathPlugin class, and demonstrates the decorators @kernel_function and Annotated:

    @kernel_function(name='Add')    def add(        self,        input: Annotated[int, 'the first number to add'],        amount: Annotated[int, 'the second number to add'],    ) -> Annotated[int, 'the output is a number']:        '''Returns the Addition result of the values provided.'''        if isinstance(input, str):            input = int(input)        if isinstance(amount, str):            amount = int(amount)        return MathPlugin.add_or_subtract(input, amount, add=True)

Prompt templates use natural English instructions and JSON descriptions. For example, the Summarize Plugin sample follows. Note that the JSON description fields are read and used by Semantic Kernel planners, and are not just documentation.

skprompt.txt:

[SUMMARIZATION RULES]DONT WASTE WORDSUSE SHORT, CLEAR, COMPLETE SENTENCES.DO NOT USE BULLET POINTS OR DASHES.USE ACTIVE VOICE.MAXIMIZE DETAIL, MEANINGFOCUS ON THE CONTENT[BANNED PHRASES]This articleThis documentThis pageThis material[END LIST]Summarize:Hello how are you?+++++HelloSummarize this{{$input}}+++++

config.json:

{ 'schema': 1, 'description': 'Summarize given text or any text document', 'execution_settings': {    'default': {      'max_tokens': 512,      'temperature': 0.0,      'top_p': 0.0,      'presence_penalty': 0.0,      'frequency_penalty': 0.0    } }, 'input_variables': [    {      'name': 'input',      'description': 'Text to summarize',      'default': '',      'is_required': true    } ]}

Semantic Kernel planners

Semantic Kernel planners are actually LLM applications that generate plans for other LLM applications. They use a set of text instructions (a “prompt”), a set of rules, and a “function manual” in conjunction with a LLM (at least GPT 3.5) and a set of plugins to generate a plan to solve a user’s query. None of this is cheap (in terms of tokens used by the LLM) or fast (the user often faces a noticeable wait). You can speed things up and reduce the number of tokens consumed by storing plans in XML format and reusing them when appropriate. Also note that the generated plans are not guaranteed to generate correct results.

Let’s look at an example in C#.

You start by creating a handlebars planning object:

var planner = new HandlebarsPlanner(new HandlebarsPlannerOptions()     { AllowLoops = true });

Then you create and execute a plan using the planning object:

// Create a planvar plan = await planner.CreatePlanAsync(kernelWithMath, problem);this._logger.LogInformation('Plan: {Plan}', plan);// Execute the planvar result = (await plan.InvokeAsync(kernelWithMath)).Trim();this._logger.LogInformation('Results: {Result}', result);

This will answer simple math word problems, such as “If my investment of 2130.23 dollars increased by 23%, how much would I have after I spent $5 on a latte?” How does that work?

The system prompt used is:

## StartNow take a deep breath and accomplish the task:1. Keep the template short and sweet. Be as efficient as possible.2. Do not make up helpers or functions that were not provided to you, and be especially careful to NOT assume or use any helpers or operations that were not explicitly defined already.3. If you can’t fully accomplish the goal with the available helpers, just print “{{insufficientFunctionsErrorMessage}}”.4. Always start by identifying any important values in the goal. Then, use the `{{set}}` helper to create variables for each of these values.5. The template should use the {{json}} helper at least once to output the result of the final step.6. Don’t forget to use the tips and tricks otherwise the template will not work.7. Don’t close the “` handlebars block until you’re done with all the steps.

The handlebars.js template is:

{{#each functions}}### `{{doubleOpen}}{{PluginName}}{{../nameDelimiter}}{{Name}}{{doubleClose}}`Description: {{Description}}Inputs: {{#each Parameters}}    - {{Name}}:    {{~#if ParameterType}} {{ParameterType.Name}} -    {{~else}}        {{~#if Schema}} {{getSchemaTypeName this}} -{{/if}}    {{~/if}}    {{~#if Description}} {{Description}}{{/if}}    {{~#if IsRequired}} (required){{else}} (optional){{/if}} {{/each}}Output:{{~#if ReturnParameter}} {{~#if ReturnParameter.ParameterType}} {{ReturnParameter.ParameterType.Name}} {{~else}}    {{~#if ReturnParameter.Schema}} {{getSchemaReturnTypeName ReturnParameter}}    {{else}} string{{/if}} {{~/if}} {{~#if ReturnParameter.Description}} - {{ReturnParameter.Description}}{{/if}}{{/if}}{{/each}}

Rendering the prompt includes the function definitions. Note that the descriptions are part of the prompt.

[AVAILABLE FUNCTIONS]### `{{MathPlugin-Add}}`Description: Add two numbersInputs: - number1 double - The first number to add (required) - number2 double - The second number to add (required)Output: double### `{{MathPlugin.Divide}}`Description: Divide two numbersInputs: - number1: double - The first number to divide from (required) - number2: double - The second number to divide by (required)Output: double

If we log the handlebars plan generated from the prompt above in plain text, it’ll look like this, according to the Microsoft documentation:

Plugins.MathSolver: Information: Plan: {{!-- Step 1: Set the initial investment amount --}}{{set 'initialInvestment' 2130.23}}{{!-- Step 2: Calculate the increase percentage --}}{{set 'increasePercentage' 0.23}}{{!-- Step 3: Calculate the final amount after the increase --}}{{set 'finalAmount' (MathPlugin-Multiply (get 'initialInvestment') (MathPlugin-Add 1 (get 'increasePercentage')))}}{{!-- Step 4: Output the final amount --}}{{json (get 'finalAmount')}}

That logic seems to be missing “subtract 5” in step 3, based on desk-checking it mentally. Perhaps Microsoft will correct the documentation at some point.

Earlier I said that planners are a differentiating feature for Semantic Kernel. That’s true. Currently, however, they aren’t soup yet. Whether they will become useful as AI hardware and software improves remains to be seen.

Installing and learning Semantic Kernel

Semantic Kernel can be installed from the standard sources for C#, Python, and Java. You should also clone the Semantic Kernel repository. If you run samples from the repo, they will typically take care of the installation for you.

For C#, you can install Semantic Kernel from NuGet. The command line is:

     dotnet add package Microsoft.SemanticKernel

For Python, you can install Semantic Kernel from PyPI. The command line is:

     pip install semantic-kernel

It’s possible that you will need to use pip3 rather than pip.

In Java, you can build the project in the repo from the Maven wrapper, and that will pull in everything you need.

No matter which language you use, you’ll need an API key, either from OpenAI or Azure OpenAI. Save the API key locally in a safe place. You’ll also need to enter the API key somewhere (it varies by language) so that the code can use it to call LLMs. If you run the Bing search example (see screenshot below) you’ll also need to get a Bing API key from Azure.

Unless you have a strong interest in Python or Java, I suggest that you read and run the C# notebook examples, which are currently in the best shape, i.e. they mostly work without throwing errors and mostly match the documentation. Entering the API key for these happens interactively in the first example.

The repository has a section on learning Semantic Kernel. Some of the content referenced is helpful. However, some of the titles no longer match the content, and some of the content is currently missing examples for particular languages.

This sample is the last of the C# notebooks for Semantic Kernel. It uses Bing search in conjunction with the Semantic Kernel and an OpenAI model to provide current results for queries.
IDG

Semantic Kernel Cookbook

The Semantic Kernel Cookbook, an open-source manual mainly focused on the implementation of the Semantic Kernel for beginners, is available in English and Simplified Chinese. It’s an interesting complement to the official Semantic Kernel documentation, written by kinfey, a Microsoft Cloud Advocate.

Project Miyagi

Project Miyagi is an “as-is” demo envisioning sample for the Copilot stack. It includes examples of usage for Semantic Kernel, Promptflow, LlamaIndex, LangChain, vector stores (Azure AI Search, CosmosDB Postgres pgvector), and generative image utilities such as DreamFusion and ControlNet. Project Miyagi is also interesting as a complement to the official Semantic Kernel documentation.

Semantic Kernel project

Given how much Microsoft has invested in Copilots and Copilot+ PCs, you would think that the Semantic Kernel project would get some serious resources. But no. In December 2023 the Semantic Kernel repo got over 100 commits a week; in June 2024, it has been getting about 30 commits a week. The core framework code seems to be progressing, especially the C# version, and the Python and Java code seems to be catching up, but the documentation and examples don’t seem to be getting much love despite being out of date.

Perhaps I’m seeing a normal development cycle for an open source project. There was a big spike in code additions and deletions in May 2024, similar to the spikes in April and October of 2023. It’s possible that the documentation and example writers have been waiting for the code to settle down before updating their parts of the project.

Or, possibly, Microsoft simply doesn’t care about the Semantic Kernel open-source project. Their internal efforts were enough to release lots of Copilots. As far as external development of AI applications goes, they may be content to let LangChain or LlamaIndex dominate the ecosystem rather than pushing their own Semantic Kernel, as long as developers use Azure or OpenAI services. Time will tell.

—

Bottom Line
Semantic Kernel can generate plans for accomplishing user requests, using LLMs and templates—a capability that differentiates it from other AI orchestrators. However, unless your heart is set on developing AI applications in C#, you may be better off working with LangChain or LlamaIndex at present. That may change in the future.

Pros

            A free open-source SDK that lets you build agents that can call your existing code.

            Supports C#, Python, and Java.

            Reasonably easy to learn and use, especially in C#.

            Can generate its own plans.

Cons

            Using planners is expensive (uses lots of AI tokens) and introduces noticeable delays for the user.

            The documentation and examples seem to be out-of-date or missing for Python and Java.

Cost

Free open source, MIT License.

Platform

C#, Python, and Java.