How RamaLama helps make AI model testing safer

Thursday March 27, 2025. 10:00 AM , from InfoWorld

Consider this example: An amazing new software tool emerges suddenly, turning technology industry expectations on their heads by delivering unprecedented performance at a fraction of the existing cost. The only catch? Its backstory is a bit shrouded in mystery and it comes from a region that is, for better or worse, in the media spotlight.

If you’re reading between the lines, you of course know that I’m talking about DeepSeek, a large language model (LLM) that uses an innovative training technique to perform as well as (if not better than) similar models for a purported fraction of the typical training cost. But there are well-founded concerns around the model, both geopolitical (the startup is China-based) and technological (Was its training data legitimate? How accurate is that cost figure?).

Some might say that the various concerns around DeepSeek, many of which start on the privacy side of the coin, are overblown. Others, including organizations, states, and even countries, have banned downloads of DeepSeek’s models.

Me? I just wanted to test the model’s crazy performance claims and understand how it works—even if it had bias, even if it was kind of weird, even if it was indoctrinating me into its subversive philosophy (that’s a joke, people). I was willing to take the risk to see how DeepSeek’s advances might be used today and influence AI moving forward. With that said, I certainly didn’t want to download DeepSeek to my phone or to any other network-connected device. I didn’t want to sign up to their service, give them my credentials, or leak my prompts to a web service.

So, I decided to run the model locally using RamaLama.

Spinning up DeepSeek with RamaLama

RamaLama is an open source project that facilitates local management and serving of AI models through the use of container technology. The RamaLama project is all about reducing friction in AI workflows. By using OCI containers as the foundation for deploying LLMs, RamaLama aims to mitigate or even eliminate issues related to dependency management, environment setup, and operational inconsistencies.

Upon launch, RamaLama inspects your system for GPU support. If no GPUs are detected it falls back to CPUs. RamaLama then uses a container engine such as Podman or Docker to download an image that includes all of the software necessary to run an AI model for your system’s setup. Once the container image is in place, RamaLama pulls the specified AI model from a model registry. At this point, it launches a container, mounts the AI model as a data volume, and starts either a chatbot or a REST API endpoint (depending on what you want).

A single command!

That part still makes me super-excited. So excited, in fact, that I recently sent an email to some of my colleagues encouraging them to try it for themselves as a way to (more safely and easily) test DeepSeek.

Here, for context, is what I said:

I want to show you how easy it is to test deepseek-r1. It’s a single command. I know nothing about DeepSeek, how to set it up. I don’t want to. But I want to get my hands on it so that I can understand it better. RamaLama can help!

Just type:

ramalama run ollama://deepseek-r1:7b

When the model is finished downloading, type the same thing you typed with granite or merlin and you can compare how they perform by looking at their results. It’s interesting how DeepSeek tells itself what to include in the story before it writes the story. It’s also interesting how it confidently says things that are wrong