How to vibe code for free, or almost free

Tuesday October 28, 2025. 10:00 AM , from InfoWorld

Many developers haven’t yet crossed over to LLM-based code generation, whether it’s vibe coding, spec development, agentic engineering, or whatever your flavor may be. And if you’re not convinced, you’re probably not going to shell out real money for it just yet. Luckily, there are free and/or cheap ways to burn someone else’s GPUs to your own benefit.

Generally, there is no real privacy guarantee for free-to-use models, so use them only for your tire-kicker or open-source projects. In other words, don’t leak the code for your DoD-funded drone navigation system to China, or someone will probably be mad at you.

Totally free AI coding tools

Qwen Code

Using either Qwen’s CLI or LLxprt Code, you can register and authenticate with Alibaba using your Google account and spend a generous amount of tokens using the Qwen3-Coder model (Qwen3-Coder-480B). This model is pretty good, and with a 256K context length it’s a good place to start. The free tier is pretty generous.

Gemini

Using either Gemini CLI or LLxprt Code, you can authenticate with Google and use a paltry amount of their Gemini 2.5 Pro model before having to switch to the Gemini 2.5 Flash model. While Gemini purports to have a 1M token context length, real experience shows it doesn’t actually pay attention to all of it. One cool thing about the Gemini models is they can directly do PDF parsing and other such tricks.

OpenRouter

Not all the time, but OpenRouter frequently announces models that are free to use. Most recently, early versions of the next Grok Fast 2M context models (thinking and non-thinking) were released as “Sonoma Sky” and “Sonoma Dusk” for free. These were not impressive models. Free is probably more than I’d pay for them. Watch the OpenRouter announcements on Discord for more free model announcements.OpenRouter often has other free endpoints for better-known models. The tradeoff there is that they may use your data for training, and the service may cut off at random and be really unreliable. They are always on the slow side.

Cursor / Windsurf

Your mileage may vary, but the big IDEs offer custom and other occasionally free models. The catch is these are usually smaller, lighter models, and often the bigger models they offer get overwhelmed quickly.

Amp Free

I am reluctant to mention this one. Amp Free has its own CLI and then (from what I can tell) redirects you to different free models and uses your data for training. They show you ads in exchange for the service.

Bottom line

There are many other services with “free tiers,” but these are usually not enough to do more than a connection test. Qwen’s is the most generous. OpenRouter is worth watching for those moments when a great new model launches. Not to pitch my own tool, but LLxprt Code’s ability to switch between them in the same CLI is helpful and supports the auth of both Qwen and Gemini. Plus it lets you swap to a PAT (pre-acceptance test) without exiting, which is useful if you’re trying to code on the cheap.

Almost free AI coding tools

Look, when something is free, there is a reason. Spending even a little can get you a lot more.

The recent open-weight models from China have really opened the door to high-quality code generation. Given that the requirements for running some of these models are much lower than for traditional frontier models, and the results are often very close or arguably better, it may now be profitable to run a provider if you figure out the right formula for subscriptions vs. charging per token. Frankly, no one can afford to code in the long term by paying per token.

However, the subscription model opened up by Anthropic, combined with the models from China, has created a renaissance. You can now have Sonnet-like performance (and perhaps better) on the cheap. The options here are listed below, from lowest cost to highest.

Z.ai

Z.ai offers plans starting at $3 per month. Their current policies state they aren’t using your prompts for training but they might if they want to. Because I use these models to work on open-source software and crackpot ideas I’d never waste time coding by hand, it doesn’t bother me that they may be using my data for training. Z.ai’s GLM-4.6 model is really great, and with a 200K context window, it has become my go-to for coding. As Claude has become pretty unreliable in terms of quality, I’ve just switched to using GLM-4.6 for coding while letting GPT-5-Codex critique and plan. Note that the noise around GLM-4.6 has made Z.ai pretty slow and unreliable.

Chutes.ai

Chutes offers plans starting at $3 as well. With Chutes, you have access to many more models. Their performance is underwhelming. I would recommend Chutes over Z.ai just because you can use more models, but that’s the only reason. Their privacy policy is ambiguous. I had issues signing up; basically, authenticating with Google or GitHub isn’t working right now. Based on a recommendation from another user on their Discord, I used a VPN and created an account without using Google or GitHub, and it worked.

Synthetic

Synthetic offers a $20-per-month plan that includes GLM-4.6, Qwen3-480B, Kimi K2, DeepSeek-V3.1, and many other models. Synthetic is relatively new, and they’ve been super open and engaging on Discord. They have a very clear privacy policy: they do not store your prompts or completions beyond 14 days without your consent. Note that there are issues such as tool calls hanging (aka buffering) for a long time and, occasionally, leaking native tool calls or tool calls into the message stream rather than proper OpenAI-compatible calls. You can use a CLI like LLxprt Code or Code Puppy, or an IDE like Roo Code, that supports OpenAI endpoints. Synthetic is also developing their own coding agent called Octofriend.

Cerebras

Cerebras plans start at $50. I ripped them a new one in my last column, but they’ve greatly improved their overall TPM (tokens per minute) and request caps. Cerebras is now the fastest provider bar none. Their policies explicitly note they don’t use your prompts. Cerebras is the fastest Qwen3-Coder-480B you can access. While the Qwen3 model supports 256K context, the Cerebras version is limited to 131K. I’ve found that usable but challenging. However, I still have a subscription in my tool kit because it is the fastest access to one of the best open-weight models around. Cerebras has a free tier, but it is barely enough for connection testing.

On October 23, 2025, Cerebras announced they are deprecating Qwen3 and launching GLM-4.6 on the infrastructure in November. No word on the context size. This shows just how fast this stuff moves. The hot model they just launched is already being swapped out for a new one.

Bottom line

Among subscription services, Cerebras has the best policy, followed by Synthetic. (If you want the ultimate guarantee of data security, run your own model.) If you want to use the best open-weight model, that’s GLM-4.6 at the moment, and I’d pick Synthetic for that. If you want the fastest inference, then you cannot beat Cerebras. If you want the ability to switch between different models, then the choice is between Chutes and Synthetic. Pick Chutes for price and Synthetic for better reliability.

What about the OGs?

Anthropic Claude

Many have moved on from the venerable Claude Code. First, Anthropic keeps changing policies. How much do you get for what you pay? Second, the model behavior has been variable. Some days Opus is a genius and the very next day it seems completely daft. From personal experience, tasks I used to have Claude do almost daily it can no longer perform reliably. The “how is Claude working in this session” prompt hints at some kind of A/B testing. Anthropic has launched new models, including a new Haiku and Sonnet. No matter your experience, Claude Max/Pro is still one of the best bangs for the buck. You can use it directly with Claude Code, OpenCode, and LLxprt Code. It is priced from $20 to $200. Be sure to enable the privacy mode; Anthropic famously changed the default to using your data.

OpenAI Codex

Making it so you could use your ChatGPT subscription for the Codex CLI was a big change for OpenAI. Previously, no one really used Codex CLI, which was a poor CLI and expensive. When GPT-5 came out, it was slow and underwhelming. Now, GPT-5-Codex is one of the best models, especially if you turn reasoning up. It is, however, very slow. Codex CLI is still not anyone’s favorite, but it is much better than it used to be. You can subscribe for $20 to $200, which gets you both ChatGPT and Codex.

Everything will change next week

Anthropic changed the game with their Claude Pro/Max plans. Qwen changed things dramatically by releasing an open-weight model that was actually useful (Qwen3-Coder-480B). Cerebras was the first inference provider to launch a subscription plan for an open-weight model and has changed the horizon for what is possible in terms of performance. Yet shortly after, several other providers followed with subscription models that, while not as fast, have technical advantages. And Qwen3 has lost its throne in terms of model capability. It is a great time to feast on the excess now that the Claude party is over.

However, next month, expect new models and new subscription plans. Expect some of these providers to overload and fall over as they (like Z.ai) sign up more people than they can serve. Note that the same model can perform very differently between two different providers, and it isn’t just quantization. Test them all with short-term commitments and be prepared to switch at the drop of a dime or a hot new model on Hugging Face.