Bigger Is Just Bigger (and More Expensive)

Mmojo.net

Human-first Generative AI.

Bigger Is Just Bigger (and More Expensive)

Written by

#MeWriting There are two main drawbacks with OpenClaw. They both stem from running the LLM that makes all the agentic magic happen in the cloud. (1) You give up your privacy and autonomy. (2) You pay for tokens. Tech people and young people in the United States seem curiously (to me) willing to ignore (1). But when a bill comes due, or you can see your token usage topping $100/day for automations you’re not quite willing to bet your business on, you might start giving the cloud a second thought.

I’ve been playing with OpenClaw since mid-January, when it was Clawdbot, then Moltbot, then finally renamed to OpenClaw. I have not purchased a single cloud token, instead focusing my efforts on configuring it for private and local use with my Mmojo Server LLM server. I watched as Google turned off non-metered use and Anthropic blocked OpenCode and recognized that a private, local, LLM would become a necessary option. And then I saw the bills people were reporting. I saw some influencers invest in expensive LLM rigs, like a Mac Studio cluster. It seemed excessive for the problem at hand.

The problem at hand is running a model that can perform the three phases of agentic LLM inference well enough to make OpenClaw work well. American (a.k.a. “real”) football fans might be familiar with the three phases of football: offense, defense, and special teams. You can think of agentic LLM inference similarly: content creation, running automations, and creating automations.

4B parameter LLMs like Google’s Gemma 4B are pretty wonderful at content creation. They feel about as generally knowledgeable, but not as annoyingly loquacious as multi-hundred billion parameter cloud models, so let’s call that phase solved for private and local. In fact, small models will do better than large models conversationally.

Creating automations with OpenClaw is a mess with the cloud models. To be honest, this isn’t so much an LLM problem as it is a content engineering problem. The system prompt and AGENT.md file that ship with OpenClaw are a contradictory mess wrapping unsafe tools and ambiguous skills. Reliably getting the weather skill to work as it looks like it was intended has been a crapshoot for everyone, even with the cloud LLMs doing the reasoning. I don’t expect a small LLM to be better, but I also don’t expect it to be less useful at exploring potential automations. Give me a candidate, let me see it action, maybe I’ll ask for another, or maybe I’ll refine the candidate to a workable automation.

This leaves us with running automations. Is it possible to create one with a reasonably sized list of natural language conditions and instructions that will run reliably and not veer off course too often? That’s the question for a small LLM. I think I’ve found a line in the sand with the reccently release Qwen3.5 model:

Qwen3.5 9B quantized to 8 bits (q8_0) runs the automations I’ve created reliably.
Qwen3.5 9B quantized down to 5 bits (q5_K_M) runs them pretty well, sometimes needs a bit more clarification in instructions.
Qwen3.5 9B quantized down to 4 bits (q4_K_M) makes a lot of mistakes.
Qwen3.5 4B quantized to 8 bits (q8_0) doesn’t feel usable.

The two automations I’ve focused on are information gathering and reporting: weather for Minden, NV from http://wttr.in and a news headline summary from KVTN news in Reno. These seem simple on the surface, but are surprisingly complex. I’ll write another article about them.

Assuming I’ve found that line in the sand, what are the implications?

You can comfortably run OpenClaw and Mmojo Server on fairly modest devices.
No, this model won’t run fast enough on a Raspberry Pi to make OpenClaw feel usable.
You can run it on an NVIDIA Jetson Orin Nano — $250 for the developer kit board and power supply.
You can run it on a Mac Mini M4 with 16GB RAM — $599 MSRP.
You don’t need a Mac Studio cluster at $10K/node for developing your automation solution.
More VRAM or shared memory for the GPUs will allow simultaneous LLM queries.

Most importantly:

You should not pay for cloud tokens to run OpenClaw.

I sell a service to turn your Mac Mini into a Mmojo Agent Appliance running Mmojo Server and OpenClaw. I’ve added some features to the stack which will help you see how the sausage is made as you try to automate. I include scripts to backup and restore OpenClaw workspace environments quickly, so you can experiment and rollback bad experiments with no effort.

Mmojo Agent Appliance — Send me your Mac Mini. I’ll convert it for $250. And… send it back to you!

If you have a Windows 11 PC or laptop with 16 GB of RAM and an NVIDIA GPU with at least 4 GB VRAM, I invite you to install Mmojo Server and OpenClaw securely on your laptop using Windows Subsystem for Linux. I have easy to follow instructions here:

Not only do I invite you install these, I personally challenge you to install and use them! You will learn a lot about LLMs and OpenClaw in the process. Most important, you will learn that a small LLM can power OpenClaw, and that you don’t have to pay for cloud tokens.

Picture is my dog Mona on her 8th birthday a couple days ago. I should probably be working on her yard instead of finding the line in the sand for small, local, private LLMs and OpenClaw. Her loss. Your win!