Mmojo.net

Human-first Generative AI.


Inference @ Home

#MeWriting Americans will soon face a choice: Get off the electric grid or do their inference at home. Let me define and explain.

The All-in Podcast for Friday, January 16, 2025 floated the first option. In order to free up electricity for data centers, homeowners in the United States would install solar panels and batteries over the next decade. The All-In option would cost each detached homeowner about $30K for a company to add solar and battery to the home’s roof. Perhaps we’ll see do-it-yourself or handyman kits appear at retailers like Lowes and Home Depot in the $10K range.

Please watch the full segment — about 17 minutes. Fans of the podcast and the “besties” will appreciate this for what it is: a trial balloon from investors, industry, and government. All four were pitching the approach, ignoring obvious pitfalls:

  1. Multi-family and high-density buildings. Two thirds of United States households live in detached homes. So, one third will have the coordination problem of who installs solar and batteries.
  2. Suboptimal rooflines. Call south facing roofs the 100% efficiency baseline. East/west facing roofs operate from 75%-90%. North facing roofs operate from 45%-70%. Solar efficiency was not a consideration in building orientation for most existing homes.
  3. Maintenance and repair. When local equipment or transmission wire needs to be replaced, the replacement costs is spread over many rate-payers. When the panel or battery on your home goes bad, replacement cost is spread over you, or a warranty. There’s likely no routing around the problem to keep service available, unless you’ve invested a multiple of base system cost for local redundancy.
  4. Regulation. Most states do not currently have regulation favorable to single family homeowners buying from and selling to the grid. Many homeowners who would have been able to afford to buy systems outright have ended up in more expensive leasing situations solely for regulatory compliance associated with not being completely off-grid. With telephone deregulation in the early 1980s, we solved this problem by letting customers plug whatever compliant equipment they wanted into the phone network. We are almost 50 years behind solving this basic problem with the electric grid. Few people are discussing it.

What Problem are We Solving?

Let’s stop a moment and appreciate that there is a problem that needs to be solved. The artificial intelligence segment of the tech industry wants to build-out inference capacity in new data centers. Inference capacity is the ability to ask “AI” questions and get responses. This is usually in the form of chat or so-called “agentic” workflows. The bigger the models and the more users using them, the more compute (CPU or GPU), memory (RAM and disk), and power needed to provide the service. Let’s leave out diffusion, which is used for images, sounds, and video. Let’s also set aside network bandwidth concerns. For text inference, it’s negligible.

The states have already entered the discussion on resource allocation. Florida, led by its staunchly conservative Governor, Ron DeSantis, is saying no to land use, environmental impacts, and grid prioritization for new AI data centers. Politically, this should surprise everybody and simultaneously, surprise nobody. DeSantis is specifically questioning the need for centralized inference, even poo-pooing the “don’t let the Chinese beat us” narrative driving data center buildout.

The recently departed Scott Adams was both the creator of the Dilbert comic strip and a recent popularizer of the persuasion lens. Through that lens, we can see that dramatically boosting capacity or radically reallocating usage of the electric grid is an example of selling past the sale. The real sale is inference capacity scaled beyond imagination. We are not talking about whether that is needed. Spoiler alert: it’s not. We are talking, instead, about how to provide enough power to do it.


Are We Solving the Right Problem?

I told you that inference scaled out by data centers is not needed. For two years now, I have helped my clients and customers use small large language models (LLMs) running on their laptops or inexpensive appliances for chat. I coined a phrase that these models feel just as knowledgeable, but less annoyingly loquacious than large, popular cloud models. They’re not quite so fast either. They generate answers a little faster than you can read them rather than spitting out a page of text in an instant.

A common response to my message should be quite flattering to me: “Brad, stick to comedy.” I make this whimsical because it is absurd. When I’ve dug deeply into real people’s embrace of cloud chat, I have found that the illusion of intelligence is very important to them. It’s easy to believe that some giant machine in the cloud is “intelligent”. It is not easy to believe that an appliance computer the size of a deck of cards is “intelligent”. Both provide similarly useful answers for let’s call it 19/20 questions they’ll ask. But they want the illusion of intelligence provided by a far-away computer they will never see. That illusion you crave might cost you $30K in installation this next decade and a lifetime of maintenance headaches and worry. See the All-In Podcast trial balloon above.

Inference is not just chat. My software, Mmojo Server, provides an OpenAI compatible application program interface (API). This makes it possible for developers working on AI applications to use a private, local Mmojo Server rather than a cloud system as the AI backend to their products. One big advantage during the development phase is that they don’t pay a cloud provider for tokens. They pay for availability and capacity of a Mmojo Server. They might pay a cloud provider tens of thousands of dollars during development for what they can run for free on their laptops or package into a fast, local stand-alone server for under $2K. Developers can also eliminate a problem called “drift” — where the model changes — using a fixed, local LLM instead of the cloud. Mmojo Server has developers from companies you’ve heard of using it for both AI wrapper and agentic application development. It’s not theory or vision. It’s real.


Alternative Approach

I have a better idea than convincing all United States homeowners to spend $30K on solar and battery. What if, instead, homeowners spend $300 or $3000 on inference at home? The hardware is inexpensive and reliable. The software already exists. The application protocols are well defined and in use. My own system, the Mmojo Knowledge Appliance, is plug and play with zero configuration. Plug it into the wall for power and your router for connectivity. It is instantly available for use by any computer or device on your home network. Should it break, order another one and plug it in, just like any other small appliance in your home.

Mmojo Knowledge Appliance

I’ve built these for paying customers with inexpensive Raspberry Pi devices. If your tastes for inference tend more to race car performance, I can build you one using, for example, a Framework Desktop computer with an AMD Ryzen AI+ CPU/GPU.

Side note: That appliance at the left consumes about $4 per month if run full throttle 24/7 on grid electricity priced at California regulated peak consumer rates. A typical heavy user might spend $0.50/month at those billing rates.

Over the past two decades, the biggest reason that software has moved to the cloud is monetization. Tech companies can put a meter on your usage of software and force you to pay. There are secondary “benefits” like no installation or required maintenance and upgrades by technically challenged users. Presented with the costs of going “all-in” on cloud inference, maybe we should reconsider the appropriateness of that model.

I have a working name for this approach: Inference @ Home. If this approach interests you, please message me on LinkedIn or drop me an email. I have several ways you can participate in this mission, ranging from using the Mmojo Server software to sponsoring my work. Let’s talk! -Brad


I would appreciate your reactions and comments on my LinkedIn repost.