burntai.com

RSS

Building a Personal AI Infrastructure from Scratch

May 11, 2026 · 2 min read

When most people talk about running AI at home, they mean a ChatGPT subscription and maybe a local Ollama install. This is something different — a full homelab stack built ground-up around the idea that AI should be a first-class operational layer, not an afterthought.

Here's what exists today.

The Hardware

Three mini PCs running Proxmox in a cluster — an AMD Ryzen 7840HS and two Intel i7-13620H nodes, each with 32GB DDR5 and dual 2.5GbE. They run Ceph across all three for replicated storage and live VM migration. A fourth machine, a full tower with an i9-10900KF and RX 6900 XT, handles GPU workloads — image generation, speech transcription, and large language models locally.

An OPNsense gateway on an N100 handles dual-WAN (fiber + LTE failover), VLAN segmentation, and IDS. A Ubiquiti switch and U6+ AP handle the wireless side.

Everything is on a UPS. There's a DR backup node with 4TB that syncs to Google Drive nightly.

The Software Stack

All services run as LXC containers on the Proxmox cluster. PostgreSQL for app databases, with a hot standby replica. PowerDNS for internal DNS with a .clydestack domain. HashiCorp Vault for secrets and SSH OTP authentication fleet-wide. Prometheus and Grafana for monitoring across 26 targets. A centralized syslog server with an LLM-powered log analyst that sends a morning digest.

Sentinel monitors the network — DHCP anomalies, Suricata alerts, port drift, unknown MACs — and fires Telegram alerts.

The AI Layer

This is the interesting part.

A command center called ClydeNexus sits at the top — a chat interface backed by Claude that has tool access to the entire fleet. Ask it to restart a service, deploy code, check logs, or investigate an alert and it does it. Every host runs a small agent (cc_agent) that executes tasks dispatched from the center.

A personal AI server (PAI1) handles the family-facing side — Gmail, Google Calendar, Google Drive, Photos integrations, all through MCP servers. There's a three-tier LLM routing system: simple queries go to a local Mistral, standard queries to a local Qwen 32B, complex ones to the Claude API.

Ollama runs on the GPU node — Qwen 32B, image generation with SDXL, speech transcription with Whisper.

What's Public

This blog. A job tracker at jobs.burntai.com. A family portal behind Tailscale.

Why

Not to save money. Not to avoid cloud services. The reason is that running your own infrastructure teaches you things that using managed services doesn't. Every incident — a kernel upgrade that silently disabled database routing, a VLAN migration that left ghost flows in the security database — is a real lesson.

The AI layer makes it genuinely useful rather than just educational. The fleet watches itself, alerts on anomalies, and can respond to natural language ops requests at 2am without needing to remember which host runs which service.

It's not finished. It probably never will be.

← all posts

← olderBuilding A Blog