Why bother?
After discovering ChatGPT in late 2022, I wanted to run LLMs locally. Cloud APIs are fine, but I wanted control over the models and data, local inference for privacy, no API costs, and a platform for understanding how these systems actually work under the hood.
Also, my wife is a neat freak who is always organizing the house. I channel the same impulse into organizing computing infrastructure. Different targets, same compulsion.
The hardware
I built this from spare parts:
- 4-core AMD 2.8 GHz (modest but functional)
- 16GB RAM
- Collection of mismatched drives
- Old tower case pulled out of the closet
- Proxmox hypervisor for VM and container management
Some of the parts had rough lives. I rescued a water-damaged SSD and a memory stick. (Dry them thoroughly. They often work.) I used ZFS for storage, partly for redundancy and partly because I wanted to learn it. The whole thing is held together with duct tape and determination.
What runs on it
LLM experimentation. Llama models with 4-bit quantization. Slow on CPU, but functional for learning. Costs nothing per token. Complete privacy.
Statistical simulations. R and Python environments for thesis work. Bootstrap simulations that run for days. Better than tying up my main machine while I wait for a few million resamples.
Standard services. File server (Samba plus encrypted rclone to Google Drive). VPN for remote access. RStudio Server so I can write and analyze from anywhere. Apache Guacamole for web-based VM access.
Reality check
Performance. CPU-only LLM inference is slow. I am not going to pretend otherwise. But it works, and I learn things about inference, quantization, and memory management that I would never learn from an API.
Reliability. Good enough for experimentation. Not for production. I do not run anything critical on spare parts.
Power. Usually underutilized. But when I need to run week-long simulations or experiment with quantized models, it earns its keep.
What you learn
Running models locally teaches you things cloud APIs hide:
- How quantization actually works
- Memory versus compute tradeoffs
- What “inference” means at the hardware level
- Why context windows have limits
No API costs, no data leaving my network, complete control over model selection.
The takeaway
You do not need a $10k GPU server to experiment with LLMs. A pile of spare parts and Proxmox gets you surprisingly far. The cloud is convenient, but there is value in understanding the substrate, actual hardware running actual compute.
And if you are doing long-running statistical simulations for research, a home lab beats monopolizing your desktop for days.
Discussion