Question 1

If I am completely new, where should I start?

Accepted Answer

The easiest starting point is the Local LLM profile or a macOS-based system. If you only want chat, document work, or a coding assistant, you do not need a multi-GPU workstation. If you plan to fine-tune later or run 70B+ models seriously, choose a more powerful and flexible system.

Question 2

What is VRAM and why does it matter for AI?

Accepted Answer

VRAM is memory on your GPU. Most of a model's weights need to fit in VRAM for fast inference. A rough rule: a 4-bit quantized model needs about 0.5 GB per billion parameters — so a 7B model needs ~4 GB, a 13B needs ~8 GB, and a 70B model needs ~40 GB. When VRAM runs out, layers spill to CPU RAM, which is 5–10× slower.

Question 3

How much VRAM should I choose?

Accepted Answer

For 7B models, 8-12GB VRAM is often enough. For 13B models, 12-16GB is more comfortable. For 20B-34B models, 16-24GB is a good class. For 70B models, look at 24GB+, multi-GPU, or workstation systems. If you want the machine to last, extra VRAM is usually more valuable than a small CPU upgrade.

Question 4

What is the difference between running a model and fine-tuning?

Accepted Answer

Running a model means using an existing model for chat, coding, writing, or document work. Fine-tuning means adapting a model to your data or style. Fine-tuning needs more RAM, more stable cooling, and often more storage than basic local chat.

Question 5

Which build profile is right for me?

Accepted Answer

Local LLM Inference: daily 7B–70B model use, best VRAM per dollar. LLM Fine-Tune Starter: LoRA adapters and custom training runs, needs more system RAM and stable long-session cooling. Hybrid AI + Gaming: AI development during the day, gaming at night. When in doubt, start with Local LLM Inference.

Question 6

How does ordering work?

Accepted Answer

When direct checkout is available, you pay the listed order price through Stripe. We then check availability and pricing for compatible parts from Estonian retailers, confirm any practical substitutions before continuing, and assemble the system with local model software setup. Quote-only systems are reviewed manually before payment.

Question 7

Do I need an account to browse?

Accepted Answer

No. Browsing builds and the catalog is fully public. You only need an account to place a paid order.

Question 8

NVIDIA vs AMD — which is better for AI?

Accepted Answer

NVIDIA is the safer choice: CUDA is the industry standard and almost all AI software works with it out of the box. AMD cards often offer more VRAM for the money, but ROCm support is less mature and some tools need extra setup. If you want everything to just work, pick NVIDIA. If you're comfortable tinkering and want more VRAM per euro, AMD is worth considering.

Question 9

Can I use a regular gaming PC for local AI?

Accepted Answer

Partly. Gaming PCs are tuned for high frame rates, but AI needs a lot of VRAM to hold large models. Most gaming cards top out at 8–12 GB VRAM, which limits which model sizes you can run. The AI-specific builds here pick cards based on maximum VRAM and AI throughput, not gaming benchmark scores.

Question 10

How fast is local AI compared to ChatGPT?

Accepted Answer

It depends on your hardware, model, and setup. A good GPU (e.g. RTX 4090) can hit 50–100 tokens per second on 7B models, while larger models are slower. The main advantage is not just raw speed; local runs can improve privacy, cost control, and offline access because they do not incur per-query API fees.

Question 11

What software do I need to get started?

Accepted Answer

Ollama is the easiest starting point: install it, pull a model with 'ollama pull llama3', and start chatting. Open WebUI gives you a ChatGPT-style web interface on top. For LLMLab.ee builds, the planned workflow is to set up the relevant software before handover so getting started is simpler.

Question 12

What should I know before buying?

Accepted Answer

The most important thing is the workload: local chat, fine-tuning, image generation, gaming, or sharing the machine with a team. Also think about noise, power use, physical size, and upgrade path. The cheapest build can be a good start, but too little VRAM quickly limits which models you can use.

Frequently Asked Questions

Why AI-ready builds?

Core terms in plain language

LLM / large language model

Parameters: 7B, 13B, 70B

Quantization

Token

LoRA / QLoRA

CUDA / ROCm

What to know before choosing