LLM / large language model
A text model that can chat, write, summarize, translate, and help with code. Examples include Llama, Qwen, and Mistral.
AI Workstations in Estonia
Everything you need to know about local AI builds — written for newcomers.
Why this matters
Running AI locally can keep model prompts and files on your own machine instead of sending them to a third-party API. For developers and researchers who run models regularly, local hardware can make cost control easier than API-only usage at scale. You also get low-latency inference and the option to work offline when your model and tools support it.
The catch is that model weights need to fit in VRAM for fast inference. A machine built for gaming will often bottleneck badly on AI workloads. The builds here are chosen so that VRAM, system memory, storage speed, and cooling are matched to your intended workload — not just the cheapest part that fits.
Beginner glossary
A text model that can chat, write, summarize, translate, and help with code. Examples include Llama, Qwen, and Mistral.
A rough measure of model size. Larger models are usually more capable, but need more VRAM and run slower.
Compressing a model so it fits in GPU memory. A 4-bit model uses much less VRAM and is more practical on a local PC.
A small piece of text, roughly a word or part of a word. Speed is often measured in tokens per second.
A way to fine-tune a model without retraining the whole thing. It needs more RAM, cooling, and stability than just chatting.
GPU software platforms. NVIDIA uses CUDA, AMD uses ROCm. CUDA is currently easier and more widely supported.
Before you buy
The easiest starting point is the Local LLM profile or a macOS-based system. If you only want chat, document work, or a coding assistant, you do not need a multi-GPU workstation. If you plan to fine-tune later or run 70B+ models seriously, choose a more powerful and flexible system.
VRAM is memory on your GPU. Most of a model's weights need to fit in VRAM for fast inference. A rough rule: a 4-bit quantized model needs about 0.5 GB per billion parameters — so a 7B model needs ~4 GB, a 13B needs ~8 GB, and a 70B model needs ~40 GB. When VRAM runs out, layers spill to CPU RAM, which is 5–10× slower.
For 7B models, 8-12GB VRAM is often enough. For 13B models, 12-16GB is more comfortable. For 20B-34B models, 16-24GB is a good class. For 70B models, look at 24GB+, multi-GPU, or workstation systems. If you want the machine to last, extra VRAM is usually more valuable than a small CPU upgrade.
Running a model means using an existing model for chat, coding, writing, or document work. Fine-tuning means adapting a model to your data or style. Fine-tuning needs more RAM, more stable cooling, and often more storage than basic local chat.
Local LLM Inference: daily 7B–70B model use, best VRAM per dollar. LLM Fine-Tune Starter: LoRA adapters and custom training runs, needs more system RAM and stable long-session cooling. Hybrid AI + Gaming: AI development during the day, gaming at night. When in doubt, start with Local LLM Inference.
When direct checkout is available, you pay the listed order price through Stripe. We then check availability and pricing for compatible parts from Estonian retailers, confirm any practical substitutions before continuing, and assemble the system with local model software setup. Quote-only systems are reviewed manually before payment.
No. Browsing builds and the catalog is fully public. You only need an account to place a paid order.
NVIDIA is the safer choice: CUDA is the industry standard and almost all AI software works with it out of the box. AMD cards often offer more VRAM for the money, but ROCm support is less mature and some tools need extra setup. If you want everything to just work, pick NVIDIA. If you're comfortable tinkering and want more VRAM per euro, AMD is worth considering.
Partly. Gaming PCs are tuned for high frame rates, but AI needs a lot of VRAM to hold large models. Most gaming cards top out at 8–12 GB VRAM, which limits which model sizes you can run. The AI-specific builds here pick cards based on maximum VRAM and AI throughput, not gaming benchmark scores.
It depends on your hardware, model, and setup. A good GPU (e.g. RTX 4090) can hit 50–100 tokens per second on 7B models, while larger models are slower. The main advantage is not just raw speed; local runs can improve privacy, cost control, and offline access because they do not incur per-query API fees.
Ollama is the easiest starting point: install it, pull a model with 'ollama pull llama3', and start chatting. Open WebUI gives you a ChatGPT-style web interface on top. For LLMLab.ee builds, the planned workflow is to set up the relevant software before handover so getting started is simpler.
The most important thing is the workload: local chat, fine-tuning, image generation, gaming, or sharing the machine with a team. Also think about noise, power use, physical size, and upgrade path. The cheapest build can be a good start, but too little VRAM quickly limits which models you can use.
Decision guide
Question 1 of 3
More questions? Read about how it works.