LLMLab.ee

AI Workstations in Estonia

Frequently Asked Questions

Everything you need to know about local AI builds — written for newcomers.

Why this matters

Why AI-ready builds?

Running AI locally can keep model prompts and files on your own machine instead of sending them to a third-party API. For developers and researchers who run models regularly, local hardware can make cost control easier than API-only usage at scale. You also get low-latency inference and the option to work offline when your model and tools support it.

The catch is that model weights need to fit in VRAM for fast inference. A machine built for gaming will often bottleneck badly on AI workloads. The builds here are chosen so that VRAM, system memory, storage speed, and cooling are matched to your intended workload — not just the cheapest part that fits.

Beginner glossary

Core terms in plain language

LLM / large language model

A text model that can chat, write, summarize, translate, and help with code. Examples include Llama, Qwen, and Mistral.

Parameters: 7B, 13B, 70B

A rough measure of model size. Larger models are usually more capable, but need more VRAM and run slower.

Quantization

Compressing a model so it fits in GPU memory. A 4-bit model uses much less VRAM and is more practical on a local PC.

Token

A small piece of text, roughly a word or part of a word. Speed is often measured in tokens per second.

LoRA / QLoRA

A way to fine-tune a model without retraining the whole thing. It needs more RAM, cooling, and stability than just chatting.

CUDA / ROCm

GPU software platforms. NVIDIA uses CUDA, AMD uses ROCm. CUDA is currently easier and more widely supported.

Before you buy

What to know before choosing

  • Start with the use case, not the component: chat, coding, image generation, fine-tuning, gaming, or team use.
  • VRAM decides which model size is comfortable. System RAM helps when tools, datasets, or partial model offload get larger.
  • If you want the simplest experience, prefer the NVIDIA + CUDA path. AMD can offer more VRAM per euro, but needs more setup tolerance.
  • Do not buy by GPU name alone. Case airflow, cooling, PSU, and motherboard choice also matter under sustained load.
  • If you do not know your model size yet, a flexible 16-24GB VRAM class is usually safer than the absolute cheapest option.
  • Mac is good for quiet, simple local use, but a PC is usually more practical for CUDA-based workflows.

If I am completely new, where should I start?

The easiest starting point is the Local LLM profile or a macOS-based system. If you only want chat, document work, or a coding assistant, you do not need a multi-GPU workstation. If you plan to fine-tune later or run 70B+ models seriously, choose a more powerful and flexible system.

What is VRAM and why does it matter for AI?

VRAM is memory on your GPU. Most of a model's weights need to fit in VRAM for fast inference. A rough rule: a 4-bit quantized model needs about 0.5 GB per billion parameters for weights, but KV cache, context length, runtime, and batch size add overhead. That means a 7B model often needs ~4 GB+, a 13B needs ~8 GB+, and a 70B model needs ~40 GB+. When VRAM runs out, layers spill to CPU RAM, which is 5-10x slower.

How much VRAM should I choose?

For 7B models, 8-12GB VRAM is often enough. For 13B models, 12-16GB is more comfortable. For 20B-34B models, 16-24GB is a good class. For 70B models, look at 48GB+ workstation GPUs or a custom multi-GPU quote; 24/32GB consumer cards are better described as offload experiments, not guaranteed comfortable use.

What is the difference between running a model and fine-tuning?

Running a model means using an existing model for chat, coding, writing, or document work. Fine-tuning means adapting a model to your data or style. Fine-tuning needs more RAM, more stable cooling, and often more storage than basic local chat.

Which build profile is right for me?

Local LLM Inference: daily 7B/13B use and selected larger experiments on higher-memory tiers. LLM Fine-Tune Starter: LoRA adapters with more RAM and stable cooling. Hybrid AI + Gaming: one machine for work and play. When in doubt, start with Local LLM Inference.

How does ordering work?

When direct checkout is available, you pay the listed order price through Stripe. Direct checkout opens only with verified pricing inputs. Planning reference prices and quote-only systems are not live order prices; they are reviewed manually before payment.

Do I need an account to browse?

No. Browsing builds and the catalog is fully public. You only need an account to place a paid order.

NVIDIA vs AMD — which is better for AI?

NVIDIA is usually the safer choice because CUDA has the broadest support across mainstream AI tools and often needs less setup. AMD cards can offer more VRAM for the money, but ROCm support is less mature and some tools need extra validation. If you want the lowest setup risk, pick NVIDIA. If you're comfortable testing compatibility and want more VRAM per euro, AMD is worth considering.

Can I use a regular gaming PC for local AI?

Partly. Gaming PCs are tuned for high frame rates, but AI needs a lot of VRAM to hold large models. Most gaming cards top out at 8–12 GB VRAM, which limits which model sizes you can run. The AI-specific builds here pick cards based on maximum VRAM and AI throughput, not gaming benchmark scores.

How fast is local AI compared to ChatGPT?

It depends on your hardware, model, and setup. A good GPU (e.g. RTX 4090) can hit 50–100 tokens per second on 7B models, while larger models are slower. The main advantage is not just raw speed; local runs can improve privacy, cost control, and offline access because they do not incur per-query API fees.

What software do I need to get started?

Ollama is the easiest starting point: install it, pull a model with 'ollama pull llama3', and start chatting. Open WebUI gives you a ChatGPT-style web interface on top. For LLMLab.ee builds, the planned workflow is to set up the relevant software before handover so getting started is simpler.

What should I know before buying?

The most important thing is the workload: local chat, fine-tuning, image generation, gaming, or sharing the machine with a team. Also think about noise, power use, physical size, and upgrade path. The cheapest build can be a good start, but too little VRAM quickly limits which models you can use.

Decision guide

Which one should I pick?

Question 1 of 5

What is your main use case?

Pick the one that best describes your primary goal.

More questions? Read about how it works.