LLMLab.ee

AI Workstations in Estonia

Experimental / Advanced

Mac + External GPU AI

Apple Silicon Macs paired with external NVIDIA/AMD GPUs for AI compute workflows. For AI compute only — not gaming or macOS graphics acceleration.

Important warning

External GPUs on Apple Silicon Macs are for AI compute workflows only. They do not accelerate macOS graphics, gaming, displays, Final Cut, or Blender viewport rendering.

Apple's official eGPU support is Intel-Mac-only. Apple Silicon support depends on TinyGPU/tinygrad-style AI compute drivers.

Advanced

Mac Studio M4 Max + RTX 6000 Ada eGPU AI Compute

Mac: Mac Studio M4 Max 128GB / 2TB (128GB unified)

eGPU: OWC Helios FX 850W

GPU: NVIDIA RTX 6000 Ada (48GB VRAM, Ada Lovelace)

70B needs serious memory tradeoffs

70B-class models depend heavily on VRAM/RAM, quantization, and context length.

Supported:

  • Large-model experiments (70B-class)
  • CUDA/tinygrad research
  • parallel model serving experiments
  • advanced AI development

Not supported:

  • Gaming acceleration
  • macOS display acceleration
  • Final Cut acceleration
  • Blender viewport rendering

For AI compute only. Depends on third-party TinyGPU/tinygrad driver support.

Mac: €4924

Enclosure: €449

Mac unified memory and eGPU VRAM are separate runtime paths. Use MLX/Ollama on Apple Silicon and validate CUDA/tinygrad eGPU acceleration per workload.

Advanced

Mac Studio M2 Max Native MLX Workstation (Optional eGPU Path)

Mac: Mac Studio M2 Max 64GB / 1TB (64GB unified)

eGPU: Open-frame PCIe Riser (ATX PSU)

GPU: NVIDIA RTX 4090 (24GB VRAM, Ada Lovelace)

Better for 30B-class models

Stronger fit for larger quantized models; actual fit depends on runtime and settings.

Supported:

  • Native MLX/Ollama local inference
  • macOS AI development
  • LLM experimentation

Not supported:

  • CUDA-based training
  • multi-GPU workloads
  • high-VRAM model serving

Primary workloads run natively on Apple Silicon via MLX/Ollama. External GPU compute is an optional experimental path and is not included as a fixed total price.

Mac: €2599

Enclosure: €49

Mac Studio M2 Max with 64GB unified memory for native macOS AI, plus an optional RTX 4090/eGPU path for CUDA/tinygrad experimentation.

Experimental

Mac mini M4 + RTX 6000 Ada eGPU AI Compute

Mac: Mac mini M4 24GB / 512GB (24GB unified)

eGPU: Sonnet Breakaway Box 750ex

GPU: NVIDIA RTX 6000 Ada (48GB VRAM, Ada Lovelace)

Best for 7B/8B models

Good starting point for chat and coding assistants; larger models need more memory.

Supported:

  • Local LLM inference
  • tinygrad experiments
  • CUDA-based AI workloads
  • high-VRAM AI testing

Not supported:

  • Gaming acceleration
  • macOS display acceleration
  • Final Cut acceleration
  • Blender viewport rendering

For AI compute only. Depends on third-party TinyGPU/tinygrad driver support. External GPUs on Apple Silicon Macs do not accelerate macOS graphics, gaming, or displays.

Mac: €1199

Enclosure: €349

Mac mini M4 + external RTX 6000 Ada (48GB VRAM) via TinyGPU/tinygrad for CUDA AI compute. Uses a 2-slot workstation GPU that fits the listed enclosure.

Experimental

Mac mini M4 Pro + RTX 6000 Ada eGPU AI Compute

Mac: Mac mini M4 Pro 48GB / 1TB (48GB unified)

eGPU: Sonnet Breakaway Box 750ex

GPU: NVIDIA RTX 6000 Ada (48GB VRAM, Ada Lovelace)

Good for 13B-class models

Strong everyday local LLM tier; 30B may need more memory or heavier quantization.

Supported:

  • Local LLM inference experiments
  • tinygrad/CUDA validation
  • high-VRAM model testing

Not supported:

  • Gaming acceleration
  • macOS display acceleration
  • Final Cut acceleration
  • Blender viewport rendering

For AI compute only. Depends on third-party TinyGPU/tinygrad driver support.

Mac: €1999

Enclosure: €349

Mac unified memory and eGPU VRAM are separate runtime paths. Use MLX/Ollama on Apple Silicon and validate CUDA/tinygrad eGPU acceleration per workload.

Experimental

Mac mini M4 + Radeon PRO W7900 eGPU (ROCm)

Mac: Mac mini M4 24GB / 512GB (24GB unified)

eGPU: Sonnet Breakaway Box 750ex

GPU: AMD Radeon PRO W7900 (48GB VRAM, RDNA 3)

Best for 7B/8B models

Good starting point for chat and coding assistants; larger models need more memory.

Supported:

  • Local LLM inference via ROCm
  • tinygrad experiments
  • AMD RDNA3+ AI workloads

Not supported:

  • Gaming acceleration
  • macOS display acceleration
  • CUDA workloads
  • Final Cut acceleration

For AI compute only via ROCm/tinygrad. AMD eGPU support is less mature than NVIDIA CUDA.

Mac: €1199

Enclosure: €349

AMD workstation path using Radeon PRO W7900 (48GB VRAM). Uses a 2-slot GPU that fits the listed enclosure; ROCm on external GPU is experimental.

Practical model fit

Local AI examples

Examples for Mac Studio M2 Max Native MLX Workstation (Optional eGPU Path) profile example, based mainly on GPU VRAM and system memory.

Good fit for private chatGood fit for coding helpGood fit for document summariesNot ideal for 70B+ models

Starter pick

Llama 3.2 3B Instruct

A small, friendly starter model for learning local AI without needing a large GPU.

It is easy to download, small enough for almost any LLMLab machine, and useful for basic private chat.

Good fit
ollama run llama3.2

Likely good memory headroom for this quantized model at normal context sizes.

Qwen3 4B

Light chat, multilingual prompts, compact reasoning tests

Good fit

A compact Qwen model that gives beginners a taste of newer reasoning-style local models.

Likely good memory headroom for this quantized model at normal context sizes.

Mistral 7B Instruct v0.3

Fast general chat and simple assistant tasks

Good fit

A fast classic 7B model that is easy to run and compare against newer models.

Likely good memory headroom for this quantized model at normal context sizes.

Llama 3.1 8B Instruct

Everyday private chat and document summaries

Good fit

A widely supported everyday local chat model when the machine has at least an 8GB to 12GB GPU.

Likely good memory headroom for this quantized model at normal context sizes.

Expandable technical details

Assumptions

  • GPU VRAM assumption: 24GB from NVIDIA RTX 4090.
  • System RAM: 64GB.
  • Mac + eGPU fit is experimental and depends on driver/runtime support, not just VRAM.
  • Ratings assume Q4-style quantization, moderate context, one local model running at a time. Treat them as fit guidance, not a speed estimate.
  • Mac + eGPU examples are experimental. Individual setup pages must be checked before treating a model as practical.
Llama 3.2 3B Instruct technical details

Family: Meta Llama 3.2

Parameters: 3B

Quantization: Q4_K_M

Approx. model size: 2GB

CPU-only: Possible

VRAM: 0GB min / 4GB recommended

RAM: 8GB min / 16GB recommended

Full GPU offload: Should be possible when memory fits

Context warning: Long documents can still push memory use up, even with a small model.

Qwen3 4B technical details

Family: Qwen3

Parameters: 4B

Quantization: Q4_K_M

Approx. model size: 2.5GB

CPU-only: Possible

VRAM: 4GB min / 6GB recommended

RAM: 8GB min / 16GB recommended

Full GPU offload: Should be possible when memory fits

Context warning: Keep the context window modest on 8GB to 16GB systems.

Research sources

Researched: 2026-06-22

Mistral 7B Instruct v0.3 technical details

Family: Mistral 7B

Parameters: 7.3B

Quantization: Q4_K_M

Approx. model size: 4.4GB

CPU-only: Not recommended

VRAM: 8GB min / 12GB recommended

RAM: 16GB min / 32GB recommended

Full GPU offload: Should be possible when memory fits

Context warning: Long context support does not mean every machine should use the maximum context.

Llama 3.1 8B Instruct technical details

Family: Meta Llama 3.1

Parameters: 8B

Quantization: Q4_K_M

Approx. model size: 4.9GB

CPU-only: Not recommended

VRAM: 8GB min / 12GB recommended

RAM: 16GB min / 32GB recommended

Full GPU offload: Should be possible when memory fits

Context warning: The Q4 model is under 5GB, but KV cache grows with context length.

Local AI performance is approximate. Results depend on quantization, context length, backend, drivers, RAM, and whether the model fits fully in VRAM.