How to Run AI Model in you Local Computer

This guide will help you set up DeepSeek-Coder-V2:16b on Windows using Ollama. These steps are optimized for users with 8GB VRAM cards like the NVIDIA RTX 3050.

1. Install Ollama

First, download the Windows version of Ollama from the official site:

👉 https://ollama.com/download

After installing, open your Command Prompt or PowerShell and verify it works:

ollama --version

2. Verify Your GPU

To ensure Ollama sees your NVIDIA card, run this command while checking your Task Manager > Performance > GPU tab:

ollama ps

3. Download the Model

Run this command to download the 16B version (requires ~10-15 GB of data):

ollama pull deepseek-coder-v2:16b

4. Optimize for RTX 3050 (Important)

To prevent your 8GB VRAM from crashing, use this specific command. This offloads 20 layers to the GPU and keeps the rest in system RAM for stability:

OLLAMA_GPU_LAYERS=20 ollama run deepseek-coder-v2:16b

Note: If it still crashes, change 20 to 16. If it is too slow, try 24.

5. Increase Context Window

If you want the model to remember longer files, create a Modelfile (a text file with no extension) and paste this:

FROM deepseek-coder-v2:16b
PARAMETER num_ctx 8192
PARAMETER temperature 0.2

Then build your custom version with this command:

ollama create deepseek-8k -f Modelfile

6. Use it in VS Code

Install the Continue.dev extension in VS Code. Open your settings and add this provider:

{
  "models": [
    {
      "title": "DeepSeek Local",
      "provider": "ollama",
      "model": "deepseek-coder-v2:16b"
    }
  ]
}

Troubleshooting

Out of Memory: Lower your GPU layers or close your Web Browser (Chrome).
Slow Responses: Ensure your NVIDIA drivers are updated to the latest version.
Better Results: Always start your prompt with: "You are a senior software engineer. Think step by step."

Code With Pritom

Search This Blog

Convert your Private Server to Public Server using DietPi