• Ali's Newsletter
  • Posts
  • 🦙 Exporting to GGUF for Ollama: Step-by-Step Guide for Local Use 🚀

🦙 Exporting to GGUF for Ollama: Step-by-Step Guide for Local Use 🚀

Working with fine-tuned models from Unsloth? Want to run them locally in Ollama 💻? This guide walks you through exporting your model in GGUF format and setting it up for smooth local use.

I ’ll cover everything: exporting, preparing the Modelfile, creating the model, verifying, running, and troubleshooting.

💾 1. Export the Model to GGUF

GGUF is a format compatible with Ollama 🦙, making deployment simple.

Export Command

model.save_pretrained_gguf("model", tokenizer, quantization_method="q4_k_m")
  • "model" → folder where GGUF files are saved

  • tokenizer → the tokenizer associated with your model

  • quantization_method="q4_k_m" → method to reduce model size while keeping accuracy 🎯

Typically, this creates a file like unsloth.Q4_K_M.gguf.

📂 2. Prepare the GGUF File Locally

  1. Download the GGUF file from your Colab session or wherever it was exported.
    Example: unsloth.Q4_K_M.gguf

  2. Place the file in a convenient folder:

    • Current working directory 📁

    • Or a dedicated folder like models/

📝 3. Create the Modelfile

The Modelfile is a blueprint 🏗️ that tells Ollama how to run your GGUF model:

  • Path to GGUF file

  • Inference parameters

  • Chat templates

  • System prompt

Steps to Create

  1. Open a text editor

  2. Create a file named Modelfile (no extension needed)

  3. Example for a general assistant:

FROM ./unsloth.Q4_K_M.gguf        # Path to GGUF file
PARAMETER temperature 0.7        # Creativity control
PARAMETER top_p 0.9              # Nucleus sampling threshold
PARAMETER stop "<|end_of_text|>" # Stop token
PARAMETER stop "<|user|>"        # Additional stop token for chat

TEMPLATE """<|user|>
{{.. . Prompt }}<|assistant|>
"""  # Simple chat template

SYSTEM """You are a helpful AI assistant."""  # Role definition

⚡ For fine-tuned tasks (e.g., HTML extraction), adjust SYSTEM prompt: "You are an HTML extraction expert. Extract name, price, category, and manufacturer in JSON format."

🦙 4. Create the Model in Ollama

Steps

  1. Open terminal (macOS, Windows WSL, Linux).

  2. Navigate to the directory with your Modelfile and GGUF file.

  3. Run:

ollama create html-model -f Modelfile
  • Replace "html-model" with your desired model name

  • Ollama processes GGUF layers (hashing & reuse of existing blobs)

  • Large files take time, but it avoids full duplication

  • Success message appears when done ✅

✅ 5. Verify the Model

Run:

ollama list
  • Your model should appear, e.g., html-model:latest

  • Confirms Ollama recognized and registered the GGUF model

💬 6. Run and Use the Model

Interactive Chat

ollama run html-model
  • Type prompts:
    "Extract from this HTML: <div>..."

  • Press Enter to get responses

  • Exit: /bye or Ctrl+C

Non-Interactive (API)

curl http://localhost:11434/api/generate -d '{"model": "html-model", "prompt": "Your prompt here"}'
  • Use from scripts for automated workflows

  • Returns structured outputs in JSON or plain text depending on your model setup

⚠️ 7. Storage Tips

  • Ollama copies GGUF files to ~/.ollama/models/blobs/

  • To save space, symlink the original file:

ln -s /path/to/original.gguf ~/.ollama/models/blobs/sha256-...
  • Replace sha256-... with actual blob hash:

ollama show html-model --modelfile

💡 Pro Tips

  • Always use absolute paths in the Modelfile if GGUF is not in the same folder

  • Adjust temperature & top_p based on task:

    • Higher temperature → more creative outputs 🎨

    • Lower temperature → more deterministic outputs 🎯

  • Stop tokens are crucial for chat-style interactions 🗣️

🎉 8. Summary

With GGUF + Ollama, you can:

  • Export models from Unsloth 🦙

  • Run fine-tuned models locally 💻

  • Configure chat behavior and inference parameters 🎛️

  • Use models interactively or via API

  • Optimize storage with symlinks 🧩

This workflow ensures your fine-tuned models are production-ready and easy to manage locally 🏆.

🔗 References