Ali's Newsletter
Posts
🦙 Exporting to GGUF for Ollama: Step-by-Step Guide for Local Use 🚀

🦙 Exporting to GGUF for Ollama: Step-by-Step Guide for Local Use 🚀

Working with fine-tuned models from Unsloth? Want to run them locally in Ollama 💻? This guide walks you through exporting your model in GGUF format and setting it up for smooth local use.

Ali Ali
September 16, 2025

I ’ll cover everything: exporting, preparing the Modelfile, creating the model, verifying, running, and troubleshooting.

💾 1. Export the Model to GGUF

GGUF is a format compatible with Ollama 🦙, making deployment simple.

Export Command

model.save_pretrained_gguf("model", tokenizer, quantization_method="q4_k_m")

"model" → folder where GGUF files are saved
tokenizer → the tokenizer associated with your model
quantization_method="q4_k_m" → method to reduce model size while keeping accuracy 🎯

Typically, this creates a file like unsloth.Q4_K_M.gguf.

📂 2. Prepare the GGUF File Locally

Download the GGUF file from your Colab session or wherever it was exported.
Example: unsloth.Q4_K_M.gguf
Place the file in a convenient folder:
- Current working directory 📁
- Or a dedicated folder like models/

📝 3. Create the Modelfile

The Modelfile is a blueprint 🏗️ that tells Ollama how to run your GGUF model:

Path to GGUF file
Inference parameters
Chat templates
System prompt

Steps to Create

Open a text editor
Create a file named Modelfile (no extension needed)
Example for a general assistant:

FROM ./unsloth.Q4_K_M.gguf        # Path to GGUF file
PARAMETER temperature 0.7        # Creativity control
PARAMETER top_p 0.9              # Nucleus sampling threshold
PARAMETER stop "<|end_of_text|>" # Stop token
PARAMETER stop "<|user|>"        # Additional stop token for chat

TEMPLATE """<|user|>
{{.. . Prompt }}<|assistant|>
"""  # Simple chat template

SYSTEM """You are a helpful AI assistant."""  # Role definition

⚡ For fine-tuned tasks (e.g., HTML extraction), adjust SYSTEM prompt: "You are an HTML extraction expert. Extract name, price, category, and manufacturer in JSON format."

🦙 4. Create the Model in Ollama

Steps

Open terminal (macOS, Windows WSL, Linux).
Navigate to the directory with your Modelfile and GGUF file.
Run:

ollama create html-model -f Modelfile

Replace "html-model" with your desired model name
Ollama processes GGUF layers (hashing & reuse of existing blobs)
Large files take time, but it avoids full duplication
Success message appears when done ✅

✅ 5. Verify the Model

Run:

ollama list

Your model should appear, e.g., html-model:latest
Confirms Ollama recognized and registered the GGUF model

💬 6. Run and Use the Model

Interactive Chat

ollama run html-model

Type prompts:
"Extract from this HTML: <div>..."
Press Enter to get responses
Exit: /bye or Ctrl+C

Non-Interactive (API)

curl http://localhost:11434/api/generate -d '{"model": "html-model", "prompt": "Your prompt here"}'

Use from scripts for automated workflows
Returns structured outputs in JSON or plain text depending on your model setup

⚠️ 7. Storage Tips

Ollama copies GGUF files to ~/.ollama/models/blobs/
To save space, symlink the original file:

ln -s /path/to/original.gguf ~/.ollama/models/blobs/sha256-...

Replace sha256-... with actual blob hash:

ollama show html-model --modelfile

💡 Pro Tips

Always use absolute paths in the Modelfile if GGUF is not in the same folder
Adjust temperature & top_p based on task:
- Higher temperature → more creative outputs 🎨
- Lower temperature → more deterministic outputs 🎯
Stop tokens are crucial for chat-style interactions 🗣️

🎉 8. Summary

With GGUF + Ollama, you can:

Export models from Unsloth 🦙
Run fine-tuned models locally 💻
Configure chat behavior and inference parameters 🎛️
Use models interactively or via API ⚡
Optimize storage with symlinks 🧩

This workflow ensures your fine-tuned models are production-ready and easy to manage locally 🏆.

🔗 References

📌 Ollama Official Site