- Ali's Newsletter
- Posts
- 🦙 Exporting to GGUF for Ollama: Step-by-Step Guide for Local Use 🚀
🦙 Exporting to GGUF for Ollama: Step-by-Step Guide for Local Use 🚀
Working with fine-tuned models from Unsloth? Want to run them locally in Ollama 💻? This guide walks you through exporting your model in GGUF format and setting it up for smooth local use.
I ’ll cover everything: exporting, preparing the Modelfile, creating the model, verifying, running, and troubleshooting.
💾 1. Export the Model to GGUF
GGUF is a format compatible with Ollama 🦙, making deployment simple.
Export Command
model.save_pretrained_gguf("model", tokenizer, quantization_method="q4_k_m")
"model"→ folder where GGUF files are savedtokenizer→ the tokenizer associated with your modelquantization_method="q4_k_m"→ method to reduce model size while keeping accuracy 🎯
Typically, this creates a file like unsloth.Q4_K_M.gguf.
📂 2. Prepare the GGUF File Locally
Download the GGUF file from your Colab session or wherever it was exported.
Example:unsloth.Q4_K_M.ggufPlace the file in a convenient folder:
Current working directory 📁
Or a dedicated folder like
models/
📝 3. Create the Modelfile
The Modelfile is a blueprint 🏗️ that tells Ollama how to run your GGUF model:
Path to GGUF file
Inference parameters
Chat templates
System prompt
Steps to Create
Open a text editor
Create a file named
Modelfile(no extension needed)Example for a general assistant:
FROM ./unsloth.Q4_K_M.gguf # Path to GGUF file
PARAMETER temperature 0.7 # Creativity control
PARAMETER top_p 0.9 # Nucleus sampling threshold
PARAMETER stop "<|end_of_text|>" # Stop token
PARAMETER stop "<|user|>" # Additional stop token for chat
TEMPLATE """<|user|>
{{.. . Prompt }}<|assistant|>
""" # Simple chat template
SYSTEM """You are a helpful AI assistant.""" # Role definition
⚡ For fine-tuned tasks (e.g., HTML extraction), adjust SYSTEM prompt: "You are an HTML extraction expert. Extract name, price, category, and manufacturer in JSON format."
🦙 4. Create the Model in Ollama
Steps
Open terminal (macOS, Windows WSL, Linux).
Navigate to the directory with your
Modelfileand GGUF file.Run:
ollama create html-model -f Modelfile
Replace
"html-model"with your desired model nameOllama processes GGUF layers (hashing & reuse of existing blobs)
Large files take time, but it avoids full duplication
Success message appears when done ✅
✅ 5. Verify the Model
Run:
ollama list
Your model should appear, e.g.,
html-model:latestConfirms Ollama recognized and registered the GGUF model
💬 6. Run and Use the Model
Interactive Chat
ollama run html-model
Type prompts:
"Extract from this HTML: <div>..."Press Enter to get responses
Exit:
/byeor Ctrl+C
Non-Interactive (API)
curl http://localhost:11434/api/generate -d '{"model": "html-model", "prompt": "Your prompt here"}'
Use from scripts for automated workflows
Returns structured outputs in JSON or plain text depending on your model setup
⚠️ 7. Storage Tips
Ollama copies GGUF files to
~/.ollama/models/blobs/To save space, symlink the original file:
ln -s /path/to/original.gguf ~/.ollama/models/blobs/sha256-...
Replace
sha256-...with actual blob hash:
ollama show html-model --modelfile
💡 Pro Tips
Always use absolute paths in the Modelfile if GGUF is not in the same folder
Adjust temperature & top_p based on task:
Higher temperature → more creative outputs 🎨
Lower temperature → more deterministic outputs 🎯
Stop tokens are crucial for chat-style interactions 🗣️
🎉 8. Summary
With GGUF + Ollama, you can:
Export models from Unsloth 🦙
Run fine-tuned models locally 💻
Configure chat behavior and inference parameters 🎛️
Use models interactively or via API ⚡
Optimize storage with symlinks 🧩
This workflow ensures your fine-tuned models are production-ready and easy to manage locally 🏆.