Ollama Configuration
This guide explains how to configure and use Ollama with HummerBot AI for local model serving.
Setting up Ollama URL in Model Selector
HummerBot AI allows you to connect to a local Ollama instance through the model selector. To configure this:
- Open the model selector dropdown in the AI service interface
- Click the settings icon (gear symbol) next to the "Model" label
- In the Ollama Settings dialog, enter your Ollama server URL (typically
http://localhost:11434) - Click "Save" to store the URL in your browser's local storage
- The Ollama models will now be available in the "Offline Models" section of the model selector
The Ollama URL is stored in your browser's local storage under the key "ollama-url" and defaults to http://localhost:11434 if not set.
Installing and Running Ollama
Before configuring HummerBot to use Ollama, you need to install and run Ollama on your system:
- Download Ollama from https://ollama.ai
- Follow the installation instructions for your operating system
- Start the Ollama service:
ollama serve
- Verify that Ollama is running by checking if you can access the API at
http://localhost:11434
Pulling and Downloading Ollama Models
Ollama provides access to a wide range of open-source models. To pull and download models:
- Open your terminal or command prompt
- Use the
ollama pullcommand to download a model:ollama pull llama3
- You can also pull other models like:
ollama pull mistral ollama pull phi3 ollama pull gemma2
- Check available models with:
ollama list
Understanding Model Differences
Different Ollama models have varying capabilities, performance characteristics, and resource requirements:
- Llama models (llama3, llama3.1, llama3.2, etc.): Meta's family of models with good general-purpose performance. Larger versions like 70B parameter models offer better reasoning but require more resources.
- Mistral models (mistral, mistral-openorca): Known for strong reasoning and coding capabilities, with open-source variants available.
- Phi models (phi3, phi3.5): Microsoft's lightweight models that perform well on reasoning tasks despite smaller size.
- Gemma models (gemma, gemma2): Google's open models optimized for efficiency and performance.
- Code-specific models (codellama, starcoder, deepseek-coder): Optimized for programming tasks and understanding code.
- Specialized models (llava for vision tasks, nomic-embed-text for embeddings): Designed for specific use cases.
Model Parameters and Requirements
Models have different parameter counts that affect performance:
- Parameter count: Models range from 1B to 70B+ parameters
- Higher parameter models require more GPU memory and computational resources, resulting in slower inference but potentially better quality
- Lower parameter models run faster with less memory but may be less accurate or capable
- GPU requirements: Models with >7B parameters typically need modern GPUs with 8GB+ VRAM for reasonable performance
- CPU fallback: If no GPU is available, models will run on CPU but significantly slower
Model Capabilities
Different models have varying capabilities:
- Tool support: Not all models support function calling or tool use. Models like llama3.1 and newer mistral models have better tool integration
- Accuracy: Larger models and newer architectures generally provide better accuracy, though this varies by task
- Context length: Models support different maximum context lengths (2K to 128K tokens), affecting how much text they can process at once
- Specialized tasks: Some models excel at coding, others at creative writing, and others at factual accuracy
Serving Models with Ollama
Once you've downloaded models, Ollama automatically serves them. To test a model:
- Run a simple query:
ollama run llama3
- Or use the API directly:
curl http://localhost:11434/api/generate -d '{ "model": "llama3", "prompt": "Why is the sky blue?", "stream": false }'
Using Ollama with HummerBot AI
After configuring the Ollama URL in HummerBot:
- The model selector will display Ollama models under the "Offline Models" section
- Select your preferred model from the dropdown
- Start using the local model for AI tasks
- Local models offer privacy benefits as your data doesn't leave your machine
Note: The model selector groups models by category (online/offline) and provider. Ollama models appear in the "Offline Models" section when properly configured.
Building Custom Models
In addition to using pre-built models from Ollama, you can create custom models by downloading from other sources like Hugging Face and building your own model files.
Downloading Models from Hugging Face
You can use models from Hugging Face with Ollama by creating a model file that references the Hugging Face model:
- Find a compatible model on Hugging Face
- Look for models that are compatible with Ollama (GGUF format or models that can be converted)
- Create a
Modelfilethat references the Hugging Face model
Creating a Modelfile
A Modelfile is a text file that defines how Ollama should build your custom model. Here's the basic structure:
FROM <base_model>
TEMPLATE """{{ if .System }}<|start_header_id|>system<|end_header_id|>
{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>
{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>
{{ .Response }}<|eot_id|>"""
PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER max_tokens 2048
SYSTEM "You are a helpful AI assistant."
ADAPTER /path/to/adapter/model
LICENSE Apache 2.0
Key components of a Modelfile:
- FROM: Specifies the base model to use (e.g., llama3, mistral, or a model you've downloaded)
- TEMPLATE: Defines how to format prompts for the model
- PARAMETER: Sets model parameters like temperature, top_p, max_tokens
- SYSTEM: Sets a system prompt that will be used with every request
- ADAPTER: Optional path to a LoRA adapter to modify the model's behavior
- LICENSE: Optional license information
Example Modelfile
Here's an example of a custom model based on Llama3:
FROM llama3
TEMPLATE """{{ if .System }}<|start_header_id|>system<|end_header_id|>
{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>
{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>
{{ .Response }}<|eot_id|>"""
PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER max_tokens 4096
SYSTEM "You are a coding assistant that specializes in JavaScript and Python."
Building Your Custom Model
To build your custom model:
- Create your
Modelfilewith the desired configuration - Run the following command in the same directory as your Modelfile:
ollama create my-custom-model -f Modelfile
- Your custom model will now be available in Ollama as
my-custom-model
Useful Resources
- Ollama Model Library - Official collection of models
- Hugging Face - Large collection of open-source models
- Ollama Documentation - Official Modelfile documentation
- GGUF Converter - Tool to convert models to GGUF format for Ollama
Exposing Local Ollama Instance
If you need to access your local Ollama instance from other devices or over the internet:
Using Tunnels
Using ngrok:
- Install ngrok from https://ngrok.com
- Start your Ollama service
- Create a tunnel:
ngrok http 11434
- Use the provided public URL in HummerBot's Ollama configuration
Using Cloudflare Tunnel:
- Install Cloudflare's
cloudflaredutility - Authenticate:
cloudflared tunnel login - Create and route a tunnel:
cloudflared tunnel --url http://localhost:11434
- Install Cloudflare's
Using Port Forwarding
- Access your router's configuration page
- Set up port forwarding for port 11434 to your local machine
- Use your public IP address with the port to access Ollama externally
- Note: This method requires a static IP or dynamic DNS service
Security Considerations
When exposing your local Ollama instance:
- Use authentication if possible
- Limit access to trusted networks
- Consider using VPN instead of direct port exposure
- Regularly update Ollama to the latest version
- Be mindful of the computational resources required when running models
Troubleshooting
- If HummerBot can't connect to Ollama, verify that the Ollama service is running
- Check that the URL is correctly formatted (including the protocol and port)
- Ensure firewall settings allow connections to port 11434
- For remote access, verify that your tunnel or port forwarding is properly configured