It runs llama 3.2 3B instruct, qwen 2.5 coder 3B and smollm2 1.7B (all in a q4_k_m) effortlessly. Just hit the ‘+’ sign and download hugging face ggufs.
Hopefully future support for mlx will be added.
I started playing with smaller models on my Intel mbp (ollama and Chatbox), which was fine. Slow, due to CPU processing.
I found this while looking for local llm options for the iPhone (16 Pro) and it runs the same models much faster.
Gemma2.2, llama 3.2 3B, Smollm2 1.7B and qwen coder 2.5 3B.
You will be limited somewhat on the variety of options with gguf only options, but most of the 3B and lower models have gguf versions that others have converted and error corrected.
I used models from both Bartowski and Unsloth.
Phi 4 did not work.
Excellent work from a solo developer. I’ve tipped him within Enclave , because this is exactly what I was looking for and I want to support small developers that don’t use a subscription model.