Recognize gemma3/llama4/mistral-small3.1+/multimodal as vision models (#1430)

is_vision_model() classified several genuinely multimodal families as text-only
because their names contain neither "vision" nor "vl": Gemma 3 (4b+), Llama 4,
Mistral Small 3.1/3.2, and *-multimodal models (e.g. phi-4-multimodal). For those
the attached image was stripped before the request, so the model never saw it —
a "can't read the image" report (issue #1274), common with Ollama tags like
gemma3:4b.

Add those keywords (plus a generic "multimodal"). Per the file's err-toward-True
policy (#124), a rare text-only tag treated as vision is the safer failure than
dropping a real image. Guard tests confirm the text-only siblings (gemma2, plain
gemma, mistral-small, phi-3) are not over-matched.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
lekt8
2026-06-03 03:17:40 +08:00
committed by GitHub
parent 70103d8719
commit 583df3dd6a
2 changed files with 27 additions and 1 deletions
+9 -1
View File
@@ -36,8 +36,16 @@ _VISION_MODEL_KEYWORDS = (
"gpt-4o", "gpt-4.1", "gpt-4.5", "gpt-4-turbo", "gpt-4-vision",
"claude-sonnet", "claude-opus", "claude-haiku", "gemini",
# open / local
"vision", "llava", "bakllava", "moondream", "pixtral", "minicpm",
"vision", "multimodal", "llava", "bakllava", "moondream", "pixtral", "minicpm",
"internvl", "cogvlm", "qwen-vl", "qwen2-vl", "qwen3-vl", "qwen3vl",
# multimodal families whose names don't contain "vision"/"vl" but DO accept
# images — without these the image is silently dropped for common Ollama tags
# like gemma3:4b (issue #1274). Gemma 3 (4b+), Llama 4 (all), and Mistral
# Small 3.1/3.2 are vision-capable; per the err-toward-True policy (#124) a
# rare text-only tag (e.g. gemma3:1b) being treated as vision is the safer
# failure than dropping a real image.
"gemma-3", "gemma3", "llama-4", "llama4",
"mistral-small-3.1", "mistral-small3.1", "mistral-small-3.2", "mistral-small3.2",
# zhipu / glm (glm-4.5v, glm-4.6v, glm-5v-turbo, etc.)
"glm-4.5v", "glm-4.6v", "glm-5v",
)