Choosing a local large language model often involves navigating a confusing landscape of hardware requirements and performance claims. The open-source tool `whichllm` provides a direct solution: it is a command-line utility that auto-detects your computer's hardware and ranks the best-performing LLMs that will actually run on it. As of May 2026, the tool uses live data to recommend models based on real benchmarks and recency, not just parameter count, according to its GitHub repository.
The core value of `whichllm` is solving the "biggest is not best" problem. Many developers assume the largest model that fits into their GPU's VRAM is the superior choice. However, a newer, more efficient model with fewer parameters can often outperform a larger, older one. `whichllm` codifies this by analyzing a model's true performance, not just its size, preventing users from choosing a less capable model simply because it has a higher parameter count.
This addresses a growing pain point for developers. The Hugging Face Hub, a central repository for AI models, hosts thousands of options with numerous variations, forks, and quantization formats. `whichllm` acts as an evidence-based filter for this noise, helping users sidestep low-quality or poorly benchmarked models.
How Does It Rank Models?
The tool’s ranking engine goes far beyond simple hardware compatibility checks. It calculates a holistic score for each model by evaluating multiple factors, ensuring its top recommendation is genuinely the best practical choice.
The ranking is built on several key data points:
- Merged Benchmarks: It pulls data from multiple trusted sources, including LiveBench, Artificial Analysis, and Chatbot Arena, to create a composite quality score.
- Evidence-Grading: Every benchmark score is discounted based on its source. A direct, verified benchmark receives full weight, while a score inherited from a base model or self-reported by an uploader is heavily penalized.
- Recency-Aware Scoring: The system actively demotes scores from stale leaderboards, ensuring that a model from 2024 cannot outrank a newer, current-generation model on an outdated test.
- Architecture Awareness: VRAM and speed estimates are not generic. The tool models VRAM usage by accounting for weights, KV cache, and activation overhead, while speed calculations consider memory bandwidth, quantization efficiency, and even the difference between unified memory and discrete GPUs.
Beyond Recommendations: Planning and Execution
`whichllm` is more than just a recommendation list; it is an actionable tool for developers and enthusiasts. A key feature is the `whichllm run` command, which can automatically download the top-ranked model, set up an isolated environment, and start an interactive chat session.
For developers looking to integrate a model into their own applications, the `whichllm snippet` command generates ready-to-run Python code for any given model. This lowers the barrier to entry for experimenting with different LLMs.
The tool also serves hardware planning needs. Users can simulate how different GPUs would perform or determine the hardware required to run a specific large model:
- `whichllm --gpu "RTX 4090"` simulates recommendations for a specific graphics card.
- `whichllm plan "llama 3 70b"` calculates the GPU needed to run a desired model.
- `whichllm upgrade "RTX 4090" "RTX 5090"` compares your current setup against potential upgrades.
All output can be formatted as JSON, allowing `whichllm` to be integrated into automated scripts and pipelines, such as feeding a model ID directly into a local Ollama instance.
The Trending Society Take
The proliferation of AI-generated content is creating a data quality crisis, forcing platforms like ArXiv to ban researchers who upload unchecked "AI slop." `whichllm` applies a similar quality-control principle at the developer level. By prioritizing verifiable benchmarks and evidence over marketing claims and parameter counts, it provides a crucial tool for navigating the increasingly chaotic local AI ecosystem. This tool helps builders choose substance over size, promoting a healthier, more transparent development landscape.








