Discovering Hidden Gems in Model Repositories

Kahana, Jonathan; Horwitz, Eliahu; Hoshen, Yedid

Discovering Hidden Gems in Model Repositories

Jonathan Kahana Eliahu Horwitz Yedid Hoshen

The Hebrew University of Jerusalem

Paper Code - Coming Soon! arXiv

We investigate the available fine-tunes of popular foundation models. While over 90% of downloads are directed to the official base versions (typically, the foundation model and its instruction fine-tuning) we find other, rarely downloaded fine-tunes that outperform these base versions significantly.

Abstract

Public repositories host millions of fine-tuned models, yet community usage remains disproportionately concentrated on a small number of foundation checkpoints. We investigate whether this concentration reflects efficient market selection or if superior models are systematically overlooked. Through an extensive evaluation of over 2,000 models, we show the prevalence of “hidden gems”, unpopular fine-tunes that significantly outperform their popular counterparts. Notably, within the Llama-3.1-8B family, we find rarely downloaded checkpoints that improve GSM8K accuracy from 83.2% to 96.0% without increasing inference costs. However, discovering these models through exhaustive evaluation of every uploaded model is computationally infeasible. We therefore formulate model discovery as a Multi-Armed Bandit problem and accelerate the Sequential Halving search algorithm by using shared query sets and aggressive elimination schedules. Our method retrieves top models with as few as 50 queries per candidate, accelerating discovery by over 50x.

Repository Inefficiency

Cumulative Downloads Distribution Usage is extremely concentrated in a tiny fraction of top models, with less than 0.15% of models accounting for more than 95% of all downloads.

Percentage of Unused Models: The vast majority of models are rarely explored, reinforcing the concept of repository inefficiency where many potentially valuable models remain undiscovered.

Detecting "Hidden Gems"

Information Asymmetry: While most downloads are centralized in a small subset of models, we believe these models are popular due to information asymmetry: users cannot evaluate all models to find the best one available, and therefore default to popular, standard base versions which are well-tested but sub-optimal.

Hidden Gems Discovery: To test this hypothesis we evaluate over 2,000 models from 4 popular model trees on diverse tasks. Our evaluation validates our hypothesis by revealing the existence of "Hidden Gems": highly unpopular models that significantly outperform their popular counterparts.

Llama 3 8B

Mistral 7B

Qwen 3B

Qwen 7B

Failure of Heuristics: Why are these models missed? Over 90% of identified gems lacked performance documentation relevant to their specific strengths, leaving users with no signal to identify them. Moreover, these gems do not cluster along predictable trajectories, implying that simple search heuristics based on popularity or structural properties are likely to fail.

Efficient Model Discovery

Exhaustively evaluating every model is infeasible. Instead, we view model discovery as a Multi-Armed Bandit (MAB) problem and propose an approach that accelerates the Sequential Halving algorithm using shared query sets as well as early and aggressive eliminations. This enables us to efficiently find high-performing candidates with only 50 queries per model, expediting the search by over 50×. Crucially, our method consistently outperforms standard MAB baselines and the original Sequential Halving algorithm iteself.

Method Overview: Our accelerated bandit-based search iteratively evaluates models on small query batches and eliminates the lowest-performing models at each round. This enables efficient discovery of top-performing models without exhaustive evaluation.

Baseline Comparison: We compare our method against standard MAB algorithms (UCB, TTTS, Successive Rejects) and the Sequential Halving baseline. "Best Base" refers to selecting the most popular official model.
Superior Efficiency: Our method consistently achieves the best rank and accuracy across all model trees, finding near-optimal models with only 50-100 queries per candidate.
Practical Impact: At just 50 queries per model, our method retrieves models ranked in the top-3 out of hundreds of candidates.

BibTeX


      @misc{kahana2026discoveringhiddengemsmodel,
      title={Discovering Hidden Gems in Model Repositories},
      author={Jonathan Kahana and Eliahu Horwitz and Yedid Hoshen},
      year={2026},
      eprint={2601.22157},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2601.22157}}
}

More Works from Our Lab

We Should Chart an Atlas of All the World's Models

Can this Model Also Recognize Dogs? Zero-Shot Model Search from Weights

Deep Linear Probe Generators for Weight Space Learning

Discovering Hidden Gems in Model Repositories

Abstract

Repository Inefficiency

Detecting "Hidden Gems"

Llama 3 8B

Mistral 7B

Qwen 3B

Qwen 7B

Efficient Model Discovery

BibTeX