Microsoft open-sources the embedding model Harrier, topping the multilingual MTEB leaderboard with a significant lead over OpenAI and Google

robot
Abstract generation in progress

Crypto news: the Microsoft Bing team has open-sourced the Harrier embedding model series. Embedding models are the underlying components of search engines and RAG systems; they convert text into vectors for retrieval and matching, and their quality directly determines whether an AI system can find the correct information. The flagship Harrier-OSS-v1-27B scores an even 74.3 on the multilingual MTEB v2 benchmark (covering 131 tasks), surpassing the previous highest score among open-source models by 2 percentage points to take the #1 spot. Compared with closed-source models, the lead is even bigger: OpenAI text-embedding-3-large averages 58.92, Google Gemini Embedding 2 is 69.9, and Amazon Titan Embed v2 is 60.37. In parallel, they have open-sourced two lightweight versions for deployment in low-compute scenarios: 1. The 0.6B-parameter version: averages 69.0, #10 on the leaderboard, already surpassing Google Gemini Embedding 1 (68.33). 2. The 270M-parameter version: averages 66.5, #15 on the leaderboard, with the smallest footprint surpassing three closed-source model versions by OpenAI and Amazon. All versions support over 100 languages and a 32K context window. The training data includes more than 2 billion weakly supervised text pairs (for contrastive pretraining) and 10 million high-quality samples (for fine-tuning). The synthetic data is generated by GPT-5. After the flagship model is completed, it is also used as a teacher model to improve the performance of the two smaller models through knowledge distillation. Microsoft says Harrier’s technology will be integrated into Bing Search and the new-generation Agent Grounding service.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments