Futures
Access hundreds of perpetual contracts
TradFi
Gold
One platform for global traditional assets
Options
Hot
Trade European-style vanilla options
Unified Account
Maximize your capital efficiency
Demo Trading
Introduction to Futures Trading
Learn the basics of futures trading
Futures Events
Join events to earn rewards
Demo Trading
Use virtual funds to practice risk-free trading
Launch
CandyDrop
Collect candies to earn airdrops
Launchpool
Quick staking, earn potential new tokens
HODLer Airdrop
Hold GT and get massive airdrops for free
Launchpad
Be early to the next big token project
Alpha Points
Trade on-chain assets and earn airdrops
Futures Points
Earn futures points and claim airdrop rewards
Microsoft Introduced Critique, A New Multi-Model Deep Research System In M365 Copilot
In Brief
Microsoft has introduced Critique, a new multi-model deep research system inside Researcher, the deep research agent in Microsoft 365 Copilot, as part of a broader push to make Copilot feel more dependable for serious knowledge work instead of just fast drafting.
According to Microsoft, Critique is designed for complex research tasks and works by splitting the job into two parts: one model handles planning, retrieval, synthesis, and drafting, while a second model reviews and refines the output before the final report is produced. Microsoft says the system uses models from frontier labs including OpenAI and Anthropic, and that it is available now through the company’s Frontier program
Reuters reported that in Critique’s current setup, OpenAI’s GPT generates the response and Anthropic’s Claude reviews it for accuracy and quality before the answer reaches the user. Microsoft has also said it wants this workflow to become bi-directional later on, allowing models to review each other in both directions
What Critique actually does inside Microsoft 365 Copilot
Microsoft’s own description makes it clear that Critique is not just a cosmetic feature or a new button slapped onto Copilot.It works inside Researcher in Microsoft 365 Copilot and is built for deeper tasks where getting it right matters just as much as getting it done fast. One model does the digging and drafts the report, while the second steps in like an editor, checking the facts, sharpening the structure, and helping turn it into a more reliable final piece.
Microsoft says the whole idea is to separate generation from evaluation, rather than asking one model to brainstorm, write, fact-check, and polish its own work all at once. That distinction matters because a lot of AI failure comes from exactly that one-model bottleneck. When a single system is asked to do everything, it can produce something that looks polished while quietly missing gaps, overreaching on claims, or leaning on weak evidence
Microsoft is not pitching Critique as a side experiment
One of the more important details in Microsoft’s announcement is that Critique will be the default experience in Researcher when Auto is selected in the model picker. That signals the company sees this as more than an optional lab feature for power users. It is effectively treating multi-model review as the new baseline for deep research quality inside Microsoft 365 Copilot. That is a meaningful product choice, because it suggests Microsoft believes enterprise customers care less about raw response speed than they do about fewer hallucinations, stronger structure, and more confidence in the finished report
That also fits neatly into Microsoft’s broader messaging around Wave 3 of Microsoft 365 Copilot, where the company has been pushing the idea of Copilot as a “system for work” built on a multi-model advantage rather than on any single AI lab. In Microsoft’s framing, Copilot is meant to pull the best available intelligence from across the industry, grounded in work context through what it calls Work IQ and protected by enterprise data controls. Critique is one of the clearest examples yet of that strategy moving from marketing language into a visible product feature
The benchmark numbers are a big part of Microsoft’s sales pitch
Microsoft is not only saying Critique feels better. It is saying the system performed better on a formal benchmark. In its technical write-up, the company says it tested Critique on the DRACO benchmark, short for Deep Research Accuracy, Completeness, and Objectivity, which covers 100 complex research tasks across 10 domains. Microsoft says responses were judged across factual accuracy, breadth and depth of analysis, presentation quality, and citation quality, and that Critique outperformed the single-model version of Researcher across all four measures
The company highlighted the largest gains in breadth and depth of analysis, followed by presentation quality and factual accuracy. It also says the improvements were statistically significant and that Researcher with Critique delivered a +7.0 point aggregated score improvement, or +13.88% over Perplexity Deep Research (Claude Opus 4.6 model), which Microsoft described as the best system reported in the benchmark paper
Data | Source: Microsoft
That is an eye-catching claim, especially because the deep research race has become one of the most competitive fronts in enterprise AI. Research tools are no longer being judged only by whether they can gather information, but by whether they can assemble a report that feels decision-ready
Microsoft’s argument is that the review layer forces researchers to identify missing angles, tighten organization, challenge weak claims, and use citations more carefully. Whether customers experience those gains in real workflows will matter more than benchmark charts, but Microsoft is clearly trying to signal that this is a measurable quality jump rather than a vague model update
Council shows Microsoft is thinking beyond one “best answer”
Critique is not the only feature Microsoft introduced alongside this update. The company also launched Council, a multi-model comparison mode inside Researcher. Microsoft says Council runs Anthropic and OpenAI models simultaneously, allowing each to generate a full standalone report. A separate judge model then creates a distilled summary showing where the reports agree, where they diverge, and what each uniquely contributes. Microsoft Support describes this as Model Council, a mode that preserves both full reports and adds a comparison summary to help users decide which output is stronger or how to combine them
That is a very interesting signal about where enterprise AI may be heading. For a while, the industry behaved as if the goal was to find one model that could replace all the others. Microsoft’s latest move suggests the more realistic future may be one where companies do not trust any single model enough to make it the only voice in the room
The timing of Critique is not accidental. Microsoft has been under pressure to show that Microsoft 365 Copilot is becoming more useful, more differentiated, and more valuable as competition intensifies
Reuters tied the rollout of Critique and Council to Microsoft’s effort to improve Copilot adoption in a market where rivals including Google’s Gemini and Anthropic’s Claude products are pushing hard into workplace AI. Axios also noted that Microsoft’s multi-model strategy has another benefit: it shows the company is not locked into overdependence on OpenAI at a time when frontier model leadership can shift quickly