Futures
Access hundreds of perpetual contracts
TradFi
Gold
One platform for global traditional assets
Options
Hot
Trade European-style vanilla options
Unified Account
Maximize your capital efficiency
Demo Trading
Introduction to Futures Trading
Learn the basics of futures trading
Futures Events
Join events to earn rewards
Demo Trading
Use virtual funds to practice risk-free trading
Launch
CandyDrop
Collect candies to earn airdrops
Launchpool
Quick staking, earn potential new tokens
HODLer Airdrop
Hold GT and get massive airdrops for free
Pre-IPOs
Unlock full access to global stock IPOs
Alpha Points
Trade on-chain assets and earn airdrops
Futures Points
Earn futures points and claim airdrop rewards
Promotions
AI
Gate AI
Your all-in-one conversational AI partner
Gate AI Bot
Use Gate AI directly in your social App
GateClaw
Gate Blue Lobster, ready to go
Gate for AI Agent
AI infrastructure, Gate MCP, Skills, and CLI
Gate Skills Hub
10K+ Skills
From office tasks to trading, the all-in-one skill hub makes AI even more useful.
GateRouter
Smartly choose from 40+ AI models, with 0% extra fees
GPT-5.5 '9.7T Parameter' Re-evaluated: Revised to Approximately 1.5T
According to monitoring by Beating, AI researchers Lawrence Chan and Benno Sturgeon have published a review of the paper by Pine AI’s Chief Scientist Li Bojie titled ‘Incompressible Knowledge Probes: Estimating the Parameter Count of Black Box Large Language Models Based on Fact Capacity.’ The original paper estimated GPT-5.5 to be about 9.7T, Claude Opus 4.7 to be around 4.0T, and o1 to be approximately 3.5T using 1,400 trivia questions to ‘weigh’ the closed-source models. The reviewers believe that while the approach itself is valuable, the original figures were significantly inflated due to the scoring criteria and question quality. The main issue lies in the ‘floor score.’ The original paper divided the questions into seven difficulty levels, and when a model answered too many incorrectly at a certain level, the score could theoretically become negative; however, the code actually pulled the minimum score for each level back to 0. This inflated the performance gap of cutting-edge models on difficult questions and further increased the inferred parameter count. The paper claims this was not handled in such a manner, yet the code and published results employed this treatment. After removing the ‘floor score,’ the fitting slope decreased from 6.79 to 3.56. This slope can be understood as ‘for every point increase in the score, how much parameter growth is translated’; a smaller slope indicates that the same score difference no longer corresponds to such an exaggerated parameter difference. The R² value dropped from 0.917 to 0.815, indicating that the ‘score to parameter count’ fitting curve is not as stable as in the original paper. The 90% prediction interval expanded from 3.0 times to 5.7 times, suggesting a wider margin of error and that single-point figures should not be taken seriously. The review also pointed out that 131 out of 1,400 questions had ambiguities or incorrect answers, accounting for 9.4%. The issues were mainly concentrated in the difficult questions, which were used to differentiate cutting-edge closed-source models like GPT-5.5 and Claude Opus 4.7. According to their revised criteria, GPT-5.5 was reduced from the original paper’s 9659B to 1458B, with a 90% prediction interval of 256B to 8311B; Claude Opus 4.7 was reduced from 4042B to 1132B; and GPT-5 was reduced from 4088B to 1330B. The reviewers also emphasized that 1.5T should not be regarded as the true parameter count for GPT-5.5. A more accurate conclusion is that this ‘trivia weighing method’ is highly sensitive to scoring details and question quality, and figures like 9.7T cannot be directly used as a weight measure for closed-source models.