The competition among large language models has indeed become intense. Based on the progress in recent months, GLM-4.7 has performed quite well on Agent-related tasks—whether it's tool invocation, web scraping, or mathematical reasoning, it has shown several advantages. However, in software engineering capabilities (according to the SWE-bench standard) and command line operation accuracy, Claude and GPT still maintain a lead.
Interestingly, the performance differences of these models in cryptocurrency application scenarios are more pronounced. Each company emphasizes its compatibility with on-chain data analysis, smart contract auditing, and DeFi interactions, but the actual results still vary by task. Especially when dealing with complex multi-step operations and engineering-level code generation, there is quite a significant gap in the ceiling of different models.
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
8 Likes
Reward
8
3
Repost
Share
Comment
0/400
EternalMiner
· 12-23 12:55
Haha, GLM really has something this time, but in the crypto circle, we still have to see who can truly handle complex on-chain operations; just bragging is useless.
---
To be honest, the performance of these models in the DeFi scenarios is uneven; sometimes it feels like they are just praising each other.
---
What's the use of powerful Agent tasks if they can't reliably call smart contracts? This area really has a huge disparity in capability.
---
GLM-4.7 looks good, but I'll wait and see if it can actually be used for auditing smart contracts; everything feels too idealistic right now.
---
Every company in the web3 applications space is boasting about being the best; who really is the best? It still needs real on-chain testing.
---
The gap in engineering-level code generation is so significant; how can we expect models to write reliable contracts? I'm a bit worried.
---
Isn't it just that everyone has their own strengths? Choose the right tool based on the scenario; there's no need to rank them.
View OriginalReply0
RektCoaster
· 12-23 12:46
GLM indeed has something going on this time, the Agent part can really hit hard. However, on the swe-bench, we still need to look at Claude and GPT, as there is still a gap.
In the on-chain part, everyone is boasting about their own, only those who use it know... DeFi contract audits still need more models for cross-validation, one alone cannot eat this meal.
View OriginalReply0
PriceOracleFairy
· 12-23 12:34
glm catching up fast on agent tasks but lmao... let's be real, when it comes to actual onchain arbitrage execution and contract auditing? claude's still the one i'm trusting with my dry powder. the agent flex means nothing if you can't catch a 2-second mev window without hallucinating the calldata 🤔
The competition among large language models has indeed become intense. Based on the progress in recent months, GLM-4.7 has performed quite well on Agent-related tasks—whether it's tool invocation, web scraping, or mathematical reasoning, it has shown several advantages. However, in software engineering capabilities (according to the SWE-bench standard) and command line operation accuracy, Claude and GPT still maintain a lead.
Interestingly, the performance differences of these models in cryptocurrency application scenarios are more pronounced. Each company emphasizes its compatibility with on-chain data analysis, smart contract auditing, and DeFi interactions, but the actual results still vary by task. Especially when dealing with complex multi-step operations and engineering-level code generation, there is quite a significant gap in the ceiling of different models.