Is the window only 1 year? The AI arms race behind the “Lobster” racing competition: JD.com finds an “industry-side” solution, can it catch up and overtake?

Question

> When it comes to stock trading, rely on Golden Kylin Analyst Reports—authoritative, professional, timely, comprehensive—helping you uncover potential thematic opportunities!		　　Daily Economic News Reporter | Wang Yubiao    Daily Economic News Editor | Bi Luming    　　The "Lobster" craze has gone viral, robots are dancing and punching on screens, digital humans are live streaming and selling products "for real," and with AI technology accelerating iteration, the entire industry has entered a critical period of scene landing. How to balance cost, efficiency, and performance, and how to connect the industry's "last mile" of implementation—these challenges all require more "reference answers."　　On March 24, JD.com announced some phased progress in AI research and application, including open-source large models JoyAI-LLM Flash, launching its own "Lobster" product suite, and pioneering the release of "Free State Digital Human," among others.　　Regarding the "Lobster" topic, a relevant technical leader at JD.com told the Daily Economic News (hereinafter "Daily News Reporter") that the amplifier effect of "Lobster" will definitely continue this year. It is expected that by the end of this year or early next year, L4-level models may be launched; at that time, many applications currently unimaginable will flourish.　　In terms of digital human technology development, JD's JoyStreamer has launched "Free State Digital Human," which offers more natural and lively interaction forms compared to traditional digital humans. When the reporter watched a live "eating broadcast" example on-site, it was clear that its movements and postures were more natural and smooth, and even with face occlusion, it maintained high-fidelity realism.　　ByteDance and Alibaba are intensively competing in AI, with the domestic tech giants' battlefield shifting almost entirely toward deepening technology implementation and broadening ecological collaboration. JD's "AI solutions" take a different approach—targeting the industry end comprehensively. Can they catch up and surpass?　　The "Lobster" intelligent agent and the underlying "Token economy" have become recent focal points in the tech circle.　　Compared to launching its own "Lobster" product, JD Cloud chose to base its offerings on JoyAI large models, deploying lightweight cloud hosts, all-in-one machines, and other products via open-source OpenClaw architecture.　　The technical leader explained that many see "Lobster," but what we see is actually the model.　　Two years ago, OpenAI defined five levels for large models: Level 1 is Conversation, Level 2 is Reasoning, Level 3 is Agentic AI—where AI becomes an integrated system capable of autonomous actions to solve problems. Level 4 is Innovation, where systems no longer rely on human intervention and possess autonomous creative thinking. The highest level, Organization, reaches or surpasses human levels, capable of improving work efficiency.　　When asked about the differences between current "Lobster" and last year's Manus (the world's first general AI agent), the technical leader pointed out that the core issue still lies in the capabilities of the foundational models. Last year's base models just broke through Level 2; even the popular DeepSeek last year was only at Reasoning level, and its model did not yet have Agentic features.　　He added, "Creating an agent requires a lot of engineering, strategy, and process to ultimately 'package' it. The ClawCode model developed at the end of last year and early this year broke through Level 3 at the model level, truly reaching the Agentic stage."　　On the technical route, he believes that soon we may see large models enter the next Innovation level, where the model itself has creative abilities. "This creativity isn't just about generating a paragraph or a song; it's about replacing humans in high-difficulty areas that require full human wisdom. It hasn't happened yet, but it could within a year. The technical path is clear," he explained further.　　He also predicts that AGI in software models might be realized within one or two years, possibly by the end of this year. Whether they can catch this wave of Agentic models and then Innovation models—timing for a company might be just one year.　　"However, the amplifier effect of 'Lobster' will continue this year. When L4 models are released, many applications currently unimaginable will flourish," the technical leader said.　　While observing JD's digital human JoyStreamer progress, the reporter noted that the three major technical pain points in the digital human industry—audio-video desynchronization, multimodal control inconsistency, and long video identity distortion—are being addressed one by one.　　Besides revealing the technical route, JD's JoyStreamer also launched the "Free State Digital Human," supporting natural movement and flexible posing, with camera follow and smooth in-and-out-of-frame capabilities, maintaining high fidelity even with face occlusion.　　Does the advancement of digital human technology mean the industry is closer to large-scale application? A JD digital human official told the reporter that the biggest challenge in scaling is reducing the actual operational dependence on merchants, avoiding too many conditions for generation. For example, reducing a 30-minute shoot to 3 minutes or a single image.　　"Last year, we launched a replication mode where all past live broadcast materials could be used to generate digital human live streams," he said.　　The reporter learned that the emergence of Agent (intelligent agent) technology products also brings some good news for large-scale digital human applications. The official explained that they currently use Agents to connect the vast amount of information already entered on the platform, including products and promotional activities. This allows for accurate, high-quality answers to user questions and reduces reliance on merchants, making large-scale digital human deployment feasible.　　How effective are digital humans in practice? The reporter learned from JD that conversion in live streams is undoubtedly the ultimate business indicator. But process metrics also matter, such as how long users stay in the live room, how many interaction rounds they have, and how these reflect potential user demand for products.　　Embodied intelligence remains hot from last year into this year. Since March, several large funding rounds have been completed in this field. Additionally, on March 20, the Shanghai Stock Exchange accepted Yu Shu Technology's IPO application on the STAR Market, with an estimated fundraising of 4.202 billion yuan, potentially becoming the first "humanoid robot" stock in A-shares.　　Last year, JD made a rare "six consecutive investments" in embodied intelligence and has released multiple robotics industry plans in recent years. At the recent China Development High-Level Forum 2026, JD CEO Xu Ran revealed that JD is building the world's largest and most comprehensive embodied intelligence data center.　　A major pain point in embodied intelligence is the lack of real scene data, leading to insufficient model training and hindering industry implementation. "Within two years, we will accumulate over 10 million hours of real scene data covering logistics, home, city, and other major scenarios," Xu Ran said.　　Regarding the project's progress, a JD official disclosed that during data collection, JD will mobilize over 100,000 internal employees of various professions and up to 500,000 external industry personnel, including over 100,000 citizens in Suqian, to participate in the "largest human-scale data collection campaign."　　The reporter learned that the specific implementation cycle for this project is to collect 5 million hours of real scene videos within the next year, exceeding 10 million hours in two years, while also collecting 1 million hours of robot body data.　　A JD embodied intelligence business leader told the reporter that initially, everyone focused on the robot's hardware control, VLA dual arms, or dexterous hands, making robots behave like humans, understand human speech, and respond accordingly—this is called "consistent speech and action." Most companies making robots and robotic dogs are already involved in this area, but it’s only part of the functionality/stage.　　"He added, 'Additionally, our JD Exploration Institute’s team is working on VLN (Visual Language Navigation) and integrating it into JoyInside, combining different robots and robotic dogs to do more human-machine interaction work.'"

Is the window only 1 year? The AI arms race behind the “Lobster” racing competition: JD.com finds an “industry-side” solution, can it catch up and overtake?

Trending Topics

GateOfficiallyIntegratesPolymarket

PreciousMetalsLeadGains

CryptoMarketClimbs

USIranClashOverCeasefireTalks

StablecoinDeYieldDebateIntensifies

Hot Gate Fun

狗头人生

狗头人生

KX

XKPY

MMD

MathMayaEver

MLX

抖音真神

testicleid

testicleid

Pin