The model represents a new frontier where AI not only performs tasks but also participates in its own development cycle. MiniMax states this capability allows M2.7 to build complex agent harnesses and complete elaborate productivity tasks using features like Agent Teams, specialized Skills, and dynamic tool search. The company internally tasked M2.7 to update its memory and build dozens of skills to aid reinforcement learning experiments, initiating a cycle of self-evolution MiniMax Models.
Evolving Software Engineering and Agent Collaboration
M2.7 delivers substantial performance in real-world software engineering scenarios, encompassing end-to-end project delivery, log analysis, and bug troubleshooting. It scored 56.22% on the SWE-Pro benchmark, placing it on par with leading models, and achieved 57.0% on Terminal Bench 2, which assesses the deep understanding of complex engineering systems. Its capabilities extend to full project delivery, scoring 55.6% on VIBE-Pro, indicating it can handle diverse requirements from web to mobile development. This is not just code generation; it's about understanding and navigating production systems. For instance, M2.7 can correlate monitoring metrics, perform causal reasoning, and even proactively connect to databases for root cause analysis during live debugging. MiniMax claims this has reduced incident recovery time in production systems to under three minutes on multiple occasions.
What truly sets M2.7 apart is its native support for Agent Teams, enabling multi-agent collaboration. This capability demands the model to manage role boundaries, engage in adversarial reasoning, adhere to protocols, and differentiate behaviors—aspects that go beyond simple prompting. In these scenarios, the model maintains a stable role identity, proactively challenges teammates' logic, and makes autonomous decisions within intricate state machines. This signifies a move towards more sophisticated, collaborative AI systems.
Beyond Code: Professional and Emotional Intelligence
M2.7’s advancements aren't confined to software development; it also exhibits enhanced domain expertise and task delivery in professional office environments. In the GDPval-AA evaluation, which measures domain expertise across 45 models, M2.7 achieved an ELO score of 1495. This positions it as the highest among open-source models, surpassing GPT-5.3 and trailing only Opus 4.6, Sonnet 4.6, and GPT-5.4, according to MarkTechPost. The model can handle complex editing in Office Suite applications—Excel, PowerPoint, and Word—performing multi-round revisions and high-fidelity editing.
The model's ability to interact with complex environments is further evidenced by its 97% skill adherence rate while working with over 40 complex skills, each exceeding 2,000 tokens, during MM Claw testing. This means it can flexibly adapt to various contexts and stably follow instructions over extended interactions. Moreover, M2.7 showcases improved character consistency and emotional intelligence. MiniMax highlights this opens doors for innovative product applications beyond pure productivity, such as interactive entertainment experiences where AI characters proactively engage with their environment. The company demonstrated this with OpenRoom, an open-sourced interaction system.
The Mechanics of Self-Evolution
MiniMax built an internal workflow that allows the M2-series models to self-evolve, pushing the boundaries of agentic capabilities. This research agent harness interacts and collaborates with different project groups, supporting data pipelines, training environments, and cross-team collaboration. Researchers guide this system, and it autonomously handles aspects like literature review, experiment execution, monitoring, debugging, and code fixes. M2.7 can manage 30-50% of this workflow.
The model's capacity to recursively evolve its own harness is crucial. It collects feedback, builds evaluation sets, and continuously iterates on its architecture, skill implementation, and memory mechanisms. For example, M2.7 autonomously optimized a model's programming performance through over 100 iterative rounds of "analyze failure trajectories → plan changes → modify scaffold code → run evaluations → compare results → decide to keep or revert changes." This process led to a 30% performance improvement on internal evaluation sets by discovering effective optimizations like systematic sampling parameter searches and improved workflow guidelines.
In preliminary tests for fully autonomous AI self-evolution in low-resource scenarios, M2.7 participated in 22 machine learning competitions on the MLE Bench Lite. Guided by a simple harness with short-term memory, self-feedback, and self-optimization modules, M2.7's trained ML models continuously improved. One run achieved 9 gold, 5 silver, and 1 bronze medal, with an average medal rate of 66.6% across three trials. This result ties with Gemini-3.1 and is second only to Opus-4.6 (75.7%) and GPT-5.4 (71.2%).
What This Means for Frontier AI
The release of MiniMax M2.7 underscores a significant trend toward more autonomous and capable AI systems. By demonstrating self-evolution and top-tier performance in demanding areas like software engineering and professional tasks, MiniMax positions itself as a key player in the global AI landscape. This development suggests a future where AI models can not only execute complex instructions but also learn, adapt, and improve themselves with less human intervention, potentially accelerating the pace of innovation across various industries.
However, the rapid advancement of Chinese AI companies, including MiniMax, has also raised concerns among some Western counterparts. Companies like OpenAI and Anthropic have accused Chinese firms of "adversarial distillation" — extracting capabilities from their cutting-edge models to gain an advantage in the global AI race, according to Los Angeles Times. Anthropic notably blocked Chinese-controlled companies from using its Claude chatbot model last year and identified MiniMax among others for illicitly extracting capabilities. This adds a layer of competitive tension to the impressive technical achievements demonstrated by M2.7.








