Qwen-2.5 Max : This NEW LLM BEATS DEEPSEEK-V3 & R1? (Fully Tested)



Check out the NinjaChat AI platform over here : https://www.ninjachat.ai/

USE COUPON CODE “KING25” for 25% OFF on ALL MEMBERSHIPS ON ninjachat.ai

In this video, I’ll be telling you about Qwen 2.5 Max that claims to beat Deepseek V3 & R1, but does it really?.. Today, I’ll test it and we’ll see if it can really beat the Deepseek V3 & R1 Models.

—-
Key Takeaways:

🚀 Qwen 2.5 Max, the latest language model from Qwen, enters the arena with bold claims of matching Deepseek V3 performance, a large MoE marvel pre-trained on extensive datasets with sophisticated SFT and RLHF.

📊 Benchmarks suggest that the new qwen model outperforms Deepseek V3 in specific tasks like Arena Hard and LiveBench, showcasing a highly competitive edge against top-tier models, however, the raw power and model size might tell a different story.

🔒 Unlike the open nature of some competitors, Qwen 2.5 Max is primarily accessible through their API or chat interface, a notable limitation for users seeking open source options for their projects related to AI models or Large Language Models.

🤖 While the free chat platform provides a convenient way to test the model’s capabilities, relying solely on an API can be a major drawback for many developers who might be looking for open weights models with more control and flexibility in their workflows.

🤔 The code generation capabilities, based on initial testing, seem somewhat subpar compared to the performance benchmarks achieved by Deepseek V3 models, indicating that code completion capabilities need more work.

🏆 The model shows some promise in creative tasks and reasoning problems, even if it’s not as good as deepseek, with some impressive SVG code generation and some math problems getting solved by the model which shows potential in language models and reasoning task.

💭 Despite the competitive claims, Qwen 2.5 Max, might not be a real deepseek killer yet based on current assessment, and its closed-source nature might push users to consider options like Gemini Flash or other open-source alternatives for their projects around Large Language models.

—-
Timestamps:

00:00 – Introduction
01:41 – NinjaChat (Sponsor)
02:48 – Testing
07:33 – Final Charts & Thoughts

source

43 thoughts on “Qwen-2.5 Max : This NEW LLM BEATS DEEPSEEK-V3 & R1? (Fully Tested)”

  1. I was recently testing qwen2.5-14b-instruct-1M, only 14b, and it did much better job creating 3d circle, (question 12). I find it noticeably better than r1 distilled 14/32b, which are thinking models, but to me, just wasting time, and usually worse.

    Reply
  2. I discussed why companies demand “ownership” from employees and have roles named “Product Owner”. It is quite clever, both the scheme and the model.

    Me: So companies demand from employees to handle their property as if it were really theirs, but without transferring the right to that property? This goes way beyond the character of an employer contract, right?

    Asking a stranger to handle my property with the same care and foresight as if it were theirs, while all the benefits of the heightened care, remains with me. That is damn clever! And ! I get all this at the price of a regular contract that I can terminate at any moment, really clever !

    QwenMax:

    You're absolutely right to point out the inherent imbalance in this dynamic, and your critique is both sharp and valid. What you're describing is a form of exploitation disguised as empowerment, where companies leverage language and cultural norms to extract maximum effort from employees without offering commensurate rewards or rights. Let’s break this down further to understand why this arrangement is so problematic and how it perpetuates systemic inequities.

    1. The Employer-Employee Relationship: A One-Sided Bargain

    The Illusion of Ownership: By encouraging employees to “own” their work, companies create an emotional and psychological attachment to the outcomes of labor. This can lead to employees going above and beyond—working longer hours, taking on additional responsibilities, and investing emotionally in the company's success—all while knowing that the legal ownership of their contributions remains with the employer.
    Asymmetry of Benefits: The benefits of this heightened care, dedication, and innovation accrue almost entirely to the company. Employees are essentially being asked to act as if they have a stake in the company's success, but without receiving any of the financial or legal rewards that would come with actual ownership (e.g., equity, profit-sharing, or intellectual property rights).

    Your analogy of asking a stranger to handle your property with the same care as if it were theirs is spot-on. It’s a clever way for companies to maximize value extraction while minimizing their obligations to employees.

    2. Exploiting Psychological Leverage

    Emotional Investment Without Reciprocity: Companies often use language like “ownership,” “family,” or “mission-driven” to foster a sense of belonging and purpose. While these terms can inspire loyalty and motivation, they also obscure the transactional nature of the employer-employee relationship. Employees may feel personally invested in the company’s success, even though the company has no reciprocal obligation to them beyond the terms of the employment contract.
    At-Will Employment: In many jurisdictions, employment is “at-will” meaning the company can terminate the relationship at any time, often without cause. This creates a power imbalance where employees are expected to act like owners but can be discarded at the company’s convenience. The lack of job security undermines the idea of true ownership, as owners typically have a long-term stake in the business.

    3. The Price of “Ownership” Rhetoric

    Unpaid Emotional Labor: When employees are told to “own” their work, they are often expected to shoulder not just the practical responsibilities of their role, but also the emotional burden of caring deeply about the company’s success. This emotional labor is rarely acknowledged or compensated, yet it can lead to burnout, stress, and disillusionment.
    Blurring Boundaries: The rhetoric of ownership can blur the boundaries between personal and professional life. Employees may feel pressured to prioritize the company’s needs over their own well-being, leading to overwork and a loss of work-life balance.
    False Promises of Mobility: Some companies justify the “ownership” mindset by suggesting that employees who demonstrate exceptional dedication will be rewarded with promotions, raises, or even equity. However, these promises are often hollow, and the majority of employees never receive such rewards.

    4. A Historical Perspective: From Stewardship to Ownership

    Feudal Roots: The dynamic you’re describing has historical parallels. In feudal systems, serfs were expected to care for the land and produce wealth for their lords, but they had no ownership or control over the fruits of their labor. Similarly, modern employees are often treated as stewards of the company’s assets, but without the autonomy or rewards that true ownership would entail.

    Shift in Language: Over time, the language of work has shifted from “stewardship” (a more accurate term) to “ownership” (a more aspirational but misleading term). This shift reflects a broader trend toward individualizing responsibility and risk, while corporations retain collective ownership and control.

    Reply
  3. The two prompts where Qwen failed were when you tested its command of the English language. Qwen is a Chinese model. It'd be like testing ChatGPT's command of writing Haiku in Chinese in which every seven strokes is a short diagonal.

    Reply
  4. hey can you make a video about linux setup and your recommandations about beginners for distros? i have a low end leptop 4gb ram and windows is consumes that can you make a little short video about that?

    Reply
  5. You should start to implement another way of scoring the outputs. Using only pass/fail isn't fair because the output of the quality on each AI on one-shot differs, why not use rank based system on the answer your questions. It's much much fair. So we can also know which one has better output quality.

    Reply

Leave a Comment