At a Glance

Name: Overthinking AI? Comparing Top LLMs on Reasoning
Price: Free CAD
Rating: 4.9 (20 reviews)
Author: antonio_cangiano, kunal_makwana, anes_khadiri, faranak_heidari, karan_goswami

OpenAI’s o3, DeepSeek-R1, and IBM’s Granite-3.2 are redefining problem-solving with logical thinking. This guided project puts their reasoning to the test with classic riddles, revealing their strengths, weaknesses, and tendencies to overthink. Compare responses from different models—who solves puzzles efficiently, and who gets lost in details? Observe how prompt instructions shape reasoning, from concise answers to step-by-step breakdowns. Gain insights into AI's thinking process and its balance between clarity and complexity.

How Many R’s Are Actually in Strawberry? AI Tries to Reason It Out

You’d think counting the number of R’s in “strawberry” is an easy task, right? Well, ask an AI, and you might get a surprising variety of answers—some correct, some overcomplicated, and some that make you question reality itself.

Modern AI reasoning models, like OpenAI’s o3-mini, DeepSeek’s R1, and IBM’s Granite-3.2-8B-Instruct-Preview, are designed to excel in logical inference, math, and real-time decision-making. But how well do they handle classic riddles and puzzles? Do they crack them effortlessly, or do they overthink themselves into oblivion?

In this guided project, we’ll put AI’s reasoning to the test—throwing tricky puzzles at different models to see which ones get it and which ones take a wild detour. Can AI solve riddles efficiently, or will it get tangled in its own logic like a confused detective overanalyzing a case? Let’s find out!

A Look at the Project Ahead

Through this experiment, you’ll:
✅ Compare AI reasoning styles by testing multiple models on logic puzzles.
✅ Observe overthinking vs. direct solutions, identifying when excessive reasoning helps or hinders.
✅ Experiment with prompt engineering to guide AI reasoning effectively.

Comparing o3-mini and DeepSeek-R1-Distill_Llama-70B in Generative AI Classroom

Some models might instantly recognize the trick. Others might overcomplicate the question, dissecting each word before arriving at an answer. By analyzing their approaches, you’ll learn how AI handles reasoning and where it still struggles.

What You'll Need

Before starting, make sure you have:
🔹 Access to a Browser to run the Generative AI Classroom lab, where you can easily compare models side by side.
🔹 Basic understanding of logic puzzles (no programming required!).
🔹 Curiosity to explore AI’s strengths and quirks in solving problems.

Key Takeaways: What You’ll Learn

🔍 How AI Thinks – Compare different models’ approaches to logic puzzles.
🧠 Overthinking vs. Clarity – Learn when detailed reasoning is beneficial and when it leads to unnecessary complexity.
📊 AI Strengths and Weaknesses – Discover where AI excels and where it struggles with common sense and probabilistic thinking.

By the end of this project, you’ll have a hands-on understanding of how modern AI models reason, where they overcomplicate things, and how to refine their responses with better prompts.

Final Thought

AI reasoning isn’t just about getting the right answer—it’s about how the model arrives at it. While some models, like DeepSeek-R1, excel in step-by-step logic, they sometimes hesitate in drawing conclusions due to biases in their training. Others may simplify too much or get tangled in unnecessary details.

This project will help you appreciate the quirks of AI reasoning and recognize why tracking logical problem-solving is key to advancing AI’s real-world applications. So, let’s dive in, put these models to the test, and see which ones can outthink a brain teaser—and which ones need a reboot! 🚀

Offered By: IBMSkillsNetwork