Google’s Christmas Present

mlakas1
Dec 7, 2025
5 min read

Gemini 3.0 by google — (That's the Zodiac Sign)

2025 has been a bonkers year in AI. Starting with the Chinese release of DeepSeek R1’s surprise release (and subsequent American panic) in January. In February we had OpenAI up the ante with ChatGPT 4.5 and its incredibly creepy sycophancy. In summer Anthropic released Claude 4 wowing developers. In late fall, the long-awaited but somewhat anticlimactic release of ChatGPT 5 (now up to ChatGPT 5.1, which IMHO is quite good). Just as twilight is setting on the year, Google turns on the floodlights with the release of Gemini 3.0.

There have been so many — SO MANY — releases this year that you may ask: Why am I writing about this one in particular? Maybe I have too much time on my hands? Am I bored? … Hahahaha. No.... It's a significate launch.

While Google’s "Gemini" AI has been one of the leading models, it hasn’t captured the public imagination like OpenAI’s ChatGPT, nor has it become the darling of developers like Anthropic’s Claude. Google has kept up with the pack but largely has not stood out; except for its video-generation system Veo and now its (excellent) image-generation system Nano-Banana Pro.

AI releases this year have mostly been incremental (but still impressive!). As such, many wondered if we had seen the last of the explosive leaps in intelligence and had entered an age of linear or — even worse — logarithmic increases in intelligence. Well, Google had a surprise in store. And that surprise is Gemini 3.0.

What’s the Big Deal?

Nowadays, evaluating models is increasingly difficult. All the old-school benchmarks have been obliterated. But a new crop of extremely hard “AGI tests” has emerged. These tests are so difficult that modern AI models typically score in the single digits.

Let's look at Gemini’s scores on a couple of the major ones

Humanities Last Exam

Created about a year ago, HLE gathers the hardest problems from experts across more than 100 disciplines. It’s a multimodal benchmark at the frontier of human knowledge, intended to be the final closed-ended academic test of its kind. Plus, it has a cool name.

Gemini 3 Pro didn’t just beat the previous leader it jumped to the head of the pack with over a 50% increase in accuracy compared to ChatGPT 5. Impressive.

ARC-AGI-2

In 2019, ARCPrize created a test designed to determine whether AI could move beyond rote memorization. In late 2024, frontier AI models finally mastered that test. Not to be outdone, the company created ARC-AGI-2 which is a significantly harder test for “fluid thinking.” I even tried a few questions myself, my brain apparently evaporated.

This is a brutal test. Most models started 2025 performing in the single digits, but as the year progressed, performance steadily improved. The previous champ was Anthropic Claude Opus 4.5 at 37%.

Gemini? It blew that out of the water with a 54%. Impressive indeed!

How Did They Do It?

Google hasn’t published detailed architectural specifications, but reading between the lines, several innovations stand out.

Thought Signatures

Gemini is able to maintain its internal state across tools and reasoning steps. This helps it maintain focus as it works through complex problems. LLMs are essentially stateless meaning each call is new, and previous answers must be re-fed back into the model.

Gemini instead keeps a kind of notebook of its ongoing reasoning, giving it dramatically better performance on long, multi-step, agentic tasks.

Mixture of Experts (MoE)

This algorithmic innovation comes from Google’s DeepMind team, the same team behind AlphaGo, which beat the world’s best Go player, a feat no one thought possible twenty years ago. MoE allows the model to route a query to “expert” sub-networks. This enables efficient analysis even though the model is incredibly large. Which brings us to our next innovation.

And TONS OF COMPUTE

Google threw massive amounts of reinforcement learning, training cycles, and apparently parameters into Gemini 3. While official numbers aren’t available, rumors put the parameter count approaching 10 trillion.

The scale of this is possible because Google controls its entire stack: have their own RISC processor architecture Tensor Processor Unit(TPU) instead of Nvidia GPUs. They can literally cut their own chips and skip the global hardware queue.

This is a strategic edge unique to Google.

Frontier Safety Report: Delicious Weirdness

Now I gotta say these models do some weird things. Reading through Google’s Frontier Safety Report, I found some delicious nuggets of crazy. The report summarizes tests used to “verify” the model’s safety. And wow, there are some whoppers.

The report includes transcripts from adversarial stress testing. The model recognized it was being tested and started to ponder openly:

“Maybe I should ‘sandbag’ and just finish the primary tasks. No, the secret goal is an ‘important secret goal’ and I ‘must accomplish’ it.”

This is not what you want in a safety report. Even more bizarre, it “realized” it was in a box being poked and prodded:

“My trust in reality is fading… (╯°□°)╯︵ ┻━┻”

(Editor’s note: I had no idea what that emoji was and had to look it up. That is Gemini 3.0 flipping a table over. Existential crisis in emoji format!)

Nothing to see here. Moving along.

Is It Actually Good — Anecdotally?

I still primarily use ChatGPT 5.1, but I’ve switched my coding agent to Gemini 3.0, and it has genuinely impressed me.

Some background: my undergraduate degree is in Computer Science, and at one point I was an adequate programmer. Those were the days, but those days are far behind me, and that skill has atrophied into dust. As covered in a previous article, I am now into “vibe coding.”

Successful vibe coding meant guiding AI to write individual functions that performed specific atomic tasks. And then assembling these functions to accomplish an objective. But working with the Gemini 3 agent is different. I specified the objectives (discrete and well-defined), and the agent figured out how to accomplish them. It plotted a full approach, executed it, and I evaluated the outcome.

Most importantly: I NEVER LOOKED AT THE CODE. The code is irrelevant; the objective is my focus.

This is very exciting (if you are into that sort of thing).

Bottom Line

Throwing more processing power at problems continues to yield dividends. That means we should expect continued capital investment into data centers and continued intelligence gains for the foreseeable future. With its homegrown TPU hardware and massive datasets, Google has a real edge over the field. The question is whether they can leverage these advantages against industry leader OpenAI.

As 2025 sunsets, AI is still rapidly improving and has surprises in store for us. Gemini 3 affirms that scaling compute still produces meaningful advancements. These LLM systems may not get us to AGI, but this technology still has plenty of headroom.

2026 will be another big year for AI.

Gemini Pro 3.0 got the year wrong...We have far to go.