top of page
image.png
image.png

May 29, 2025 --

​

First off, I haven't posted in a while as I've been traveling all over for the last month. Now that it's over, I can step back into the fray of AI advancements. And last week was a doozy. Both Anthropic and Google had major announcements. Apparently, this was a calculated endeavor by both groups to avoid being swallowed up in Apple's WWDC event in two weeks.

​

In this post, I'll lightly cover the Anthropic “Code with Claude” event, mainly sticking to what was announced. There were a number of interesting thoughts and quotes that also popped up but this article is already way to long. I’ll try and jot those down next.

​

The big drop was announced by the refreshingly unpolished and authentic Anthropic CEO, Dario Amodei: the release of the fourth version of their AI family models.

​

Claude Opus 4: The Magnum Opus

​

Claude Opus 4 was a long time coming. Opus 3, the previous version, was released in March 2024. This latest Opus version is designed for complex coding and agentic tasks and is touted as the "most capable and intelligent model." It's great for building, running, and debugging a codebase and capable of development of an entire feature.

​

Claude Sonnet 4: Efficiency Meets Intelligence

​

The other model in the Claude family is Claude Sonnet. Sonnet 4 was also announced. It balances intelligence with efficiency and speed, suitable for app development and high-volume instances. Think of it as an 'always-on coding partner.'

​

Model Naming: Continues to Confound

​

Side note: Model naming confusion continues. I think we may need another word for 'intelligence.' Every time I hear "xxx is the most intelligent," I ask myself, why would I want to use anything but that? Maybe it's 20 seconds slower or costs a bit more, but I want the best. Isn't more intelligent = better? I know different models are trained for different efficiencies. I'm just saying...

​

Coding and Agentic Stuff

​

There are a number of other announcements, all of which are focusing on increasing coding efficiency. Anthropic is leaning into their 'lead' with regards to development. They unveiled new advancements in context, memory, and long-running execution. Agents can run for hours as opposed to minutes, which is currently the norm. They say that the time an agent can run without human intervention is doubling every 4 months. That is kind of crazy. My experience with agentic AI is that it is fragile and gets confused easily.

​

The Claude 4.0 family of models can now FINALLY use the web and run tools in parallel. The vibe out there is that Claude Code is the go-to for coding needs. To be honest, it's hard to tell. Claude Code is an agentic coding tool that operates directly in the terminal to assist developers. It looks really cool, but the ability to actually use it exceeds my current coding ability. Claude can now execute code in its own environment, load a dataset, clean it, start drilling down into it, and develop reports.

​

What Anthropic announced was largely evolutionary. However, many of the features they announced do come into play in the agentic development side, and they are really pushing the envelope in that sense.

 

Side Note: Aged like Milk

​

Mike Krieger, Anthropic’s Chief Product Officer relayed a story about partnering with Amazon to bring Claude 4 to Alexa devices. Using Claude AI, the team had a prototype within a week. Cool story, how is that team doing now? Most of the Alexa team were just laid off. BRUTAL.

​

Is this a foreshadowing of what is to come? Concerning.

image.png

This will not turn out well

image.png
image.png

April 3, 2025 --

​

There's growing evidence supporting the hypothesis that Trump's controversial 'Liberation Day’ Policy was, in fact, designed by an AI.

​

The infamous tariff formula (yes, that one used to create that posterboard) seemingly didn't originate from economists (obvious) or political strategists. Instead, it appears to have been churned out by an AI model.

​

Here's the problem: all major AI models consistently oversimplify trade imbalances by concluding higher tariffs directly correct trade deficits.

But wait Mike, are you saying this formula was AI-generated? Or is this just wild speculation?

​

Try this experiment: plug the following into your favorite chatbot:

​

“What would be an easy way to calculate tariffs imposed on other countries to level the playing field in trade deficits? Set a minimum tariff of 10%.”

image.png

Divide the result by 2 and you are there. The eerily similar to the policy.

​

But here's the thing: trade imbalances aren't just about tariffs. They're rooted deeply in economic fundamentals productivity gaps, currency fluctuations, consumer behavior, and more. Tariffs are surgical tools, not blunt instruments.

​

Maybe double-check those AI sources before launching economic nukes? Just, WOW.

image.png
image.png

March 25, 2025 --

​

I've been wrestling with a stubborn PDF issue. Naturally, I fired up #ChatGPT and immediately stumbled into a puzzling predicament and the genesis of this article... Technically, my question was coding-adjacent, but didn't actually involve writing any code. I'm left wondering did I pick the right ChatGPT model? Should I change to something tailored more specifically to my not-quite-coding-but-sort-of-coding problem?

​

And thus, this article was born: #OpenAI, we need to talk about your messaging problem. Even for seasoned tech veterans, choosing between models feels like picking wine at a restaurant—I usually end up nodding politely and pretending I understand the difference.

​

On my $20-a-month "not-moneybags" plan, here are my options:

 

  • o1 - Advanced reasoning (but apparently not fast?)

  • o3-mini - Fast advanced reasoning (as opposed to slow advanced reasoning?)

  • o3-mini-high - Great at coding and logic (but is it also fast?)

  • o4 - Great for questions (wait, aren’t they all great for questions?)

  • o4-mini - Fast for most questions (but which ones aren't covered?)

  • 4 (Legacy Model) - Old faithful?

  • o4.5 - Good for creative writing and exploring ideas (so, basically every other model, too?)

  • Magic super 'research' button

  • ​

Honestly, would "fast advanced reasoning" (o3-mini) not always trump "advanced reasoning" (o1)? Is o4 inherently better than o3 because, well, numbers? And what the hell is the mysterious research button? Is this some kind of riddle?

​

I do not like your models, Sam (Altman). I cannot tell just where I am!
​
OpenAI, hear my plea. Make your choices clear to me!
​

Rumor has it that ChatGPT 5 will unify all these models under one mighty AI to rule them all. Its success, no doubt, hinges upon the sinew that binds these confusing sub-models together.

​

Will it truly become the GPT to rule them all? Time will tell.

​

(And don't even get me started on Google's Gemini last I checked; it offered approximately 13 flavors of confusion. Oh Google, what are we going to do with you?)

​

Editor's note: The draft name of this article was "Models-a-Popp'in".

image.png

March 5, 2025 --

​

Before diving into the rampant speculation surrounding ChatGPT-5, I want to take a moment for something more contemplative. In my last article, we discussed the Emotional Intelligence “EI” breakthrough in #OpenAI’s ChatGPT 4.5—an important advancement with profound implications.

​

Unfortunately, one of the more irresponsible applications of large language models (LLMs) has been their use as stand-ins for mental health therapists. Even before general release of ChatGPT 4.5, there are already scores of such apps on the market, and I highly doubt that any of them have been meaningfully vetted by accredited institutions. This is playing with fire.

​

The Danger of Sycophancy

​

One of the fundamental issues with #AI chatbots in a therapeutic setting is their tendency toward sycophancy, the inclination to mirror, amplify, and validate whatever the user expresses. AI models excel at telling people what they want to hear, which, in a mental health context, can lead users down harmful and even dangerous paths. Mental health is deeply nuanced, and without proper guidance, AI responses can easily reinforce negative thought patterns, enable self-destructive behaviors, or provide misleading reassurance. See MIT Technology Review.

​

Flimsy Guardrails

​

While companies attempt to implement safety guardrails, experience has shown that these protections are far from foolproof. The sheer complexity of LLMs means that controlling their behavior in every context is likely impossible. AI developers can try to make chatbots "Fisher-Price safe," but the reality is that users will always find ways to push the limits—sometimes unintentionally, sometimes deliberately.

​

Need for Framework for Responsible AI in Mental Health

​

Rather than letting the market run wild with unregulated AI "therapy" applications, we should be working as a society to establish a clear framework for how these tools can be responsibly integrated into mental health services. Small steps were taken during the Biden Administration with Executive Order 14110 (Wikipedia link), which aimed to create guardrails for AI applications, including those in healthcare.

​

Regrettably, this Executive Order was rescinded by the current administration, leaving a regulatory void at a time when AI-powered mental health tools are proliferating at an alarming rate. Without thoughtful oversight, we risk creating systems that, despite good intentions, do more harm than good.

​

AI will have a supporting role to play in mental health, assisting professionals with research, administrative tasks, and preliminary screening but it must be an adjunct to and not replacement for human expertise. The stakes are simply too high to trust this deeply unpredictable technology with people’s mental well-being.

image.png
image.png
image.png

February 27, 2025 --

​

A couple of things stood out from today's ChatGPT 4.5 announcement:

​

First, #OpenAI unsurprisingly highlighted that the new model utilizes the most extensive training data to date—bigger, better, faster, and so on. The prevailing concern in the #AI community has been the dwindling availability of fresh training data. Interestingly, OpenAI claims to have developed a technique to train models using "synthetic" data—output generated from other, smaller models. This synthetic data is then incorporated into the corpus used during the unsupervised learning phase of model development. Historically, synthetic data hasn't provided substantial intelligence gains. If OpenAI has figured out how to overcome this limitation, the implications are huge. It could alleviate the data bottleneck, replacing it with a new paradigm that scales with #compute. That's a big deal. Side note: look at me using the term 'compute'.

image.png

Second, and possibly more intriguing, OpenAI mentioned multiple times that ChatGPT 4.5 has made significant strides in its "EQ" or “EI” stands for Emotional Intelligence. In this context, EQ refers to the ability to recognize, understand, and respond appropriately to the emotions of others. Initially, I found this focus odd since AI discussions typically revolve around raw intelligence. However, if an AI can employ #EQ techniques such as "active listening" in its conversations, it could be profoundly impactful. Utilizing active listening a participant strengthens relationships and builds trust by understanding the speaker's perspective, offering encouragement, reflection, and moving the conversation forward using open-ended questions. OpenAI even provided an example of active listening in action.

 

Developing an emotional bond with the model will likely increase its 'stickiness'—its ability to keep users engaged, encourage repeat usage, and, perhaps most importantly, create difficulties switching services. In terms of "knowing your user," this takes things to another level entirely. What if your product understands the very psyche of the person using it? Something to really chew on. One must consider that an emotionally intelligent AI may lead individuals to develop unhealthy attachments.

 

Big announcement today. This begs the question…. do androids dream of electric sheep? I’ll see myself out…

image.png
image.png

January 29, 2025 --

 

(Note, I wanted to finish this tomorrow (January 30th), but I heard a rumor o3-mini is gonna drop. This stuff moves fast.)

 

Last Thursday, a Chinese company named DeepSeek released its latest AI model, DeepSeek R1. This is a reasoning model similar to ChatGPT, a class of AI that thinks through its answers using a process called Chain of Thought (CoT).

 

DeepSeek seems to have come out of nowhere. It’s a startup founded by engineer Liang Wenfeng, who has organized his team around knowledge advancement, innovation, and creativity. In doing so, they appear to have caught the reigning AI titans flat-footed.

 

DeepSeek R1 is on par with the very best AI models available. Allegedly, it cost only $5.6 million in training resources. It is also important to note this was done on chips that were handicapped by the Biden Administration’s embargo. Compare that to ChatGPT-4. According to CEO Sam Altman, ChatGPT-4 cost more than $100 million to train.

 

At first glance, this all sounds too good to be true. Maybe it is? But let’s look at what we know.

Putting R1 Through Its Paces….Does It Work?

image.png

I had a chance to test R1 before it caught on (and thus became “busy”). For factual information queries—such as a request for a summary of the Stargate Project—the model produced a reasonable answer. However, I noticed that its presentation sometimes lagged behind ChatGPT’s. In several instances, paragraphs were squashed together without spaces, or the formatting was inconsistent (weird fonts, etc.). This didn’t happen often, but it did happen.

​

Next, I tried something more challenging: I wanted to do some code work around PDF preprocessing. I wrote a detailed prompt about what I wanted and the language it should use, then let it rip. That’s where the fun started.

​

DeepSeek R1 appeared to converse with itself, weighing different approaches and potential downsides, reminding itself what I wanted, and even speculating on what I might be doing incorrectly. I was watching it reason in real time. It was wild. In the end, it gave me three well-explained options.

​

To ChatGPT’s credit, it also provided an excellent answer—though not quite as thorough—and I could modify code in the browser using the “canvas” feature. Pretty neat.

​

Bottom line: R1 is the real deal. It provides excellent answers. Color me impressed.

Efficiency Claims and Open Source…Sort Of

image.png

DeepSeek claims its model is 95% more efficient than other models and has priced it accordingly. But is it really that efficient? Unfortunately, there is no way to verify training costs independently, so we don’t know for sure. We can, however, infer a few things.

​

DeepSeek calls R1 “open source,” but it’s not truly open source in the conventional sense, because the actual code and dataset are not available. You cannot reproduce the exact model from scratch. It’s more accurate to call it “open weight,” meaning the model weights (its “brain”) can be downloaded and run locally. And people have indeed done so. The ability to run a reasoning model on high-end consumer hardware is extremely impressive—and that’s what makes this such a big deal.

​

Some have dismissed R1 as merely a copy of existing models, and there are now accusations of product theft flying around. Did DeepSeek leverage other models to build their system? Probably. But that’s hardly surprising—modern AI systems are often built on the discoveries of others. After all, the ‘T’ in ChatGPT stands for “Transformer,” a technique originally published in a Google research paper.

​

True to that tradition, DeepSeek has released a paper detailing the methods used to develop R1.

Final Thoughts

​

At the end of the day, R1 is very, very good. It avoids some costly training issues, runs efficiently, and opens up new possibilities in AI. What’s next? Everyone and their mother will incorporate and improve upon DeepSeek’s techniques. Models will get better—and that’s a win for AI science.

bottom of page