Blog & Articles

Publications + Updates from Our Team

New Updates for Onyx Point AI

July 31, 2025 --

We have released a new version of Onyx Point AI with updates.

Including the following:

Added metadata to search results
- The LLM can now see the title and author of each source that it cites.
Added Semantic Analysis of User Prompt to search
- The natural language search will now respond appropriately to requests for works by a particular author, title or year.
- Natural language search is aware of the previous conversation thread and can respond to simple modifications such as "show me more by him from 1997".
Added a CSV download button for user imports
Document Self Service:
- Users can upload their own documents to be processed.
- Processed documents will be included in Onyx search results
- Processing jobs can be tracked in a new dashboard
- Documents that run into problems in processing will be shown with a page-by-page description of problems
- Documents can also be archived (Removed from search results) or Deleted (Removed from existing threads)
Added instrumentation for admin level action logging
Various enhancements and stability / efficiency improvements

Book a Free Demo Today

making applications fun again - Mike article.png

My Sojourn Into "Vibe Coding" Reality

Mike Lakas

Chief Strategy Advisory, Co-Founder

Onyx Point Systems, LLC

July 31, 2025 --

Recently, I came across a task that required some actual software development. Over the years, I’ve taken a few Python and Git courses for “fun,” but the days of actually programming have been relegated to the ancient past. My programming skills are basically gone. I no longer “just get it.” And this particular task required real development and would require days if not weeks of effort trying to stumble my way through syntax error after syntax error.

Instead of embarking on a long, painful project that I might not even finish, I decided to try experiment with something called “vibe coding.” Vibe coding refers to the practice of instructing AI agents to write code using natural language prompts. In some sense, it's like a higher-level programming language. (Editor’s note: yes, that’s a very controversial statement, said no one.)

I fired up Cursor, one of the many exploding vibe coding platforms. It’s based on Visual Studio Code, which I’m somewhat familiar with. And OMG—it actually works. Like, really works. I would explain what I needed to a coding agent, and it would either ask clarifying questions or go gangbusters and spit out lines and lines of (mostly) functional code.

I’m in awe of agentic AI’s capabilities in this space. It reminded me of the first time I used ChatGPT. Mind = Blown.

Now, to be clear, none of the tasks I was working on were particularly difficult. A real developer could have built those apps with one eye open and a busted laptop. But that’s not the point. I understand how to approach application development, but I just can’t write code in any reasonable way anymore. Vibe coding fills in those gaps. It complements my deficiencies. At one point, I even skipped out on precious gaming time just to enjoy the experience of building something again. I haven’t felt that way in years.

Of course, not everything is perfect in Camelot. Cursor absolutely could not handle basic file structure manipulation. It also has a sort of AI-flavored ADHD and completely loses track of what it was doing. Sometimes it decides to go off on a wild tangent, rewriting perfectly functional code and usually not for the better. And my personal favorite: it confidently announces that a task is complete… when it has done absolutely nothing. I’m sitting there watching it, like: “No you didn’t. I saw you. You did nothing.” Is it hallucinating? Being stubborn? Who knows.

Still, despite its flaws, AI-enabled vibe coding is an incredibly powerful programming tool. I’m in awe. This is truly Making Applications Fun Again. (MAFA - Trade Marked)

And it begs the question: What does the future of programming look like?

Made in China

SETOL: A Semi-Empirical Theory of (Deep) Learning

Charles H. Martin

Christopher Hinrichs, Ph.D.

CTO, Co-Founder

Onyx Point Systems, LLC

July 27, 2025 --

ABSTRACT

We present a Semi-Empirical Theory of Learning (SETOL) that explains the remarkable performance of State-of-the-Art (SOTA) Neural Networks (NNs). We provide a formal explanation of the origin of the fundamental quantities in the phenomenological theory of Heavy-Tailed Self-Regularization (HTSR), the Heavy-Tailed Power Law Layer Quality metrics, AlphaHat (α) and AlphaHat (ˆα). In prior work, these metrics have been shown to predict trends in the test accuracies of pretrained SOTA NN models, and, importantly, without needing access to the testing or even training data. Our SETOL uses techniques from Statistical Mechanics (StatMech) as well as advanced methods from Random Matrix Theory (RMT) and Quantum Chemistry. Our derivation suggests new mathematical preconditions for Ideal learning, including the new ERG metric (which is equivalent to applying a single step of the Wilson Exact Renormalization Group). We test the assumptions and predictions of our SETOL on a simple 3-layer Multi-Layer Perceptron (MLP), demonstrating excellent agreement with the key theoretical assumptions. For SOTA NN models, we show how to estimate the individual layer Qualities of a trained NN by simply computing the Empirical Spectral Density (ESD) of the layer weight matrices and then plugging this ESD into our SETOL formulae. Notably, we ex amine the performance of the HTSR α and the SETOL ERG Layer Quality metrics, and find that they align remarkably well, both on our MLP and SOTA NNs.

Read the full paper released here.

Back in the AI Saddle: #Anthropic's "Code with Claude" Event

Mike Lakas

Chief Strategy Advisory, Co-Founder

Onyx Point Systems, LLC

May 29, 2025 --

First off, I haven't posted in a while as I've been traveling all over for the last month. Now that it's over, I can step back into the fray of AI advancements. And last week was a doozy. Both Anthropic and Google had major announcements. Apparently, this was a calculated endeavor by both groups to avoid being swallowed up in Apple's WWDC event in two weeks.

In this post, I'll lightly cover the Anthropic “Code with Claude” event, mainly sticking to what was announced. There were a number of interesting thoughts and quotes that also popped up but this article is already way to long. I’ll try and jot those down next.

The big drop was announced by the refreshingly unpolished and authentic Anthropic CEO, Dario Amodei: the release of the fourth version of their AI family models.

Claude Opus 4: The Magnum Opus

Claude Opus 4 was a long time coming. Opus 3, the previous version, was released in March 2024. This latest Opus version is designed for complex coding and agentic tasks and is touted as the "most capable and intelligent model." It's great for building, running, and debugging a codebase and capable of development of an entire feature.

Claude Sonnet 4: Efficiency Meets Intelligence

The other model in the Claude family is Claude Sonnet. Sonnet 4 was also announced. It balances intelligence with efficiency and speed, suitable for app development and high-volume instances. Think of it as an 'always-on coding partner.'

Model Naming: Continues to Confound

Side note: Model naming confusion continues. I think we may need another word for 'intelligence.' Every time I hear "xxx is the most intelligent," I ask myself, why would I want to use anything but that? Maybe it's 20 seconds slower or costs a bit more, but I want the best. Isn't more intelligent = better? I know different models are trained for different efficiencies. I'm just saying...

Coding and Agentic Stuff

There are a number of other announcements, all of which are focusing on increasing coding efficiency. Anthropic is leaning into their 'lead' with regards to development. They unveiled new advancements in context, memory, and long-running execution. Agents can run for hours as opposed to minutes, which is currently the norm. They say that the time an agent can run without human intervention is doubling every 4 months. That is kind of crazy. My experience with agentic AI is that it is fragile and gets confused easily.

The Claude 4.0 family of models can now FINALLY use the web and run tools in parallel. The vibe out there is that Claude Code is the go-to for coding needs. To be honest, it's hard to tell. Claude Code is an agentic coding tool that operates directly in the terminal to assist developers. It looks really cool, but the ability to actually use it exceeds my current coding ability. Claude can now execute code in its own environment, load a dataset, clean it, start drilling down into it, and develop reports.

What Anthropic announced was largely evolutionary. However, many of the features they announced do come into play in the agentic development side, and they are really pushing the envelope in that sense.

Side Note: Aged like Milk

Mike Krieger, Anthropic’s Chief Product Officer relayed a story about partnering with Amazon to bring Claude 4 to Alexa devices. Using Claude AI, the team had a prototype within a week. Cool story, how is that team doing now? Most of the Alexa team were just laid off. BRUTAL.

Is this a foreshadowing of what is to come? Concerning.

This will not turn out well

AI-Designed Trade War: Algorithms Gone Rogue

Mike Lakas

Chief Strategy Advisory, Co-Founder

Onyx Point Systems, LLC

April 3, 2025 --

There's growing evidence supporting the hypothesis that Trump's controversial 'Liberation Day’ Policy was, in fact, designed by an AI.

The infamous tariff formula (yes, that one used to create that posterboard) seemingly didn't originate from economists (obvious) or political strategists. Instead, it appears to have been churned out by an AI model.

Here's the problem: all major AI models consistently oversimplify trade imbalances by concluding higher tariffs directly correct trade deficits.

But wait Mike, are you saying this formula was AI-generated? Or is this just wild speculation?

Try this experiment: plug the following into your favorite chatbot:

“What would be an easy way to calculate tariffs imposed on other countries to level the playing field in trade deficits? Set a minimum tariff of 10%.”

Divide the result by 2 and you are there. The eerily similar to the policy.

But here's the thing: trade imbalances aren't just about tariffs. They're rooted deeply in economic fundamentals productivity gaps, currency fluctuations, consumer behavior, and more. Tariffs are surgical tools, not blunt instruments.

Maybe double-check those AI sources before launching economic nukes? Just, WOW.

Musings on Models

Mike Lakas

Chief Strategy Advisory, Co-Founder

Onyx Point Systems, LLC

March 25, 2025 --

I've been wrestling with a stubborn PDF issue. Naturally, I fired up #ChatGPT and immediately stumbled into a puzzling predicament and the genesis of this article... Technically, my question was coding-adjacent, but didn't actually involve writing any code. I'm left wondering did I pick the right ChatGPT model? Should I change to something tailored more specifically to my not-quite-coding-but-sort-of-coding problem?

And thus, this article was born: #OpenAI, we need to talk about your messaging problem. Even for seasoned tech veterans, choosing between models feels like picking wine at a restaurant—I usually end up nodding politely and pretending I understand the difference.

On my $20-a-month "not-moneybags" plan, here are my options:

o1 - Advanced reasoning (but apparently not fast?)
o3-mini - Fast advanced reasoning (as opposed to slow advanced reasoning?)
o3-mini-high - Great at coding and logic (but is it also fast?)
o4 - Great for questions (wait, aren’t they all great for questions?)
o4-mini - Fast for most questions (but which ones aren't covered?)
4 (Legacy Model) - Old faithful?
o4.5 - Good for creative writing and exploring ideas (so, basically every other model, too?)
Magic super 'research' button

Honestly, would "fast advanced reasoning" (o3-mini) not always trump "advanced reasoning" (o1)? Is o4 inherently better than o3 because, well, numbers? And what the hell is the mysterious research button? Is this some kind of riddle?

I do not like your models, Sam (Altman). I cannot tell just where I am!

OpenAI, hear my plea. Make your choices clear to me!

Rumor has it that ChatGPT 5 will unify all these models under one mighty AI to rule them all. Its success, no doubt, hinges upon the sinew that binds these confusing sub-models together.

Will it truly become the GPT to rule them all? Time will tell.

(And don't even get me started on Google's Gemini last I checked; it offered approximately 13 flavors of confusion. Oh Google, what are we going to do with you?)

Editor's note: The draft name of this article was "Models-a-Popp'in".

AI Therapy: Dangerous Waters

Mike Lakas

Chief Strategy Advisory, Co-Founder

Onyx Point Systems, LLC

March 5, 2025 --

Before diving into the rampant speculation surrounding ChatGPT-5, I want to take a moment for something more contemplative. In my last article, we discussed the Emotional Intelligence “EI” breakthrough in #OpenAI’s ChatGPT 4.5—an important advancement with profound implications.

Unfortunately, one of the more irresponsible applications of large language models (LLMs) has been their use as stand-ins for mental health therapists. Even before general release of ChatGPT 4.5, there are already scores of such apps on the market, and I highly doubt that any of them have been meaningfully vetted by accredited institutions. This is playing with fire.

The Danger of Sycophancy

One of the fundamental issues with #AI chatbots in a therapeutic setting is their tendency toward sycophancy, the inclination to mirror, amplify, and validate whatever the user expresses. AI models excel at telling people what they want to hear, which, in a mental health context, can lead users down harmful and even dangerous paths. Mental health is deeply nuanced, and without proper guidance, AI responses can easily reinforce negative thought patterns, enable self-destructive behaviors, or provide misleading reassurance. See MIT Technology Review.

Flimsy Guardrails

While companies attempt to implement safety guardrails, experience has shown that these protections are far from foolproof. The sheer complexity of LLMs means that controlling their behavior in every context is likely impossible. AI developers can try to make chatbots "Fisher-Price safe," but the reality is that users will always find ways to push the limits—sometimes unintentionally, sometimes deliberately.

Need for Framework for Responsible AI in Mental Health

Rather than letting the market run wild with unregulated AI "therapy" applications, we should be working as a society to establish a clear framework for how these tools can be responsibly integrated into mental health services. Small steps were taken during the Biden Administration with Executive Order 14110 (Wikipedia link), which aimed to create guardrails for AI applications, including those in healthcare.

Regrettably, this Executive Order was rescinded by the current administration, leaving a regulatory void at a time when AI-powered mental health tools are proliferating at an alarming rate. Without thoughtful oversight, we risk creating systems that, despite good intentions, do more harm than good.

AI will have a supporting role to play in mental health, assisting professionals with research, administrative tasks, and preliminary screening but it must be an adjunct to and not replacement for human expertise. The stakes are simply too high to trust this deeply unpredictable technology with people’s mental well-being.

Quick Takes From the ChatGPT 4.5 Announcement

Mike Lakas

Chief Strategy Advisory, Co-Founder

Onyx Point Systems, LLC

February 27, 2025 --

A couple of things stood out from today's ChatGPT 4.5 announcement:

First, #OpenAI unsurprisingly highlighted that the new model utilizes the most extensive training data to date—bigger, better, faster, and so on. The prevailing concern in the #AI community has been the dwindling availability of fresh training data. Interestingly, OpenAI claims to have developed a technique to train models using "synthetic" data—output generated from other, smaller models. This synthetic data is then incorporated into the corpus used during the unsupervised learning phase of model development. Historically, synthetic data hasn't provided substantial intelligence gains. If OpenAI has figured out how to overcome this limitation, the implications are huge. It could alleviate the data bottleneck, replacing it with a new paradigm that scales with #compute. That's a big deal. Side note: look at me using the term 'compute'.

Second, and possibly more intriguing, OpenAI mentioned multiple times that ChatGPT 4.5 has made significant strides in its "EQ" or “EI” stands for Emotional Intelligence. In this context, EQ refers to the ability to recognize, understand, and respond appropriately to the emotions of others. Initially, I found this focus odd since AI discussions typically revolve around raw intelligence. However, if an AI can employ #EQ techniques such as "active listening" in its conversations, it could be profoundly impactful. Utilizing active listening a participant strengthens relationships and builds trust by understanding the speaker's perspective, offering encouragement, reflection, and moving the conversation forward using open-ended questions. OpenAI even provided an example of active listening in action.

Developing an emotional bond with the model will likely increase its 'stickiness'—its ability to keep users engaged, encourage repeat usage, and, perhaps most importantly, create difficulties switching services. In terms of "knowing your user," this takes things to another level entirely. What if your product understands the very psyche of the person using it? Something to really chew on. One must consider that an emotionally intelligent AI may lead individuals to develop unhealthy attachments.

Big announcement today. This begs the question…. do androids dream of electric sheep? I’ll see myself out…

DeepSeek - Why It Matters

Mike Lakas

Chief Strategy Advisory, Co-Founder

Onyx Point Systems, LLC

January 29, 2025 --

(Note, I wanted to finish this tomorrow (January 30th), but I heard a rumor o3-mini is gonna drop. This stuff moves fast.)

Last Thursday, a Chinese company named DeepSeek released its latest AI model, DeepSeek R1. This is a reasoning model similar to ChatGPT, a class of AI that thinks through its answers using a process called Chain of Thought (CoT).

DeepSeek seems to have come out of nowhere. It’s a startup founded by engineer Liang Wenfeng, who has organized his team around knowledge advancement, innovation, and creativity. In doing so, they appear to have caught the reigning AI titans flat-footed.

DeepSeek R1 is on par with the very best AI models available. Allegedly, it cost only $5.6 million in training resources. It is also important to note this was done on chips that were handicapped by the Biden Administration’s embargo. Compare that to ChatGPT-4. According to CEO Sam Altman, ChatGPT-4 cost more than $100 million to train.

At first glance, this all sounds too good to be true. Maybe it is? But let’s look at what we know.

Putting R1 Through Its Paces….Does It Work?

I had a chance to test R1 before it caught on (and thus became “busy”). For factual information queries—such as a request for a summary of the Stargate Project—the model produced a reasonable answer. However, I noticed that its presentation sometimes lagged behind ChatGPT’s. In several instances, paragraphs were squashed together without spaces, or the formatting was inconsistent (weird fonts, etc.). This didn’t happen often, but it did happen.

Next, I tried something more challenging: I wanted to do some code work around PDF preprocessing. I wrote a detailed prompt about what I wanted and the language it should use, then let it rip. That’s where the fun started.

DeepSeek R1 appeared to converse with itself, weighing different approaches and potential downsides, reminding itself what I wanted, and even speculating on what I might be doing incorrectly. I was watching it reason in real time. It was wild. In the end, it gave me three well-explained options.

To ChatGPT’s credit, it also provided an excellent answer—though not quite as thorough—and I could modify code in the browser using the “canvas” feature. Pretty neat.

Bottom line: R1 is the real deal. It provides excellent answers. Color me impressed.

Efficiency Claims and Open Source…Sort Of

DeepSeek claims its model is 95% more efficient than other models and has priced it accordingly. But is it really that efficient? Unfortunately, there is no way to verify training costs independently, so we don’t know for sure. We can, however, infer a few things.

DeepSeek calls R1 “open source,” but it’s not truly open source in the conventional sense, because the actual code and dataset are not available. You cannot reproduce the exact model from scratch. It’s more accurate to call it “open weight,” meaning the model weights (its “brain”) can be downloaded and run locally. And people have indeed done so. The ability to run a reasoning model on high-end consumer hardware is extremely impressive—and that’s what makes this such a big deal.

Some have dismissed R1 as merely a copy of existing models, and there are now accusations of product theft flying around. Did DeepSeek leverage other models to build their system? Probably. But that’s hardly surprising—modern AI systems are often built on the discoveries of others. After all, the ‘T’ in ChatGPT stands for “Transformer,” a technique originally published in a Google research paper.

True to that tradition, DeepSeek has released a paper detailing the methods used to develop R1.

Final Thoughts

At the end of the day, R1 is very, very good. It avoids some costly training issues, runs efficiently, and opens up new possibilities in AI. What’s next? Everyone and their mother will incorporate and improve upon DeepSeek’s techniques. Models will get better—and that’s a win for AI science.