News:

And we're back!

Main Menu

Recent posts

#61
Off the Record / Re: The AI dooooooom thread
Last post by Josquius - November 25, 2025, 04:16:04 PM
Earlier today I was at a talk about AI in Development.

This article was recommended. Interesting. Definitely lines up with what I've seen about compounding errors.

https://utkarshkanwat.com/writing/betting-against-agents

QuoteWhy I'm Betting Against AI Agents in 2025 (Despite Building Them)
I've built 12+ AI agent systems across development, DevOps, and data operations. Here's why the current hype around autonomous agents is mathematically impossible and what actually works in production.

Everyone says 2025 is the year of AI agents. The headlines are everywhere: "Autonomous AI will transform work," "Agents are the next frontier," "The future is agentic." Meanwhile, I've spent the last year building many different agent systems that actually work in production. And that's exactly why I'm betting against the current hype.
I'm not some AI skeptic writing from the sidelines. Over the past year, I've built more than a dozen production agent systems across the entire software development lifecycle:
Development agents: UI generators that create functional React components from natural language, code refactoring agents that modernize legacy codebases, documentation generators that maintain API docs automatically, and function generators that convert specifications into working implementations.
Data & Infrastructure agents: Database operation agents that handle complex queries and migrations, DevOps automation AI systems managing infrastructure-as-code across multiple cloud providers.
Quality & Process agents: AI-powered CI/CD pipelines that fix lint issues, generate comprehensive test suites, perform automated code reviews, and create detailed pull requests with proper descriptions.
These systems work. They ship real value. They save hours of manual work every day. And that's precisely why I think much of what you're hearing about 2025 being "the year of agents" misses key realities.
TL;DR: Three Hard Truths About AI Agents
After building AI systems, here's what I've learned:
Error rates compound exponentially in multi-step workflows. 95% reliability per step = 36% success over 20 steps. Production needs 99.9%+.
Context windows create quadratic token costs. Long conversations become prohibitively expensive at scale.
The real challenge isn't AI capabilities, it's designing tools and feedback systems that agents can actually use effectively.
The Mathematical Reality No One Talks About
Here's the uncomfortable truth that every AI agent company is dancing around: error compounding makes autonomous multi-step workflows mathematically impossible at production scale.



Let's do the math. If each step in an agent workflow has 95% reliability, which is optimistic for current LLMs,then:
5 steps = 77% success rate
10 steps = 59% success rate
20 steps = 36% success rate
Production systems need 99.9%+ reliability. Even if you magically achieve 99% per-step reliability (which no one has), you still only get 82% success over 20 steps. This isn't a prompt engineering problem. This isn't a model capability problem. This is mathematical reality.
My DevOps agent works precisely because it's not actually a 20-step autonomous workflow. It's 3-5 discrete, independently verifiable operations with explicit rollback points and human confirmation gates. The "agent" handles the complexity of generating infrastructure code, but the system is architected around the mathematical constraints of reliability.
Every successful agent system I've built follows the same pattern: bounded contexts, verifiable operations, and human decision points (sometimes) at critical junctions. The moment you try to chain more than a handful of operations autonomously, the math kills you.
The Token Economics That Don't Add Up
There's another mathematical reality that agent evangelists conveniently ignore: context windows create quadratic cost scaling that makes conversational agents economically impossible.
Here's what actually happens when you build a "conversational" agent:
Each new interaction requires processing ALL previous context
Token costs scale quadratically with conversation length
A 100-turn conversation costs $50-100 in tokens alone
Multiply by thousands of users and you're looking at unsustainable economics
I learned this the hard way when prototyping a conversational database agent. The first few interactions were cheap. By the 50th query in a session, each response was costing multiple dollars - more than the value it provided. The economics simply don't work for most scenarios.



My function generation agent succeeds because it's completely stateless: description → function → done. No context to maintain, no conversation to track, no quadratic cost explosion. It's not a "chat with your code" experience, it's a focused tool that solves a specific problem efficiently.
The most successful "agents" in production aren't conversational at all. They're smart, bounded tools that do one thing well and get out of the way.
The Tool Engineering Reality Wall
Even if you solve the math problems, you hit a different kind of wall: building production-grade tools for agents is an entirely different engineering discipline that most teams underestimate.
Tool calls themselves are actually quite precise now. The real challenge is tool design. Every tool needs to be carefully crafted to provide the right feedback without overwhelming the context window. You need to think about:
How does the agent know if an operation partially succeeded? How do you communicate complex state changes without burning tokens?
A database query might return 10,000 rows, but the agent only needs to know "query succeeded, 10k results, here are the first 5." Designing these abstractions is an art.
When a tool fails, what information does the agent need to recover? Too little and it's stuck; too much and you waste context.
How do you handle operations that affect each other? Database transactions, file locks, resource dependencies.
My database agent works not because the tool calls are unreliable, but because I spent weeks designing tools that communicate effectively with the AI. Each tool returns structured feedback that the agent can actually use to make decisions, not just raw API responses.
The companies promising "just connect your APIs and our agent will figure it out" haven't done this engineering work. They're treating tools like human interfaces, not AI interfaces. The result is agents that technically make successful API calls but can't actually accomplish complex workflows because they don't understand what happened.
The dirty secret of every production agent system is that the AI is doing maybe 30% of the work. The other 70% is tool engineering: designing feedback interfaces, managing context efficiently, handling partial failures, and building recovery mechanisms that the AI can actually understand and use.
The Integration Reality Check
But let's say you solve the reliability problems and the economics. You still have to integrate with the real world, and the real world is a mess.
Enterprise systems aren't clean APIs waiting for AI agents to orchestrate them. They're legacy systems with quirks, partial failure modes, authentication flows that change without notice, rate limits that vary by time of day, and compliance requirements that don't fit neatly into prompt templates.
My database agent doesn't just "autonomously execute queries." It navigates connection pooling, handles transaction rollbacks, respects read-only replicas, manages query timeouts, and logs everything for audit trails. The AI handles query generation; everything else is traditional systems programming.
The companies promising "autonomous agents that integrate with your entire tech stack" are either overly optimistic or haven't actually tried to build production systems at scale. Integration is where AI agents go to die.
What Actually Works (And Why)
After building more than a dozen different agent systems across the entire software development lifecycle, I've learned that the successful ones share a pattern:
My UI generation agent works because humans review every generated interface before deployment. The AI handles the complexity of translating natural language into functional React components, but humans make the final decisions about user experience.
My database agent works because it confirms every destructive operation before execution. The AI handles the complexity of translating business requirements into SQL, but humans maintain control over data integrity.
My function generation agent works because it operates within clearly defined boundaries. Give it a specification, get back a function. No side effects, no state management, no integration complexity.
My DevOps automation works because it generates infrastructure-as-code that can be reviewed, versioned, and rolled back. The AI handles the complexity of translating requirements into Terraform, but the deployment pipeline maintains all the safety mechanisms we've learned to rely on.
My CI/CD agent works because each stage has clear success criteria and rollback mechanisms. The AI handles the complexity of analyzing code quality and generating fixes, but the pipeline maintains control over what actually gets merged.
The pattern is clear: AI handles complexity, humans maintain control, and traditional software engineering handles reliability.
My Predictions
Here's my specific prediction about who will struggle in 2025:
Venture-funded "fully autonomous agent" startups will hit the economics wall first. Their demos work great with 5-step workflows, but customers will demand 20+ step processes that break down mathematically. Burn rates will spike as they try to solve unsolvable reliability problems.
Enterprise software companies that bolted "AI agents" onto existing products will see adoption stagnate. Their agents can't integrate deeply enough to handle real workflows.
Meanwhile, the winners will be teams building constrained, domain-specific tools that use AI for the hard parts while maintaining human control or strict boundaries over critical decisions. Think less "autonomous everything" and more "extremely capable assistants with clear boundaries."
The market will learn the difference between AI that demos well and AI that ships reliably. That education will be expensive for many companies.
I'm not betting against AI. I'm betting against the current approach to agent architecture. But I believe future is going to be far more valuable than the hype suggests.
Building the Right Way
If you're thinking about building with AI agents, start with these principles:
Define clear boundaries. What exactly can your agent do, and what does it hand off to humans or deterministic systems?
Design for failure. How do you handle the 20-40% of cases where the AI makes mistakes? What are your rollback mechanisms?
Solve the economics. How much does each interaction cost, and how does that scale with usage? Stateless often beats stateful.
Prioritize reliability over autonomy. Users trust tools that work consistently more than they value systems that occasionally do magic.
Build on solid foundations. Use AI for the hard parts (understanding intent, generating content), but rely on traditional software engineering for the critical parts (execution, error handling, state management).
The agent revolution is coming. It just won't look anything like what everyone's promising in 2025. And that's exactly why it will succeed.


#62
Off the Record / Re: The Off Topic Topic
Last post by Sheilbh - November 25, 2025, 03:46:36 PM
Oh they're not moderating.

But if law enforcement turn up they're not putting up any resistance whatsoever - whereas Twitter did used to fight.
#63
Off the Record / Re: What does a TRUMP presiden...
Last post by Jacob - November 25, 2025, 03:34:13 PM
Dangerous for the US and dangerous for the world.
#64
Off the Record / Re: The Off Topic Topic
Last post by Josquius - November 25, 2025, 03:33:26 PM
It really shows just how dumb the far rights views are that some guy from a village in Uttar Pradesh who wouldn't pass a primary school English exam can recreate them so well.

Quote from: Sheilbh on November 25, 2025, 01:33:02 PMI think some of it seems to be IP address derived as well.

On the trusting X and their systems now - yes in relation to the blue ticks. My understanding is that users are not anonymous to Twitter if they have blue ticks and are getting money for their content. I've mentioned before that X is repeatedly rolling over on identifying people following law enforcement requests (it's happened a lot in Turkey and also in the UK following the Southport riots) so I think they must have reasonable data on this (at least for verified accounts).

In the UK with Southport the interesting thing was that a lot of the far-right verified accounts (that were British) were probabl verified because they loved what Musk was doing with the platform, but what he was also doing was gutting their legal team and not fighting any disclosure requests. So his biggest fans were providing him with the data that could be used to prosecute them. He vocally complained about it but it feels very "thoughts and prayers" when it's his company unmasking people for the police.

That's odd. I follow Sunder Katwala and he's got an ongoing thing about the amount of racist hate messages he gets and vichy doing fuck all about it.
#65
Off the Record / Re: The AI dooooooom thread
Last post by Jacob - November 25, 2025, 03:32:15 PM
Quote from: DGuller on November 25, 2025, 03:01:30 PMBut then what is the point of that article?  AI bubble exists because people with money believe that a usable enough AI already exists, not because some people believe that "language = intelligence" or that the sky is brown.  If the point of that article is not to say that "people mistakenly believe that a usable AI exists because they equate language to intelligence", then what is exactly the point?

The point of the article is:

QuoteThe AI hype machine relentlessly promotes the idea that we're on the verge of creating something as intelligent as humans, or even "superintelligence" that will dwarf our own cognitive capacities. If we gather tons of data about the world, and combine this with ever more powerful computing power (read: Nvidia chips) to improve our statistical correlations, then presto, we'll have AGI. Scaling is all we need.

But this theory is seriously scientifically flawed. LLMs are simply tools that emulate the communicative function of language, not the separate and distinct cognitive process of thinking and reasoning, no matter how many data centers we build.

If you disagree that the first paragraph above is true then obviously the argument against it is less relevant. Personally I think Sam Altman is enough of a "thought leader" on this topic to make it worthwhile to address the position he advances.
#66
Off the Record / Re: The AI dooooooom thread
Last post by DGuller - November 25, 2025, 03:01:30 PM
Quote from: Jacob on November 25, 2025, 02:08:56 PM
Quote from: DGuller on November 25, 2025, 01:31:05 PMThe title of the article is knocking down a strawman.  People who think LLMs have some artificial intelligence don't equate language with intelligence.  They see language as one of the "outputs" of intelligence.  If you build a model that matches one mode of output of intelligence well enough, then it's possible that under the hood that model had to have evolved something functionally analogous to intelligence during training in order to do that.

I am happy to agree that you don't think that. And equally happy to agree that thoughtful proponents of LLM don't think that.

But the idea is out there, and certainly some of the hype propagated by some of the relevant tech CEOs (who are some of the richest and most powerful men in the world, and who absolutely have a role in shaping the public discourse) certainly seems to imply it if not outright state it at times.

So I don't agree it's a strawman. It's a thoughtful and well-argued counter argument to a line of thought that is absolutely being made, even if it's not being made by you.
But then what is the point of that article?  AI bubble exists because people with money believe that a usable enough AI already exists, not because some people believe that "language = intelligence" or that the sky is brown.  If the point of that article is not to say that "people mistakenly believe that a usable AI exists because they equate language to intelligence", then what is exactly the point?
#67
Off the Record / Re: What does a TRUMP presiden...
Last post by crazy canuck - November 25, 2025, 02:54:26 PM
From the Globe and Mail's health reporter

Gifted link so the writer gets credit the link.

QuoteSometime, in the not-too-distant future, we're going to look back and wonder: Whatever happened to the U.S. Centers for Disease Control and Prevention, at one time the single most powerful and influential public-health agency on Earth?

Robert F. Kennedy Jr., the Secretary of the U.S. Department of Health and Human Services (which oversees the CDC), and a fervent anti-vaccine activist, seems hell-bent on using his position to destroy the CDC.

Last week, in his latest salvo, Mr. Kennedy personally instructed the agency to change its long-standing position that vaccines do not cause autism.

Before the update, the CDC website said studies have shown that there is "no link" between vaccines and developing autism, and that "no links" have been found between any vaccine ingredients and the disorder.

The page now says that studies supporting a link between vaccines and autism "have been ignored by health authorities."

Then it adds: "The claim 'vaccines do not cause autism' is not an evidence-based claim because studies have not ruled out the possibility that infant vaccines cause autism."

RFK Jr. says he told CDC to change position on vaccines and autism

CFIA monitoring U.S. move away from science-based decision making, documents show

This is preposterous, and doubly so because the change was ordered by a politician with no training in science, medicine or public health.

What Mr. Kennedy does have is a long-standing animus toward vaccination, and he has now co-opted and weaponized a powerful public-health agency toward his cause. The Health Secretary's claims are false, misleading and harmful.

Mr. Kennedy faux-innocently says he is not saying vaccines cause autism, just that studies have not shown that they can't.

But he knows full well that a negative cannot be proven. As Dr. Arthur Caplan, a bioethicist at New York University, said to The New York Times: "You can't prove that Coca-Cola doesn't cause autism either."

Research has shown quite clearly that there is no link between measles-mumps-rubella (MMR) vaccination and autism. Those studies were made necessary by explosive claims in a paper published by Andrew Wakefield – a paper that was later withdrawn because the findings were manipulated. But not before terrible damage was done.

Mr. Kennedy and his ilk are now "wondering" about another routine childhood vaccine for DTaP – a combination shot that protects against diphtheria, tetanus, and pertussis (whooping cough) in young children.

And, every now and again, he just tosses out weird new theories, like autism is caused by pregnant women taking Tylenol. Sorry, that should read: You can't prove it doesn't.

These are the common tactics of anti-vaxxers and other embracers of anti-science: Keep moving the goal posts, and then claim you're "only asking questions."

Mr. Kennedy's goal is clear: To foment confusion and distrust in vaccines.

It's surely a coincidence that, over the years, he has made millions promoting anti-vaccine views and referring clients to law firms that sue governments and pharmaceutical companies on behalf of parents of "vaccine-damaged" children.

Mr. Kennedy has been on a crusade against the CDC for years. He has called it a "cesspool of corruption" and claimed its officials are in bed with Big Pharma – all without evidence, it goes without saying.

Since becoming HHS Secretary this year, Mr. Kennedy has not just rewritten web pages. He has fired at least one-quarter of its staff, including the CDC director; dismantled its expert vaccine advisory panel; slashed the agency budget by roughly a third; cut the US$500-million research program for promising mRNA vaccines; slashed the infectious diseases program, both abroad and domestically (in the midst of the biggest measles outbreak in 30 years, no less); and, called for the CDC to abandon its chronic disease prevention activities.

Death by a thousand cuts, literally.

Mr. Kennedy is engaging in similar patterns of destruction at two other agencies he oversees, the National Institutes of Health and the Food and Drug Administration, but not with the same fervour he has reserved for the CDC.

All this, presumably, to Make America Healthy Again.

Inviting back the spread of childhood illness and its related impacts – including the potential for childhood mortality – seems like an odd way of achieving this goal.

But it's all about ideology. The desire to be free of any regulations you don't like. The freedom to choose what you believe, even if it's dangerously wrong. The supremacy of the individual, regardless of the cost to the collective.

Seems like Donald's Trump's prescription for an angry nation is an unhinged health secretary.

We can't but shudder to think how much damage will be done before Mr. Kennedy is through with his crusade, or how long public health, in the United States and beyond, will take to recover and rebuild.


https://www.theglobeandmail.com/gift/519974dc5e28f762a87fb14c1570523f4fb873adfcfcb945043907ccc3c41650/3CWI3UBNSRGMHI3U4ZVDRYXFYM/
#68
Gaming HQ / Re: Europa Universalis V confi...
Last post by Tamas - November 25, 2025, 02:34:10 PM
I am guetting frequent crashes with the beta.
#69
Off the Record / Re: The AI dooooooom thread
Last post by crazy canuck - November 25, 2025, 02:33:07 PM
Quote from: Sheilbh on November 25, 2025, 01:35:37 PMDelighted that only 30 years after the Sokal affair, STEM is finally acknowledging that Derrida was right :w00t:

 :yes:
#70
Off the Record / Re: Russo-Ukrainian War 2014-2...
Last post by Tamas - November 25, 2025, 02:23:12 PM
Quote from: Jacob on November 25, 2025, 02:16:04 PMSo it looks like Ukraine is doing the "okay we'll agree, with only a few minor details to hammer out" thing... so now Russia can reject the peace proposal again and Trump can do whatever he's going to do in response.

This dance is much better than forcing Ukraine to surrender, but also very tiresome. Trump is, by far, the worst world leader of any consequence during my lifetime. Somebody like Boris Johnson is a towering avatar of statemanship compared to him. Heck, I'd take Lizz Truss as US President over him.