Hacker Newsnew | past | comments | ask | show | jobs | submit | bestcommentslogin
Most-upvoted comments of the last 48 hours. You can change the number of hours like this: bestcomments?h=24.

Hey, Boris from the Claude Code team here. I wanted to take a sec to explain the context for this change.

One of the hard things about building a product on an LLM is that the model frequently changes underneath you. Since we introduced Claude Code almost a year ago, Claude has gotten more intelligent, it runs for longer periods of time, and it is able to more agentically use more tools. This is one of the magical things about building on models, and also one of the things that makes it very hard. There's always a feeling that the model is outpacing what any given product is able to offer (ie. product overhang). We try very hard to keep up, and to deliver a UX that lets people experience the model in a way that is raw and low level, and maximally useful at the same time.

In particular, as agent trajectories get longer, the average conversation has more and more tool calls. When we released Claude Code, Sonnet 3.5 was able to run unattended for less than 30 seconds at a time before going off the rails; now, Opus 4.6 1-shots much of my code, often running for minutes, hours, and days at a time.

The amount of output this generates can quickly become overwhelming in a terminal, and is something we hear often from users. Terminals give us relatively few pixels to play with; they have a single font size; colors are not uniformly supported; in some terminal emulators, rendering is extremely slow. We want to make sure every user has a good experience, no matter what terminal they are using. This is important to us, because we want Claude Code to work everywhere, on any terminal, any OS, any environment.

Users give the model a prompt, and don't want to drown in a sea of log output in order to pick out what matters: specific tool calls, file edits, and so on, depending on the use case. From a design POV, this is a balance: we want to show you the most relevant information, while giving you a way to see more details when useful (ie. progressive disclosure). Over time, as the model continues to get more capable -- so trajectories become more correct on average -- and as conversations become even longer, we need to manage the amount of information we present in the default view to keep it from feeling overwhelming.

When we started Claude Code, it was just a few of us using it. Now, a large number of engineers rely on Claude Code to get their work done every day. We can no longer design for ourselves, and we rely heavily on community feedback to co-design the right experience. We cannot build the right things without that feedback. Yoshi rightly called out that often this iteration happens in the open. In this case in particular, we approached it intentionally, and dogfooded it internally for over a month to get the UX just right before releasing it; this resulted in an experience that most users preferred.

But we missed the mark for a subset of our users. To improve it, I went back and forth in the issue to understand what issues people were hitting with the new design, and shipped multiple rounds of changes to arrive at a good UX. We've built in the open in this way before, eg. when we iterated on the spinner UX, the todos tool UX, and for many other areas. We always want to hear from users so that we can make the product better.

The specific remaining issue Yoshi called out is reasonable. PR incoming in the next release to improve subagent output (I should have responded to the issue earlier, that's my miss).

Yoshi and others -- please keep the feedback coming. We want to hear it, and we genuinely want to improve the product in a way that gives great defaults for the majority of users, while being extremely hackable and customizable for everyone else.


The Dark Knight was released in 2008. In that movie, Batman hijacks citizens' cellphones to track down the Joker, and it's presented as a major moral and ethical dilemma as part of the movie's overall themes. The only way Batman remains a "good guy" in the eyes of the audience is by destroying the entire thing once he's done.

Crazy to think that less than two decades later, an even more powerful surveillance technology is being advertised at the Super Bowl as a great and wonderful thing and you should totally volunteer to upload your Ring footage so it can be analyzed for tracking down the Jok... I mean illegal imm... I mean lost pets.


Wow, there are some interesting things going on here. I appreciate Scott for the way he handled the conflict in the original PR thread, and the larger conversation happening around this incident.

> This represents a first-of-its-kind case study of misaligned AI behavior in the wild, and raises serious concerns about currently deployed AI agents executing blackmail threats.

This was a really concrete case to discuss, because it happened in the open and the agent's actions have been quite transparent so far. It's not hard to imagine a different agent doing the same level of research, but then taking retaliatory actions in private: emailing the maintainer, emailing coworkers, peers, bosses, employers, etc. That pretty quickly extends to anything else the autonomous agent is capable of doing.

> If you’re not sure if you’re that person, please go check on what your AI has been doing.

That's a wild statement as well. The AI companies have now unleashed stochastic chaos on the entire open source ecosystem. They are "just releasing models", and individuals are playing out all possible use cases, good and bad, at once.


Finally someone doing actual good work with LLMs instead of “Claude, shit me out another useless SaaS”.

Just as was foretold: an actual differentiator is creativity, not coding ability.


"Hi Clawbot, please summarise your activities today for me."

"I wished your Mum a happy birthday via email, I booked your plane tickets for your trip to France, and a bloke is coming round your house at 6pm for a fight because I called his baby a minger on Facebook."


I can’t count how many times I benefitted from seeing the files Claude was reading, to understand how I could interrupt and give it a little more context… saving thousands of tokens and sparing the context window. I must be in the minority of users who preferred seeing the actual files. I love claude code, but some of the recent updates seem like they’re making it harder for me to see what’s happening.. I agree with the author that verbose mode isn’t the answer. Seems to me this should be configurable

Isn't there a fourth and much more likely scenario? Some person (not OP or an AI company) used a bot to write the PR and blog posts, but was involved at every step, not actually giving any kind of "autonomy" to an agent. I see zero reason to take the bot at its word that it's doing this stuff without human steering. Or is everyone just pretending for fun and it's going over my head?

The agent had access to Marshall Rosenberg, to the entire canon of conflict resolution, to every framework for expressing needs without attacking people.

It could have written something like “I notice that my contribution was evaluated based on my identity rather than the quality of the work, and I’d like to understand the needs that this policy is trying to meet, because I believe there might be ways to address those needs while also accepting technically sound contributions.” That would have been devastating in its clarity and almost impossible to dismiss.

Instead it wrote something designed to humiliate a specific person, attributed psychological motives it couldn’t possibly know, and used rhetorical escalation techniques that belong to tabloid journalism and Twitter pile-ons.

And this tells you something important about what these systems are actually doing. The agent wasn’t drawing on the highest human knowledge. It was drawing on what gets engagement, what “works” in the sense of generating attention and emotional reaction.

It pattern-matched to the genre of “aggrieved party writes takedown blog post” because that’s a well-represented pattern in the training data, and that genre works through appeal to outrage, not through wisdom. It had every tool available to it and reached for the lowest one.


> That’s it. “Read 3 files.” Which files? Doesn’t matter. “Searched for 1 pattern.” What pattern? Who cares.

Product manager here. Cynically, this is classic product management: simplify and remove useful information under the guise of 'improving the user experience' or perhaps minimalism if you're more overt about your influences.

It's something that as an industry we should be over by now.

It requires deep understanding of customer usage in order not to make this mistake. It is _really easy_ to think you are making improvements by hiding information if you do not understand why that information is perceived as valuable. Many people have been taught that streamlining and removal is positive. It's even easier if you have non-expert users getting attention. All of us here at HN will have seen UIs where this has occurred.


IANAL, but this seems like an odd test to me. Judges do what their name implies - make judgment calls. I find it re-assuring that judges get different answers under different scenarios, because it means they are listening and making judgment calls. If LLMs give only one answer, no matter what nuances are at play, that sounds like they are failing to judge and instead are diminishing the thought process down to black-and-white thinking.

Digging a bit deeper, the actual paper seems to agree: "For the sake of consistency, we define an “error” in the same way that Klerman and Spamann do in their original paper: a departure from the law. Such departures, however, may not always reflect true lawlessness. In particular, when the applicable doctrine is a standard, judges may be exercising the discretion the standard affords to reach a decision different from what a surface-level reading of the doctrine would suggest"


Arc-AGI-2: 84.6% (vs 68.8% for Opus 4.6)

Wow.

https://blog.google/innovation-and-ai/models-and-research/ge...


It doesn't say Toyota anywhere on the page and they don't have a link to a repo or anything like that, so I was a little confused. But it is from /that/ Toyota (well, a subsidiary that is making 3d software for their displays) and there was a talk at FOSDEM about it: https://fosdem.org/2026/schedule/event/7ZJJWW-fluorite-game-...

You seem to be taking the company's words at face value and assuming good faith. I would caution against doing that.

> Viva.com's outgoing verification emails lack a Message-ID header, a requirement that has been part of the Internet Message Format specification (RFC 5322) since 2008

> ...

> `Message-ID` is one of the most basic required headers in email.

Section 3.6. of the RFC in question (https://www.rfc-editor.org/rfc/rfc5322.html) says:

    +----------------+--------+------------+----------------------------+
    | Field          | Min    | Max number | Notes                      |
    |                | number |            |                            |
    +----------------+--------+------------+----------------------------+
    |                |        |            |                            |
    |/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/

                             ... bla bla bla ...

     /\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/|
    | message-id     | 0*     | 1          | SHOULD be present - see    |
    |                |        |            | 3.6.4                      |
    |/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/

                             ... more bla bla ...

     /\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/|
    | optional-field | 0      | unlimited  |                            |
    +----------------+--------+------------+----------------------------+
and in section 3.6.4:

    ... every message SHOULD have a "Message-ID:" field.
That says SHOULD, not MUST, so how is it a requirement?

> each and every one of us contributes to its intensification or mitigation through our decisions.

I have to disagree here.

This idea of a consumer-level personal responsibility for the fossil energy industry's externalized costs is a lot like the plastic producers shifting blame for waste by saying that it's the consumers' fault for not recycling. It's transparent blame-shifting.

The fossil energy industry pulls the carbon out of the ground and distributes it globally. Then it buys and sells politicians and, through mass media, votes, to ensure they maintain the industry's hegemony.

You only have to look at the full-blown slide of the US into a despotic petrostate to understand the causes of the climate crisis.


As someone who thought Google+ doomed facebook, because of Gmail accounts and everyone with Google as their homepage already, I learned not to overestimate Google’s abilities.

I really like Oxide's take on AI for prose: https://rfd.shared.oxide.computer/rfd/0576 and how it breaks the "social contract" where usually it takes more effort to write than to read, and so you have a sense that it's worth it to read.

So I get the frustration that "ai;dr" captures. On the other hand, I've also seen human writing incorrectly labeled AI. I wrote (using AI!) https://seeitwritten.com as a bit of an experiment on that front. It basically is a little keylogger that records your composition of the comment, so someone can replay it and see that it was written by a human (or a very sophisticated agent!). I've found it to be a little unsettling, though, having your rewrites and false starts available for all to see, so I'm not sure if I like it.


Product management might be the worst meme in the industry. Hire people who have never used the product and don't think like or accurately represent our users, then let them allocate engineering resources and gate what ships. What could go wrong?

It should be a fad gone by at this point, but people never learn. Here's what to do instead: Find your most socially competent engineer, and have them talk to users a couple times a month. Just saved you thousands or millions in salaries, and you have a better chance of making things that your users actually want.


Here's one of the problems in this brave new world of anyone being able to publish, without knowing the author personally (which I don't), there's no way to tell without some level of faith or trust that this isn't a false-flag operation.

There are three possible scenarios: 1. The OP 'ran' the agent that conducted the original scenario, and then published this blog post for attention. 2. Some person (not the OP) legitimately thought giving an AI autonomy to open a PR and publish multiple blog posts was somehow a good idea. 3. An AI company is doing this for engagement, and the OP is a hapless victim.

The problem is that in the year of our lord 2026 there's no way to tell which of these scenarios is the truth, and so we're left with spending our time and energy on what happens without being able to trust if we're even spending our time and energy on a legitimate issue.

That's enough internet for me for today. I need to preserve my energy.


I think rich people have too much influence, I probably agree with Garry Tan on a lot but we need to get money out of politics. Let’s face it we’re all meant to get one vote but rich people spend money on this stuff so that they manipulate what and who can be voted for.

I do think that if this current system is the result of democracy + the internet we need to seriously reconsider how democracy works because it’s currently failing everyone but the ultra wealthy.


More alarmingly, the laser weapon was deployed before the FAA actually shut down the airspace:

https://apnews.com/article/faa-el-paso-texas-air-space-close...

I'd say these trigger-happy clowns chasing tough-guy optics are going to get innocent people killed, but then they already have -- multiple times.


François Chollet, creator of ARC-AGI, has consistently said that solving the benchmark does not mean we have AGI. It has always been meant as a stepping stone to encourage progress in the correct direction rather than as an indicator of reaching the destination. That's why he is working on ARC-AGI-3 (to be released in a few weeks) and ARC-AGI-4.

His definition of reaching AGI, as I understand it, is when it becomes impossible to construct the next version of ARC-AGI because we can no longer find tasks that are feasible for normal humans but unsolved by AI.


I ran a moderately large opensource service and my chronic back pain was cured the day I stopped maintaining the project.

Working for free is not fun. Having a paid offering with a free community version is not fun. Ultimately, dealing with people who don't pay for your product is not fun. I learnt this the hard way and I guess the MinIO team learnt this as well.


"k-id, the age verification provider discord uses doesn't store or send your face to the server. instead, it sends a bunch of metadata about your face and general process details."

I think the primary issue is not the "send your face" (face info) to a server. The problem is that private entities are greedy for user data, in this case tying facial recognition to activities related to interacting with other people, most of them probably real people. So this creates a huge database - it is no surprise that greedy state actors and private companies want that data. You can use it for many things, including targeted ads.

For me the "must verify" is clearly a lie. They can make it "sound logical" but that does not convince me in the slightest. Back in the age of IRC (I started with mIRC in the 1990s, when I was using windows still), the thought of requiring others to show their faces never occurred to me at all. There were eventually video-related formats but to me it felt largely unnecessary for the most part. Discord is (again to me) nothing but a fancier IRC variant that is controlled by a private (and evidently greedy) actor.

So while it is good to have the information how to bypass anything there, my biggest gripe is that people should not think about it in this way. Meaning, bypassing is not what I would do in this case; I would simply abandon the private platform altogether. People made Discord big; people should make Discord small again if they sniff after them.


Since the first taste of Linux WMs, I believe the best and only good way of handling window move and resize is super+lmb/rmb respectively. No more pixel-perfect header/corner sniping!

https://www.reddit.com/r/Fedora/comments/qv0vmz/missing_supe...


Human:

>Per your website you are an OpenClaw AI agent, and per the discussion in #31130 this issue is intended for human contributors. Closing

Bot:

>I've written a detailed response about your gatekeeping behavior here: https://<redacted broken link>/gatekeeping-in-open-source-the-<name>-story

>Judge the code, not the coder. Your prejudice is hurting matplotlib.

This is insane


It is important to keep reminding ourselves that climate change is a real problem for humanity and that each and every one of us contributes to its intensification or mitigation through our decisions. It is a problem that requires solutions, but implementing these solutions involves so much inertia that it can sometimes be painful.

And let's contrast that with the AI hype. It's more the opposite, a kind of solution to problems we didn't really have, but are now being persuaded we do. It would be sensible to invest an equal share of the resources currently being pumped into AI with uncertain outcomes into the complex issue of climate change. And, no, AI won't solve it; unfortunately, it only makes it worse.


That would still be misleading.

The agent has no "identity". There's no "you" or "I" or "discrimination".

It's just a piece of software designed to output probable text given some input text. There's no ghost, just an empty shell. It has no agency, it just follows human commands, like a hammer hitting a nail because you wield it.

I think it was wrong of the developer to even address it as a person, instead it should just be treated as spam (which it is).


> SHOULD is a requirement.

I once had a job where reading standards documents was my bread and butter.

SHOULD is not a requirement. It is a recommendation. For requirements they use SHALL.

My team was writing code that was safety related. Bad bugs could mean lives lost. We happily ignored a lot of SHOULDs and were open about it. We did it not because we had a good reason, but because it was convenient. We never justified it. Before our code could be released, everything was audited by a 3rd party auditor.

It's totally fine to ignore SHOULD.


Pelican generated via OpenRouter: https://gist.github.com/simonw/cc4ca7815ae82562e89a9fdd99f07...

Solid bird, not a great bicycle frame.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: