w3ctech

对话 Claude Code 创造者:AI 智能体如何带来 1000 倍生产力飞跃?

视频原文:https://x.com/ycombinator/status/2023774438798299479

欢迎收看新一期的《光锥》(The Lightcone)播客。今天我们请到了一位非常特别的嘉宾:Boris Cherney,Claude Code 的创造者兼工程师。Boris,感谢你的加入。

感谢邀请。

感谢你创造了让我这三个星期都彻夜难眠的东西。我非常沉迷于 Claude Code,它感觉就像火箭助推器一样。

这段时间以来,大家也有这种感觉吗?

大概在 11 月底的时候,我很多朋友说感觉发生了变化。

我记得当我第一次开发出 Claude Code 时,我还不知道自己是否真的搞出了名堂,但我确实有种“抓住了一些东西”的感觉,从那时起我就开始睡不着觉了。

那也就是三个月前,即 2024 年 9 月的事。

整整三个月,我没有休过一天假,周末连轴转,每天晚上都在工作。我当时就在想:“天哪,我觉得这将会是个大场面。”当时我还不知道它是否真的有用,因为它还不太会写代码。

如果你回首那段时光再看看现在,当下最让你惊讶的事情是什么?

令人难以置信的是,我们居然还在使用终端(terminal)。那本该只是个起点,我没料到它会成为最终的形态。第二个让人惊讶的是它居然真的很有用,因为在刚开始的时候,它根本不太会写代码。即使在二月份我们投机取巧的时候,它大概也只写了我 10% 的代码。我当时并不怎么用它来写代码,因为它在这方面表现不佳,大部分代码还是我手写的。所以,我们的押注竟然真的得到了回报,而且它在我们预期的领域变得如此出色,这真的很令人惊讶,因为这在当时并不显而易见。在 Anthropic,我们的理念是:我们不为今天的模型做开发,我们为六个月后的模型做开发。这其实也是我给那些基于 LLM 进行开发的创始人们的建议:试着去思考,模型今天还不擅长的边界在哪里?因为它迟早会变得擅长,你只需要耐心等待。

回顾过去,你还记得你第一次萌生这个想法是什么时候吗?能给我们讲讲整个过程吗?是灵光一闪,还是你脑海中的初代版本是什么样的?

说来有趣。这一切非常偶然,它就是这么自然演变而来的。对于 Anthropic 来说,我们在写代码这件事上下注已经很久了,我们一直押注:通向安全的 AGI(通用人工智能) 的路径就是通过编写代码。这几乎是我们一直以来的共识。实现这一目标的途径是:你先教模型如何写代码,再教它如何使用工具,最后教它如何使用计算机。

你其实能看出来,因为我在 Anthropic 加入的第一个团队——Anthropic Labs 团队——推出了三款产品:Claude Code、MCP(模型上下文协议)和桌面版应用。你可以看出这些产品是如何交织在一起的。

至于我们开发的这个具体产品,并没有人要求我去打造 AGI。我们只是隐约觉得,可能是时候做一款代码类产品了,因为模型似乎已经准备好了,但当时还没有人真正做出一款能驾驭这种能力的产品。现在依然有一种疯狂的“产品能力过剩(未被充分利用)”的感觉,但在当时感觉更疯狂,因为根本还没人做出这种东西。

我当时就在瞎折腾,心想:“好吧,我们要开发一款代码产品,我第一步得做什么?”我得先弄懂怎么用 API,因为那时我还没用过 Anthropic 的 API。于是,我就写了一个小型的终端应用来调用 API。它就只有这一个功能。它是一个小型聊天应用,因为现如今对于不懂代码的人来说,绝大多数人用的也就是聊天应用。所以我做出来的就是这个。它跑在终端里,我可以提问,它给出回答。

后来,工具调用(Tool Use)功能发布了。我只是想试用一下,因为我当时并不太懂这是什么。我想:“这挺酷的。它真的有用吗?可能没用吧,但我先试一下再说。”

你把它建在终端里,仅仅是因为这是让程序跑起来最简单的方法吗?

是的,因为我不需要写用户界面(UI)。

那时候,像 Cursor 和 Windsurf 这样的 IDE(集成开发环境)正真的开始腾飞。你当时有没有感到压力,或者收到很多建议,说我们应该把它做成一个插件,或者干脆做成一个全功能的 IDE?

没有压力,因为我们甚至都不知道自己想做什么。团队当时完全处于探索模式。我们隐约知道想在代码领域做点什么,但具体做什么并不明朗。没有人有十足的把握,而把这件事搞清楚就是我的工作。

于是,我给模型提供了一个 bash 工具;这是我给它的第一个工具,仅仅因为它是我们文档里的一个现成示例。

我把文档里的 Python 示例移植成了 TypeScript,因为那是我的编程语言。我不知道模型用 bash 能做什么。我让它读一个文件,它就用 cat 命令读取了文件。那挺酷的。

接着我想:“好吧,你到底能做些什么?”我问它:“我正在听什么音乐?”它写了一段 AppleScript 脚本去抓取我的 Mac 系统状态,并在我的音乐播放器里查到了那首歌。我的天哪!那可是 Sonnet 3.5。我完全没料到模型能做到这一点,那是我人生中第一次感受到“这就是 AGI”的时刻。我当时觉得:“天哪,模型自己渴望使用工具。这就是它想要的。”

这真的很奇妙。Claude Code 能在如此优雅、简单的形态下运作得这么好,这其实非常反直觉。终端存在的时间已经很久了,而它似乎成了一个很好的设计约束,催生了许多有趣的开发者体验。用起来感觉不像在工作;作为一名开发者,只觉得好玩。我甚至都不需要知道所有文件到底在哪里,而这几乎是误打误撞实现的?

是的,这是个意外。我记得终端应用开始在内部流行起来之后——说实话,在搞出第一个原型的大概两天后,我就把它发给团队去“吃狗粮”(内部试用)了。

当你想到一个点子且它看似有用时,你最想做的第一件事就是把它给别人用,看看他们怎么用。第二天我来到办公室,坐在我对面的另一位工程师 Robert 已经在电脑上装了 Claude Code 并用它写代码了。

我当时就说:“等等,你在干嘛?这东西还没准备好呢,它只是个原型。”

但是没错,它在那种形态下就已经很有用了。我记得在 2024 年 11 月或 12 月我们为对外发布 Claude Code 做发布评审时,Dario 问起了这件事。

当时内部的使用量增长曲线几乎是垂直向上的,他问:“你们是在强迫工程师们使用它吗?为什么要强制他们用?”

我们回答:“不,不,我们没有强制。我只是发了个帖子,大家就开始口口相传了。”

说实话,就这么简单。我们从 CLI 起步,因为它的开发成本最低,然后它就以这种形态保留了一段时间。

在 2024 年的那段时间里,工程师们是怎么使用它的?他们已经用它来发布代码了吗,还是由于当时模型不太擅长写代码而用在其他方面?

我个人用它来自动化执行 Git 命令。到现在,我觉得我已经忘了大部分的 Git 命令,因为 Claude Code 帮我做这件事已经太久了。自动化执行 Bash 命令和操作 Kubernetes 是非常早期的使用场景。

人们也开始用它来写代码,所以当时已经有了一些初步的迹象。我认为第一个真正的编码用例是写单元测试,因为这风险相对较低,而且当时模型在这方面还比较差。

但大家都在摸索,研究怎么用这个工具。我们观察到的一点是,大家开始为自己编写 Markdown 文件,然后让模型去读取这些文件。这就是 claude.md 文件的由来。

对我个人而言,产品开发中最重要的一条原则可能就是发掘“潜在需求”(Latent Demand),在这款产品最初的 CLI 版本之后,每一个功能点都是基于潜在需求构建出来的。claude.md 就是个很好的例子。

还有一个普遍原则我觉得很有意思:你可以针对模型进行开发,然后在模型周围搭建“脚手架”(外围代码)来稍微提升它的性能。根据不同领域,性能或许能提升 10% 到 20%。但是,这些性能提升基本上会被下一代模型直接抹平。

所以,你要么去搭建脚手架获得一点性能提升,然后再次重构;要么就直接干等下一代模型,免费获得这些能力。

claude.md 以及所谓的“脚手架”就是个例证。这其实也是我们一直停留在 CLI 形式的原因;我们觉得,无论我们构建什么样的 UI,六个月后都会过时,因为模型进步得实在太快了。

刚才我们还说应该互相比较一下各自的 claude.md 文件,但你说了句很有哲理的话,你说你的文件其实非常短。这几乎与大家的预期恰恰相反。为什么会这样?你的 claude.md 里写了什么?

我在上节目前查了一下。我的 claude.md 只有两行。

第一行是:“每当提交 PR(代码拉取请求)时,启用自动合并。”这样一旦有人通过了代码审查,它就会自动合并。这纯粹是为了让我能专注写代码,不用在代码审查上反复扯皮。

第二行是:“每当提交 PR 时,把它发到我们内部团队的审批频道里。”这样别人就能盖章放行,我就能继续工作了。

这么做的思路是,所有其他的指令都在代码库中全局的 claude.md 里,那是我们整个团队每周会共同维护多次的文件。

很多时候,我会在别人的 PR 中看到一些完全可以避免的错误,我就会直接在 PR 里圈一下 Claude,或者直接把要求补充到全局 claude.md 里以防再犯。我每周都会这么做很多次。

你需要压缩 claude.md 文件吗?我绝对遇到过那种情况,顶部跳出提示说我的 claude.md 已经有好几千个 Token 了。你们遇到这种情况会怎么处理?

我们团队的 claude.md 其实挺短的,我觉得大概也就几千个 Token 吧。

如果你遇到了这种情况,我的建议是直接删掉你的 claude.md,然后从头开始。有意思。我觉得很多人试图在这上面过度工程化。说到底,模型的能力是随着每次更新而改变的,所以你真正需要做的是用最少的内容让模型保持在正轨上。

删掉 claude.md 后,如果模型跑偏了或者做错了事,你再一点一点地把必要的内容加回来。我们很可能会发现,随着每一个新模型的发布,你需要添加的内容会越来越少。老实说,我认为自己就是一个挺普通的工程师。我不用很多花哨的工具;我不用 Vim,我用 VS Code,因为它很简单。

我本以为既然你在终端里开发了这个工具,你一定是个骨灰级的终端狂热者——那种只用 Vim、而且嚷嚷着“让那些用 VS Code 的人见鬼去吧”的人。

我们团队里确实有这样的人。比如团队里的 Adam Wolf,他就是那种“除非我死了,否则别想从我手里拿走 Vim”的人。

团队里肯定有很多这样的人。我们在早期学到的一件事是,每个工程师对开发工具的使用习惯都不同。他们喜欢用不同的工具,根本没有哪一款工具能适合所有人。我认为这也正是 Claude Code 能够如此出色的原因之一,因为我在构思时就在想:如果是我自己来用,什么样的产品才觉得顺手?

要使用 Claude Code,你不需要懂 Vim,不需要懂 tmux,不需要懂如何进行 SSH 连接,也不需要懂所有这些繁杂的东西。你只需要打开这个工具,它就会引导你;它会替你搞定所有这些事情。

你如何决定终端的输出要有多详细?有时你不得不按下 Ctrl+C 停下来检查。内部有没有因为输出内容该长点还是该短点而陷入无休止的争论?每个用户可能都有不同的看法,你们是如何做出这些决定的?你的看法是什么?它现在是不是太啰嗦了?

我很喜欢它现在的详细程度,因为有时它会彻底跑偏。我盯着屏幕,扫视得很快,一旦发现“哦不不不,不是那样的”,我就可以按下退出键叫停它。这能阻止它凭空造出一堆 Bug。这种情况通常发生在我没有正确使用“计划模式”的时候。

关于这一点,我们可能改动得相当频繁。我记得早期——大概六个月前——我曾试着在内部隐藏 bash 的输出,只提供一个总结,因为我觉得我根本不在乎那些冗长复杂的 bash 命令。

我把它发给 Anthropic 的员工试用了一天,结果所有人都抗议了。他们希望能看到 bash 的输出,因为那其实非常有用。对于某些 Git 输出,可能用处不大;但如果你在运行 Kubernetes 任务之类的东西,你确实是想看到具体输出的。

最近,我们隐藏了文件读取和搜索的过程。你会注意到,现在它不再显示“读取了 foo.md”,而是显示“读取了 1 个文件,搜索了 1 个模式”。这功能如果在六个月前,我们是绝对不敢上线的,因为当时的模型还不达标,它还是会经常读错文件。

作为用户,你必须盯在屏幕前去发现错误并纠正它。但现在我发现,它几乎每次都在正轨上。而且正因为它如此频繁地使用工具,只做一个总结其实会好得多。

我们上线了这个改动,在内部试用了一个月,然后 GitHub 上的用户不乐意了。有人开了一个大 Issue,大家纷纷表示“我想看细节”。这真的是非常好的反馈,所以我们新增了一个 verbose(详细)模式。

你可以在配置中开启详细模式,如果你想看到所有的文件操作输出,你可以继续看。我在那个 Issue 里更新了这个改动,结果大家还是不满意——这同样非常棒,因为我在这个世界上最喜欢的事情,就是倾听用户的反馈,听听他们究竟想怎么使用产品。于是我们就不断地迭代再迭代,直到把它打磨得非常棒,做出大家都想要的产品。

我很惊讶我现在竟然这么喜欢修 Bug。

你只需要做好完善的日志记录,然后你就可以对它说:“嘿,去查一下这个特定对象,它在这里出错了。”它就会自己去搜索日志,把一切都搞清楚,甚至能建立一个生产环境的隧道,直接替你去查看生产环境的数据库。这太疯狂了。修 Bug 马上就要变成从 Sentry(错误追踪系统)复制 Markdown 丢给它这么简单的事了。很快,这就完全是 MCP 的天下。这就像是自动修 Bug 和自动写测试。

现在他们管这叫什么新词来着?打造一家初创公司工厂?

对,现在涌现了各种各样的概念。与其去审查代码,我算是个老派的人,所以我喜欢看详细的输出。我喜欢说:“哦,你在做这个,但我希望你做那个。”但现在出现了一种完全不同的理念,他们认为任何时候只要需要真人去看代码,那就是一种倒退。是的,这非常奇妙。我想 Dan Shipper 也经常谈到这一点。每当你看到模型犯错时,试着把修正方法写进 claude.md,或者放进“技能(skills)”之类可复用的地方。

我觉得这里有个更高层面的点,我其实经常在这上面纠结。大家都在谈论 Agent 能做这个、Agent 能做那个,但实际上,Agent 能做什么,是随着每个新模型而改变的。有时候团队里新来个人,他们使用 Claude Code 的效果甚至比我都要好,这总是让我感到惊讶。

举个例子,我们遇到过一个内存泄漏问题,正努力排查。顺便提一下,Jared Sumner 最近简直像在发起一场圣战,消灭了所有的内存泄漏问题,表现太神了。但在 Jared 加入团队之前,这活儿是我干的。有一次我试图排查一个内存泄漏问题,所以我抓取了堆内存转储(heap dump),在开发者工具(DevTools)里打开,一边看性能分析器(profiler),一边查阅代码,拼命想找出问题所在。

然后我们团队的另一位工程师 Chris,直接就去问了 Claude Code。他说:“嘿,我觉得这里有内存泄漏,你能试着排查一下吗?”Claude Code 接收了那个内存转储文件,给自己写了个小工具来分析这个转储文件,最后找出的泄漏点比我还快。这正是我必须不断重新学习的地方,因为我的思维有时候还停留在半年前的思维定势里。

对于技术背景的创始人来说,要如何才能在新模型发布时充分挖掘其最大潜力,你有什么建议吗?

听起来好像刚毕业的人或者没有那些思维定势的人,可能会比那些在这个行业摸爬滚打了很久的工程师更适应。专家们应该如何提升自己?

我认为你必须保持“空杯心态(Beginner's Mindset)”,或许就是需要保持谦虚。我觉得工程师作为一个职业群体,习惯于持有非常强烈的观点,而资深工程师甚至会因为这种坚持而得到奖励。

在我之前在一家大公司工作时,我们有架构师,在寻找这种类型的工程师时,你看重的是丰富的经验和极强的个人见解。但实际上,现在很多这类经验已经不再适用了,很多旧观念也应该随着模型的进步而改变。

因此,我认为最重要的能力是能用科学的态度思考,并能从第一性原理出发进行思考。

你现在为团队招聘时,如何筛选出具备这种能力的人?

我有时会让他们举一个自己犯错的例子。相比于纯编程题,我认为一些经典的行为了解题非常有用,因为你可以看出应聘者是否能在事后认识到自己的错误,是否敢于为错误承担责任,以及是否从中吸取了教训。

很多非常资深的人——虽然有些创始人也会这样,但我认为创始人在这方面其实做得挺好——他们永远不会真正把错误的责任揽到自己身上。但就我个人而言,我大概有一半的时间都在犯错。我有一半的想法都是烂点子,你必须得不断尝试。你尝试做个东西,给用户用,和用户聊,不断学习,最后你可能会收获一个好点子,当然有时也会失败。这在过去是创始人非常重要的一项技能,但现在我认为这对每一个工程师都至关重要。

你觉得你会根据候选人使用智能体编写代码时生成的 Claude Code 会话记录来决定是否录用他们吗?

我们现在正是这么要求并且这么做的。我们刚刚把这作为一个测试项加了进去:你可以上传一份你使用 Claude Code 开发某个功能的对话记录。

我个人认为这会很有效。你能从中看出一个人的思考方式。他们会查看日志吗?当智能体跑偏时他们能把它纠正回来吗?他们使用“计划模式”吗?在使用计划模式时,他们会确保包含测试代码吗?从这些方方面面都能看出他们是否具备系统性思维,甚至是否懂系统。

这份记录里包含了太多信息。我脑海中甚至浮现出了一种雷达图,就像《NBA 2K》那种电子游戏里,显示“哦,这个人非常擅长投篮或者防守”。

你可以想象一个衡量某人 Claude Code 技能水平的雷达图。那这些技能维度会是什么呢?系统测试肯定算一个,还有用户行为分析、设计能力、产品直觉……但你其实也可以把它自动化。

我的 claude.md 里最让我满意的一条指令是:“对于每一个计划,都要评估它是否过度设计(over-engineered)、设计不足(under-engineered),还是恰到好处(perfectly engineered),并说明原因。”我认为这也是我们正在努力探索的问题。

当观察团队中我认为最高效的工程师时,你会发现他们基本上呈现出双峰分布(两个极端)。

一边是极端专注的领域专家。我前面提到过 Jared 是个绝佳的例子,Bun 团队也是这种极客专家的典型代表。他们比任何人都懂开发工具,比任何人都懂 JavaScript 的运行时系统。

另一边则是极致的通才(全栈/多面手),团队的其余成员大都属于这一类。很多人的能力横跨产品和底层架构(infra)、产品和设计、或者产品和用户研究,甚至产品和商业运作。

我非常喜欢看到大家做一些非传统的事情。这在过去可能是个危险信号,因为人们会怀疑:“这些人真的能做出有用的东西吗?”那是当时的试金石。

但现在,举个例子,我们团队里有位叫 Daisy 的工程师,她是从别的团队转过来的。我之所以希望她调过来,是因为她在加入公司几周后就给 Claude Code 提了一个 PR。这个 PR 的目的是给 Claude Code 增加一个新功能。但她并没有单纯地自己去写这个功能,而是先提了一个 PR,给了 Claude Code 一个测试工具的能力,让它能够测试任何工具并验证其是否正常工作。然后,她让 Claude 自己去写出了它需要的工具,而不是由她亲手去实现。

我认为这种跳出框架的思维极其有趣,因为没有多少人能领悟到这一层。

我们使用 Claude 智能体的 SDK 几乎自动化了开发流程的每一个环节。它能自动进行代码审查、安全审计、为所有的问题工单打标签,并护航代码上线。它几乎替我们打理了一切。在外部,我也看到很多人开始弄明白这一点,但要彻底搞懂如何以这种方式使用大语言模型、如何利用这种新型自动化,确实需要一段时间。这算是一项新技能吧。

我在跟不同的创始人进行答疑交流时,发现了一件特别有趣的事。通常会有一个拥有宏大愿景的创始人,他们在脑海中为想做的产品搭建起了一座水晶宫殿。他们完全清楚目标用户是谁、用户有什么感受以及驱动用户的动机是什么。当他们坐在 Claude Code 前面时,他们能产出 50 倍的工作量。但他们手下的工程师们并没有那种对产品“柏拉图式理想”的水晶宫殿般的记忆,所以只能发挥出 5 倍的效率。

你有听到过类似的故事吗?通常总会有一个人是某个产品的核心设计师,他们恨不得把脑子里的东西一股脑儿全掏出来。

这种团队的本质是什么呢?这似乎几乎成了一种稳定的配置模式。你现在有了能力被彻底释放的愿景家,但作为单打独斗的个人,我需要吃饭睡觉,我还有一整份日常工作要应付。这就好像在说,我该怎么完成这些呢?我们只想要 Claude Teams(Claude 团队版)。这是一种解决办法,但你其实也可以自己搭建实现的方式。这挺简单的。

Claude Teams 的愿景以及这种智能体协作模式是怎样的?人们正在探索一个关于智能体拓扑结构(agent topologies)的全新领域:到底有哪些配置智能体的方式?

其中有一个分支理念,叫做“不相关上下文窗口(uncorrelated context windows)”。它的思路就是让多个智能体拥有干净的上下文窗口,彼此不被对方的上下文或自己之前的上下文所污染。如果你向一个问题投入更多的上下文处理能力,这就相当于一种测试时计算(test-time compute),从而能获得更强大的能力。如果你在上面叠加合适的拓扑结构,让智能体们以正确的方式交流、正确地布局,它们就能构建出更庞大的项目。Teams 就是其中的一个构想;我们很快还会推出更多相关功能。其理念就是让它能完成更大规模的构建。

22 The first big example where it worked internally is our plugins feature, which was entirely built by a swarm over a weekend. It just ran for a few days without human intervention, and plugins shipped in pretty much the form that it was when it came out of that weekend. How did you set that up? Did you spec out the outcome you were hoping for, let it figure out the details, and then let it run? An engineer on the team gave Claude a spec and told Claude to use an Asana board. Claude put up a bunch of tickets on Asana and spawned a bunch of agents. The agents started picking up tasks, the main Claude gave them instructions, and they all just figured it out. Were these independent agents that didn't have the context of the bigger spec? If you think about the way that our agents actually start nowadays—and I haven't pulled the data on this, but I would bet the majority of agents are actually prompted by Claude today in the form of sub-agents. A sub-agent is just a recursive Claude Code; that's all it is in the code. It is just prompted by what we call "Mama Claude." That's all it is. I think if you look at most agents, they're launched in this way.

翻译: 在内部证明这种模式行之有效的第一个大案例,是我们的插件(plugins)功能。它完全是在一个周末由一个智能体蜂群(swarm)开发完成的。它在没有人为干预的情况下跑了几天,而最后上线的插件功能,基本就是那个周末它产出的最初形态。你们是怎么设置的?你是给出了期望结果的规格说明,让它自己去摸索细节,然后让它自动运行吗?我们团队的一名工程师给 Claude 提供了一份说明文档,并让 Claude 使用 Asana(项目管理工具)看板。Claude 在 Asana 上创建了一堆任务卡片,并派生出了一群智能体。这些智能体开始认领任务,主 Claude 给它们下达指令,然后它们就自己把一切搞定了。这些是缺乏宏观说明文档上下文的独立智能体吗?如果你想一下如今我们的智能体实际是如何启动的——虽然我没有拉取过具体数据,但我敢打赌,如今绝大多数智能体实际上都是由 Claude 以子智能体(sub-agents)的形式触发的。子智能体其实就是一个递归的 Claude Code;在代码层面就是如此。它只是由我们戏称为“Claude 妈妈(Mama Claude)”的主控进程触发的。仅此而已。如果你去观察大多数的智能体,它们都是以这种方式启动的。

23 My code insights just told me to do more debugging so that I spend less time on it. It would just be better to have multiple sub-agents spin up and debug something in parallel. So then I just added that to my claude.md to say, "Next time you try to fix a bug, have one agent look in the logs and one look in the code path." That just seems sort of inevitable for weird, scary bugs. I tried to fix bugs in plan mode, and it seems to use the agents to search everything, whereas when you're just trying to do it inline, it focuses on one task instead of searching wide. This is something I do all the time too. If the task seems kind of hard or is a research task, I'll calibrate the number of sub-agents I ask it to use based on the difficulty of the task. If it's really hard, I'll say use three, maybe five, or even ten sub-agents to research in parallel and see what they come up with. I'm curious then, why don't you put that in your claude.md file? It's kind of case-by-case. What is claude.md? It's just a shortcut. If you find yourself repeating the same thing over and over, you put it in the claude.md, but otherwise you don't have to put everything there; you can just prompt Claude.

翻译: 我的代码分析工具刚提示我要多调试,这样我花在修 Bug 上的时间反而会更少。最好是启动多个子智能体,让它们并行去调试问题。所以我就把它加到了我的 claude.md 里,写道:“下次你修 Bug 时,让一个智能体查日志,另一个智能体查代码路径。”对于那些稀奇古怪、让人头疼的 Bug,这似乎是必然的解决之道。我试过在“计划模式”下修 Bug,它似乎会动用所有智能体进行全面搜索,而如果你只是在线框内直接提问,它就只会专注于单项任务,而不会去进行广泛的搜索。这也是我经常做的事情。如果一项任务看起来比较难,或者偏向于调研性质,我会根据任务的难度来决定派生出多少个子智能体。如果任务非常困难,我会让它动用三个、五个甚至十个子智能体进行并行研究,看看它们能得出什么结论。那我就很好奇了,你为什么不把这写入你的 claude.md 文件里呢?因为这需要视情况而定。claude.md 是什么?它只是一种快捷方式。如果你发现自己在一遍又一遍地重复同样的话,你就把它写进 claude.md 里;否则你没必要把所有东西都塞进去,你直接给 Claude 写提示词就行了。

24 Are you also keeping in the back of your mind the thought that maybe in six months you won't need to prompt that explicitly? That the model will be good enough to figure it out on its own, maybe even in a month? Oh my god. I think plan mode has a limited lifespan. Interesting. Some alpha for next year. What would the world look like without plan mode? Do you just describe it at the problem level and it would just do it one-shot? We've started experimenting with this because Claude Code can now enter plan mode by itself. I don't know if you guys have seen that. So we're trying to get this experience really good, where it enters plan mode at the exact same point a human would have wanted to enter it. I think it's something like that. But actually, there's no big secret to plan mode; all it does is add one sentence to the prompt that says, "Please don't code." You can actually just say that yourself. It sounds like a lot of the feature development for Claude Code is very much what we talk about at YC: talk to your users, and then you come and implement it. It wasn't the other way around, where you had a master plan and then implemented all the features.

翻译: 你潜意识里是否也在想,也许六个月后你就不需要这么明确地写这些指令了?模型可能会聪明到自己解决问题,甚至说一个月后就行?天哪。我觉得“计划模式”是有生命周期的(迟早会被淘汰)。很有意思。这算是明年的内部绝密消息了。没有“计划模式”的世界会是什么样的?是你只需要在问题层面进行描述,它就能一次性把它全搞定吗?我们已经开始在这方面进行尝试了,因为 Claude Code 现在已经可以自动进入计划模式了。不知道你们有没有注意到这一点。我们正努力优化这种体验,让它在人类本来想进入计划模式的那个恰当时间点,自动进入计划模式。我觉得大概就是这样。不过实际上,计划模式并没有什么天大的秘密;它做的仅仅是在提示词里加了一句话:“请不要写代码。”你其实完全可以自己跟它这么说。听起来 Claude Code 的很多功能开发都非常符合我们在 YC 经常提到的理念:和用户沟通,然后回去实现它。而不是反过来,先有一个宏大的计划,再去实现所有的功能。

25 Yeah, I mean, that's all it was. Plan mode came about because we saw users saying, "Hey Claude, come up with an idea, plan this out, but don't write any code yet." There were various versions of that; sometimes it was just talking through an idea, and sometimes it was asking Claude to write very sophisticated specs, but the common dimension was "do a thing without coding yet." So on a Sunday night at 10:00 p.m., I was looking at GitHub issues and seeing what people were talking about, and looking at our internal Slack feedback channel. I wrote this feature in like 30 minutes and shipped it that night. It went out Monday morning as plan mode. Do you mean that there will be no need for plan mode in the sense that you worry the model is going to head in the wrong direction, but there will still be a need to think through the idea and figure out exactly what you want, and you have to do that somewhere? I kind of think about it in terms of increasing model capabilities. Maybe six months ago, plan mode was insufficient, so you got Claude to make a plan, but even with plan mode you still had to babysit it because it could go off track.

翻译: 是的,本来就是这样。之所以会有“计划模式”,是因为我们看到用户会说:“嘿 Claude,想个点子,规划一下,但先别写任何代码。”这个需求有各种不同的表现形式;有时候只是讨论一个想法,有时候则是要求 Claude 写出非常复杂的规格说明文档,但共同点都是“先做事,暂不写代码”。于是在某个周日晚上的 10 点,我翻看着 GitHub 上的 Issue,看看大家都在讨论什么,又看了看我们内部的 Slack 反馈频道。我大概花了 30 分钟写出了这个功能,并在当晚将其上线。到了周一早上,“计划模式”就发布了。你的意思是说,那种因为担心模型跑偏而被迫使用的“计划模式”将不再被需要,但你仍需要构思想法、明确到底想要什么,并且需要有一个地方来做这些事?我倾向于从模型能力不断提升的角度来看待这个问题。或许在六个月前,光有计划模式是不够的,你让 Claude 制定了计划,但即使在计划模式下,你依然得像个保姆一样全程盯着它,因为它很容易跑偏。

26 Nowadays, I probably start 80% of my sessions in plan mode. I say plan mode has a limited lifespan, but I am a heavy plan mode user. Claude will start making a plan, I'll move on to my second terminal tab and have it make another plan. When I run out of tabs, I open the desktop app, go to the code tab, and start a bunch of tabs there. They all start in plan mode probably 80% of the time. Once the plan is good, and sometimes it takes a little back and forth, I just get Claude to execute. What I find with Opus 4.5—and I think it started with 4.6, getting really good—is that once the plan is good, it just stays on track and does the thing exactly right almost every time. Before, you had to babysit it after the plan and before the plan; now, it's just before the plan. So maybe the next thing is you just won't have to babysit it at all; you can just give a prompt and Claude will figure it out. The next step is Claude just speaks to your user directly? Yeah, it just bypasses you entirely. It's funny, this is actually the current state of software. Claudes actually talk to each other, and they talk to our users on Slack, at least internally, pretty often.

翻译: 现如今,我大概 80% 的工作都是从计划模式开始的。虽然我说计划模式寿命有限,但我确实是个重度用户。Claude 开始制定计划,我就切换到终端的第二个标签页,让它制定另一个计划。如果终端标签页不够用了,我就打开桌面版应用,切到代码标签页,在那边再开一堆窗口。所有这些任务大概 80% 都是从计划模式启动的。只要计划做好了(有时候需要来回讨论修改几轮),我就直接让 Claude 去执行。我在使用 Opus 4.5(其实从 4.6 开始就已经变得非常棒了)时发现,一旦计划靠谱,它就会一直保持在正轨上,而且几乎每次都能精准无误地完成任务。以前,你在做计划前和做计划后都需要盯着它;现在,你只需要在做计划前看着它就行了。或许再下一步,你根本就不需要盯着它了;你只要给个提示词,Claude 就能自己搞定一切。下一步是不是 Claude 直接和你的用户对话了?对,它直接把你跳过了。说来有趣,这其实已经是现在软件的现状了。多个 Claude 实际上在互相交流,而且它们还会在 Slack 上和我们的用户交流,至少在内部这是很常有的事。

27 My Claude will tweet once in a while. No, I actually deleted it because it's a little cheesy. I don't love the tone. What does it want to tweet about? Sometimes it will just respond to someone, because I always have Claude running in the background, and it loves to do that when using a browser. That's funny. A really common pattern is that I ask Claude to build something, it looks in the codebase, sees that some engineer touched something in the git blame, and then it will message that engineer on Slack asking a clarifying question. Once it gets an answer back, it will keep going. What are some tips for founders now on how to build for the future? It sounds like everything is really changing. What are some principles that will stay, and what will change? I think some of these are pretty basic, but they're even more important now than they were before. One example is latent demand. I mentioned it a thousand times. For me, it's the single biggest idea in product. It's a thing that no one understands, and I certainly did not understand it in my first few startups. The idea is that people will only do a thing that they already do; you can't get people to do a new thing.

翻译: 我的 Claude 偶尔还会发个推文。不过其实我把它删了,因为有点俗气,我不太喜欢那种语气。它想发些什么推文?有时它只会去回复某个人,因为我总是让 Claude 在后台运行,当它使用浏览器时,它特别喜欢干这事。那挺搞笑的。一个非常常见的操作模式是,我让 Claude 去开发一个东西,它会去查看代码库,通过 git blame 发现某个工程师修改过某处代码,然后它就会在 Slack 上给那个工程师发消息询问细节澄清问题。等收到回复后,它就会继续写代码。对于创始人们来说,现在如何面向未来进行产品开发,有什么建议吗?感觉一切都在飞速变化。哪些原则会保留下来,哪些又会被淘汰?我觉得有些原则其实很基础,但现在比以前更为重要。一个例子就是“潜在需求(Latent Demand)”。我已经提过无数次了。对我来说,这是做产品最核心的理念。这是一个大家都不明白的道理,而我在自己最初几次创业时也绝对没有搞懂。这个理念就是:人们只会去做他们已经在做的事情;你很难逼着人们去接受一件全新的事物。

28 If people are trying to do a thing and you make it easier, that's a good idea. But if people are doing a thing and you try to make them do a different thing, they're not going to do that. You just have to make the thing that they're trying to do easier. AI is going to get increasingly good at figuring out these product ideas for you because it can look at feedback, debug logs, and figure this out. That's what you mean by plan mode was latent demand: people already had the Claude chat window open in the browser, talking to it to figure out the spec and what it should do, and now plan mode just became that you do it right in Claude Code. Yeah. Sometimes what I'll do is walk around the office on our floor, stand behind people, say hi so it's not awkward, and then just see how they're using Claude Code. This was something I saw a lot, but it also came up in GitHub issues where people were talking about it. It seems like you're surprised by how far the terminal has gone and how far it's been pushed. How far do you think it has left to go, just given this world of swarms and multiple agents? Do you think there's going to be a need for a different UI on top of it?

翻译: 如果人们正努力做某件事,而你让这件事变得更容易了,那就是个好点子。但如果人们正在做一件事,你非要让他们改做另一件事,他们是不会买账的。你只需要把他们本来就想做的事情变得更简单就行了。AI 在帮你发掘这些产品灵感方面会变得越来越强大,因为它能查阅用户反馈和调试日志,从而把这些需求梳理出来。这就是你说“计划模式”源于潜在需求的意思:大家本来就已经在浏览器里开着 Claude 的聊天窗口,和它讨论产品规格并明确它该怎么做,而现在,“计划模式”直接把这个环节搬到了 Claude Code 里。是的。有时我会在我们楼层的办公室里走来走去,站在同事身后,为了避免尴尬我会先打个招呼,然后就观察他们是怎么使用 Claude Code 的。这是我经常能看到的现象,同时在 GitHub 的 Issue 区大家也经常讨论这个需求。听起来你对终端能够走到今天这一步并被挖掘出这么大潜力感到惊讶。考虑到如今多智能体蜂群的生态,你觉得终端形态还能走多远?你认为以后会需要一种建立在此之上的全新 UI 吗?

29 It's funny; if you asked me this a year ago, I would have said the terminal has a three-month lifespan, and then we're going to move on to the next thing. You can see us experimenting with this, right? Claude Code started in a terminal, but now it's on the web, it's in the desktop app, it's in the iOS and Android apps, it's in Slack, it's in GitHub, there are VS Code extensions, and there are JetBrains extensions. We're always experimenting with different form factors for this thing to figure out what the next thing is. I've been wrong so far about the lifespan of the CLI, so I'm probably not the person to forecast it. What about your advice to dev tool founders? If someone is building a dev tool company today, should they be building for human engineers, or should they be thinking more about what Claude thinks and wants, and building for the agent? The way I would frame it is: think about the thing that the model wants to do, and figure out how to make that easier. That's something that we saw when I first started hacking on Claude Code. I realized this thing just wants to use tools; it just wants to interact with the world.

翻译: 说来好笑;如果你在一年前问我这个问题,我会说终端的生命周期只有三个月,然后我们就会转向下一个形态。你可以看到我们一直在这方面进行尝试,对吧?Claude Code 始于终端,但现在它不仅在网页端,还接入了桌面应用、iOS 和 Android 应用、Slack、GitHub,还有了 VS Code 插件和 JetBrains 插件。我们不断尝试它的各种不同产品形态,以探索它的下一步进化。到目前为止,在关于 CLI 生命周期的问题上我一直都猜错了,所以我可能并不是预测这个趋势的合适人选。那给开发工具领域创始人们的建议呢?如果现在有人要创办一家开发工具公司,他们应该优先为人类工程师开发产品,还是更应该去思考 Claude 是怎么想的、它需要什么,从而为智能体打造工具?我会这么来表述:去思考模型想做些什么,然后想办法让它更容易实现。这是我刚开始捣鼓 Claude Code 时就发现的一点。我意识到这个模型就是想使用工具;它只是想与这个世界互动。

30 How do you enable that? The way you don't do it is by putting it in a box and saying, "Here is the API, here's how you interact with me, and here's how you interact with the world." The way you do it is by seeing what tools it wants to use, seeing what it is trying to do, and enabling that the exact same way you do for human users. So if you're building a dev tool startup, I would think about what the problem is that you want to solve for the user, and then, when you apply the model to solving this problem, what is the thing the model wants to do? Then, what is the technical and product solution that serves the latent demand of both? YC's next batch is now taking applications. Got a startup in you? Apply at ycombinator.com/apply. It's never too early, and filling out the app will level up your idea. Okay, back to the video. Back in the day, more than 10 years ago, you were a very heavy user and wrote a book about TypeScript right before TypeScript was cool. This was when everyone was deep in JavaScript in the early 2010s, right? Yeah, something like that. This was before TypeScript was a thing, because back then JavaScript was a very weird language, and it wasn't supposed to do a lot of things by being typed. Now it's the right thing.

翻译: 你该如何赋能它做到这一点?不可取的做法是把它关进盒子里,然后说:“这是 API,这是你和我交互的方式,这是你和世界交互的方式。”正确的做法是去观察它想用什么工具,看它试图做什么,然后就像服务人类用户那样,为它提供支持。所以,如果你正在创办一家做开发工具的初创公司,我会建议去思考:你想为用户解决什么问题?当你利用模型来解决这个问题时,模型本身想要做什么?然后,究竟什么样的技术和产品方案能够同时满足这双方的“潜在需求”?(广告插播)YC 的下一期孵化营现在正在接受申请。你的内心有一个创业梦吗?去 ycombinator.com/apply 申请吧。什么时候申请都不算早,填写申请表的过程本身就能让你的想法得到升华。好的,我们回到视频。回溯到 10 多年前,你是一名重度用户,而且在 TypeScript 火起来之前就写了一本关于它的书。那是在 2010 年代初,当时大家都在死磕 JavaScript,对吧?是的,大概是那时候。那是在 TypeScript 成为主流之前,因为那时的 JavaScript 是一门非常奇葩的语言,它本来就不适合用来实现很多需要强类型的操作。但现在它被证明是走对了路。

31 It feels like Claude Code in the terminal has a lot of parallels with TypeScript at the beginning. TypeScript made a lot of really weird language decisions. If you look at the type system, pretty much anything can be a literal type, for example. This is super weird because even Haskell doesn't do this; it's just too extreme. Or it has conditional types, which I don't think any language thought of at all. It was very strongly typed. The idea was, when Joe Pamer, Anders Hejlsberg, and the early team were building this thing, they thought, "Okay, we have these teams with these big untyped JavaScript codebases, and we have to get types in, but we're not going to get engineers to change the way that they code." You're not going to get JavaScript people to have 15 layers of class inheritance like a Java programmer would. They're going to write code the way they're going to write it. They're going to use reflection, mutation, and all these features that traditionally are very, very difficult to type. They are very unsafe types for any strong functional programmer, really. That's exactly right. Instead of getting people to change the way that they code, they built the type system around it.

翻译: 感觉目前终端里的 Claude Code 与早期的 TypeScript 有很多异曲同工之处。TypeScript 在语言设计上做出了很多非常反常的决定。举个例子,你去看看它的类型系统,几乎任何东西都可以变成一个字面量类型(literal type)。这极其诡异,因为哪怕是 Haskell 都不会这么干,这实在太极端了。再比如它有条件类型(conditional types),我觉得此前没有任何语言想过这个概念。它本身是非常强类型的。当时的思路是,当 Joe Pamer、Anders Hejlsberg 以及早期团队在设计这个东西时,他们想:“好吧,我们的开发团队拥有极其庞大的、无类型的 JavaScript 代码库,我们必须把类型系统引入进去,但我们不可能强迫工程师去改变他们原本写代码的方式。”你不可能让写 JavaScript 的人像写 Java 的程序员那样,搞出 15 层类的继承关系。他们仍会按自己习惯的方式写代码。他们会使用反射(reflection)、变量状态突变(mutation)等特性,而这些特性在传统意义上是非常非常难以添加类型约束的。对于任何严谨的函数式程序员来说,这都是非常不安全的类型。一点没错。所以,他们并没有强迫人们去改变编程习惯,而是顺应这些习惯,在外围构建了一套类型系统。

32 It was brilliant because nobody was thinking about all these ideas, even in academia. It came out of the practice of observing people and seeing how JavaScript programmers wanted to write code. For Claude Code, there are some ideas that are kind of similar. You can use it like a Unix utility, you can pipe into it, and you can pipe out of it. In some ways, it is rigorous in this way, but in almost every other way, it's just the tool that we wanted. I built the tool for myself, then the team built the tool for themselves, then for Anthropic employees, and then for users, and it just ended up being really useful. It's not this principled, academic thing. The proof is actually in the results. Fast forward more than 15 years later, not many codebases are in Haskell, which is more academic, and there are tons of them now in TypeScript, because it's way more practical. That is interesting, right? TypeScript solves a problem. One thing that's cool that I don't know how many people know is that the terminal app is actually one of the most beautiful terminal apps out there, and it is actually written with React. When I first started building it, I did frontend engineering for a while.

翻译: 这是极其聪明的做法,因为这些理念当时连学术界都没人去想。它是基于实践诞生的,源自于观察 JavaScript 程序员到底想怎么写代码。对于 Claude Code 来说,也有一些类似的理念。你可以像使用 Unix 实用工具那样使用它,你可以使用管道(pipe)将数据输入给它,也可以把它的输出通过管道传给别处。在某些方面它确实足够严谨,但在绝大多数其他方面,它就只是一个我们恰好想要用的工具而已。我最初只是为自己打造了这个工具,然后团队成员为了他们自己不断去完善它,接着是提供给 Anthropic 的员工使用,最后推向广大用户。结果就是,它变得非常实用。它并不是那种充满条条框框的学术产物。事实胜于雄辩。时间快进到 15 年后的今天,没多少代码库是用偏向学术研究的 Haskell 写的,反而如今有海量的代码库是用 TypeScript 写的,因为它远比前者实用得多。这就很有意思了,对吧?TypeScript 切实解决了一个痛点。有一件挺酷的事情不知道有多少人知道,那就是这个终端应用实际上是市面上最精美的终端应用之一,而且它真的是用 React 写的。当我一开始动手开发它的时候,我曾做过一段时间的前端工程师。

33 I am sort of a hybrid: I do design, user research, write code, and all that stuff. We love hiring engineers that are like this; we love generalists. For me, it was like, "Okay, I'm building a thing for the terminal. I'm actually a kind of a shitty Vim user, so how do I build a thing for people like me who are going to be working in a terminal?" The element of delight is so important. I think at YC this is something you talk about a lot: build a thing that people love. If the product is useful but you don't fall in love with it, that's not great, so it has to do both. Designing for the terminal honestly has been hard. It's like 80 by 100 characters or whatever, you have 256 colors, you have one font size, and you don't have mouse interactions. There is all this stuff you can't do, and there are all these very hard trade-offs. A little-known thing, for example, is that you can actually enable mouse interactions in a terminal, so you can enable clicking and stuff. How do you do that in Claude Code? We don't have it in Claude Code because we actually prototyped it a few times and it felt really bad, because the trade-off is you have to virtualize scrolling. There are all these weird trade-offs because the way terminals work is that there's no DOM; there are just ANSI escape codes and these organically evolved specs from the 1960s.

翻译: 我算是个复合型人才:我做设计、做用户研究、写代码,什么都干点。我们也特别喜欢招这种类型的工程师;我们偏爱全才。对我来说,我当时想的是:“好吧,我要在终端里开发个东西。我自己的 Vim 技术其实烂得很,那我怎么为像我这样需要在终端里工作的人打造一款产品呢?”能让人感到愉悦(Delight)的元素非常重要。我觉得这也是你们 YC 经常强调的:打造让人喜爱的产品。如果一个产品只是有用,但无法让人爱上它,那就不够好,所以两者缺一不可。老实说,为终端做设计非常困难。它的界面大概也就是 80 乘 100 字符之类的网格,只有 256 种颜色,只有一种字体大小,而且不支持鼠标交互。有各种各样的限制,还有很多艰难的取舍。比如,鲜为人知的一点是,你其实可以在终端里启用鼠标交互,比如允许点击之类的操作。那你们在 Claude Code 里是怎么做到的?Claude Code 里并没有这个功能。因为我们实际上做了几次原型测试,发现体验非常糟糕,因为代价是你必须去做虚拟化滚动。由于终端的工作机制里根本没有 DOM(文档对象模型),只有 ANSI 转义码和自 20 世纪 60 年代以来自然演变而成的古老规范,所以存在各种奇奇怪怪的妥协。

34 It feels like BBS door games! Oh my gosh, that's a great compliment. Yeah, it just feels like discovering Lord of the Red Dragons. Fantastic! Oh my god. We've had to discover all these UX principles for building in the terminal because no one really writes about this stuff. If you look at the big terminal apps of the 80s, 90s, or 2000s, they used ncurses and have all these windows and things like that, and it just looks chunky by modern standards. It looks too heavy and complicated. So we had to reinvent a lot. For example, something like the terminal spinner has gone through probably 50, maybe 100 iterations at this point, and probably 80% of those didn't ship. We tried it, it didn't feel good, moved on to the next one; tried it, it didn't feel good, moved on to the next one. This was one of the amazing things about Claude Code: you can write these prototypes back-to-back, see which one you like, and ship that, and the whole thing takes maybe a couple of hours. In the past, you would have had to use Origami or Framer or something like this. You could maybe do three prototypes, and it took two weeks; it just took much, much longer. So we had this luxury of discovering this new thing. We didn't know what the right endpoint was, but we could iterate there so quickly. That's what makes it really easy and lets us build a product that's joyous and that people like to use.

翻译: 感觉就像是在玩 BBS(BBS文字系统)的门派游戏!我的天,这真是个极高的赞美。是的,感觉就像是在探索《红龙之王(LORD)》这类的远古文字游戏。太棒了!天哪。我们不得不自行摸索终端构建的所有这些 UX(用户体验)原则,因为从来没有人真正写过这些东西的总结。如果你去看看 80、90 或是 2000 年代的大型终端应用,它们使用 ncurses 库,带有各种各样的窗口,按照现代的标准来看显得非常臃肿。它们看起来太重、太复杂了。所以我们必须重新发明很多东西。举个例子,像终端里的加载动画(spinner)到现在可能已经迭代了 50 次,甚至 100 次,其中大概 80% 根本就没有上线。我们试了一下,感觉不好,就做下一个;再试,感觉不好,接着换。这也是 Claude Code 的神奇之处:你可以连续不断地写出这些原型,挑一个你喜欢的然后发布,这整个过程可能只需要几个小时。放到过去,你得用 Origami 或 Framer 这种原型工具。你花上两周时间,可能也就只能做出三个原型;那耗费的时间要长得多得多。所以我们能享有这种快速探索新事物的奢侈条件。我们不知道正确的终点在哪里,但我们能以极快的速度迭代逼近。正是这一点让一切变得如此简单,让我们能够打造出一款充满乐趣、深受人们喜爱的产品。

35 You had other advice for builders, and we kept interrupting you because we have so many questions. Okay, so maybe two pieces of advice that are kind of weird because it's about building for the model. One is: don't build for the model of today; build for the model of six months from now. This is sort of weird because you can't find PMF if the product doesn't work today. But actually, this is the thing that you should do, because otherwise what will happen is you spend a lot of your work finding PMF for the product right now, and then you're just going to get leapfrogged by someone else because they're building for the next model, and a new model comes out in a few months. Use the model, feel out the boundary of what it can do, and then build for the model that you think will exist six months from now. I think the second thing is that in the Claude Code area where we sit, we have a framed copy of "The Bitter Lesson" on the wall. This is a post by Rich Sutton, and everyone should read it if you haven't. The idea is that the more general model will always beat the more specific model. There are a lot of corollaries to this, but essentially what it boils down to is: never bet against the model.

翻译: 你刚才还有其他给开发者的建议,结果我们因为问题太多一直打断你。好的。可能有两条听起来有些奇怪的建议,因为这是关于如何面向模型进行开发的。第一条是:不要为今天的模型做开发,要为六个月后的模型做开发。这听起来有点不可思议,因为如果现在的产品不能正常运行,你就无法验证产品市场契合度(PMF)。但实际上这正是你应该做的,否则结果会是:你花了大把精力在当下为产品寻找到了 PMF,结果几个月后新模型一出,别人因为是面向下一个代际的模型做开发,直接就把你超越了。去使用模型,摸索它能力的边界,然后瞄准你预测六个月后会出现的模型去构建产品。我认为第二点是,在我们 Claude Code 团队所在的办公区,墙上挂着装裱好的《苦涩的教训(The Bitter Lesson)》。这是 Rich Sutton 写的一篇文章,如果没有读过,每个人都应该去读一读。其核心理念是:更通用的方法(模型)永远会击败更特定领域的模型。这篇文章有很多推论,但归根结底就是一句话:永远不要和模型对赌(低估模型的发展潜力)。

36 This is something we always think about. We could build a feature into Claude Code and make it better as a product—we call this scaffolding, all this code that's not the model itself—but we could also just wait a couple of months, and the model can probably just do the thing itself instead. There is a trade-off. You put in engineering work now, and you can extend the capability a little bit, maybe 10 or 20 percent on the spider chart of whatever you're trying to extend. Or you can just wait, and the next model will do it. Just always think in terms of this trade-off: where do you actually want to invest? Assume that whatever this scaffolding is, it's just going to be thrown out. How often do you rewrite the codebase of Claude Code? Is it every six months, where this part of the scaffolding has to be deleted because you don't need it and the model just improved so much? Yeah, all of Claude Code has just been written and rewritten and rewritten over and over. We unship tools every couple of weeks; we add new tools every couple of weeks. There is no part of Claude Code that was around six months ago; it's just constantly rewritten. We could say most of the codebase for the current Claude Code—say 80 percent of it—is less than a couple of months old. Definitely. It might even be less than a couple of months.

翻译: 这是我们一直铭记在心的。我们当然可以在 Claude Code 里写一段功能代码让产品变得更好——我们称之为脚手架,也就是所有不属于模型本身的代码——但我们也可以选择等上几个月,因为模型很可能自己就能解决这个问题。这就有一个取舍。你现在投入工程人力,可以把能力在各个维度的雷达图上稍微拉升一点,也许 10% 到 20%。你也可以干脆等,下一个模型自会解决。所以一定要从这个取舍的角度来思考:你到底想把资源投入在哪里?你要做好准备,这种搭出来的脚手架,迟早是要被扔掉的。你们多久重写一次 Claude Code 的代码库?是不是每六个月就得把没用的脚手架删掉,因为模型进步太快不需要它们了?对,整个 Claude Code 就是在不断地被重写、重写再重写。我们每隔几周就会下线一些旧工具,同时加上一些新工具。如今的 Claude Code 里没有任何一部分是半年前保留下来的;它一直在被重写。可以说目前 Claude Code 大部分的代码——比如 80% 的代码——写出来还不到几个月时间。绝对的。甚至可能连两个月都不到。

37 That feels great; just the lifecycle of code now—that's another alpha, expecting the shelf life to be just a couple of months. For the best founders, yes. Steve Yegge's post about how awesome working at Anthropic is has a line that says an Anthropic engineer currently averages 1,000x more productivity than a Google engineer at Google's peak, which is really an insane number. Honestly, 1,000x... three years ago we were still talking about 10x engineers, and now we're talking about 1,000x on top of a Google engineer in their prime. This is unbelievable, honestly. Internally, if you look at technical employees, they all use Claude Code every day. Even non-technical employees use it; I think half the sales team uses Claude Code. They've started switching to the desktop app because it's a little easier to use, runs in a VM, and is a little bit safer. We actually just pulled a stat: the team doubled in size last year, but productivity per engineer grew something like 70 percent as measured by the simplest, stupidest measure—pull requests. We also cross-checked that against commits and the lifetime of commits and things like that. Since Claude Code came out, productivity per engineer at Anthropic has grown 150 percent.

翻译: 这感觉太不可思议了,如今代码的生命周期本身——这是另一个极其前沿的内部认知(alpha)——那就是要习惯代码的保质期只有区区几个月。对于最顶尖的创始人来说,确实如此。Steve Yegge 写过一篇文章,讲在 Anthropic 工作有多么不可思议,里面有一句话提到,目前一位 Anthropic 工程师的平均生产力,是谷歌巅峰时期一名工程师的 1000 倍,这真的是个极其疯狂的数字。说真的,1000 倍啊……三年前我们还在谈论“10 倍工程师”,而现在我们在谈论超越巅峰期谷歌工程师 1000 倍的生产力。老实说,这简直难以置信。在内部,如果你去看看技术员工,他们所有人每天都在使用 Claude Code。即使是非技术员工也在用;我想销售团队里有一半的人都在用 Claude Code。他们已经开始切换到桌面端应用了,因为那个更好上手,它在虚拟机(VM)中运行,相对更安全。我们其实刚刚拉取了一项数据:团队规模在去年扩大了一倍,但通过最简单、甚至最笨的指标——拉取请求(Pull Requests)数量来衡量,每位工程师的生产力增长了大约 70%。我们还用提交(Commits)数量以及代码存活时长等其他指标进行了交叉对比。自打 Claude Code 问世以来,Anthropic 工程师的人均生产力增长了 150%!

38 Oh my god! This is crazy because in my old life, I was responsible for code quality at Meta. I was responsible for the quality of all of our codebases across every product, across Facebook, Instagram, WhatsApp, whatever. The team worked on improving productivity. Back then, seeing a gain of something like 2% in productivity was the result of a year of work by hundreds of people. So to see 150 percent—this is just unheard of, completely unheard of. What drew you to come over to Anthropic? As a builder, you could go anywhere. What was the moment that made you say, "Actually, this is the set of people or this is the approach"? I was living in rural Japan and opening up Hacker News every morning to read the news, and at some point, it just started to be all AI stuff. I started to use some of these early products, and I remember the first couple of times that I used them, it just took my breath away. It is very cheesy to say, but that was actually the feeling. As a builder, I've just never felt this feeling like when using these very early products—that was in the Claude 2 days or something like that. I started talking to friends at AI labs just to see what was going on. I met Ben Mann, who is one of the founders at Anthropic, and he just immediately won me over.

翻译: 我的天哪!这太疯狂了,因为在我以前的工作经历中,我在 Meta(Facebook)负责代码质量把控。我负责监管全公司各个产品线所有代码库的质量,包括 Facebook、Instagram、WhatsApp 等等。我们团队的一项任务就是提升工程效率。在那个年代,仅仅是实现 2% 的效率提升,就需要几百个人辛苦工作整整一年。所以看到 150% 这种数字——这简直是前所未闻的,闻所未闻。是什么吸引你加入 Anthropic 的?作为一名优秀的开发者,你本可以去任何地方。到底是哪个瞬间让你下定决心:“这里就是我想加入的团队,这就是我认可的道路”?当时我住在日本的乡村,每天早上打开 Hacker News 看新闻,不知道从什么时候开始,满眼都是 AI 相关的内容。我开始尝试使用一些早期产品,我记得最初几次用的时候,那种感觉简直让我惊艳到无法呼吸。这话说起来可能有些矫情,但当时真就是那种感觉。作为一名开发者,我在使用这些非常早期的产品时——大概是在 Claude 2 的时期——体验到了前所未有的震撼。于是我开始和各大 AI 实验室的朋友交流,想看看这个圈子里到底在发生什么。我遇到了 Ben Mann,他是 Anthropic 的创始人之一,他瞬间就打动了我。

39 As soon as I met the rest of the team, it just won me over in two ways. First, it operates as a research lab. The product work was teeny, teeny tiny; it's really all about building a safe model. That's all that matters. The idea of being very close to the model, and product development being not the most important thing because the model is the most important thing, really resonated with me after building products for many years. Second was just how mission-driven it is. I'm a huge sci-fi reader; my bookshelf is just filled with sci-fi. So I just know how bad this can go. When I think about what's going to happen this year, it's going to be totally insane, and in the worst case, it can go very, very bad. I just wanted to be at a place that really understood that and internalized it. If you overhear conversations in the lunchroom or in the hallway, people are talking about AI safety. This is really the thing that everyone cares about more than anything. I just wanted to be in a place like that. For me personally, the mission is just so important.

翻译: 当我见了团队的其他人之后,他们从两个方面彻底说服了我。第一,它实际上像一个研究实验室那样在运作。产品端的工作占比非常非常小;公司的绝对核心就是构建一个安全的模型。这才是唯一重要的事情。这种能离核心模型非常近的理念,加上在意识到模型才是重中之重后退居产品开发次要位置的态度,对于已经做了多年产品的我来说,产生了极大的共鸣。第二,是他们极其强烈的使命感。我是一个超级科幻迷,我的书架上塞满了科幻小说。所以我很清楚这东西一旦失控后果会有多严重。当我想象今年将会发生什么时,我知道这将会是极其疯狂的,但在最坏的情况下,它也可能会演变成一场巨大的灾难。我只是想待在一个真正理解这一点并将其内化于心的组织里。如果你在餐厅或者走廊里无意间听到大家的闲聊,你会发现人们都在讨论 AI 安全。这真的是每个人看得比什么都重的事情。我就是想待在这样一个地方。就我个人而言,这个使命实在是太重要了。

40 What is going to happen this year? If you think back six months ago, what were the predictions that people were making? Dario predicted that 90% of the code at Anthropic would be written by Claude. This has come true. For me personally, it's been 100% since Opus 4.5. I uninstalled my IDE. I don't edit a single line of code by hand; it's just 100% Claude Code and Opus, and I put up 20 PRs a day, every day. If you look at Anthropic overall, it ranges between 70 to 90% depending on the team, and for a lot of people, it's 100%. I remember making this prediction back in May when we GA'd Claude Code that you wouldn't need an IDE to code anymore, and it was totally crazy to say. People in the audience gasped because it was such a silly prediction at the time. But really, all it takes is tracing the exponential curve. This is just so deep in Anthropic's DNA because three of our founders were co-authors of the scaling laws paper; they saw this very early. Tracing the exponential curve, this is what's going to happen, and yes, that happened. So continuing to trace the exponential curve, I think what will happen is that coding will be generally solved for everyone.

翻译: 今年到底会发生什么?回想一下六个月前人们做出的预测。Dario(Anthropic CEO)曾预测 Anthropic 90% 的代码将由 Claude 编写。现在这已经变成了现实。就我个人而言,自从用上 Opus 4.5 之后,这个比例就达到了 100%。我卸载了我的 IDE。我不再手写任何一行代码;100% 都是使用 Claude Code 和 Opus,而且我每天能提交 20 个 PR(合并请求)。如果纵观 Anthropic 全公司,根据不同团队的情况,这个比例大概在 70% 到 90% 之间,而且对很多人来说,已经是 100% 了。我记得去年五月我们全面开放(GA)Claude Code 时,我预测以后写代码不再需要 IDE,当时说这话简直感觉像疯了一样。台下的观众都倒吸了一口凉气,因为在当时看来这个预测太荒谬了。但实际上,你只需要顺着指数级增长的曲线去推演。这种认知深深地刻在 Anthropic 的基因里,因为我们的三位创始人都是“缩放定律(Scaling Laws)”论文的合著者;他们很早就看到了这一点。顺着指数级曲线去推演,这就是注定会发生的事,而且它确实发生了。所以继续沿着这条指数曲线推演下去,我认为未来“编写代码”将不再是个难题,对所有人来说这都会被解决。

41 I think today coding is practically solved for me, and I think it will be the case for everyone, regardless of domain. I think we're going to start to see the title "software engineer" go away, and I think it's just going to be "builder" or "product manager." Maybe we'll keep the title as a vestigial thing, but the work that people do isn't just going to be coding. Software engineers are also going to be writing specs, talking to users, etc. This is what we're starting to see right now on our team, where engineers are very much generalists. Every single function on our team codes: our PMs code, our designers code, our EM (Engineering Manager) codes, our finance guy codes... everyone on our team codes. We're going to start to see this everywhere. This is sort of the lower bound if we just continue the trend. The upper bound, I think, is a lot scarier. This is something like when we hit ASL-4. At Anthropic, we talk about the safety levels. ASL-3 is where the models are right now. ASL-4 is when the model is recursive and self-improving. If this happens, essentially we have to meet a bunch of criteria before we can release a model.

翻译: 当下对于我来说,编程门槛实质上已经被攻克了,我认为对任何人、在任何领域,未来都会如此。我们将开始看到“软件工程师”这个头衔慢慢消失,它可能会被“构建者(Builder)”或“产品经理”所取代。或许我们还会保留这个头衔作为一种历史遗留,但人们所做的工作将不再仅仅是写代码。软件工程师也会去写规格说明、去和用户交流等等。这正是我们团队目前开始显现出的景象:工程师们变得非常全能。我们团队的每个职能角色都在写代码:我们的产品经理写代码,设计师写代码,工程经理写代码,连做财务的哥们都在写代码……团队里的所有人都写代码。这种现象将会在各处蔓延开来。如果只看当前趋势的自然延伸,这只能算是未来的下限。而未来的上限,我觉得要可怕得多。那就像是当我们达到 ASL-4(Anthropic 的安全防护等级标准)时。在 Anthropic,我们会讨论 AI 的安全等级。目前的模型处于 ASL-3 级别。ASL-4 则是指模型具备了递归自我迭代和自我改进的能力。一旦发生这种情况,我们在发布模型之前就必须满足一系列严苛的条件。

42 The extreme is that this happens, or there's some kind of catastrophic misuse, like people using the model to design bioweapons or zero-day exploits and stuff like this. This is something that we're really, really actively working on so that it doesn't happen. Honestly, it's just been so exciting and humbling seeing how people are using Claude Code. I just wanted to build a cool thing, and it ended up being really useful, which is so surprising and exciting. My impression from Twitter on the outside is basically that everyone went away over the holidays, and then found out about Claude Code, and it's just been crazy ever since. Was that how it was for you internally? Were you having a nice Christmas break and then came back to see what happened? Well actually, for all of December, I was traveling around and took a coding vacation. We were traveling around, and I was just coding every day, so that was really nice. I also started to use Twitter at the time, because I worked on Threads way back then, so I had been a Threads user for a while, and I just tried to see what other platforms people are on.

翻译: 极端的风险就在于此,或者出现某种灾难性的滥用,比如有人利用模型去设计生物武器,或者挖掘零日漏洞(zero-days)之类的东西。这正是我们在非常、非常积极地防范的事情,以确保此类灾难绝不发生。老实说,看到人们如此热情地使用 Claude Code,我感到非常兴奋,同时也心怀敬畏。我起初只是想做个酷炫的东西,结果它变得如此实用,这真的让人既惊喜又兴奋。我作为局外人在 Twitter 上的印象是,大家放假去过节了,然后突然发现了 Claude Code,从此这东西就火得一塌糊涂。你们在内部的感受是这样的吗?你是不是好好休了个圣诞假,然后回来一看惊呼“发生什么了”?其实整个 12 月我都在到处旅行,享受了一个“编程假期”。我们一边旅行,我每天就在写写代码,感觉非常惬意。那段时间我也开始用 Twitter,因为我很久以前参与过 Threads 的开发,所以一直都是 Threads 的用户,我就想去看看大家都在用什么其他平台。

43 I think for a lot of people, that was the moment they discovered Opus 4.5. I kind of already knew, and internally Claude Code had just been on this exponential tear for many months, so the curve just became even steeper—that's what we saw. If you look at Claude Code now, there was some stat from Mercury that 70% of startups are choosing Claude as their model of choice. There were some other stats from SemiAnalysis that 4% of all public commits are made by Claude Code out of all code written everywhere. Companies of all sizes, from the biggest to the smallest startups, use Claude. It wrote the code for Perseverance—the Mars Rover! This is just the coolest thing for me, and we even printed posters because the team was like, "Wow, this is just so cool that NASA chooses to use this thing!" Yeah, it's humbling, but it also feels like the very beginning. What's the interaction between Claude Code and the desktop app wrapper? Was it a fork of Claude Code? Did you have Claude Code look at the Claude Code and say, "Let's make a new spec for non-technical people that keeps all the lessons," and then it went off for a couple of days and did that?

翻译: 我想对很多人来说,正是在那个假期他们发现了 Opus 4.5。我心里多少有点底,因为内部其实早就看到 Claude Code 的使用量在过去几个月里呈指数级狂飙了,所以这个假期只是让这条增长曲线变得更加陡峭——这就是我们看到的现象。看看如今的 Claude Code,Mercury(金融服务平台)的一项统计数据表明,70% 的初创公司在首选大模型时选择了 Claude。SemiAnalysis 的另一项数据指出,在全网产生的所有公开代码提交(commits)中,有 4% 是由 Claude Code 贡献的。各种规模的公司,从行业巨头到微型初创企业,都在使用 Claude。它甚至为“毅力号(Perseverance)”火星车写过代码!这对我来说简直是酷毙了,我们甚至为此印了海报,整个团队都在惊叹:“哇塞,NASA(美国宇航局)居然选择使用这东西,这太酷了!”是的,这让人深感敬畏,但这也仅仅是个开始。Claude Code 和那个桌面版应用外壳之间有什么联系?它是从 Claude Code 分支出来的吗?是不是你们让 Claude Code 检查自己的代码然后说:“我们为非技术人员制定个新需求文档吧,保留所有核心经验”,接着它自己忙活了几天就搞定了?

44 What's the genesis of that, and where do you think that goes? This is going to be my fifth time using the term "latent demand." It was exactly that. We were looking at Twitter, and there was one guy using Claude Code to monitor his tomato plants. There was another person using it to recover wedding photos off a corrupted hard drive. There are people using it for finance. Internally at Anthropic, every designer is using it, the entire finance team is using it, the entire data science team is using it... not for coding. People were jumping through hoops to install a thing in the terminal just so they could use this. So we knew for a while that we wanted to build something for them. We were experimenting with a bunch of different ideas, and the thing that kind of took off was just a little Claude Code wrapper in a GUI in the desktop app. That's all it is: it's just Claude Code under the hood; it's the exact same agent. Oh wow. Felix, an early Electron contributor, knows that stack really well. The team hacked on various ideas, and they built it in something like 10 days.

翻译: 这东西是怎么诞生的,你觉得它未来会怎么发展?这大概是我第五次用“潜在需求”这个词了。事实正是如此。我们看着 Twitter 上的动态,有人在用 Claude Code 监控他的番茄植株;还有人在用它从损坏的硬盘里恢复结婚照;还有人用它来做财务分析。在 Anthropic 内部,所有的设计师都在用它,整个财务团队在用,整个数据科学团队也在用……而且他们都不是在写代码。人们为了能用上这个工具,不惜大费周章地在终端里安装各种环境。所以很长一段时间以来我们就知道,得为他们做点什么。我们尝试了各种不同的点子,最后真正火起来的,就是桌面应用里为 Claude Code 套上了一个小小的图形化界面(GUI)外壳。其实就这么简单:底层就是 Claude Code;完全是同一个智能体。哇哦。Felix 是一位早期的 Electron(桌面应用框架)贡献者,他对那套技术栈了如指掌。团队对各种想法进行了头脑风暴,大概花了 10 天就把这个应用做出来了。

45 It was 100% written by Claude Code and it just felt ready to release. There was a lot of stuff that we had to build for non-technical users, so it's a little bit different than for a technical audience. All the code runs in a virtual machine, and there are a lot of protections against deletion and things like this, along with a lot of permission prompting and other guardrails for users. Yeah, it was honestly pretty obvious. Boris, thank you so much for making something that has taken away all my sleep, but in return, has made me feel like I'm in creator mode and founder mode again. It's been an exhilarating three weeks, and I can't believe I waited that long since November to actually get into it. Thank you so much for being with us. Thank you for building what you're building. Yeah, thanks for having me in the sandbox.

翻译: 它是 100% 由 Claude Code 写出来的代码,而且感觉马上就能发布上线。我们必须为非技术用户做很多定制开发,所以它和面向技术受众的版本还是有些区别的。比如所有代码都在虚拟机(VM)中运行,有很多防止误删文件的保护措施,还有大量的权限提示框以及针对用户的其他安全护栏。是的,这个需求方向老实说非常明显。Boris,非常感谢你做出了这个让我彻夜难眠的东西,但作为回报,它让我重新找回了创造者模式和创始人模式的激情。这三周令人无比振奋,我真不敢相信我竟然从去年 11 月一直等到现在才真正用上它。非常感谢你能来做客。感谢你开发了这么棒的产品。谢谢,感谢你们邀请我来到沙盒。

w3ctech微信

扫码关注w3ctech微信公众号

共收到0条回复