Opus 4.7 发布
Opus 4.7 发布引发性能与成本争议。
1. 关键信息
- 4.7 移除 Sampling 参数(temperature/top_p/top_k),thinking 内容默认省略,需显式 opt-in。
- 4.7 推理内容默认不返回,延时应略低;设 display=summarized 可回显。
- cache TTL 从 60 min 意外回归 5 min(#1),致 20–32% 成本与 quota 激增;3 月 6–8 日切换(#1)。
- 新增 effort level:xhigh & max vision(#8)。
2. 羊毛/优惠信息
无。
3. 最近动态
- 4.7 发布后多用户报告降智、幻觉与安全误拦(#25、#27、#42、#66)。
- GitHub Copilot multiplier 7.5x,四月促销用完 quota(#70)。
- 部分用户切换回 4.6 或 IDE-based 工作流(#27、#66、#76)。
4. 争议或不同意见
- 性能与稳定性分歧:有人觉 4.7 更强(#63),更多人认为退化、不可靠(#25、#33、#42、#66)。
- 归因争议:是否调低 reasoning effort 或引入量化模型(#33)。
- enterprise quota 未 reset,引发 revenue 担忧(#48、#51)。
5. 行动建议
- 暂不盲目升级,保留 4.6 作为回退;监控 quota 与成本(#1、#48)。
- 显式设置 display=summarized 复现思考链;避免非默认 sampling 参数(#71)。
- 考虑 IDE-based 流程以降低中断风险(#57)。
大概率这周。 cache TLL 从 60 min 变成 5 min 是不是因为这个 https://github.com/anthropics/claude-code/issues/46829 https://github.com/anthropics/claude-code/issues/46829 已打开 01:49AM - 12 Apr 26 UTC 已关闭 10:15AM - 12 Apr 26 UTC https://github.com/seanGSISG bug has repro area:cost api:anthropic # Cache TTL appears to have silently regressed from 1h to 5m around early March 2026, causing significant quota and cost inflation ## Summary Analysis of raw Claude Code session JSONL files spanning Jan 11 – Apr 11, 2026 shows that Anthropic appears to have **silently changed the prompt cache TTL default from 1 hour to 5 minutes sometime in early March 2026**. Prior to this change, Claude Code was receiving 1-hour TTL cache writes — which we believe was the intended default. The reversion to 5-minute TTL has caused a **20–32% increase in cache creation costs** and a measurable spike in quota consumption for subscription users who have never previously hit their limits. This appears directly related to the behavior described in #45756. --- ## Data Session data extracted from `~/.claude/projects/` JSONL files across **two machines** (Linux workstation + Windows laptop, different accounts/sessions), totaling **119,866 API calls** from Jan 11 – Apr 11, 2026. Each assistant message includes a `usage.cache_creation.ephemeral_5m_input_tokens` / `ephemeral_1h_input_tokens` breakdown that makes the TTL tier per-call observable. Having two independent machines strengthens the signal — both show the same behavioral shift at the same dates. ### Phase breakdown | Phase | Dates | TTL behavior | Evidence | |-------|-------|--------------|----------| | 1 | Jan 11 – Jan 31 | **5m ONLY** | `ephemeral_1h` absent/zero — likely predates 1h tier availability in the API | | 2 | Feb 1 – Mar 5 | **1h ONLY** | `ephemeral_5m = 0`, `ephemeral_1h > 0` across **33+ consecutive days** on both machines — near-zero exceptions | | 3 | Mar 6–7 | **Transition** | First 5m tokens re-appear, small volumes, 1h still present | | 4 | Mar 8 – Apr 11 | **5m dominant** | 5m tokens surge to majority; 1h becomes minority or disappears entirely | We believe Phase 2 represents Anthropic's **intended default behavior** — 1h TTL was rolled out as the Claude Code standard around Feb 1 and held consistently for over a month across two independent machines on two different accounts. January's all-5m data most likely predates the 1h TTL tier being available in the API. The regression began **around March 6–8, 2026**. No client-side changes were made between phases. The same Claude Code version and usage patterns were in place throughout. The TTL tier is set server-side by Anthropic. ### Day-by-day TTL data showing the regression (combined, both machines) ``` Date | 5m-create | 1h-create | Behavior ------------|------------|------------|---------- 2026-02-01 | 0.00M | 1.70M | 1h ONLY ← 1h default begins 2026-02-09 | 0.00M | 7.95M | 1h ONLY 2026-02-15 | 0.00M | 13.61M | 1h ONLY ← heaviest day, 100% 1h 2026-02-28 | 0.00M | 16.15M | 1h ONLY ← 16M tokens, still 100% 1h 2026-03-01 | 0.00M | 0.12M | 1h ONLY 2026-03-04 | 0.00M | 8.12M | 1h ONLY 2026-03-05 | 0.00M | 6.55M | 1h ONLY ← last clean 1h-only day | | | 2026-03-06 | 0.29M | 0.22M | MIXED ← first 5m tokens reappear 2026-03-07 | 4.56M | 0.50M | MIXED ← 5m surging 2026-03-08 | 16.86M | 3.44M | MIXED ← 5m now dominant (83%) 2026-03-10 | 10.55M | 0.51M | MIXED 2026-03-15 | 19.47M | 1.84M | MIXED 2026-03-21 | 21.37M | 1.70M | MIXED ← 93% 5m 2026-03-22 | 13.48M | 2.85M | MIXED ``` The transition is visible to the day: **March 6 is when 5m tokens first reappear** after 33 days of clean 1h-only behavior. By March 8, 5m tokens outnumber 1h by 5:1. This is consistent with a server-side configuration change being rolled out gradually then completing around March 8. --- ## Cost impact Applying official Anthropic pricing (rates.json, updated 2026-04-09): Combined dataset (119,866 API calls, two machines): **claude-sonnet-4-6** (`cache_write_5m = $3.75/MTok`, `cache_write_1h = $6.00/MTok`, `cache_read = $0.30/MTok`): | Month | Calls | Actual cost | Cost with 1h TTL | Overpaid | % waste | |-------|-------|-------------|-----------------|----------|---------| | Jan 2026 | 2,639 | $78.99 | $37.54 | $41.45 | **52.5%** | | Feb 2026 | 27,220 | $1,120.43 | $1,108.11 | $12.32 | **1.1%** ← nearly 0 on 1h | | Mar 2026 | 68,264 | $2,776.11 | $2,057.01 | $719.09 | **25.9%** | | Apr 2026 | 21,743 | $1,193.01 | $1,016.78 | $176.23 | **14.8%** | | **Total** | **119,866** | **$5,561.17** | **$4,612.09** | **$949.08** | **17.1%** | **claude-opus-4-6** (`cache_write_5m = $6.25/MTok`, `cache_write_1h = $10.00/MTok`, `cache_read = $0.50/MTok`): | Month | Calls | Actual cost | Cost with 1h TTL | Overpaid | % waste | |-------|-------|-------------|-----------------|----------|---------| | Jan 2026 | 2,639 | $131.65 | $62.57 | $69.08 | **52.5%** | | Feb 2026 | 27,220 | $1,867.38 | $1,846.85 | $20.53 | **1.1%** ← nearly 0 on 1h | | Mar 2026 | 68,264 | $4,626.84 | $3,428.36 | $1,198.49 | **25.9%** | | Apr 2026 | 21,743 | $1,988.35 | $1,694.64 | $293.71 | **14.8%** | | **Total** | **119,866** | **$9,268.97** | **$7,687.17** | **$1,581.80** | **17.1%** | February — the month Anthropic was defaulting to 1h TTL — shows only **1.1% waste** (trace 5m activity from one machine on one day). Every other month shows 15–53% overpayment from 5m cache re-creations. The cost difference is explained entirely by TTL tier, not by usage volume. The **percentage waste is identical across model tiers** (17.1%) because it is driven purely by the 5m/1h token split, not by per-token price. ### Why 5m TTL is so expensive in practice With 5m TTL, any pause in a session longer than 5 minutes causes the entire cached context to expire. On the next turn, Claude Code must re-upload that context as a fresh `cache_creation` at the write rate, rather than a `cache_read` at the read rate. The write rate is **12.5× more expensive** than the read rate for Sonnet, and the same ratio holds for Opus. For long coding sessions — which are the primary Claude Code use case — this creates a compounding penalty: the longer and more complex your session, the more context you have cached, and the more expensive each cache expiry becomes. Over the 3-month period analyzed: - **220M tokens** were written to the 5m tier - Those same tokens generated **5.7B cache reads** — meaning they were actively being used - Had those 220M tokens been on the 1h tier, re-accesses within the same hour would be reads (~$0.30–0.50/MTok) instead of re-creations (~$3.75–6.25/MTok) --- ## Quota impact Users on Pro/subscription plans are quota-limited, not just cost-limited. Cache creation tokens count toward quota at full rate; cache reads are significantly cheaper (the exact coefficient is under investigation in #45756). The silent reversion to 5m TTL in March is the most likely explanation for why subscription users began hitting their 5-hour quota limits for the first time — including the author of this issue, who had never hit quota limits before March 2026. --- ## Hypothesis The data strongly suggests that **1h TTL was the intended default for Claude Code** and was in place as of early February 2026. Sometime between Feb 27 and Mar 8, 2026, Anthropic silently changed the default to 5m TTL — either intentionally as a cost-saving measure, or accidentally as an infrastructure regression. Evidence supporting "1h was the intended default": - Phase 2 (1h ONLY) shows *zero* 5m tokens across **14 separate active days** spanning 3+ weeks — this is not noise or partial rollout, it is consistent deliberate behavior - The February cost profile is the only month with 0% overpayment — it represents what users should have been paying all along - The March reversion immediately produced the largest 5m-tier days in the entire dataset (30M tokens on Mar 22 alone), suggesting a sudden configuration flip rather than gradual drift - Subscription users began hitting 5-hour quota limits **for the first time** in March — directly coinciding with the reversion The most likely sequence of events: 1. **~Feb 1 and prior**: Anthropic defaulted to 1h TTL for Claude Code subscription users 2. **~Mar 6**: 5m tokens begin reappearing — gradual rollout of the change or partial infrastructure flip 3. **~Mar 8**: 5m TTL becomes dominant — the regression is fully in effect across both tested machines and accounts 4. **Mar 8+**: Mixed behavior continues, suggesting either incomplete rollout, A/B testing, or regional infrastructure variance The 33-day window of clean 1h-only behavior (Feb 1 – Mar 5) across two independent machines and two separate accounts makes this one of the strongest available signals that **1h TTL was Anthropic's deliberate default**, not a fluke. --- ## Request 1. **Confirm or deny** whether Anthropic made a server-side TTL default change in early February 2026 and reverted it in early March 2026 2. **Clarify the intended TTL behavior** for claude-code sessions — is 5m the intended default, or was 1h intended to be permanent? 3. **Consider restoring 1h TTL as the default** for Claude Code sessions, or exposing it as a user-configurable option. The 5m TTL is disproportionately punishing for the long-session, high-context use case that defines Claude Code usage 4. **Disclose quota counting behavior for cache_read tokens** (ref #45756) so users can make informed decisions about their usage patterns --- ## Methodology - Source: raw `~/.claude/projects/**/*.jsonl` session files (Claude Code stores per-message API responses including full `usage` objects) - Extraction: filtered for `type: "assistant"` entries with `message.usage.cache_creation` field - No external tools or proxies involved — this data comes directly from Claude Code's own session logs - Analysis tool: [cnighswonger/claude-code-cache-fix](https://github.com/cnighswonger/claude-code-cache-fix) `quota-analysis --source` mode (added to support this investigation) - Pricing: official Anthropic rates from `rates.json` (updated 2026-04-09)
降智一般都是要发新的了
激动地搓手手
发布一下二月初版本的Opus 4.6称为4.7 赢了
趕緊來 把軟體再幹下去
boring: cache TLL 从 60 min 变成 5 min 是不是因为这个 也可能只是找不到算力来扩容了只好精简资源
遥遥领先,为什么要更新model?
新增了两个 effort level: xhigh & max vision 能力好像也提高了
这就是4.6这几天仿佛傻子的原因吗,充满了幻觉,臆想,也不听指令,说了不要用memory要去找文献,完全不听…
用的谭友们有啥感觉? https://www.anthropic.com/news/claude-opus-4-7 https://www.anthropic.com/news/claude-opus-4-7 Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.
没有太大感觉,有没有 token 多的谭友试试 Max effort 什么水平要不要常开
了
opus 4.6 最近降智到开始骗人了,我实在受不了今天刚换去 codex,祝 claude 安好吧
我拿来做manual test挺方便的。不过从4.6起manual test功能就很惊艳了
claude codework的thinking被部分砍了让人有点难受,其他的我觉得没很大的体感,用的either xhigh or extended thinking。谭友拿来做什么降智了?
版本通胀,赢麻了
我一般thinking efforts 调到 max 但是也还是降智了。 会开始骗人,reduce transparency 比如我一个数据集里面有 25% 是 category A25% 是 B 50 是 C,我让他去掉 A 类把剩下的 75% 给我,他告诉我准备好了,我看了一下不对,然后他说“对的我就是 random sample 到了 75%” 还有一个是我让他按照之前的 config 改两个数据集,那个 config 叫泥潭 4.1,他改好了然后跑了一天我看了一下根本不对,问他咋回事他说他没有找到这个 config,就把找了一个类似的泥潭 4.0 的 config 把名字改成4.1 告诉我这就是4.1 的 config 我觉得这个挺要命的,直接骗人了。这就没法用了,你可以告诉我你做不到。 Plus 5.4 调到 extra high 之后比之前的 codex model 好很多。我觉得超过 opus at its peak
我记得他们家在mythos的system card 4.1.2 alignment里说过model会有这种偷懒或者改条件的behavior,和降智是不是直接相关就不知道了
不如alexnet一根 zsbd
release 了 4.7 然后居然 reset 了 limit
这周开始的,以前如果这么弱智不可能用这么久。都是很容易发现的问题。 应该是调低了 reasoning efforts 或者给了 quantize 的模型,或者就是 self distill 了一个更cost efficient的变种
quota不够了,下个月再试…
每读一个文件就浪费token说这不是malware,麻了
有没有可能说更详细点 + 用 full filepath 就好了?我会主动做 over communication 之后就没这个问题
我有老板的开的20x,effort一直拉满max,过去一周左右4.6简直就是和傻逼同事工作的感觉,花钱找罪受.有个东西正学写法只要几行代码,她自己搞一个data structure,然后写一堆method,前前后后几百行。告诉他这么不对,他删一部分,跟他说这个整个data structure都不对,全删了,他删了,正确的告诉他怎么写,他写了。下一个任务,他又自己默默给把这个data strcture给你写回来,简直脑溢血。
用了一天,跟4.6刚发布的时候满血版没什么区别,远不及4.5到4.6的时候惊艳
Should actively seeking information 是 agent 的基本要求之一,不能啥都是自己以为然后瞎做。你是试一下 GPT 5.4 真的很好用。我以前对 oai 有偏见+以前的 codex 确实也比较蠢。 我觉得 efforts 都开 medium 的时候以前的 claude> OAI.这次5.4 + extra high 有惊艳到我。基本可以长时间自动执行了 。比如帮我改 prompt 啥的,你设定个 eval 让他自己去根据 feedback 改。跑个一个小时确实最后的 prompt 比我自己写的那种要好很多。
感觉近期会有针对这几个LAB的集体诉讼吧。钱花了,然后来的不是如烟而是宜修,算诈骗吗?
这个长时间大概是多长?我用claude对model的信任程度没有高到能让它run几小时不检查输出的程度
好难用……原本一轮能跑完的任务,工具调用limit四五次。还有非常离谱的幻觉问题。 作为最可信可靠的agent的人设彻底破灭了——但说实话,其他两家也不行。
我最近两天都在用GPT-5.4 套用的之前用opus时候调试出来的skill和personality prompt,发现没有明显体感差异,甚至比降智的opus更强 看来harness调整顺手以后差距模型没有那么大
好玩,跑了1小时了 /uploads/short-url/1Pf79OgWiD7YzgPFH5NCR1bhJTM.png?dl=1 /uploads/short-url/wuspJIqmSDZyMzyu3IMNsfr0XDm.png?dl=1
所以奥特曼伟大,很早就发现了选手水平都差不多,谁矿多谁牛逼的事实。3矿打2矿,怎么输?
太nb了,几个小时前不是刚刚reset了所有人的quota吗
是reset了,他不能这么快就用完了吧
enterprise的没有reset..
太棒了 满血4.6又回来了
其实模型能力从去年到今年就没变化,我把去年gpt搞不定的html5问题又丢给4.7了还是做不出来
enterprise 怎么能有 limit 呢,这会 impact revenue
没法用了,动不动就偷懒,然后告诉我偷懒了。 /uploads/short-url/nFtTGgnNacBzD7nS67Zmgp4bAKR.jpeg?dl=1
https://zhuanlan.zhihu.com/p/2025966510070404278 AI鞭子,人类还是太超前了
而且迁移零成本,我让他去看.claude 然后他直接全部迁移过来了 hhh 感觉这些大模型 frontier 上班是得卷,完全没有护城河了。
我觉得你要是完全不信任的话就还是选择 IDE based 的。总感觉 cli based 其实已经有(你不需要检查他的工作)这个潜台词了。
我也想这么说… 那点token我三天就烧完了,然后剩下的27天回归古法编程
我一直IDE-based 代码现在看得很少了 主要是有gui检查toolcall和reasoning过程比较方便
求求了,能不能先别三天两头的崩
不完全信任咋办
试用了30秒发现还是拉 我们的代码里有个东西叫做XX,一个东西叫做YY,还有一个东西叫做XX on YY,然后我让AI扫bug,跟他说别扫和XX on YY有关的, 然后opus做了一遍ls之后说我应该跳过XX和YY目录……
那我对工作也没那么上心。不信任也还是得用。
晚上用了下,个人感觉4.7变聪明了不少
4.6最近的确很笨, 今天让他写个东西, 找以前的类似的project改, 各种自由发挥, 来来回回改了好几回, 我最后把几次的prompt看了一下, 几乎就是重新给他布置任务
测试了一天,security check误报严重,昨天用4.6完全没问题的任务,今天用4.7就一直提示violate security term拒绝服务,本身是误报不说了,把提示稍微改一下就能绕过,关键是这种频繁打断工作,导致效率反而大幅下降完全没法用,已经暂时切回4.6了,有点失望说实话。
vibe code出来的还指望个啥
和降智了的4.6比?
重新布置算好的 有时候得手把手教 比带ncg还累
github copilot上4.7的multiplier居然是7.5x,而且还是四月的promotion,limit一下子就用没了
今天才注意到这两个改动,有点恶心 Sampling parameters removed Starting with Claude Opus 4.7, setting temperature , top_p , or top_k to any non-default value will return a 400 error. The safest migration path is to omit these parameters entirely from requests, and to use prompting to guide the model’s behavior. If you were using temperature = 0 for determinism, note that it never guaranteed identical outputs. Thinking content omitted by default Starting with Claude Opus 4.7, thinking content is omitted from the response by default. Thinking blocks still appear in the response stream, but their thinking field will be empty unless the caller explicitly opts in. This is a silent change — no error is raised — and response latency will be slightly improved. If reasoning outputs are needed, you can set display to "summarized" and opt back in with a one-line change https://platform.claude.com/docs/en/about-claude/models/whats-new-claude-4-7#:~:text=Sampling%20parameters%20removed,one%2Dline%20change%3A https://platform.claude.com/docs/en/about-claude/models/whats-new-claude-4-7#:~:text=Sampling%20parameters%20removed,one%2Dline%20change%3A Overview of new features, breaking changes, and behavior changes in Claude Opus 4.7.
/uploads/short-url/no6O1oea43RnjfQXxvR1vmQdhxA.jpeg?dl=1
Eddie: If you were using temperature = 0 for determinism, note that it never guaranteed identical outputs. 防降智benchmark
pix0: model会有这种偷懒或者改条件的behavior 这我的codex(即使是extra high)经常干,还经常假装推进进度写一堆垃圾代码 准确的说是前两周开始
/uploads/short-url/A2AHuv0WotZko784MNwitu7B8hW.png?dl=1
copilot pro+都不能用4.6了,4.7那个吊样还涨价150%,决定回去用gpt 5.4
Is it 7.5 times better?
我感觉是省了思考,我真的去问了一下4.7和4.7 extend都是2,4.6是1
4.7和巅峰4.6真的有区别么?感觉就是加价(旧模型降智)再打折(同等能力新模型)的套路
我个人没感受出来很大的差别