最近用Gemini辅助刷题，发现如果指出bug，AI有可能突然就脑瘫了

📈 搬砖作者 yuxuppx 2026-02-22 15:33 PST 原帖 #485786 ↗

内容摘要

指出Gemini代码bug可能导致模型逻辑崩溃。

1. 关键信息

指Bug后Gemini给出绕的Fix逻辑，甚至嘴硬否认（#5）。
使用3.0快速模式需切换至思考模式才正确（#4）。
3.1针对Tool Calling优化但质量不稳（#14）。
防御性模式导致放弃原有推理路径（#18）。

2. 羊毛/优惠信息

无。

3. 最新动态

Gemini 3.1 Pro Preview发布customtools端点优化Agent工作流（#14）。

4. 争议或不同意见

模型能力分化：部分场景逻辑崩坏，部分场景可用（#2、#4、#7）。
不同模型间“比烂”，各有所长（#7、#20）。

5. 行动建议

尝试切换至思考模式或换模型。
避免盲目信任AI生成的修复代码，需人工验证。

原始内容

--- 第 1 楼来自 yuxuppx 的回复 (2026-02-22 15:33:36 PST) ---

有点像神之一手后，AI突然不会下棋一样

本来是个二分+计算句子分行的题目，Gemini立马给出如下答案并结合数学分析底层逻辑头头是道。
public class TableMinimizer {

/**
* 计算文本在给定宽度下的总行数
* 策略：单词不打断，如果单词长于宽度，则该单词独占一行或多行
*/
public static int calculateHeight(String text, int width) {
if (width <= 0) return Integer.MAX_VALUE;
String[] words = text.split("\\s+");
int totalLines = 0;
int currentLineWidth = 0;

for (String word : words) {
int len = word.length();

// 如果单个单词就超过了总宽度，强制拆分该单词占用的行数
if (len > width) {
if (currentLineWidth > 0) totalLines++; // 换行
totalLines += (len / width);
currentLineWidth = len % width;
if (currentLineWidth == 0) totalLines--; // 刚好整除的情况
}
// 如果当前行放不下这个单词，则换行
else if (currentLineWidth + (currentLineWidth == 0 ? 0 : 1) + len > width) {
totalLines++;
currentLineWidth = len;
}
// 放在当前行
else {
currentLineWidth += (currentLineWidth == 0 ? 0 : 1) + len;
}
}

return currentLineWidth > 0 ? totalLines + 1 : totalLines;
}

public static int solve(String col1, String col2, int totalWidth) {
int low = 1;
int high = totalWidth - 1;
int minHeight = Integer.MAX_VALUE;

while (low <= high) {
int w1 = low + (high - low) / 2;
int w2 = totalWidth - w1;

int h1 = calculateHeight(col1, w1);
int h2 = calculateHeight(col2, w2);

int currentMax = Math.max(h1, h2);
minHeight = Math.min(minHeight, currentMax);

// 二分趋势判断
if (h1 > h2) {
// 左边太高，需要给左边更多宽度，分割线右移
low = w1 + 1;
} else if (h2 > h1) {
// 右边太高，需要给右边更多宽度，分割线左移
high = w1 - 1;
} else {
// h1 == h2，达到理论平衡点，直接返回
return currentMax;
}
}
return minHeight;
}
}

然后我发现if (currentLineWidth == 0) totalLines–; 这一行逻辑不对。反馈后AI开始犯病并给了两个很绕的fix逻辑

截屏2026-02-22 15.24.571292×1488 149 KB

我问能不能就删这行就好了，AI开始狡辩。

截屏2026-02-22 15.27.571196×1716 297 KB

我让AI举个栗子

截屏2026-02-22 15.28.481102×1260 120 KB

截屏2026-02-22 15.29.151000×1418 126 KB

--- 第 2 楼来自收束观测者的回复 (2026-02-22 15:34:48 PST) ---

Gemini就是有点问题的

3.0经常死循环

3.1还没用过很多不知道修好没

--- 第 3 楼来自 eyeshield21 的回复 (2026-02-22 15:35:53 PST) ---

首先我同意，今天碰到个因为print statement oom导致test case过不了，但因为gemini不知道原因（prompt: “test case failed”)，开始xjb分析改算法

其次，感觉御三家模型用中文prompt都是上加

--- 第 4 楼来自 yuxuppx 的回复 (2026-02-22 15:41:14 PST) ---

我用的就是3.0

不过是快速模式，用思考模式重新回答就给出正确答案了，Pro模式也可以就是慢

--- 第 5 楼来自 yuxuppx 的回复 (2026-02-22 15:44:03 PST) ---

试试跟AI说这一行出问题了，看会不会嘴硬

--- 第 6 楼来自 258 的回复 (2026-02-22 15:47:43 PST) ---

gemini不是图片生成器么

--- 第 7 楼来自收束观测者的回复 (2026-02-23 00:18:15 PST) ---

codex 5.3的attention defects严重到一个prompt安排两件事就能漏一件

结果三大模型最后变成比烂，真的偷懒只选一个用还是只能选opus……

--- 第 8 楼来自 zpf0117b 的回复 (2026-02-23 00:43:48 PST) ---

今天发现一个更神奇的事情

让给我画张图，因为没有说“作图：xxx”，而说的是“请给我做一张图”，gemini有时候就会不调用nano banana而直接给我口述

神奇的事情来了，我改成了“请用nano banana作图：xxx”之后，生成的图片中banana元素都贼多。。。（咱就是说一个流程图为啥要带一堆香蕉图标

IMG_51531223×768 307 KB

--- 第 9 楼来自收束观测者的回复 (2026-02-23 07:40:04 PST) ---

调nano banana是tool calling

Gemini的tool calling明显没训练好

--- 第 10 楼来自 EVA1 的回复 (2026-02-23 07:42:04 PST) ---

用AI studio吧，比Gemini聪明一点

--- 第 11 楼来自 rrrrz 的回复 (2026-02-23 07:45:15 PST) ---

英文不也是 xjb 乱说吗

--- 第 12 楼来自 bill 的回复 (2026-02-23 08:15:54 PST) ---

模型问题，换一个吧

--- 第 13 楼来自非交换几何的回复 (2026-02-23 10:31:38 PST) ---

first time？
【引用自 Nik0major】:
Gemini 3.0发布了
难绷，gemini感觉指令理解确实有问题，我之前让它画流程图死活不画，我直接让它用Nano Banana画它真给我画了一堆猴子和香蕉
IMG_34041290×953 234 KB

--- 第 14 楼来自郁小南的回复 (2026-02-23 23:11:28 PST) ---

可能因为这个问题太严重了 Gemini3.1出了个针对tool calling优化的版本

gemini-3.1-pro-preview-customtools

For those building with a mix of bash and custom tools, Gemini 3.1 Pro Preview comes with a separate endpoint available via the API called gemini-3.1-pro-preview-customtools. This endpoint is better at prioritizing your custom tools (for example view_file or search_code).

但是

Note that while gemini-3.1-pro-preview-customtools is optimized for agentic workflows that use custom tools and bash, you may see quality fluctuations in some use cases which don’t benefit from such tools.

拆东墙补西墙了可以说是

--- 第 15 楼来自 devilevga 的回复 (2026-02-23 23:19:05 PST) ---

现在还刷题干啥，anthropic不是说码农只剩12个月了吗

--- 第 16 楼来自 l1nv3ga 的回复 (2026-02-23 23:40:12 PST) ---

【引用自收束观测者】:
感觉得自己（让agent）手搓agent前端了，主agent用opus，subagent按需选
是否在寻找：GitHub - code-yeongyu/oh-my-opencode: the best agent harness ？Opus指挥+计划，Codex干活。

--- 第 17 楼来自收束观测者的回复 (2026-02-24 01:17:48 PST) ---

CLI还是太难用了我宁愿vscode里自己（用agent）搓

--- 第 18 楼来自 harvey8 的回复 (2026-02-24 01:22:33 PST) ---

很多模型在被 challenge 时会触发 defensive pattern，为了保持“自洽”反而放弃原有推理路径。

本质还是 token continuation，不是真 debug。

--- 第 19 楼来自 v_v 的回复 (2026-02-24 01:24:04 PST) ---

像在搜题库

--- 第 20 楼来自 geminixiao 的回复 (2026-02-24 01:25:03 PST) ---

我最近用六个AI来回答几个信用卡的回报排序问题，结果chatgpt第一个出局，其次的grok、DeepSeek也出局，接下来gemini也出局，最后doubao、kimi都出局，全军覆没

其中chatgpt连hsbc hk pulse是什么卡种都没搞清楚

gemini连hsbc us elite的积分规则都没搞清楚

--- 第 21 楼来自 anasfc 的回复 (2026-02-24 01:51:42 PST) ---

看这个顺序感觉和问的语言也有关

📈 搬砖 · 其他高楼

【摸鱼第十四季完结】你也要变成和我一样的大人了呢 💬 9991
好的，我将根据您提供的文本内容，尝试总结并回答您的问题。
【摸鱼第十六季】偶然走入了最甜蜜事件里！ 💬 9920
帖子标题
【摸鱼第十五季】二月啦，都不摸鱼只开嗑了吗？ 💬 9896
摸鱼楼持续热议“求富”话题，聚焦报税、里程、远程工作，并围绕“删帖”行为展开集体调侃。
【摸鱼第十七季】记得绿罗裙，处处怜芳草 💬 9223
摸鱼楼讨论投资、求职、AI工具、信用卡优惠，氛围轻松。
Meta又来？ 💬 661
Meta内部大规模裁员与AI成本压力引发连锁反应。
我弟要申请大学了，求推荐好的CS Undergrad 💬 635
关于CS本科申请的讨论与选择
【水】做题家每天做题碎碎念 💬 500
LeetCode刷题与技术交流的持续记录，用户分享算法心得与生活点滴
年底将至，赠送潭友 LinkedIn Premium 💬 477
年底赠送 LinkedIn Premium 助求职者与毕业生

← 返回 📈 搬砖