也许大家都没碰到过这样的报错: “错误: 无法识别的字符: 4” --TypeError: unsupported operand type(s) for -: ‘str‘ and ‘float‘

skywalk8163

于 2025-04-08 10:38:45 发布

阅读量894

点赞数 27

分类专栏：人工智能编程语言文章标签：人工智能自动化 python

本文链接：https://blog.csdn.net/skywalk8163/article/details/147058144

版权

人工智能同时被 2 个专栏收录

259 篇文章

订阅专栏

编程语言

209 篇文章

订阅专栏

用AI辅助写了一个小项目，尝试写一个使用中文操作符和同像性编程的语言。写好需求后，AI帮我写了架构设计，并写了编码实现。

在使用unittest进行项目测试的时候，碰到如下报错：

.====x is 错误: 无法识别的字符: 4, x.type:<class 'str'>
E.
======================================================================
ERROR: test_math_functions (__main__.TestCNPLInterpreter)
测试数学函数
----------------------------------------------------------------------
Traceback (most recent call last):
  File "E:\work\cntrae\test_parser.py", line 26, in test_math_functions
  File "e:\py310\lib\unittest\case.py", line 876, in assertAlmostEqual
    diff = abs(first - second)
TypeError: unsupported operand type(s) for -: 'str' and 'float'

也就是使用代码assertAlmostEqual(self.interpreter.interpret("4 开方"), 2.0)

进行相等测试的时候，因为类型不匹配而报错。

问题回顾和解决

事情是这样的，在编程涉及到数学计算的处理，要求是能识别两元或三元操作符，三元操作符是主谓宾，两元操作符可以看做主宾，类似这样的：
"5 加 3"

"4 乘 2"

"4 开方"

"0 余弦"

使用Trae和RooCode辅助编程。其中Trae刚开始使用了DeepSeek-v3模型，因为上面的报错，单独写了测试语句如下：

from cnpl_interpreter import CNPLInterpreter

interpreter = CNPLInterpreter()
x = interpreter.interpret("4 开方")
print(f"====x is {x}, x.type:{type(x)}")
print(x)

碰到了这个报错：

PS E:\work\cntrae> python test_2d.py
====1词法分析器: input_str:4 开方
====2无法识别的字符: 4
====x is 错误: 无法识别的字符: 4, x.type:<class 'str'>
错误: 无法识别的字符: 4

有错误就让Trae改呗，结果改了几十遍，都没改对。

Trae的修改

问题出在词法分析器对数字和操作符的处理顺序上。建议修改lexer方法，确保优先识别数字，然后才是操作符。需要调整token_types中number和operator的匹配顺序，并添加对单操作数数学函数的特殊处理。

我将修改lexer方法，调整数字和操作符的匹配顺序，确保优先识别数字，以解决'4 开方'这样的语句导致的错误。

cnpl_interpreter.py

失败

我将重新尝试修改lexer方法，确保数字和操作符的匹配顺序正确，优先识别数字后才是操作符。

cnpl_interpreter.py

+ 0 - 10

审查

我将调整lexer方法中数字和操作符的匹配顺序，确保优先识别数字，以解决'4 开方'这样的语句导致的错误。

cnpl_interpreter.py

+ 10 - 0

审查

已修复词法分析器中对数字和操作符的处理顺序问题，现在可以正确处理'4 开方'这样的语句。修改后的lexer方法会优先识别数字，然后再检查操作符，避免了之前的错误。

问题定位确实正确，但是改了几十遍还是错，估计陷入了AI模型的死循环了。

柳暗花明

就我个人而言，刚开始手工加print断点，但是没有新信息打印出来，一筹莫展不知道该怎么调试。

后来从报错信息突然想到：是不是这个4就是个字符，但是无法识别呢？离真相很近了，但是我误以为这个4就是ASCII里的第四个字符，导致无法识别。

再后来，我突然明白了，x的返回值就是：“错误: 无法识别的字符: 4” 这么一长串字符串。

在我知道问题所在后，再次给AI指出来，但是AI还是陷在错误的谜团里。于是我做了一个违背自己习惯的决定，就是用豆包替换DeepSeek-v3 ，之所以敢让豆包上，是因为我知道豆包这个模型比较稳妥，同时它也并不像一些人认为的跟国际顶级大模型有较大的差距。

果然，换了豆包后，我把有问题的代码段，以及promtp：“这段代码有问题，碰到“4 开方” 这样的语句，会导致出错：错误: 无法识别的字符: 4” 发送给豆包，豆包帮我修改，您猜怎么着？一把过！

这是测试时的正常的输出是：

PS E:\work\cntrae> python test_2d.py
====1词法分析器: input_str:4 开方
====11 理单操作数函数predicate:开方
====12执行单操作数函数: 开方(4)
====13操作数转换成功: 4 -> 4.0
====开方结果: 2.0
====x is 2.0, x.type:<class 'float'>
2.0

可以看到，我写了5个print断点检测，但是如果早一天给豆包，可能就没手工写断点这些事情了。

这是unittest测试的输出结果：

PS E:\work\cntrae> python .\test_parser.py
.====1词法分析器: input_str:4 开方
====11 理单操作数函数predicate:开方
====12执行单操作数函数: 开方(4)
====13操作数转换成功: 4 -> 4.0
====开方结果: 2.0
====1词法分析器: input_str:0 正弦
====11 理单操作数函数predicate:正弦
====12执行单操作数函数: 正弦(0)
====13操作数转换成功: 0 -> 0.0
====正弦结果: 0.0
====1词法分析器: input_str:0 余弦
====11 理单操作数函数predicate:余弦
====12执行单操作数函数: 余弦(0)
====13操作数转换成功: 0 -> 0.0
====余弦结果: 1.0
====1词法分析器: input_str:0 正切
====11 理单操作数函数predicate:正切
====12执行单操作数函数: 正切(0)
====13操作数转换成功: 0 -> 0.0
====正切结果: 0.0
..
----------------------------------------------------------------------      
Ran 3 tests in 0.006s

OK

代码部分是这样的：

    def lexer(self, input_str):
        """词法分析器"""
        tokens = []
        pos = 0
        print(f"====1词法分析器: input_str:{input_str}")
        
        while pos < len(input_str):
            # 跳过空白字符
            match = re.match(r'\s+', input_str[pos:])
            if match:
                pos += len(match.group())
                continue
                
            if pos >= len(input_str):
                break
                
            # 检查数字（优先检查数字，因为数学函数可能包含数字）
            # 原逻辑：match = re.match(self.token_types["number"], input_str[pos:])
            # 修改为：match = re.match(r'^[0-9]+(\.[0-9]+)?', input_str[pos:])
            match = re.match(r'^[0-9]+(\.[0-9]+)?', input_str[pos:])
            if match:
                token = match.group()
                tokens.append({"type": "number", "value": token})
                pos += len(token)
                continue
                
            # 检查关键字、操作符和分隔符
            found = False
            for token_type in ["keywords", "operators", "delimiters"]:
                for token in self.token_types[token_type]:
                    if input_str[pos:].startswith(token):
                        tokens.append({"type": token_type[:-1], "value": token})
                        pos += len(token)
                        found = True
                        break
                if found:
                    break
            if found:
                continue
                
            # 检查数学函数（如'开方'等）
            for op in self.token_types["operators"]:
                if input_str[pos:].startswith(op):
                    tokens.append({"type": "operator", "value": op})
                    pos += len(op)
                    found = True
                    break
            if found:
                continue
                
            # 检查标识符
            match = re.match(self.token_types["identifier"], input_str[pos:])
            if match:
                token = match.group()
                tokens.append({"type": "identifier", "value": token})
                pos += len(token)
                continue
            print(f"====2无法识别的字符: {input_str[pos]}")    
            raise Exception(f"无法识别的字符: {input_str[pos]}")
            
        return tokens