LLaMA-Factory如何解决数据集加载时json.decoder.JSONDecodeError: Extra data: line 2 column 1 (char 214)的错误


🎉进入大模型应用与实战专栏 | 🚀查看更多专栏内容


在这里插入图片描述

错误记录

在进行一个物理学相关的机器学习项目时,我下载了一个物理学领域的指令微调数据集。由于该数据集是以JSON格式提供的,理论上应该能够直接通过常见的数据处理工具进行预览。然而,在我尝试使用代码预览数据集时,发生了如下错误。

在这里插入图片描述

错误信息提示无法正确解析JSON文件,显然数据格式存在一些问题。为了进一步调查,我决定手动查看JSON文件的内容。这是当时我在尝试打开文件时所发现的情况。

Downloading Model from https://www.modelscope.cn to directory: /home/administrator/.cache/modelscope/hub/models/Qwen/Qwen3-8B 2025-05-20 09:11:20,638 - modelscope - WARNING - Using branch: master as version is unstable, use with caution Traceback (most recent call last): File "/home/administrator/tools/LLM/LLaMA-Factory/src/llamafactory/model/loader.py", line 82, in load_tokenizer tokenizer = AutoTokenizer.from_pretrained( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/administrator/anaconda3/envs/llm/lib/python3.11/site-packages/transformers/models/auto/tokenization_auto.py", line 946, in from_pretrained tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/administrator/anaconda3/envs/llm/lib/python3.11/site-packages/transformers/models/auto/tokenization_auto.py", line 800, in get_tokenizer_config result = json.load(reader) ^^^^^^^^^^^^^^^^^ File "/home/administrator/anaconda3/envs/llm/lib/python3.11/json/__init__.py", line 293, in load return loads(fp.read(), ^^^^^^^^^^^^^^^^ File "/home/administrator/anaconda3/envs/llm/lib/python3.11/json/__init__.py", line 346, in loads return _default_decoder.decode(s) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/administrator/anaconda3/envs/llm/lib/python3.11/json/decoder.py", line 340, in decode raise JSONDecodeError("Extra data", s, end) json.decoder.JSONDecodeError: Extra data: line 240 column 1 (char 9732) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/administrator/anaconda3/envs/llm/lib/python3.11/site-packages/gradio/queueing.py", line 715, in process_events response = await route_utils.call_process_api( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/administrator/anaconda3/envs/llm/lib/python3.11/site-packages/gradio/route_utils.py", line 322, in call_process_api output = await app.get_blocks().process_api( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/administrator/anaconda3/envs/llm/lib/python3.11/site-packages/gradio/blocks.py", line 2137, in process_api result = await self.call_function( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/administrator/anaconda3/envs/llm/lib/python3.11/site-packages/gradio/blocks.py", line 1675, in call_function prediction = await utils.async_iteration(iterator) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/administrator/anaconda3/envs/llm/lib/python3.11/site-packages/gradio/utils.py", line 735, in async_iteration return await anext(iterator) ^^^^^^^^^^^^^^^^^^^^^ File "/home/administrator/anaconda3/envs/llm/lib/python3.11/site-packages/gradio/utils.py", line 729, in __anext__ return await anyio.to_thread.run_sync( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/administrator/anaconda3/envs/llm/lib/python3.11/site-packages/anyio/to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/administrator/anaconda3/envs/llm/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2470, in run_sync_in_worker_thread return await future ^^^^^^^^^^^^ File "/home/administrator/anaconda3/envs/llm/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 967, in run result = context.run(func, *args) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/administrator/anaconda3/envs/llm/lib/python3.11/site-packages/gradio/utils.py", line 712, in run_sync_iterator_async return next(iterator) ^^^^^^^^^^^^^^ File "/home/administrator/anaconda3/envs/llm/lib/python3.11/site-packages/gradio/utils.py", line 873, in gen_wrapper response = next(iterator) ^^^^^^^^^^^^^^ File "/home/administrator/tools/LLM/LLaMA-Factory/src/llamafactory/webui/chatter.py", line 144, in load_model super().__init__(args) File "/home/administrator/tools/LLM/LLaMA-Factory/src/llamafactory/chat/chat_model.py", line 53, in __init__ self.engine: BaseEngine = HuggingfaceEngine(model_args, data_args, finetuning_args, generating_args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/administrator/tools/LLM/LLaMA-Factory/src/llamafactory/chat/hf_engine.py", line 54, in __init__ tokenizer_module = load_tokenizer(model_args) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/administrator/tools/LLM/LLaMA-Factory/src/llamafactory/model/loader.py", line 90, in load_tokenizer tokenizer = AutoTokenizer.from_pretrained( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/administrator/anaconda3/envs/llm/lib/python3.11/site-packages/transformers/models/auto/tokenization_auto.py", line 946, in from_pretrained tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/administrator/anaconda3/envs/llm/lib/python3.11/site-packages/transformers/models/auto/tokenization_auto.py", line 800, in get_tokenizer_config result = json.load(reader) ^^^^^^^^^^^^^^^^^ File "/home/administrator/anaconda3/envs/llm/lib/python3.11/json/__init__.py", line 293, in load return loads(fp.read(), ^^^^^^^^^^^^^^^^ File "/home/administrator/anaconda3/envs/llm/lib/python3.11/json/__init__.py", line 346, in loads return _default_decoder.decode(s) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/administrator/anaconda3/envs/llm/lib/python3.11/json/decoder.py", line 340, in decode raise JSONDecodeError("Extra data", s, end) json.decoder.JSONDecodeError: Extra data: line 240 column 1 (char 9732)
最新发布
05-21
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

羊城迷鹿

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值