Python编程题——句式中的指定字符替换

                         Python编程题——句式中的指定字符替换

目标:给定句式(含待替换字符)、替换对象与待替换对象的 span,求随机替换N次后生成的数据。并保存为json格式文件。

Python脚本

#!/usr/bin/env python
# -*- coding:utf-8 -*-

import json
import copy
import random
from loguru import logger


def get_data(syntax, data_map, slot_name_span, json_file_path, threshold=10):
    """
    功能: syntax生成指定格式的数据
    :param syntax:
    :param data_map:
    :param slot_name_span:
    :param json_file_path:
    :param threshold:
    :return:
    """
    with open(json_file_path, 'w', encoding='utf-8') as outfile:
        for i in range(threshold):
            text = copy.copy(syntax)
            flag1, label_list = 0, []
            for idx, item in enumerate(slot_name_span):
                slot_name, s_index, e_index = item
                entity = random.choice(data_map[slot_name])
                text = text[:s_index + flag1] + entity + text[e_index + flag1:]
                flag2 = copy.copy(flag1)
                flag1 += (len(entity) - (e_index - s_index))
                if idx == 0:
                    e_index = s_index + len(entity)
                else:
                    s_index += flag2
                    e_index = s_index + len(entity)
                label_list.append([s_index, e_index, slot_name, entity])
            res = {"text": text, "label_list": label_list}
            logger.info("第{0}条数据为:{1}.".format(i + 1, res))

            json.dump(res, outfile, ensure_ascii=False)
            outfile.write("\n")


if __name__ == "__main__":
    get_data(syntax="time我想喝tea",
             data_map={"time": ["今天", "明天", "后天"],
                       "tea": ["茶百道", "喜茶", "奈雪的茶", "一点点", "瑞幸咖啡"]},
             slot_name_span=[("time", 0, 4), ("tea", 7, 10)],
             json_file_path="./data_demo.json")

运行结果:

(1)代码运行

(2)json文件内容

{"text": "今天我想喝瑞幸咖啡", "label_list": [[0, 2, "time", "今天"], [5, 9, "tea", "瑞幸咖啡"]]}
{"text": "今天我想喝茶百道", "label_list": [[0, 2, "time", "今天"], [5, 8, "tea", "茶百道"]]}
{"text": "后天我想喝喜茶", "label_list": [[0, 2, "time", "后天"], [5, 7, "tea", "喜茶"]]}
{"text": "明天我想喝茶百道", "label_list": [[0, 2, "time", "明天"], [5, 8, "tea", "茶百道"]]}
{"text": "后天我想喝喜茶", "label_list": [[0, 2, "time", "后天"], [5, 7, "tea", "喜茶"]]}
{"text": "今天我想喝茶百道", "label_list": [[0, 2, "time", "今天"], [5, 8, "tea", "茶百道"]]}
{"text": "明天我想喝喜茶", "label_list": [[0, 2, "time", "明天"], [5, 7, "tea", "喜茶"]]}
{"text": "今天我想喝奈雪的茶", "label_list": [[0, 2, "time", "今天"], [5, 9, "tea", "奈雪的茶"]]}
{"text": "今天我想喝瑞幸咖啡", "label_list": [[0, 2, "time", "今天"], [5, 9, "tea", "瑞幸咖啡"]]}
{"text": "今天我想喝茶百道", "label_list": [[0, 2, "time", "今天"], [5, 8, "tea", "茶百道"]]}
  • 适用范围:数据增强、数据处理与数据清洗。
  • 细节说明:麻烦在于替换后的entity索引位置变换。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值