DataX之MySQL-overwrite-HIVE

1、json脚本

{
  "job": {
    "setting": {
      "speed": {
        "channel": 3
      },
      "cipher": {
        "sfjm": "Y",
        "instance": "AES128"
      },
      "errorLimit": {
        "record": "-1"
      }
    },
    "content": [{
      "reader": {
        "name": "mysqlreader",
        "parameter": {
          "password": "123456",
          "username": "root",
          "connection": [{
            "jdbcUrl": ["jdbc:mysql://192.168.30.20:3306/test?useUnicode=true&characterEncoding=utf8&useSSL=false&serverTimezone=Asia/Shanghai"],
            "table": ["test.jkmzmxx_plqd"]
          }],
          "column": ["`xm`", "`zjhm`", "`lxdh`", "`city`", "`region`", "`street`", "`community`", "`dz`", "`sjly`", "`lysj`", "`zmyy`", "`zmqzt`", "`nzzt`", "`sffk`", "`jcfkrq`"]
        }
      },
      "writer": {
        "name": "hdfswriter",
        "parameter": {
          "password": "123456",
          "username": "hadoop",
          "jdbcUrl": "jdbc:hive2://bigcity1:2181,bigcity2:2181,bigcity3:2181/stg;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2",
          "analyzeSql": "ANALYZE TABLE stg.stg_szsj_jkmzmxx_plqd partition(day_id='20211204') COMPUTE STATISTICS",
          "defaultFS": "hdfs://bigdata:8020",
          "hadoopConfig": {
            "dfs.client.use.datanode.hostname": "true"
          },
          "fileType": "orc",
          "fieldDelimiter": ",",
          "fileName": "stg.stg_szsj_jkmzmxx_plqd",
          "path": "/warehouse/tablespace/managed/hive/stg.db/stg_szsj_jkmzmxx_plqd/day_id=20211204",
          "hdfsUser": "hdfs",
          "writeMode": "overwrite",
          "column": [{
            "name": "xm",
            "type": "VARCHAR"
          }, {
            "name": "zjhm",
            "type": "VARCHAR"
          }, {
            "name": "lxdh",
            "type": "VARCHAR"
          }, {
            "name": "city",
            "type": "VARCHAR"
          }, {
            "name": "region",
            "type": "VARCHAR"
          }, {
            "name": "street",
            "type": "VARCHAR"
          }, {
            "name": "community",
            "type": "VARCHAR"
          }, {
            "name": "dz",
            "type": "VARCHAR"
          }, {
            "name": "sjly",
            "type": "VARCHAR"
          }, {
            "name": "lysj",
            "type": "TIMESTAMP"
          }, {
            "name": "zmyy",
            "type": "VARCHAR"
          }, {
            "name": "zmqzt",
            "type": "VARCHAR"
          }, {
            "name": "nzzt",
            "type": "VARCHAR"
          }, {
            "name": "sffk",
            "type": "VARCHAR"
          }, {
            "name": "jcfkrq",
            "type": "TIMESTAMP"
          }],
          "originColumn": [{
            "name": "xm",
            "type": "VARCHAR"
          }, {
            "name": "zjhm",
            "type": "VARCHAR"
          }, {
            "name": "lxdh",
            "type": "VARCHAR"
          }, {
            "name": "city",
            "type": "VARCHAR"
          }, {
            "name": "region",
            "type": "VARCHAR"
          }, {
            "name": "street",
            "type": "VARCHAR"
          }, {
            "name": "community",
            "type": "VARCHAR"
          }, {
            "name": "dz",
            "type": "VARCHAR"
          }, {
            "name": "sjly",
            "type": "VARCHAR"
          }, {
            "name": "lysj",
            "type": "TIMESTAMP"
          }, {
            "name": "zmyy",
            "type": "VARCHAR"
          }, {
            "name": "zmqzt",
            "type": "VARCHAR"
          }, {
            "name": "nzzt",
            "type": "VARCHAR"
          }, {
            "name": "sffk",
            "type": "VARCHAR"
          }, {
            "name": "jcfkrq",
            "type": "TIMESTAMP"
          }]
        }
      }
    }]
  }
}
### 数据传输过程中中文乱码问题分析 在使用 DataXMySQL 中的数据同步至 HDFS 的场景下,如果遇到中文字符编码问题,通常是因为数据流在不同环节之间的编码设置不一致所致。以下是可能导致该问题的原因及其解决方案: #### 1. **MySQL 数据库的字符集配置** 确保 MySQL 数据库的默认字符集为 UTF-8 或其他兼容多字节字符的编码方式。可以通过以下 SQL 查询确认数据库、表和字段的字符集: ```sql SHOW VARIABLES LIKE 'character_set%'; SHOW FULL COLUMNS FROM t_user_test; ``` 如果发现字符集不是 `utf8` 或 `utf8mb4`,可以修改数据库或表的字符集: ```sql ALTER DATABASE your_database_name CHARACTER SET utf8 COLLATE utf8_general_ci; ALTER TABLE t_user_test CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci; ``` #### 2. **DataX 配置文件中的编码设置** DataX 支持通过 JSON 配置文件指定读取和写入过程中的字符编码。对于从 MySQL 到 HDFS 的任务,需确保以下参数被正确定义。 ##### (a) **Reader 插件的编码设置** 在 DataX 的 Reader 配置部分,添加 `column` 和 `connection` 参数的同时,还需显式声明编码格式为 UTF-8: ```json { "name": "mysqlreader", "parameter": { "username": "root", "password": "", "connection": [ { "jdbcUrl": ["jdbc:mysql://localhost:3306/your_db?useUnicode=true&characterEncoding=utf8"], "table": ["t_user_test"] } ] } } ``` 上述 JDBC URL 中的关键参数解释如下: - `useUnicode=true`: 启用 Unicode 编码模式。 - `characterEncoding=utf8`: 明确指定了客户端与服务器之间通信使用的字符集为 UTF-8[^4]。 ##### (b) **Writer 插件的编码设置** 同样,在 Writer 部分也需要明确指定编码格式。例如,当目标存储介质为 HDFS 文件时,可按以下方式进行配置: ```json { "name": "hdfswriter", "parameter": { "defaultFS": "hdfs://localhost:9000", "fileType": "text", "path": "/user/hive/warehouse/t_user_test_output", "fileName": "output_data", "writeMode": "overwrite", "fieldDelimiter": ",", "encoding": "UTF-8" } } ``` 此处特别需要注意的是 `"encoding"` 字段,它定义了最终写入 HDFS 文件的内容所采用的编码形式。将其设为 `"UTF-8"` 可有效避免因编码差异引发的乱码现象[^3]。 #### 3. **Java 系统属性的影响** 由于 DataX 是基于 Java 开发的工具,默认情况下会继承 JVM 的本地化设置。因此,运行 DataX 脚本前建议强制设定系统的字符编码环境变量: ```bash export JAVA_OPTS="-Dfile.encoding=UTF-8" python bin/datax.py job/mysql_to_hdfs.json ``` 此操作能够保证整个程序执行期间均以统一的 UTF-8 编码处理字符串数据[^1]。 --- ### 综合验证流程 完成以上调整之后,重新启动 DataX 导入作业并观察日志输出是否有异常提示。正常情况下,经过妥善配置后的系统应当能正确解析并保存包含中文在内的各种语言文字信息。 ---
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值