diff --git a/.gitignore b/.gitignore index c7f2bfd64e..09167d98ec 100644 --- a/.gitignore +++ b/.gitignore @@ -11,4 +11,4 @@ flinkconf/ hadoopconf/ /default_task_id_output /syncplugins -flinkx-test/ \ No newline at end of file +/ci/ diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml deleted file mode 100644 index f29c0c55bc..0000000000 --- a/.gitlab-ci.yml +++ /dev/null @@ -1,11 +0,0 @@ -build: - stage: test - script: - - sh ci/install_jars.sh - - mvn clean org.jacoco:jacoco-maven-plugin:0.7.8:prepare-agent package -Dmaven.test.failure.ignore=true -q - - mvn sonar:sonar -Dsonar.projectKey="dt-insight-engine/flinkx" -Dsonar.host.url=http://172.16.100.198:9000 -Dsonar.jdbc.url=jdbc:postgresql://172.16.100.198:5432/sonar -Dsonar.java.binaries=target/sonar -Dsonar.login=11974c5e9a29625efa09fdc3c3fdc031efb1aab1 - - sh ci/sonar_notify.sh - only: - - 1.8_dev - tags: - - dt-insight-engine \ No newline at end of file diff --git a/README.md b/README.md index cd8aa4cea1..97be55aa15 100644 --- a/README.md +++ b/README.md @@ -3,39 +3,57 @@ FlinkX [![License](https://img.shields.io/badge/license-Apache%202-4EB1BA.svg)](https://www.apache.org/licenses/LICENSE-2.0.html) -English | [中文](README_CH.md) - -# Communication +[English](README_EN.md) | 中文 + +# 技术交流 + +- 招聘**Flink研发工程师**,如果有兴趣可以联系思枢(微信号:ysqwhiletrue)
+Flink开发工程师JD要求:
+1.负责袋鼠云基于Flink的衍生框架数据同步flinkx和实时计算flinkstreamsql框架的开发;
+2.调研和把握当前最新大数据实时计算技术,将其中的合适技术引入到平台中,改善产品,提升竞争力;
+职位要求:
+1、本科及以上学历,3年及以上的Flink开发经验,精通Java,熟悉Scala、Python优先考虑;
+2、熟悉Flink原理,有基于Flink做过二次源码的开发,在github上贡献者Flink源码者优先;
+3、有机器学习、数据挖掘相关经验者优先;
+4、对新技术有快速学习和上手能力,对代码有一定的洁癖;
+加分项:
+1.在GitHub或其他平台上有过开源项目
+可以添加本人微信号ysqwhiletrue,注明招聘,如有意者发送简历至[sishu@dtstack.com](mailto:sishu@dtstack.com) + +- 我们使用[钉钉](https://www.dingtalk.com/)沟通交流,可以搜索群号[**30537511**]或者扫描下面的二维码进入钉钉群 + +
+ +
-- We are recruiting **Big data platform development engineers**.If you want more information about the position, please add WeChat ID [**ysqwhiletrue**] or email your resume to [sishu@dtstack.com](mailto:sishu@dtstack.com). +# 介绍 +* **FlinkX是在是袋鼠云内部广泛使用的基于flink的分布式离线和实时的数据同步框架,实现了多种异构数据源之间高效的数据迁移。** -- We use [DingTalk](https://www.dingtalk.com/) to communicate,You can search the group number [**30537511**] or scan the QR code below to join the communication group - -
- -
+不同的数据源头被抽象成不同的Reader插件,不同的数据目标被抽象成不同的Writer插件。理论上,FlinkX框架可以支持任意数据源类型的数据同步工作。作为一套生态系统,每接入一套新数据源该新加入的数据源即可实现和现有的数据源互通。 -# Introduction +
+ +
-FlinkX is a data synchronization tool based on Flink. FlinkX can collect static data, such as MySQL, HDFS, etc, as well as real-time changing data, such as MySQL binlog, Kafka, etc. FlinkX currently includes the following features: +FlinkX是一个基于Flink的批流统一的数据同步工具,既可以采集静态的数据,比如MySQL,HDFS等,也可以采集实时变化的数据,比如MySQL binlog,Kafka等。FlinkX目前包含下面这些特性: -- Most plugins support concurrent reading and writing of data, which can greatly improve the speed of reading and writing; +- 大部分插件支持并发读写数据,可以大幅度提高读写速度; -- Some plug-ins support the function of failure recovery, which can restore tasks from the failed location and save running time; [Failure Recovery](docs/restore.md) +- 部分插件支持失败恢复的功能,可以从失败的位置恢复任务,节约运行时间;[失败恢复](docs/restore.md) -- The Reader plugin for relational databases supports interval polling. It can continuously collect changing data; [Interval Polling](docs/offline/reader/mysqlreader.md) +- 关系数据库的Reader插件支持间隔轮询功能,可以持续不断的采集变化的数据;[间隔轮询](docs/offline/reader/mysqlreader.md) -- Some databases support opening Kerberos security authentication; [Kerberos](docs/kerberos.md) +- 部分数据库支持开启Kerberos安全认证;[Kerberos](docs/kerberos.md) -- Limit the reading speed of Reader plugins and reduce the impact on business databases; +- 可以限制reader的读取速度,降低对业务数据库的影响; -- Save the dirty data when writing data; +- 可以记录writer插件写数据时产生的脏数据; -- Limit the maximum number of dirty data; +- 可以限制脏数据的最大数量; -- Multiple running modes: Local,Standalone,Yarn Session,Yarn Per; +- 支持多种运行模式; -The following databases are currently supported: +FlinkX目前支持下面这些数据库: | | Database Type | Reader | Writer | |:----------------------:|:--------------:|:-------------------------------:|:-------------------------------:| @@ -62,39 +80,48 @@ The following databases are currently supported: | | FTP | [doc](docs/offline/reader/ftpreader.md) | [doc](docs/offline/writer/ftpwriter.md) | | | HDFS | [doc](docs/offline/reader/hdfsreader.md) | [doc](docs/offline/writer/hdfswriter.md) | | | Carbondata | [doc](docs/offline/reader/carbondatareader.md) | [doc](docs/offline/writer/carbondatawriter.md) | -| | Stream | [doc](docs/offline/reader/streamreader.md) | [doc](docs/offline/writer/carbondatawriter.md) | +| | Stream | [doc](docs/offline/reader/streamreader.md) | [doc](docs/offline/writer/streamwriter.md) | | | Redis | | [doc](docs/offline/writer/rediswriter.md) | | | Hive | | [doc](docs/offline/writer/hivewriter.md) | +| | Alluxio | | [doc](docs/offline/writer/alluxiowriter.md) | | Stream Synchronization | Kafka | [doc](docs/realTime/reader/kafkareader.md) | [doc](docs/realTime/writer/kafkawriter.md) | | | EMQX | [doc](docs/realTime/reader/emqxreader.md) | [doc](docs/realTime/writer/emqxwriter.md) | -| | RestApi || [doc](docs/realTime/writer/restapiwriter.md) | +| | RestApi |[doc](docs/realTime/reader/restapireader.md) | [doc](docs/realTime/writer/restapiwriter.md) | | | MySQL Binlog | [doc](docs/realTime/reader/binlogreader.md) | | | | MongoDB Oplog | [doc](docs/realTime/reader/mongodboplogreader.md)| | | | PostgreSQL WAL | [doc](docs/realTime/reader/pgwalreader.md) | | +| | Oracle LogMiner | [doc](docs/realTime/reader/LogMiner.md) | | +| | Sqlserver CDC | [doc](docs/realTime/reader/sqlservercdc.md) | | + +# 基本原理 +在底层实现上,FlinkX依赖Flink,数据同步任务会被翻译成StreamGraph在Flink上执行,基本原理如下图: +
+ +
-# Quick Start +# 快速开始 -Please click [Quick Start](docs/quickstart.md) +请点击[快速开始](docs/quickstart.md) -# General Configuration +# 通用配置 -Please click [General Configuration](docs/generalconfig.md) +请点击[插件通用配置](docs/generalconfig.md) -# Statistics Metric +# 统计指标 -Please click [Statistics Metric](docs/statistics.md) +请点击[统计指标](docs/statistics.md) # Kerberos -Please click [Kerberos](docs/kerberos.md) +请点击[Kerberos](docs/kerberos.md) # Questions -Please click [Questions](docs/questions.md) +请点击[Questions](docs/questions.md) -# How to contribute FlinkX +# 如何贡献FlinkX -Please click [Contribution](docs/contribution.md) +请点击[如何贡献FlinkX](docs/contribution.md) # License diff --git a/README_CH.md b/README_EN.md similarity index 58% rename from README_CH.md rename to README_EN.md index c181ce45c0..4eea74d6bb 100644 --- a/README_CH.md +++ b/README_EN.md @@ -3,50 +3,47 @@ FlinkX [![License](https://img.shields.io/badge/license-Apache%202-4EB1BA.svg)](https://www.apache.org/licenses/LICENSE-2.0.html) -[English](README.md) | 中文 - -# 技术交流 - -- 招聘**Flink研发工程师**,如果有兴趣可以联系思枢(微信号:ysqwhiletrue)
-Flink开发工程师JD要求:
-1.负责袋鼠云基于Flink的衍生框架数据同步flinkx和实时计算flinkstreamsql框架的开发;
-2.调研和把握当前最新大数据实时计算技术,将其中的合适技术引入到平台中,改善产品,提升竞争力;
-职位要求:
-1、本科及以上学历,3年及以上的Flink开发经验,精通Java,熟悉Scala、Python优先考虑;
-2、熟悉Flink原理,有基于Flink做过二次源码的开发,在github上贡献者Flink源码者优先;
-3、有机器学习、数据挖掘相关经验者优先;
-4、对新技术有快速学习和上手能力,对代码有一定的洁癖;
-加分项:
-1.在GitHub或其他平台上有过开源项目
-可以添加本人微信号ysqwhiletrue,注明招聘,如有意者发送简历至[sishu@dtstack.com](mailto:sishu@dtstack.com) - -- 我们使用[钉钉](https://www.dingtalk.com/)沟通交流,可以搜索群号[**30537511**]或者扫描下面的二维码进入钉钉群 +English | [中文](README.md) + +# Communication + +- We are recruiting **Big data platform development engineers**. If you want more information about the position, please add WeChat ID [**ysqwhiletrue**] or email your resume to [sishu@dtstack.com](mailto:sishu@dtstack.com). + +- We use [DingTalk](https://www.dingtalk.com/) to communicate, you can search the group number [**30537511**] or scan the QR code below to join the communication group -
- -
+
+ +
+ +# Introduction -# 介绍 +* **FlinkX is a distributed offline and real-time data synchronization framework based on flink widely used in 袋鼠云, which realizes efficient data migration between multiple heterogeneous data sources.** -FlinkX是一个基于Flink的批流统一的数据同步工具,既可以采集静态的数据,比如MySQL,HDFS等,也可以采集实时变化的数据,比如MySQL binlog,Kafka等。FlinkX目前包含下面这些特性: +Different data sources are abstracted into different Reader plugins, and different data targets are abstracted into different Writer plugins. In theory, the FlinkX framework can support data synchronization of any data source type. As a set of ecosystems, every time a set of new data sources is connected, the newly added data sources can realize intercommunication with existing data sources. -- 大部分插件支持并发读写数据,可以大幅度提高读写速度; +
+ +
-- 部分插件支持失败恢复的功能,可以从失败的位置恢复任务,节约运行时间;[失败恢复](docs/restore.md) +FlinkX is a data synchronization tool based on Flink. FlinkX can collect static data, such as MySQL, HDFS, etc, as well as real-time changing data, such as MySQL binlog, Kafka, etc. FlinkX currently includes the following features: -- 关系数据库的Reader插件支持间隔轮询功能,可以持续不断的采集变化的数据;[间隔轮询](docs/offline/reader/mysqlreader.md) +- Most plugins support concurrent reading and writing of data, which can greatly improve the speed of reading and writing; -- 部分数据库支持开启Kerberos安全认证;[Kerberos](docs/kerberos.md) +- Some plug-ins support the function of failure recovery, which can restore tasks from the failed location and save running time; [Failure Recovery](docs/restore.md) -- 可以限制reader的读取速度,降低对业务数据库的影响; +- The Reader plugin for relational databases supports interval polling. It can continuously collect changing data; [Interval Polling](docs/offline/reader/mysqlreader.md) -- 可以记录writer插件写数据时产生的脏数据; +- Some databases support opening Kerberos security authentication; [Kerberos](docs/kerberos.md) -- 可以限制脏数据的最大数量; +- Limit the reading speed of Reader plugins and reduce the impact on business databases; -- 支持多种运行模式; +- Save the dirty data when writing data; -FlinkX目前支持下面这些数据库: +- Limit the maximum number of dirty data; + +- Multiple running modes: Local,Standalone,Yarn Session,Yarn Per; + +The following databases are currently supported: | | Database Type | Reader | Writer | |:----------------------:|:--------------:|:-------------------------------:|:-------------------------------:| @@ -73,39 +70,48 @@ FlinkX目前支持下面这些数据库: | | FTP | [doc](docs/offline/reader/ftpreader.md) | [doc](docs/offline/writer/ftpwriter.md) | | | HDFS | [doc](docs/offline/reader/hdfsreader.md) | [doc](docs/offline/writer/hdfswriter.md) | | | Carbondata | [doc](docs/offline/reader/carbondatareader.md) | [doc](docs/offline/writer/carbondatawriter.md) | -| | Stream | [doc](docs/offline/reader/streamreader.md) | [doc](docs/offline/writer/carbondatawriter.md) | +| | Stream | [doc](docs/offline/reader/streamreader.md) | [doc](docs/offline/writer/streamwriter.md) | | | Redis | | [doc](docs/offline/writer/rediswriter.md) | | | Hive | | [doc](docs/offline/writer/hivewriter.md) | +| | Alluxio | | [doc](docs/offline/writer/alluxiowriter.md) | | Stream Synchronization | Kafka | [doc](docs/realTime/reader/kafkareader.md) | [doc](docs/realTime/writer/kafkawriter.md) | | | EMQX | [doc](docs/realTime/reader/emqxreader.md) | [doc](docs/realTime/writer/emqxwriter.md) | -| | RestApi | | [doc](docs/realTime/writer/restapiwriter.md) | +| | RestApi || [doc](docs/realTime/writer/restapiwriter.md) | | | MySQL Binlog | [doc](docs/realTime/reader/binlogreader.md) | | | | MongoDB Oplog | [doc](docs/realTime/reader/mongodboplogreader.md)| | | | PostgreSQL WAL | [doc](docs/realTime/reader/pgwalreader.md) | | +| | Oracle LogMiner| [doc](docs/realTime/reader/LogMiner.md) | | +| | Sqlserver CDC| [doc](docs/realTime/reader/sqlservercdc.md) | | + +# Fundamental +In the underlying implementation, FlinkX relies on Flink, and the data synchronization task will be translated into StreamGraph and executed on Flink. The basic principle is as follows: +
+ +
-# 快速开始 +# Quick Start -请点击[快速开始](docs/quickstart.md) +Please click [Quick Start](docs/quickstart.md) -# 通用配置 +# General Configuration -请点击[插件通用配置](docs/generalconfig.md) +Please click [General Configuration](docs/generalconfig.md) -# 统计指标 +# Statistics Metric -请点击[统计指标](docs/statistics.md) +Please click [Statistics Metric](docs/statistics.md) # Kerberos -请点击[Kerberos](docs/kerberos.md) +Please click [Kerberos](docs/kerberos.md) # Questions -请点击[Questions](docs/questions.md) +Please click [Questions](docs/questions.md) -# 如何贡献FlinkX +# How to contribute FlinkX -请点击[如何贡献FlinkX](docs/contribution.md) +Please click [Contribution](docs/contribution.md) # License diff --git a/bin/flinkx b/bin/flinkx index 5af671e003..f31a8ae7f0 100755 --- a/bin/flinkx +++ b/bin/flinkx @@ -38,3 +38,4 @@ CLASS_NAME=com.dtstack.flinkx.launcher.Launcher echo "flinkx starting ..." nohup $JAVA_RUN -cp $JAR_DIR $CLASS_NAME $@ & +tail -f nohup.out \ No newline at end of file diff --git a/ci/install_jars.sh b/ci/install_jars.sh deleted file mode 100644 index 23a0e3850c..0000000000 --- a/ci/install_jars.sh +++ /dev/null @@ -1,13 +0,0 @@ -#!/usr/bin/env bash - -## db2 driver -mvn install:install-file -DgroupId=com.ibm.db2 -DartifactId=db2jcc -Dversion=3.72.44 -Dpackaging=jar -Dfile=jars/db2jcc-3.72.44.jar - -## oracle driver -mvn install:install-file -DgroupId=com.github.noraui -DartifactId=ojdbc8 -Dversion=12.2.0.1 -Dpackaging=jar -Dfile=jars/ojdbc8-12.2.0.1.jar - -## gbase driver -mvn install:install-file -DgroupId=com.esen.jdbc -DartifactId=gbase -Dversion=8.3.81.53 -Dpackaging=jar -Dfile=jars/gbase-8.3.81.53.jar - -## dm driver -mvn install:install-file -DgroupId=dm.jdbc.driver -DartifactId=dm7 -Dversion=18.0.0 -Dpackaging=jar -Dfile=jars/Dm7JdbcDriver18.jar \ No newline at end of file diff --git a/ci/sonar_notify.sh b/ci/sonar_notify.sh deleted file mode 100644 index 4ab9172260..0000000000 --- a/ci/sonar_notify.sh +++ /dev/null @@ -1,14 +0,0 @@ -#!/bin/bash -#参考钉钉文档 https://open-doc.dingtalk.com/microapp/serverapi2/qf2nxq - sonarreport=$(curl -s http://172.16.100.198:8082/?projectname=dt-insight-engine/flinkx) - curl -s "https://oapi.dingtalk.com/robot/send?access_token=e2718f7311243d2e58fa2695aa9c67a37760c7fce553311a32d53b3f092328ed" \ - -H "Content-Type: application/json" \ - -d "{ - \"msgtype\": \"markdown\", - \"markdown\": { - \"title\":\"sonar代码质量\", - \"text\": \"## sonar代码质量报告: \n -> [sonar地址](http://172.16.100.198:9000/dashboard?id=dt-insight-engine/flinkx) \n -> ${sonarreport} \n\" - } - }" \ No newline at end of file diff --git a/docs/contribution.md b/docs/contribution.md index 2063b96ddd..43ac77347a 100644 --- a/docs/contribution.md +++ b/docs/contribution.md @@ -10,11 +10,29 @@ 1. 如何合理且正确地使用框架; 1. 配置文件的规范; + +## PR规范 +1. 建立issue,描述相关问题信息 +1. 基于对应的release分支拉取开发分支 +1. commit 信息:[type-issueid] [module] msg + 1. type 类别 + 1. feat:表示是一个新功能(feature) + 1. hotfix:hotfix,修补bug + 1. docs:改动、增加文档 + 1. opt:修改代码风格及opt imports这些,不改动原有执行的代码 + 1. test:增加测试 +1. 多次提交使用rebase 合并成一个。 +1. pr 名称:[flinkx-issueid][module名称] 标题 + +eg: +- [hotfix-31280][core] 修复bigdecimal转decimal运行失败问题 +- [feat-31372][rdb] RDB结果表Upsert模式支持选择更新策略 + ## 开发环境 -- Flink集群: 1.4及以上(单机模式不需要安装Flink集群) -- Java: JDK8及以上 +- Flink集群: 版本与FlinkX版本对应(单机模式不需要安装Flink集群) +- Java: JDK8 - 操作系统:理论上不限,但是目前只编写了shell启动脚本,用户可以可以参考shell脚本编写适合特定操作系统的启动脚本。 开发之前,需要理解以下概念: @@ -41,13 +59,13 @@ ## 插件入口类 -插件的入口类需继承**DataReader**和**DataWriter**,在内部获取任务json传来的参数,通过相应的**Builder**构建对应**InputFormat**和**OutputFormat**实例 +插件的入口类需继承**BaseDataReader**和**BaseDataWriter**,在内部获取任务json传来的参数,通过相应的**Builder**构建对应**InputFormat**和**OutputFormat**实例 ### DataReader ```java -public class SomeReader extends DataReader { +public class SomeReader extends BaseDataReader { protected String oneParameter; public SomeReader(DataTransferConfig config, StreamExecutionEnvironment env) { super(config, env); @@ -59,7 +77,7 @@ public class SomeReader extends DataReader { } ``` -reader类需继承DataReader,同时重写readData方法。在构造函数中获取任务json中构建InputFormat所需要的参数,代码案例如下: +reader类需继承BaseDataReader,同时重写readData方法。在构造函数中获取任务json中构建InputFormat所需要的参数,代码案例如下: 构造方法 @@ -92,7 +110,7 @@ public DataStream readData() { ### DataWriter ```java -public class SomeWriter extends DataWriter { +public class SomeWriter extends BaseDataWriter { protected String oneParameter; public SomeWriter(DataTransferConfig config) { super(config); @@ -105,7 +123,7 @@ public class SomeWriter extends DataWriter { } ``` -和DataReader类似,writer需继承DataWriter,同时重写writeData方法。通常会创建一个ConfigKeys类,包含reader和writer所有需要的使用的任务json中参数的key。 +和DataReader类似,writer需继承BaseDataWriter,同时重写writeData方法。通常会创建一个ConfigKeys类,包含reader和writer所有需要的使用的任务json中参数的key。 构造方法 @@ -136,10 +154,10 @@ public DataStreamSink writeData(DataStream dataSet) { ### InputFormatBuilder的设计 -需继承**RichInputFormatBuilder** +需继承**BaseRichInputFormatBuilder** ```java -public class SomeInputFormatBuilder extends RichInputFormatBuilder { +public class SomeInputFormatBuilder extends BaseRichInputFormatBuilder { /** * 首先实例化一个InputFormat实例,通过构造函数传递,通过set方法设置参数 */ @@ -161,10 +179,10 @@ public class SomeInputFormatBuilder extends RichInputFormatBuilder { ### InputFormat的设计 -需继承**RichInputFormat**,根据任务逻辑分别实现 +需继承**BaseRichInputFormat**,根据任务逻辑分别实现 ```java -public class SomeInputFormat extends RichInputFormat { +public class SomeInputFormat extends BaseRichInputFormat { @override public void openInputFormat() { @@ -211,43 +229,43 @@ public class SomeInputFormat extends RichInputFormat { - 调用位置:configure方法会在JobManager里构建执行计划的时候和在TaskManager里初始化并发实例后各调用一次; - 作用:用于配置task的实例; - 注意事项:不要在这个方法里写耗时的逻辑,比如获取连接,运行sql等,否则可能会导致akka超 - + #### createInputSplits - 调用位置:在构建执行计划时调用; - 作用:调用子类的逻辑生成数据分片; - 注意事项:分片的数量和并发数没有严格对应关系,不要在这个方法里做耗时的操作,否则会导致akka超时异常; - + #### getInputSplitAssigner - 调用位置:创建分片后调用; - 作用:获取分片分配器,同步插件里使用的是DefaultInputSplitAssigner,按顺序返回分配给各个并发实例; - 注意事项:无; - + #### openInternal - 调用位置:开始读取分片时调用; - 作用:用于打开需要读取的数据源,并做一些初始化; - 注意事项:这个方法必须是可以重复调用的,因为同一个并发实例可能会处理多个分片; - + #### reachEnd和nextRecordInternal - 调用位置:任务运行时,读取每条数据时调用; - 作用:返回结束标识和下一条记录; - 注意事项:无 - + #### closeInternal - 调用位置:读取完一个分片后调用,至少调用一次; - 作用:关闭资源; - 注意事项:可重复调用,关闭资源做非null检查,因为程序遇到异常情况可能直接跳转到closeInternal; - + #### openInputFormat - 调用位置:创建分片之后调用; - 作用:对整个InpurFormat资源做初始化; - 注意事项:无; - + #### closeInputFormat - 调用位置:当所有切片都执行完之后调用; @@ -256,10 +274,10 @@ public class SomeInputFormat extends RichInputFormat { ### OutputFormatBuilder -需继承**RichOutputFormatBuilder**,和**InputFormatBuilder**相似 +需继承**BaseRichOutputFormatBuilder**,和**InputFormatBuilder**相似 ```java -public class SomeOutputFormatBuilder extends RichOutputFormatBuilder { +public class SomeOutputFormatBuilder extends BaseRichOutputFormatBuilder { /** * 首先实例化一个OutputFormat实例,通过构造函数传递,通过设计set方法设置参数 * 如下演示 @@ -282,10 +300,10 @@ public class SomeOutputFormatBuilder extends RichOutputFormatBuilder { ### OutputFormat -需继承**RichOutputFormat** +需继承**BaseRichOutputFormat** ```java -public class SomeOutputFormat extends RichOutputFormat { +public class SomeOutputFormat extends BaseRichOutputFormat { @Override protected void openInternal(int taskNumber, int numTasks) throws IOException {} @@ -312,13 +330,13 @@ openInternal -> writeSingleRecordInternal / writeMultipleRecordsInternal - 调用位置:开始写入使用 - 作用:用于打开需要读取的数据源,并做一些初始化; - 注意事项:无; - + #### writerSingleRecordInternal - 调用位置:openInernal之后调用,开始写入数据 - 作用:向数据源写入一条数据 - 注意事项:无; - + #### writerMultipleRecordsInternal - 调用位置:openInternal之后调用,开始写入多条数据 @@ -451,10 +469,9 @@ public class Row implements Serializable{ ## 加载原理 -1. 框架扫描`plugin/reader`和`plugin/writer`目录,加载每个插件的`plugin.json`文件。 -1. 以`plugin.json`文件中`name`为key,索引所有的插件配置。如果发现重名的插件或者不存在的插件,框架会异常退出。 -1. 用户在插件中在`reader`/`writer`配置的`name`字段指定插件名字。框架根据插件的类型(`reader`/`writer`)和插件名称去插件的路径下扫描所有的jar,加入`classpath`。 -1. 根据插件配置中定义的入口类,框架通过反射实例化对应的`Job`对象。 +1. 用户在插件中在`reader`/`writer`配置的`name`字段指定插件名字。 +2. 框架根据插件的类型(`reader`/`writer`)和插件名称去插件的路径下扫描所有的jar,加入`classpath`。 +3. 根据插件配置中定义的入口类,框架通过反射实例化对应的`Job`对象。 ## 统一的目录结构 @@ -465,26 +482,25 @@ public class Row implements Serializable{ ``` ${Flinkx_HOME} |-- bin -| -- flink -| -- flinkx.sh +| -- flinkx.sh | |-- flinkx-somePlugin - |-- flinkx-somePlugin-core - |-- common 一些插件共用的类 - |-- exception 异常处理类 - |-- pom.xml 插件公用依赖 - |-- flinkx-somePlugin-reader - |-- InputFormat - |-- SomePluginInputFormat - |-- SomePluginInputFormatBuiler - |-- reader - |-- SomePluginReader - |-- flinkx-somePlugin-writer - |-- OutputFormat - |-- SomePluginOutputFormat - |-- SomePluginOutputFormatBuiler - |-- reader - |-- SomePluginWriter +|-- flinkx-somePlugin-core +|-- common 一些插件共用的类 +|-- exception 异常处理类 +|-- pom.xml 插件公用依赖 +|-- flinkx-somePlugin-reader +|-- InputFormat +|-- SomePluginInputFormat +|-- SomePluginInputFormatBuiler +|-- reader +|-- SomePluginReader +|-- flinkx-somePlugin-writer +|-- OutputFormat +|-- SomePluginOutputFormat +|-- SomePluginOutputFormatBuiler +|-- reader +|-- SomePluginWriter ``` ``` @@ -511,4 +527,8 @@ unix平台 mvn clean package -DskipTests -Prelease -DscriptType=sh ``` -打包结束后,项目根目录下会产生bin目录和plugins目录,其中bin目录包含FlinkX的启动脚本,plugins目录下存放编译好的数据同步插件包,之后就可以提交开发平台测试啦! +打包结束后,项目根目录下会产生bin目录和plugins目录,其中bin目录包含FlinkX的启动脚本,syncplugins目录下存放编译好的数据同步插件包,之后就可以提交开发平台测试啦! + + + + diff --git a/docs/example/LogMiner_hive.json b/docs/example/LogMiner_hive.json new file mode 100644 index 0000000000..c69d7492cf --- /dev/null +++ b/docs/example/LogMiner_hive.json @@ -0,0 +1,51 @@ +{ + "job" : { + "content" : [ { + "reader" : { + "parameter" : { + "schema" : "ROMA_LOGMINER", + "password" : "password", + "cat" : "insert,update,delete", + "jdbcUrl" : "jdbc:oracle:thin:@//localhost:1521/xib", + "readPosition" : "current", + "pavingData" : true, + "table" : [ "ROMA_LOGMINER.TEST" ], + "username" : "username" + }, + "name" : "oraclelogminerreader" + }, + "writer" : { + "parameter" : { + "writeMode" : "append", + "partitionType" : "DAY", + "tablesColumn" : "{\"TEST\":[{\"type\":\"VARCHAR2\",\"key\":\"before_ID\",\"comment\":\"\"},{\"comment\":\"\",\"type\":\"VARCHAR2\",\"key\":\"after_ID\"},{\"comment\":\"\",\"type\":\"varchar\",\"key\":\"type\"},{\"comment\":\"\",\"type\":\"varchar\",\"key\":\"schema\"},{\"comment\":\"\",\"type\":\"varchar\",\"key\":\"table\"},{\"comment\":\"\",\"type\":\"bigint\",\"key\":\"ts\"}]}", + "partition" : "pt", + "hadoopConfig" : { + "dfs.ha.namenodes.ns1" : "nn1,nn2", + "fs.defaultFS" : "hdfs://ns1", + "dfs.namenode.rpc-address.ns1.nn2" : "ip2:9000", + "dfs.client.failover.proxy.provider.ns1" : "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider", + "dfs.namenode.rpc-address.ns1.nn1" : "ip1:9000", + "dfs.nameservices" : "ns1", + "fs.hdfs.impl.disable.cache" : "true", + "fs.hdfs.impl" : "org.apache.hadoop.hdfs.DistributedFileSystem" + }, + "jdbcUrl" : "jdbc:hive2://ip3:8191/test", + "defaultFS" : "hdfs://ns1", + "fileType" : "parquet", + "charsetName" : "utf-8" + }, + "name" : "hivewriter" + } + } ], + "setting" : { + "restore" : { + "isRestore" : true, + "isStream" : true + }, + "speed" : { + "channel" : 1 + } + } + } +} \ No newline at end of file diff --git a/docs/example/binlog_hive.json b/docs/example/binlog_hive.json new file mode 100644 index 0000000000..1acd0e3b50 --- /dev/null +++ b/docs/example/binlog_hive.json @@ -0,0 +1,52 @@ +{ + "job": { + "content": [ + { + "reader": { + "parameter": { + "table": ["schema.table"], + "password": "passwd", + "database": "schema", + "port": 3306, + "cat": "insert,update,delete", + "host": "host", + "jdbcUrl": "jdbc:mysql://host:3306/schema", + "pavingData": true, + "username": "user" + }, + "name": "binlogreader" + }, + "writer": { + "parameter": { + "writeMode": "overwrite", + "partitionType": "DAY", + "tablesColumn" : "{\"CDC\":[{\"type\":\"VARCHAR2\",\"key\":\"before_ID\",\"comment\":\"\"},{\"comment\":\"\",\"type\":\"VARCHAR2\",\"key\":\"after_ID\"},{\"comment\":\"\",\"type\":\"varchar\",\"key\":\"type\"},{\"comment\":\"\",\"type\":\"varchar\",\"key\":\"schema\"},{\"comment\":\"\",\"type\":\"varchar\",\"key\":\"table\"},{\"comment\":\"\",\"type\":\"bigint\",\"key\":\"ts\"}]}", + "partition": "pt", + "hadoopConfig": { + "dfs.ha.namenodes.ns1": "nn1,nn2", + "dfs.namenode.rpc-address.ns1.nn2": "host1:9000", + "dfs.client.failover.proxy.provider.ns1": "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider", + "dfs.namenode.rpc-address.ns1.nn1": "host2:9000", + "dfs.nameservices": "ns1" + }, + "jdbcUrl": "jdbc:hive2://host1:10000/dev", + "defaultFS": "hdfs://ns1", + "fileType": "orc", + "charsetName": "utf-8", + "username": "admin" + }, + "name": "hivewriter" + } + } + ], + "setting": { + "restore": { + "isRestore": true, + "isStream": true + }, + "speed": { + "channel": 1 + } + } + } +} \ No newline at end of file diff --git a/docs/example/binlog_stream.json b/docs/example/binlog_stream.json new file mode 100644 index 0000000000..6606c89181 --- /dev/null +++ b/docs/example/binlog_stream.json @@ -0,0 +1,41 @@ +{ + "job": { + "content": [ + { + "reader": { + "parameter": { + "filter": "schema\\.table", + "password": "passwd", + "database": "database", + "port": 3306, + "start" : { + "journalName": "binlog.000031", + "timestamp": 1610525946000, + "position": 4 + }, + "cat": "DELETE,INSERT,UPDATE", + "host": "localhost", + "jdbcUrl": "jdbc:mysql://localhost:3306/schema", + "pavingData": true, + "username": "username" + }, + "name": "binlogreader" + }, + "writer": { + "parameter": { + "print": true + }, + "name": "streamwriter" + } + } + ], + "setting": { + "restore": { + "isStream": true + }, + "speed": { + "channel": 1 + } + } + } +} \ No newline at end of file diff --git a/docs/example/cassandra_stream.json b/docs/example/cassandra_stream.json new file mode 100644 index 0000000000..b764be874c --- /dev/null +++ b/docs/example/cassandra_stream.json @@ -0,0 +1,54 @@ +{ + "job" : { + "content" : [ { + "reader": { + "name": "cassandrareader", + "parameter": { + "host": "localhost", + "port": 9042, + "username":"", + "password":"", + "useSSL":false, + "column": [ + { + "name": "rowkey", + "type": "string" + }, + { + "name": "cf1:id", + "type": "string" + } + ] + } + }, + "writer": { + "name": "streamwriter", + "parameter": { + "print": true + } + } + } + ], + "setting": { + "speed": { + "channel": 1, + "bytes": 0 + }, + "errorLimit": { + "record": 1 + }, + "restore": { + "maxRowNumForCheckpoint": 0, + "isRestore": false, + "restoreColumnName": "", + "restoreColumnIndex": 0 + }, + "log" : { + "isLogger": false, + "level" : "debug", + "path" : "", + "pattern":"" + } + } + } +} \ No newline at end of file diff --git a/docs/example/clickhouse_stream.json b/docs/example/clickhouse_stream.json new file mode 100644 index 0000000000..eaa608cac7 --- /dev/null +++ b/docs/example/clickhouse_stream.json @@ -0,0 +1,52 @@ +{ + "job": { + "content": [{ + "reader": { + "parameter" : { + "column" : [ { + "name" : "id", + "type" : "int" + }, { + "name" : "user_id", + "type" : "int" + }, { + "name" : "name", + "type" : "string" + },{ + "name" : "eventDate", + "type" : "date" + } + ], + "username" : "username", + "password" : "password", + "connection" : [ { + "jdbcUrl" : [ "jdbc:clickhouse://localhost:8123/database" ], + "table" : [ "test" ] + } ], + "where": "id > 1", + "splitPk": "id", + "fetchSize": 1000, + "queryTimeOut": 1000, + "customSql": "", + "requestAccumulatorInterval": 2 + }, + "name" : "clickhousereader" + }, + "writer": { + "name": "streamwriter", + "parameter": { + "print": true + } + } + }], + "setting": { + "speed": { + "channel": 1, + "bytes": 0 + }, + "errorLimit": { + "record": 100 + } + } + } +} \ No newline at end of file diff --git a/docs/example/db2_stream.json b/docs/example/db2_stream.json new file mode 100644 index 0000000000..d835e22e86 --- /dev/null +++ b/docs/example/db2_stream.json @@ -0,0 +1,52 @@ +{ + "job": { + "content": [{ + "reader": { + "parameter" : { + "column" : [ { + "name" : "id", + "type" : "bigint" + },{ + "name" : "name", + "type" : "varchar" + } ], + "username" : "username", + "password" : "password", + "connection" : [ { + "jdbcUrl" : [ "jdbc:db2://localhost:50002/database" ], + "table" : [ "TEST" ] + } ], + "where": "", + "splitPk": "id", + "fetchSize": 1000, + "queryTimeOut": 1000, + "customSql": "", + "requestAccumulatorInterval": 2 + }, + "name" : "db2reader" + }, + "writer": { + "name": "streamwriter", + "parameter": { + "print": true + } + } + } + ], + "setting": { + "speed": { + "channel": 1, + "bytes": 0 + }, + "errorLimit": { + "record": 100 + }, + "restore": { + "maxRowNumForCheckpoint": 0, + "isRestore": false, + "restoreColumnName": "", + "restoreColumnIndex": 0 + } + } + } +} \ No newline at end of file diff --git a/docs/example/dm_stream.json b/docs/example/dm_stream.json new file mode 100644 index 0000000000..5ca44a0240 --- /dev/null +++ b/docs/example/dm_stream.json @@ -0,0 +1,55 @@ +{ + "job": { + "content": [ + { + "reader": { + "name": "dmreader", + "parameter": { + "column": [ + { + "name": "ID", + "type": "int" + }, + { + "name": "AGE", + "type": "int" + } + ], + "increColumn": "", + "startLocation": "", + "username": "username", + "password": "password", + "connection": [ + { + "jdbcUrl": ["jdbc:dm://localhost:5236"], + "table": ["TABLE"] + } + ], + "where": "" + } + }, + "writer": { + "name": "streamwriter", + "parameter": { + "print": true + } + } + } + ], + "setting": { + "speed": { + "channel": 1, + "bytes": 0 + }, + "errorLimit": { + "record": 100 + }, + "restore": { + "maxRowNumForCheckpoint": 0, + "isRestore": false, + "restoreColumnName": "", + "restoreColumnIndex": 0 + } + } + } +} \ No newline at end of file diff --git a/docs/example/gbase_stream.json b/docs/example/gbase_stream.json new file mode 100644 index 0000000000..a8c26230d5 --- /dev/null +++ b/docs/example/gbase_stream.json @@ -0,0 +1,48 @@ +{ + "job": { + "content": [{ + "reader": { + "parameter" : { + "column" : [ { + "name" : "id", + "type" : "bigint" + }, { + "name" : "user_id", + "type" : "bigint" + }, { + "name" : "name", + "type" : "varchar" + } ], + "username" : "username", + "password" : "password", + "connection" : [ { + "jdbcUrl" : [ "jdbc:gbase://0.0.0.1:5258/database" ], + "table" : [ "tableTest" ] + } ], + "where": "id > 1", + "splitPk": "id", + "fetchSize": 1000, + "queryTimeOut": 1000, + "customSql": "", + "requestAccumulatorInterval": 2 + }, + "name" : "gbasereader" + }, + "writer": { + "name": "streamwriter", + "parameter": { + "print": true + } + } + }], + "setting": { + "speed": { + "channel": 1, + "bytes": 0 + }, + "errorLimit": { + "record": 100 + } + } + } +} \ No newline at end of file diff --git a/docs/example/greenplum_stream.json b/docs/example/greenplum_stream.json new file mode 100644 index 0000000000..68ec13a786 --- /dev/null +++ b/docs/example/greenplum_stream.json @@ -0,0 +1,36 @@ +{ + "job": { + "content": [{ + "reader": { + "parameter" : { + "column" : [ {"name" : "id", "type": "int"}], + "username" : "username", + "password" : "password", + "connection" : [ { + "jdbcUrl" : ["jdbc:pivotal:greenplum://localhost:5432;DatabaseName=database"], + "table" : ["table"] + } ], + "where": "", + "customSql": "", + "requestAccumulatorInterval": 2 + }, + "name" : "greenplumreader" + }, + "writer": { + "name": "streamwriter", + "parameter": { + "print": true + } + } + }], + "setting": { + "speed": { + "channel": 1, + "bytes": 0 + }, + "errorLimit": { + "record": 100 + } + } + } +} \ No newline at end of file diff --git a/docs/example/hbase_stream.json b/docs/example/hbase_stream.json new file mode 100644 index 0000000000..3662beb674 --- /dev/null +++ b/docs/example/hbase_stream.json @@ -0,0 +1,65 @@ +{ + "job": { + "content": [ + { + "reader": { + "name": "hbasereader", + "parameter": { + "hbaseConfig": { + "hbase.zookeeper.property.clientPort": "2181", + "hbase.rootdir": "hdfs://ns1/hbase", + "hbase.cluster.distributed": "true", + "hbase.zookeeper.quorum": "node01,node02,node03", + "zookeeper.znode.parent": "/hbase" + }, + "table": "sb5", + "encodig": "utf-8", + "column": [ + { + "name": "rowkey", + "type": "string" + }, + { + "name": "cf1:id", + "type": "string" + } + ], + "range": { + "startRowkey": "", + "endRowkey": "", + "isBinaryRowkey": false + } + } + }, + "writer": { + "parameter": { + "print": true + }, + "name": "streamwriter" + } + } + ], + "setting": { + "speed": { + "channel": 1, + "bytes": 0 + }, + "errorLimit": { + "record": 100 + }, + "restore": { + "maxRowNumForCheckpoint": 0, + "isRestore": false, + "isStream": false, + "restoreColumnName": "", + "restoreColumnIndex": 0 + }, + "log": { + "isLogger": false, + "level": "debug", + "path": "", + "pattern": "" + } + } + } +} \ No newline at end of file diff --git a/docs/example/kafka09_stream.json b/docs/example/kafka09_stream.json deleted file mode 100644 index cb69ecbee6..0000000000 --- a/docs/example/kafka09_stream.json +++ /dev/null @@ -1,36 +0,0 @@ -{ - "job": { - "content": [ - { - "reader": { - "parameter": { - "topic": "kafka09", - "groupId": "default", - "codec": "text", - "encoding": "UTF-8", - "blankIgnore": false, - "consumerSettings": { - "zookeeper.connect": "localhost:2181/kafka09" - } - }, - "name": "kafka09reader" - }, - "writer": { - "parameter": { - "print": true - }, - "name": "streamwriter" - } - } - ], - "setting": { - "restore": { - "isRestore": false, - "isStream": true - }, - "speed": { - "channel": 1 - } - } - } -} \ No newline at end of file diff --git a/docs/example/kingbase_stream.json b/docs/example/kingbase_stream.json index db34cbfdfd..9566ba92c9 100644 --- a/docs/example/kingbase_stream.json +++ b/docs/example/kingbase_stream.json @@ -4,12 +4,12 @@ { "reader": { "parameter": { - "username": "dtstack", - "password": "abc123", + "username": "username", + "password": "password", "connection": [{ - "jdbcUrl": ["jdbc:kingbase8://localhost:54321/test"], - "table": ["kudu"], - "schema":"test" + "jdbcUrl": ["jdbc:kingbase8://localhost:54321/database"], + "table": ["table"], + "schema":"schema" }], "column": ["*"], "customSql": "", diff --git a/docs/example/mysql_mysql.json b/docs/example/mysql_mysql.json new file mode 100644 index 0000000000..37661ea9a2 --- /dev/null +++ b/docs/example/mysql_mysql.json @@ -0,0 +1,74 @@ + +{ + "job": { + "content": [{ + "reader": { + "parameter": { + "username": "root", + "password": "root@1298", + "connection": [{ + "jdbcUrl": ["jdbc:mysql://192.168.90.145:3306/china?useSSL=false&useUnicode=true&characterEncoding=utf8"], + "table": ["pipeline_source"] + }], + "column": [{ + "name": "id", + "type": "bigint" + }, { + "name": "user_id", + "type": "bigint" + }, { + "name": "name", + "type": "varchar" + }], + "customSql": "", + "where": "id > 2", + "splitPk": "id", + "increColumn": "id", + "startLocation": "", + "polling": true, + "pollingInterval": 5000, + "queryTimeOut": 1000, + "requestAccumulatorInterval": 2 + }, + "name": "mysqlreader" + }, + "writer": { + "name": "mysqlwriter", + "parameter": { + "username": "root", + "password": "root@1298", + "connection": [{ + "jdbcUrl": "jdbc:mysql://192.168.90.145:3306/china?useSSL=false&useUnicode=true&characterEncoding=utf8", + "table": ["pipeline_sink"] + }], + "preSql": ["truncate table pipeline_sink;"], + "postSql": [], + "writeMode": "insert", + "column": ["id", "user_id", "name"], + "batchSize": 2 + } + } + }], + "setting": { + "speed": { + "channel": 1, + "bytes": 0 + }, + "errorLimit": { + "record": 100 + }, + "restore": { + "maxRowNumForCheckpoint": 0, + "isRestore": true, + "restoreColumnName": "id", + "restoreColumnIndex": 1 + }, + "log": { + "isLogger": false, + "level": "debug", + "path": "", + "pattern": "" + } + } + } +} diff --git a/docs/example/mysql_stream.json b/docs/example/mysql_stream.json new file mode 100644 index 0000000000..a0702ed741 --- /dev/null +++ b/docs/example/mysql_stream.json @@ -0,0 +1,52 @@ +{ + "job": { + "content": [ + { + "reader": { + "parameter": { + "username": "username", + "password": "password", + "connection": [{ + "jdbcUrl": ["jdbc:mysql://0.0.0.1:3306/database?useUnicode=true&characterEncoding=utf8"], + "table": ["table"] + }], + "column": ["*"], + "customSql": "", + "where": "id < 100", + "splitPk": "", + "queryTimeOut": 1000, + "requestAccumulatorInterval": 2 + }, + "name": "mysqlreader" + }, + "writer": { + "name": "streamwriter", + "parameter": { + "print": true + } + } + } + ], + "setting": { + "speed": { + "channel": 1, + "bytes": 0 + }, + "errorLimit": { + "record": 100 + }, + "restore": { + "maxRowNumForCheckpoint": 0, + "isRestore": false, + "restoreColumnName": "", + "restoreColumnIndex": 0 + }, + "log" : { + "isLogger": false, + "level" : "debug", + "path" : "", + "pattern":"" + } + } + } +} \ No newline at end of file diff --git a/docs/example/oracle_stream.json b/docs/example/oracle_stream.json new file mode 100644 index 0000000000..2ba2f47d34 --- /dev/null +++ b/docs/example/oracle_stream.json @@ -0,0 +1,53 @@ +{ + "job": { + "content": [ + { + "reader": { + "parameter": { + "username": "username", + "password": "password", + "connection": [{ + "jdbcUrl": ["jdbc:oracle:thin:@0.0.0.1:1521:oracle"], + "table": ["TABLE"] + }], + "column": ["*"], + "customSql": "", + "where": "ID < 10000", + "splitPk": "", + "fetchSize": 1024, + "queryTimeOut": 1000, + "requestAccumulatorInterval": 2 + }, + "name": "oraclereader" + }, + "writer": { + "name": "streamwriter", + "parameter": { + "print": true + } + } + } + ], + "setting": { + "speed": { + "channel": 1, + "bytes": 0 + }, + "errorLimit": { + "record": 1 + }, + "restore": { + "maxRowNumForCheckpoint": 0, + "isRestore": false, + "restoreColumnName": "", + "restoreColumnIndex": 0 + }, + "log" : { + "isLogger": false, + "level" : "debug", + "path" : "", + "pattern":"" + } + } + } +} \ No newline at end of file diff --git a/flinkx-examples/examples/phoenix5_stream.json b/docs/example/phoenix5_stream.json similarity index 69% rename from flinkx-examples/examples/phoenix5_stream.json rename to docs/example/phoenix5_stream.json index 9747bfcd94..7d1d0df7bb 100644 --- a/flinkx-examples/examples/phoenix5_stream.json +++ b/docs/example/phoenix5_stream.json @@ -5,28 +5,28 @@ "parameter" : { "column" : [ { "name" : "id", - "type" : "int" + "type" : "bigint" + }, { + "name" : "user_id", + "type" : "bigint" }, { "name" : "name", "type" : "varchar" - }, { - "name" : "age", - "type" : "int" } ], - "username" : "root", - "password" : "abc123", + "username" : "", + "password" : "", "connection" : [ { - "jdbcUrl" : [ "jdbc:phoenix:172.16.100.68,172.16.100.142,172.16.100.205:2181" ], - "table" : [ "PERSON" ] + "jdbcUrl" : [ "jdbc:phoenix:node01,node02,node03:2181" ], + "table" : [ "tableTest" ] } ], - "where": "id > 0", + "where": "id > 1", "splitPk": "id", "fetchSize": 1000, "queryTimeOut": 1000, - "customSql": "select * from PERSON where id < 100", + "customSql": "", "requestAccumulatorInterval": 2 }, - "name" : "phoenix5reader" + "name" : "phoenixreader" }, "writer": { "name": "streamwriter", @@ -38,7 +38,7 @@ ], "setting": { "speed": { - "channel": 3, + "channel": 1, "bytes": 0 }, "errorLimit": { diff --git a/docs/example/polardb_stream.json b/docs/example/polardb_stream.json new file mode 100644 index 0000000000..78b69c0d5d --- /dev/null +++ b/docs/example/polardb_stream.json @@ -0,0 +1,52 @@ +{ + "job": { + "content": [ + { + "reader": { + "parameter": { + "username": "username", + "password": "password", + "connection": [{ + "jdbcUrl": ["jdbc:polardb://0.0.0.1:3306/database"], + "table": ["table"] + }], + "column": ["*"], + "customSql": "", + "where": "id < 100", + "splitPk": "", + "queryTimeOut": 1000, + "requestAccumulatorInterval": 2 + }, + "name": "polardbreader" + }, + "writer": { + "name": "streamwriter", + "parameter": { + "print": true + } + } + } + ], + "setting": { + "speed": { + "channel": 1, + "bytes": 0 + }, + "errorLimit": { + "record": 100 + }, + "restore": { + "maxRowNumForCheckpoint": 0, + "isRestore": false, + "restoreColumnName": "", + "restoreColumnIndex": 0 + }, + "log" : { + "isLogger": false, + "level" : "debug", + "path" : "", + "pattern":"" + } + } + } +} \ No newline at end of file diff --git a/docs/example/postgresql_stream.json b/docs/example/postgresql_stream.json new file mode 100644 index 0000000000..05278bb6d1 --- /dev/null +++ b/docs/example/postgresql_stream.json @@ -0,0 +1,48 @@ +{ + "job": { + "content": [{ + "reader": { + "parameter" : { + "column" : [ { + "name" : "id", + "type" : "bigint" + }, { + "name" : "user_id", + "type" : "bigint" + }, { + "name" : "name", + "type" : "varchar" + } ], + "username" : "username", + "password" : "password", + "connection" : [ { + "jdbcUrl" : [ "jdbc:postgresql://0.0.0.1:5432/postgres" ], + "table" : [ "tableTest" ] + } ], + "where": "id > 1", + "splitPk": "id", + "fetchSize": 1000, + "queryTimeOut": 1000, + "customSql": "", + "requestAccumulatorInterval": 2 + }, + "name" : "postgresqlreader" + }, + "writer": { + "name": "streamwriter", + "parameter": { + "print": true + } + } + }], + "setting": { + "speed": { + "channel": 1, + "bytes": 0 + }, + "errorLimit": { + "record": 100 + } + } + } +} \ No newline at end of file diff --git a/docs/example/saphana_stream.json b/docs/example/saphana_stream.json new file mode 100644 index 0000000000..084eaf48d1 --- /dev/null +++ b/docs/example/saphana_stream.json @@ -0,0 +1,87 @@ +{ + "job": { + "content": [ + { + "reader": { + "parameter": { + "username": "username", + "password": "password", + "connection": [ + { + "jdbcUrl": [ + "jdbc:sap://0.0.0.1:39017" + ], + "table": [ + "SYS.P_DPAPI_KEY_" + ] + } + ], + "column": [ + { + "name": "OID", + "type": "BIGINT" + }, + { + "name": "CALLER", + "type": "NVARCHAR" + }, + { + "name": "RECORD_ID", + "type": "NVARCHAR" + }, + { + "name": "KEY_ID", + "type": "INTEGER" + }, + { + "name": "KEY", + "type": "VARBINARY" + }, + { + "name": "CREATE_USER", + "type": "NVARCHAR" + }, + { + "name": "CREATE_TIME", + "type": "BIGINT" + }, + { + "name": "DATA_ENCRYPTION_ALGORITHM", + "type": "TINYINT" + }, + { + "name": "IS_CURRENT_KEY", + "type": "TINYINT" + } + ] + }, + "name": "saphanareader" + }, + + "writer": { + "parameter": { + "print": true + }, + "name": "streamwriter" + } + } + ], + "setting": { + "restore": { + "isRestore": false, + "isStream": false + }, + "errorLimit": {}, + "speed": { + "bytes": 0, + "channel": 1 + }, + "log": { + "isLogger": false, + "level": "trace", + "path": "", + "pattern": "" + } + } + } +} \ No newline at end of file diff --git a/docs/example/sqlserver_stream.json b/docs/example/sqlserver_stream.json new file mode 100644 index 0000000000..2e8ed1a2cc --- /dev/null +++ b/docs/example/sqlserver_stream.json @@ -0,0 +1,59 @@ +{ + "job": { + "content": [{ + "reader": { + "parameter" : { + "column" : [ { + "name" : "id", + "type" : "bigint" + }, { + "name" : "user_id", + "type" : "bigint" + }, { + "name" : "name", + "type" : "varchar" + } ], + "username" : "username", + "password" : "password", + "connection" : [ { + "jdbcUrl" : [ "jdbc:jtds:sqlserver://0.0.0.1:1433;DatabaseName=database" ], + "table" : [ "tableTest" ] + } ], + "where": "id > 1", + "splitPk": "id", + "fetchSize": 1000, + "queryTimeOut": 1000, + "customSql": "", + "requestAccumulatorInterval": 2 + }, + "name" : "sqlserverreader" + }, + "writer": { + "name": "streamwriter", + "parameter": { + "print": true + } + } + }], + "setting": { + "speed": { + "channel": 1, + "bytes": 0 + }, + "errorLimit": { + "record": 100 + }, + "restore": { + "maxRowNumForCheckpoint": 0, + "isRestore": false, + "restoreColumnName": "", + "restoreColumnIndex": 0 + }, + "log" : { + "isLogger": false, + "level" : "debug", + "path" : "", + "pattern":"" + } + } + } \ No newline at end of file diff --git a/docs/example/stream_cassandra.json b/docs/example/stream_cassandra.json new file mode 100644 index 0000000000..80b8e3cc41 --- /dev/null +++ b/docs/example/stream_cassandra.json @@ -0,0 +1,64 @@ +{ + "job" : { + "content" : [ { + "reader": { + "name": "streamreader", + "parameter": { + "column": [ + { + "name": "rowkey", + "type": "string" + }, + { + "name": "id", + "type": "string" + } + ], + "sliceRecordCount" : ["100"] + } + }, + "writer": { + "name": "cassandrawriterer", + "parameter": { + "host": "localhost", + "port": 9042, + "username":"", + "password":"", + "useSSL":false, + "column": [ + { + "name": "rowkey", + "type": "string" + }, + { + "name": "cf1:id", + "type": "string" + } + ] + } + } + } ], + "setting": { + "speed": { + "channel": 1, + "bytes": 0 + }, + "errorLimit": { + "record": 100 + }, + "restore": { + "maxRowNumForCheckpoint": 0, + "isRestore": false, + "isStream" : false, + "restoreColumnName": "", + "restoreColumnIndex": 0 + }, + "log" : { + "isLogger": false, + "level" : "debug", + "path" : "", + "pattern":"" + } + } + } +} \ No newline at end of file diff --git a/docs/example/stream_clickhouse.json b/docs/example/stream_clickhouse.json new file mode 100644 index 0000000000..77a65aa4f6 --- /dev/null +++ b/docs/example/stream_clickhouse.json @@ -0,0 +1,67 @@ +{ + "job": { + "content": [{ + "reader": { + "parameter": { + "sliceRecordCount": ["100"], + "column": [ + { + "name": "id", + "type": "int" + }, + { + "name": "user_id", + "type": "int" + }, + { + "name" : "name", + "type" : "string" + }, + { + "name" : "eventDate", + "type" : "date" + } + ] + }, + "name": "streamreader" + }, + "writer": { + "name": "clickhousewriter", + "parameter": { + "connection": [{ + "jdbcUrl": "jdbc:clickhouse://localhost:8123/database", + "table": ["test"] + }], + "username": "username", + "password": "password", + "column": ["id","user_id","name","eventDate"], + "writeMode": "insert", + "batchSize": 1024, + "preSql": [], + "postSql": [] + } + } + }], + "setting": { + "speed": { + "channel": 1, + "bytes": 0 + }, + "errorLimit": { + "record": 1 + }, + "restore": { + "maxRowNumForCheckpoint": 0, + "isRestore": false, + "restoreColumnName": "", + "restoreColumnIndex": 0 + }, + "log" : { + "isLogger": false, + "level" : "debug", + "path" : "", + "pattern":"" + } + } + } +} \ No newline at end of file diff --git a/docs/example/stream_db2.json b/docs/example/stream_db2.json new file mode 100644 index 0000000000..17370323c1 --- /dev/null +++ b/docs/example/stream_db2.json @@ -0,0 +1,56 @@ +{ + "job": { + "content": [ + { + "reader": { + "parameter": { + "sliceRecordCount": ["1"], + "column": [ + { + "name" : "id", + "type" : "int", + "value": "2" + },{ + "name" : "name", + "type" : "String" + } + ] + }, + "name": "streamreader" + }, + "writer": { + "name": "db2writer", + "parameter": { + "connection": [{ + "jdbcUrl": "jdbc:db2://localhost:50002/database", + "table": ["TEST"] + }], + "username": "username", + "password": "password", + "column": ["id","name"], + "writeMode": "insert", + "batchSize": 1024, + "preSql": [], + "postSql": [], + "updateKey": {} + } + } + } + ], + "setting": { + "speed": { + "channel": 1, + "bytes": 0 + }, + "errorLimit": { + "record": 100 + }, + "restore": { + "maxRowNumForCheckpoint": 0, + "isRestore": false, + "restoreColumnName": "", + "restoreColumnIndex": 0 + } + } + } +} \ No newline at end of file diff --git a/docs/example/stream_dm.json b/docs/example/stream_dm.json new file mode 100644 index 0000000000..ec97eddb0a --- /dev/null +++ b/docs/example/stream_dm.json @@ -0,0 +1,56 @@ +{ + "job": { + "content": [ + { + "reader": { + "parameter": { + "sliceRecordCount": ["100"], + "column": [ + { + "name": "id", + "type": "int" + }, + { + "name": "age", + "type": "int" + } + ] + }, + "name": "streamreader" + }, + "writer": { + "name": "dmwriter", + "parameter": { + "username": "username", + "password": "password", + "connection": [ + { + "jdbcUrl": "jdbc:dm://localhost:5236", + "table": ["TABLE"] + } + ], + "preSql": [], + "postSql": [], + "writeMode": "insert", + "column": ["ID","AGE"] + } + } + } + ], + "setting": { + "speed": { + "channel": 1, + "bytes": 0 + }, + "errorLimit": { + "record": 100 + }, + "restore": { + "maxRowNumForCheckpoint": 0, + "isRestore": false, + "restoreColumnName": "", + "restoreColumnIndex": 0 + } + } + } +} \ No newline at end of file diff --git a/docs/example/stream_gbase.json b/docs/example/stream_gbase.json new file mode 100644 index 0000000000..76269b24d3 --- /dev/null +++ b/docs/example/stream_gbase.json @@ -0,0 +1,48 @@ +{ + "job": { + "content": [{ + "reader": { + "parameter": { + "sliceRecordCount": ["100"], + "column": [ + { + "name": "id", + "type": "int" + }, + { + "name": "age", + "type": "int" + } + ] + }, + "name": "streamreader" + }, + "writer": { + "name": "gbasewriter", + "parameter": { + "connection": [{ + "jdbcUrl": "jdbc:gbase://0.0.0.1:5258/database", + "table": ["tableTest"] + }], + "username": "username", + "password": "password", + "column": ["id","age"], + "writeMode": "insert", + "batchSize": 1024, + "preSql": [], + "postSql": [], + "updateKey": {} + } + } + }], + "setting": { + "speed": { + "channel": 1, + "bytes": 0 + }, + "errorLimit": { + "record": 100 + } + } + } +} \ No newline at end of file diff --git a/docs/example/stream_greenplum.json b/docs/example/stream_greenplum.json new file mode 100644 index 0000000000..04ee214b7a --- /dev/null +++ b/docs/example/stream_greenplum.json @@ -0,0 +1,45 @@ +{ + "job": { + "content": [{ + "reader": { + "parameter": { + "column": [ + { + "name": "id", + "type": "int", + "value": 1 + } + ], + "sliceRecordCount": ["100"] + }, + "name" : "streamreader" + }, + "writer": { + "name": "greenplumwriter", + "parameter": { + "connection": [{ + "jdbcUrl": "jdbc:pivotal:greenplum://localhost:5432;DatabaseName=database", + "table": ["tbl_pay_log_copy"] + }], + "username": "username", + "password": "password", + "column": ["id"], + "writeMode": "insert", + "insertSqlMode": "copy", + "batchSize": 100, + "preSql": ["TRUNCATE tbl_pay_log_copy"], + "postSql": [] + } + } + }], + "setting": { + "speed": { + "channel": 1, + "bytes": 0 + }, + "errorLimit": { + "record": 100 + } + } + } +} \ No newline at end of file diff --git a/docs/example/stream_hbase.json b/docs/example/stream_hbase.json new file mode 100644 index 0000000000..6ae32d1461 --- /dev/null +++ b/docs/example/stream_hbase.json @@ -0,0 +1,104 @@ +{ + "job" : { + "content" : [ { + "reader": { + "name": "streamreader", + "parameter": { + "column": [ + { + "name": "id", + "type": "id" + }, + { + "name": "string", + "type": "string", + "value": "2020-01-01 01:01:02" + }, + { + "name": "boolean", + "type": "boolean" + }, + { + "name": "long", + "type": "long" + }, + { + "name": "double", + "type": "double" + }, + { + "name": "timestamp", + "type": "timestamp" + } + ], + "sliceRecordCount" : ["100"] + } + }, + "writer": { + "name": "hbasewriter", + "parameter": { + "hbaseConfig": { + "hbase.zookeeper.property.clientPort": "2181", + "hbase.rootdir": "hdfs://ns1/hbase", + "hbase.cluster.distributed": "true", + "hbase.zookeeper.quorum": "node01,node02,node03", + "zookeeper.znode.parent": "/hbase" + }, + "table": "t1", + "rowkeyColumn":"$(f1:col1)", + "versionColumn":{ + "value":"1610509763849" + }, + "column": [ + { + "name": "f1:col1", + "type": "int" + }, + { + "name": "f1:col2", + "type": "string" + }, + { + "name": "f1:col3", + "type": "boolean" + }, + { + "name": "f1:col4", + "type": "long" + }, + { + "name": "f1:col5", + "type": "double" + }, + { + "name": "f1:col6", + "type": "string" + } + ] + } + } + } ], + "setting": { + "speed": { + "channel": 1, + "bytes": 0 + }, + "errorLimit": { + "record": 0 + }, + "restore": { + "maxRowNumForCheckpoint": 0, + "isRestore": false, + "isStream" : false, + "restoreColumnName": "", + "restoreColumnIndex": 0 + }, + "log" : { + "isLogger": false, + "level" : "debug", + "path" : "", + "pattern":"" + } + } + } +} \ No newline at end of file diff --git a/docs/example/stream_kingbase.json b/docs/example/stream_kingbase.json index 5a34e7058b..1de3b32d8f 100644 --- a/docs/example/stream_kingbase.json +++ b/docs/example/stream_kingbase.json @@ -33,19 +33,7 @@ }], "username": "username", "password": "password", - "column": [ - { - "name": "id", - "type": "BIGINT" - }, - { - "name": "user_id", - "type": "BIGINT" - }, - { - "name": "name", - "type": "varchar" - }], + "column": ["id","user_id","name"], "writeMode": "insert", "batchSize": 1024, "preSql": [], @@ -63,4 +51,4 @@ } } } -} \ No newline at end of file +} diff --git a/docs/example/stream_mysql.json b/docs/example/stream_mysql.json new file mode 100644 index 0000000000..1a3f768667 --- /dev/null +++ b/docs/example/stream_mysql.json @@ -0,0 +1,67 @@ +{ + "job": { + "content": [ + { + "reader": { + "name": "streamreader", + "parameter": { + "column": [ + { + "name": "id", + "type": "id" + }, + { + "name": "user_id", + "type": "int" + }, + { + "name": "name", + "type": "string" + } + ], + "sliceRecordCount" : [ "100"] + } + }, + "writer": { + "name": "mysqlwriter", + "parameter": { + "username": "username", + "password": "password", + "connection": [ + { + "jdbcUrl": "jdbc:mysql://0.0.0.1:3306/database?useSSL=false", + "table": ["table"] + } + ], + "preSql": ["truncate table table"], + "postSql": ["update table set user_id = 1;"], + "writeMode": "insert", + "column": ["id","user_id","name"], + "batchSize": 1024 + } + } + } + ], + "setting": { + "speed": { + "channel": 1, + "bytes": 0 + }, + "errorLimit": { + "record": 100 + }, + "restore": { + "maxRowNumForCheckpoint": 0, + "isRestore": false, + "restoreColumnName": "", + "restoreColumnIndex": 0 + }, + "log" : { + "isLogger": false, + "level" : "debug", + "path" : "", + "pattern":"" + } + } + } +} \ No newline at end of file diff --git a/docs/example/stream_oracle.json b/docs/example/stream_oracle.json new file mode 100644 index 0000000000..5f4aa285a0 --- /dev/null +++ b/docs/example/stream_oracle.json @@ -0,0 +1,67 @@ +{ + "job": { + "content": [ + { + "reader": { + "name": "streamreader", + "parameter": { + "column": [ + { + "name": "id", + "type": "id" + }, + { + "name": "user_id", + "type": "int" + }, + { + "name": "name", + "type": "string" + } + ], + "sliceRecordCount" : [ "100"] + } + }, + "writer": { + "name": "oraclewriter", + "parameter": { + "username": "username", + "password": "password", + "connection": [ + { + "jdbcUrl": "jdbc:oracle:thin:@0.0.0.1:1521:oracle", + "table": ["TABLE"] + } + ], + "preSql": ["delete from TABLE"], + "postSql": ["update TABLE set USER_ID = 1"], + "writeMode": "insert", + "column": ["ID","USER_ID","NAME"], + "batchSize": 1024 + } + } + } + ], + "setting": { + "speed": { + "channel": 1, + "bytes": 0 + }, + "errorLimit": { + "record": 1 + }, + "restore": { + "maxRowNumForCheckpoint": 0, + "isRestore": false, + "restoreColumnName": "", + "restoreColumnIndex": 0 + }, + "log" : { + "isLogger": false, + "level" : "debug", + "path" : "", + "pattern":"" + } + } + } +} \ No newline at end of file diff --git a/flinkx-examples/examples/stream_phoenix5.json b/docs/example/stream_phoenix5.json similarity index 70% rename from flinkx-examples/examples/stream_phoenix5.json rename to docs/example/stream_phoenix5.json index 0d547e9cd5..c9b8643f90 100644 --- a/flinkx-examples/examples/stream_phoenix5.json +++ b/docs/example/stream_phoenix5.json @@ -3,47 +3,48 @@ "content": [{ "reader": { "parameter": { - "sliceRecordCount": ["100","100","100"], + "sliceRecordCount": ["1"], "column": [ { "name": "id", - "type": "int" + "type": "int", + "value": "400" }, { - "name": "name", - "type": "varchar" + "name": "user_id", + "type": "int" }, { - "name": "age", - "type": "int" + "name": "name", + "type": "string" } ] }, "name": "streamreader" }, "writer": { - "name": "phoenix5writer", + "name": "phoenixwriter", "parameter": { "connection": [{ - "jdbcUrl": "jdbc:phoenix:172.16.100.68,172.16.100.142,172.16.100.205:2181", + "jdbcUrl": "jdbc:phoenix:node01,node02,node03:2181", "table": [ - "PERSON" + "tableTest" ] }], - "username": "root", - "password": "abc123", + "username": "", + "password": "", "column": [ { "name": "id", - "type": "int" + "type": "BIGINT" }, { - "name": "name", - "type": "varchar" + "name": "user_id", + "type": "BIGINT" }, { - "name": "age", - "type": "int" + "name": "name", + "type": "varchar" }], "writeMode": "insert", "batchSize": 1024, @@ -55,7 +56,7 @@ }], "setting": { "speed": { - "channel": 3, + "channel": 1, "bytes": 0 }, "errorLimit": { diff --git a/docs/example/stream_polardb.json b/docs/example/stream_polardb.json new file mode 100644 index 0000000000..e55e2027c3 --- /dev/null +++ b/docs/example/stream_polardb.json @@ -0,0 +1,67 @@ +{ + "job": { + "content": [ + { + "reader": { + "name": "streamreader", + "parameter": { + "column": [ + { + "name": "id", + "type": "id" + }, + { + "name": "user_id", + "type": "int" + }, + { + "name": "name", + "type": "string" + } + ], + "sliceRecordCount" : [ "100"] + } + }, + "writer": { + "name": "polarwriter", + "parameter": { + "username": "username", + "password": "password", + "connection": [ + { + "jdbcUrl": "jdbc:polardb://0.0.0.1:3306/database", + "table": ["table"] + } + ], + "preSql": ["truncate table table;"], + "postSql": ["update table set user_id = 1;"], + "writeMode": "insert", + "column": ["id","user_id","name"], + "batchSize": 1024 + } + } + } + ], + "setting": { + "speed": { + "channel": 1, + "bytes": 0 + }, + "errorLimit": { + "record": 100 + }, + "restore": { + "maxRowNumForCheckpoint": 0, + "isRestore": false, + "restoreColumnName": "", + "restoreColumnIndex": 0 + }, + "log" : { + "isLogger": false, + "level" : "debug", + "path" : "", + "pattern":"" + } + } + } +} \ No newline at end of file diff --git a/docs/example/stream_postgresql.json b/docs/example/stream_postgresql.json new file mode 100644 index 0000000000..6e609e52bd --- /dev/null +++ b/docs/example/stream_postgresql.json @@ -0,0 +1,63 @@ +{ + "job": { + "content": [{ + "reader": { + "parameter": { + "sliceRecordCount": ["100"], + "column": [ + { + "name": "id", + "type": "int" + }, + { + "name": "user_id", + "type": "int" + }, + { + "name":"name", + "type":"string" + } + ] + }, + "name": "streamreader" + }, + "writer": { + "name": "postgresqlwriter", + "parameter": { + "connection": [{ + "jdbcUrl": "jdbc:postgresql://0.0.0.1:5432/postgres", + "table": ["tableTest"] + }], + "username": "username", + "password": "password", + "column": [ + { + "name": "id", + "type": "BIGINT" + }, + { + "name": "user_id", + "type": "BIGINT" + }, + { + "name": "name", + "type": "varchar" + }], + "writeMode": "insert", + "batchSize": 1024, + "preSql": [], + "postSql": [] + } + } + }], + "setting": { + "speed": { + "channel": 1, + "bytes": 0 + }, + "errorLimit": { + "record": 100 + } + } + } +} \ No newline at end of file diff --git a/docs/example/stream_redis.json b/docs/example/stream_redis.json new file mode 100644 index 0000000000..7c09b00f27 --- /dev/null +++ b/docs/example/stream_redis.json @@ -0,0 +1,65 @@ +{ + "job": { + "content": [ + { + "reader": { + "parameter": { + "column": [ + { + "name": "key1", + "type": "string" + }, + { + "name": "key2", + "type": "string" + }, + { + "name": "key3", + "type": "string" + }, + { + "name": "key4", + "type": "string" + } + ], + "sliceRecordCount": ["100"] + }, + "name": "streamreader" + }, + "writer": { + "parameter": { + "hostPort": "ip:6379", + "type": "string", + "mode": "set", + "keyIndexes": [0,1], + "password": "123456", + "database": 0, + "timeout": 30000 + }, + "name": "rediswriter" + } + } + ], + "setting": { + "restore": { + "maxRowNumForCheckpoint": 0, + "isRestore": false, + "restoreColumnName": "", + "restoreColumnIndex": 0 + }, + "errorLimit": { + "record": 100 + }, + "speed": { + "bytes": 0, + "channel": 1 + }, + "log": { + "isLogger": false, + "level": "debug", + "path": "", + "pattern": "" + } + } + } +} \ No newline at end of file diff --git a/docs/example/stream_restapi.json b/docs/example/stream_restapi.json new file mode 100644 index 0000000000..0e91831e34 --- /dev/null +++ b/docs/example/stream_restapi.json @@ -0,0 +1,59 @@ +{ + "job": { + "content": [ + { + "reader": { + "parameter": { + "column": [ + { + "name": "id", + "type": "id" + }, + { + "name": "data", + "type": "string" + } + ], + "sliceRecordCount": [ + "100" + ] + }, + "name": "streamreader" + }, + "writer": { + "parameter": { + "url": "http://kudu3/server/index.php?g=Web&c=Mock&o=mock&projectID=58&uri=/api/tiezhu/test/get", + "header": [], + "body": [], + "method": "post", + "params": {}, + "column": ["id","data"] + }, + "name": "restapiwriter" + } + } + ], + "setting": { + "restore": { + "maxRowNumForCheckpoint": 0, + "isRestore": false, + "isStream": true, + "restoreColumnName": "", + "restoreColumnIndex": 0 + }, + "errorLimit": { + "record": 100 + }, + "speed": { + "bytes": 0, + "channel": 1 + }, + "log": { + "isLogger": false, + "level": "debug", + "path": "", + "pattern": "" + } + } + } +} \ No newline at end of file diff --git a/docs/example/stream_saphana.json b/docs/example/stream_saphana.json new file mode 100644 index 0000000000..243c4277b4 --- /dev/null +++ b/docs/example/stream_saphana.json @@ -0,0 +1,67 @@ +{ + "job": { + "content": [ + { + "reader" : { + "parameter" : { + "column" : [ { + "name" : "id", + "type" : "id" + }, { + "name" : "CONTEXT", + "type" : "string" + } ], + "sliceRecordCount" : [ "100"] + }, + "name" : "streamreader" + }, + "writer": { + "name": "saphanawriter", + "parameter": { + "connection": [ + { + "jdbcUrl": "jdbc:sap://0.0.0.1:39017", + "table": ["SYS.P_ROLES_"] + } + ], + "username": "username", + "password": "password", + "column": [ + { + "name": "ROLE_ID", + "type": "BIGINT" + }, + { + "name": "CONTEXT", + "type": "NVARCHAR" + } + ], + "writeMode": "insert", + "batchSize": 1024 + } + } + } + ], + "setting": { + "speed": { + "channel": 1, + "bytes": 0 + }, + "errorLimit": { + "record": 1 + }, + "restore": { + "maxRowNumForCheckpoint": 0, + "isRestore": false, + "restoreColumnName": "", + "restoreColumnIndex": 0 + }, + "log" : { + "isLogger": false, + "level" : "debug", + "path" : "", + "pattern":"" + } + } + } +} \ No newline at end of file diff --git a/docs/example/stream_sqlserver.json b/docs/example/stream_sqlserver.json new file mode 100644 index 0000000000..f29512f363 --- /dev/null +++ b/docs/example/stream_sqlserver.json @@ -0,0 +1,64 @@ +{ + "job": { + "content": [{ + "reader": { + "parameter": { + "sliceRecordCount": ["100"], + "column": [ + { + "name": "id", + "type": "int" + }, + { + "name": "user_id", + "type": "int" + }, + { + "name":"name", + "type":"string" + } + ] + }, + "name": "streamreader" + }, + "writer": { + "name": "sqlserverwriter", + "parameter": { + "connection": [{ + "jdbcUrl": "jdbc:jtds:sqlserver://0.0.0.1:1433;DatabaseName=database", + "table": ["tableTest"] + }], + "username": "username", + "password": "password", + "column": [ + { + "name": "id", + "type": "BIGINT" + }, + { + "name": "user_id", + "type": "BIGINT" + }, + { + "name": "name", + "type": "varchar" + }], + "writeMode": "insert", + "batchSize": 1024, + "preSql": [], + "postSql": [], + "updateKey": {} + } + } + }], + "setting": { + "speed": { + "channel": 1, + "bytes": 0 + }, + "errorLimit": { + "record": 100 + } + } + } +} \ No newline at end of file diff --git a/docs/images/LogMiner/LogMiner1.png b/docs/images/LogMiner/LogMiner1.png new file mode 100644 index 0000000000..15b25f0958 Binary files /dev/null and b/docs/images/LogMiner/LogMiner1.png differ diff --git a/docs/images/LogMiner/LogMiner10.png b/docs/images/LogMiner/LogMiner10.png new file mode 100644 index 0000000000..d2c7615ca1 Binary files /dev/null and b/docs/images/LogMiner/LogMiner10.png differ diff --git a/docs/images/LogMiner/LogMiner11.png b/docs/images/LogMiner/LogMiner11.png new file mode 100644 index 0000000000..50ba3ab06f Binary files /dev/null and b/docs/images/LogMiner/LogMiner11.png differ diff --git a/docs/images/LogMiner/LogMiner12.png b/docs/images/LogMiner/LogMiner12.png new file mode 100644 index 0000000000..8faedb9e04 Binary files /dev/null and b/docs/images/LogMiner/LogMiner12.png differ diff --git a/docs/images/LogMiner/LogMiner13.png b/docs/images/LogMiner/LogMiner13.png new file mode 100644 index 0000000000..0dccc2e916 Binary files /dev/null and b/docs/images/LogMiner/LogMiner13.png differ diff --git a/docs/images/LogMiner/LogMiner14.png b/docs/images/LogMiner/LogMiner14.png new file mode 100644 index 0000000000..ff7501cda6 Binary files /dev/null and b/docs/images/LogMiner/LogMiner14.png differ diff --git a/docs/images/LogMiner/LogMiner15.png b/docs/images/LogMiner/LogMiner15.png new file mode 100644 index 0000000000..d2c7615ca1 Binary files /dev/null and b/docs/images/LogMiner/LogMiner15.png differ diff --git a/docs/images/LogMiner/LogMiner16.png b/docs/images/LogMiner/LogMiner16.png new file mode 100644 index 0000000000..2026ed18ae Binary files /dev/null and b/docs/images/LogMiner/LogMiner16.png differ diff --git a/docs/images/LogMiner/LogMiner17.png b/docs/images/LogMiner/LogMiner17.png new file mode 100644 index 0000000000..388b31e891 Binary files /dev/null and b/docs/images/LogMiner/LogMiner17.png differ diff --git a/docs/images/LogMiner/LogMiner18.png b/docs/images/LogMiner/LogMiner18.png new file mode 100644 index 0000000000..0dccc2e916 Binary files /dev/null and b/docs/images/LogMiner/LogMiner18.png differ diff --git a/docs/images/LogMiner/LogMiner19.png b/docs/images/LogMiner/LogMiner19.png new file mode 100644 index 0000000000..bd9003e0ef Binary files /dev/null and b/docs/images/LogMiner/LogMiner19.png differ diff --git a/docs/images/LogMiner/LogMiner2.png b/docs/images/LogMiner/LogMiner2.png new file mode 100644 index 0000000000..253f4bcc3e Binary files /dev/null and b/docs/images/LogMiner/LogMiner2.png differ diff --git a/docs/images/LogMiner/LogMiner20.png b/docs/images/LogMiner/LogMiner20.png new file mode 100644 index 0000000000..7a5f8ecab0 Binary files /dev/null and b/docs/images/LogMiner/LogMiner20.png differ diff --git a/docs/images/LogMiner/LogMiner21.png b/docs/images/LogMiner/LogMiner21.png new file mode 100644 index 0000000000..e394b1cac0 Binary files /dev/null and b/docs/images/LogMiner/LogMiner21.png differ diff --git a/docs/images/LogMiner/LogMiner22.png b/docs/images/LogMiner/LogMiner22.png new file mode 100644 index 0000000000..6ef4a82e0d Binary files /dev/null and b/docs/images/LogMiner/LogMiner22.png differ diff --git a/docs/images/LogMiner/LogMiner23.png b/docs/images/LogMiner/LogMiner23.png new file mode 100644 index 0000000000..5fe403e37e Binary files /dev/null and b/docs/images/LogMiner/LogMiner23.png differ diff --git a/docs/images/LogMiner/LogMiner3.png b/docs/images/LogMiner/LogMiner3.png new file mode 100644 index 0000000000..aa7457c602 Binary files /dev/null and b/docs/images/LogMiner/LogMiner3.png differ diff --git a/docs/images/LogMiner/LogMiner4.png b/docs/images/LogMiner/LogMiner4.png new file mode 100644 index 0000000000..999a89e861 Binary files /dev/null and b/docs/images/LogMiner/LogMiner4.png differ diff --git a/docs/images/LogMiner/LogMiner5.png b/docs/images/LogMiner/LogMiner5.png new file mode 100644 index 0000000000..b935758e20 Binary files /dev/null and b/docs/images/LogMiner/LogMiner5.png differ diff --git a/docs/images/LogMiner/LogMiner6.png b/docs/images/LogMiner/LogMiner6.png new file mode 100644 index 0000000000..053b4fb3a8 Binary files /dev/null and b/docs/images/LogMiner/LogMiner6.png differ diff --git a/docs/images/LogMiner/LogMiner7.png b/docs/images/LogMiner/LogMiner7.png new file mode 100644 index 0000000000..6be444e437 Binary files /dev/null and b/docs/images/LogMiner/LogMiner7.png differ diff --git a/docs/images/LogMiner/LogMiner8.png b/docs/images/LogMiner/LogMiner8.png new file mode 100644 index 0000000000..1cf9b8b6a8 Binary files /dev/null and b/docs/images/LogMiner/LogMiner8.png differ diff --git a/docs/images/LogMiner/LogMiner9.png b/docs/images/LogMiner/LogMiner9.png new file mode 100644 index 0000000000..ff7501cda6 Binary files /dev/null and b/docs/images/LogMiner/LogMiner9.png differ diff --git a/docs/images/SqlserverCDC/Sqlserver1.png b/docs/images/SqlserverCDC/Sqlserver1.png new file mode 100644 index 0000000000..8425562bf3 Binary files /dev/null and b/docs/images/SqlserverCDC/Sqlserver1.png differ diff --git a/docs/images/SqlserverCDC/Sqlserver10.png b/docs/images/SqlserverCDC/Sqlserver10.png new file mode 100644 index 0000000000..74626d1a80 Binary files /dev/null and b/docs/images/SqlserverCDC/Sqlserver10.png differ diff --git a/docs/images/SqlserverCDC/Sqlserver11.png b/docs/images/SqlserverCDC/Sqlserver11.png new file mode 100644 index 0000000000..13369a50ef Binary files /dev/null and b/docs/images/SqlserverCDC/Sqlserver11.png differ diff --git a/docs/images/SqlserverCDC/Sqlserver12.png b/docs/images/SqlserverCDC/Sqlserver12.png new file mode 100644 index 0000000000..1deb5c1060 Binary files /dev/null and b/docs/images/SqlserverCDC/Sqlserver12.png differ diff --git a/docs/images/SqlserverCDC/Sqlserver13.png b/docs/images/SqlserverCDC/Sqlserver13.png new file mode 100644 index 0000000000..d2235e4f40 Binary files /dev/null and b/docs/images/SqlserverCDC/Sqlserver13.png differ diff --git a/docs/images/SqlserverCDC/Sqlserver14.png b/docs/images/SqlserverCDC/Sqlserver14.png new file mode 100644 index 0000000000..f2d7c4cca6 Binary files /dev/null and b/docs/images/SqlserverCDC/Sqlserver14.png differ diff --git a/docs/images/SqlserverCDC/Sqlserver15.png b/docs/images/SqlserverCDC/Sqlserver15.png new file mode 100644 index 0000000000..3345aaadda Binary files /dev/null and b/docs/images/SqlserverCDC/Sqlserver15.png differ diff --git a/docs/images/SqlserverCDC/Sqlserver16.png b/docs/images/SqlserverCDC/Sqlserver16.png new file mode 100644 index 0000000000..bdd1ab0603 Binary files /dev/null and b/docs/images/SqlserverCDC/Sqlserver16.png differ diff --git a/docs/images/SqlserverCDC/Sqlserver17.png b/docs/images/SqlserverCDC/Sqlserver17.png new file mode 100644 index 0000000000..558a3bef5a Binary files /dev/null and b/docs/images/SqlserverCDC/Sqlserver17.png differ diff --git a/docs/images/SqlserverCDC/Sqlserver18.png b/docs/images/SqlserverCDC/Sqlserver18.png new file mode 100644 index 0000000000..5be43b703a Binary files /dev/null and b/docs/images/SqlserverCDC/Sqlserver18.png differ diff --git a/docs/images/SqlserverCDC/Sqlserver19.png b/docs/images/SqlserverCDC/Sqlserver19.png new file mode 100644 index 0000000000..6872934102 Binary files /dev/null and b/docs/images/SqlserverCDC/Sqlserver19.png differ diff --git a/docs/images/SqlserverCDC/Sqlserver2.png b/docs/images/SqlserverCDC/Sqlserver2.png new file mode 100644 index 0000000000..c0c0e8acf3 Binary files /dev/null and b/docs/images/SqlserverCDC/Sqlserver2.png differ diff --git a/docs/images/SqlserverCDC/Sqlserver3.png b/docs/images/SqlserverCDC/Sqlserver3.png new file mode 100644 index 0000000000..b6cbce625c Binary files /dev/null and b/docs/images/SqlserverCDC/Sqlserver3.png differ diff --git a/docs/images/SqlserverCDC/Sqlserver4.png b/docs/images/SqlserverCDC/Sqlserver4.png new file mode 100644 index 0000000000..7f863458c3 Binary files /dev/null and b/docs/images/SqlserverCDC/Sqlserver4.png differ diff --git a/docs/images/SqlserverCDC/Sqlserver5.png b/docs/images/SqlserverCDC/Sqlserver5.png new file mode 100644 index 0000000000..22b8b4e3b2 Binary files /dev/null and b/docs/images/SqlserverCDC/Sqlserver5.png differ diff --git a/docs/images/SqlserverCDC/Sqlserver6.png b/docs/images/SqlserverCDC/Sqlserver6.png new file mode 100644 index 0000000000..58f55cae82 Binary files /dev/null and b/docs/images/SqlserverCDC/Sqlserver6.png differ diff --git a/docs/images/SqlserverCDC/Sqlserver7.png b/docs/images/SqlserverCDC/Sqlserver7.png new file mode 100644 index 0000000000..d0d0b1bb08 Binary files /dev/null and b/docs/images/SqlserverCDC/Sqlserver7.png differ diff --git a/docs/images/SqlserverCDC/Sqlserver8.png b/docs/images/SqlserverCDC/Sqlserver8.png new file mode 100644 index 0000000000..2d790363a8 Binary files /dev/null and b/docs/images/SqlserverCDC/Sqlserver8.png differ diff --git a/docs/images/SqlserverCDC/Sqlserver9.png b/docs/images/SqlserverCDC/Sqlserver9.png new file mode 100644 index 0000000000..44e491ca8b Binary files /dev/null and b/docs/images/SqlserverCDC/Sqlserver9.png differ diff --git a/docs/images/ding.jpg b/docs/images/ding.jpg index 605928bae9..d069044f6e 100644 Binary files a/docs/images/ding.jpg and b/docs/images/ding.jpg differ diff --git a/docs/images/quick_1.png b/docs/images/quick_1.png index 6b1f4450c7..71f00c6a90 100644 Binary files a/docs/images/quick_1.png and b/docs/images/quick_1.png differ diff --git a/docs/images/quick_2.png b/docs/images/quick_2.png index 75848ca09f..936158d723 100644 Binary files a/docs/images/quick_2.png and b/docs/images/quick_2.png differ diff --git a/docs/images/quick_3.png b/docs/images/quick_3.png index 2345e5cd37..c065db1b4a 100644 Binary files a/docs/images/quick_3.png and b/docs/images/quick_3.png differ diff --git a/docs/images/quick_4.png b/docs/images/quick_4.png index e4159e74bc..702d3e2b05 100644 Binary files a/docs/images/quick_4.png and b/docs/images/quick_4.png differ diff --git a/docs/images/quick_5.png b/docs/images/quick_5.png index bf7746440c..18f155815a 100644 Binary files a/docs/images/quick_5.png and b/docs/images/quick_5.png differ diff --git a/docs/images/quick_6.png b/docs/images/quick_6.png index 56854b4219..e414267238 100644 Binary files a/docs/images/quick_6.png and b/docs/images/quick_6.png differ diff --git a/docs/images/quick_7.png b/docs/images/quick_7.png index 24e6c72ffd..f2b611bd35 100644 Binary files a/docs/images/quick_7.png and b/docs/images/quick_7.png differ diff --git a/docs/images/quick_8.png b/docs/images/quick_8.png index 6fcd58b974..3817f572e6 100644 Binary files a/docs/images/quick_8.png and b/docs/images/quick_8.png differ diff --git a/docs/offline/reader/carbondatareader.md b/docs/offline/reader/carbondatareader.md index 7c3ab76cc7..d8a1d8f34b 100644 --- a/docs/offline/reader/carbondatareader.md +++ b/docs/offline/reader/carbondatareader.md @@ -2,7 +2,7 @@ ## 一、插件名称 -名称:**carbondatareader**
** +名称:**carbondatareader**
## 二、支持的数据源版本 **Carbondata 1.5及以上**
@@ -13,30 +13,34 @@ - **path** - 描述:carbondata表的存储路径 - 必选:是 + - 字段类型:String - 默认值:无 - +
- **table** - 描述:carbondata表名 - 必选:否 + - 字段类型:String - 默认值:无 - +
- **database** - 描述:carbondata库名 - 必选:否 + - 字段类型:String - 默认值:无 - +
- **filter** - 描述:简单过滤器,目前只支持单条件的简单过滤,形式为 col op value,col为列名;op为关系运算符,包括=,>,>=,<,<=; value为字面值,如1234, "ssss" - 必选:否 + - 字段类型:String - 默认值:无 - +
- **column** @@ -56,22 +60,25 @@ value为字面值,如1234, "ssss" - type:字段类型,可以和数据库里的字段类型不一样,程序会做一次类型转换 - value:如果数据库里不存在指定的字段,则会把value的值作为常量列返回,如果指定的字段存在,当指定字段的值为null时,会以此value值作为默认值返回 - 必选:是 + - 字段类型:List - 默认值:无 - +
- **hadoopConfig** - 描述:集群HA模式时需要填写的namespace配置及其它配置 - 必选:是 + - 字段类型:Map - 默认值:无 - +
- **defaultFS** - 描述:Hadoop hdfs文件系统namenode节点地址。 - 必选:是 + - 字段类型:String - 默认值:无 - +
diff --git a/docs/offline/reader/cassandrareader.md b/docs/offline/reader/cassandrareader.md index 4291d6f435..0881796853 100644 --- a/docs/offline/reader/cassandrareader.md +++ b/docs/offline/reader/cassandrareader.md @@ -15,91 +15,103 @@ - 描述:数据库地址 - 必选:是 - 默认值:无 - + - 字段类型:String +
- **port** - 描述:端口 - 必选:否 - 默认值:9042 - + - 字段类型:Integer +
- **username** - 描述:用户名 - 必选:否 - 默认值:无 - - + - 字段类型:String +
- **password** - 描述:密码 - 必选:否 - 默认值:无 - + - 字段类型:String +
- **useSSL** - 描述:数字证书 - 必选:否 - 默认值:false - + - 字段类型:boolean +
- **column** - 描述:查询结果中被select出来的属性集合,为空则select * - 必选:否 - 默认值:无 - + - 字段类型:List +
- **keyspace** - 描述:需要同步的表所在的keyspace - 必选:是 - 默认值:无 - + - 字段类型:String +
- **table** - 描述:要查询的表 - 必选:是 - 默认值:无 - + - 字段类型:String +
- **where** - 描述:过滤条件where之后的表达式 - 必选:否 - 默认值:无 - + - 字段类型:String +
- **allowFiltering** - 描述:是否在服务端过滤数据 - 必选:否 - 默认值:false - + - 字段类型:boolean +
- **connecttionsPerHost** - 描述:分配给每个host的连接数 - 必选:否 - 默认值:8 - + - 字段类型:Integer +
- **maxPendingPerConnection** - 描述:最多能建立的连接数 - 必选:否 - 默认值:128 - + - 字段类型:Integer +
- **consistancyLevel** - 描述:数据一致性级别。可选`ONE`、`QUORUM`、`LOCAL_QUORUM`、`EACH_QUORUM`、`ALL`、`ANY`、`TWO`、`THREE`、`LOCAL_ONE` - 必选:否 - 默认值:无 - + - 字段类型:String +
diff --git a/docs/offline/reader/clickhousereader.md b/docs/offline/reader/clickhousereader.md index 67218c00dd..fadd2fa69c 100644 --- a/docs/offline/reader/clickhousereader.md +++ b/docs/offline/reader/clickhousereader.md @@ -9,62 +9,101 @@ ## 三、参数说明 +- **connection** + - 描述:数据库连接参数,包含jdbcUrl、schema、table等参数 + - 必选:是 + - 字段类型:List + - 示例:指定jdbcUrl、schema、table + ```json + "connection": [{ + "jdbcUrl": ["jdbc:clickhouse://localhost:8123/database"], + "table": ["table"], + "schema":"public" + }] + ``` + - 默认值:无 + +
+ - **jdbcUrl** - 描述:针对关系型数据库的jdbc连接字符串 -
jdbcUrl参考文档:[clickhouse-jdbc官方文档](https://github.com/ClickHouse/clickhouse-jdbc) +
jdbcUrl参考文档:[clickhouse-jdbc官方文档](https://github.com/ClickHouse/clickhouse-jdbc) - 必选:是 + - 字段类型:List + - 默认值:无 + +
+ + - **schema** + - 描述:数据库schema名 + - 必选:否 + - 字段类型:String + - 默认值:无 + +
+ +- **table** + - 描述:目的表的表名称。目前只支持配置单个表,后续会支持多表 + - 必选:是 + - 字段类型:List - 默认值:无 - +
- **username** - 描述:数据源的用户名 - 必选:是 + - 字段类型:String - 默认值:无 - +
- **password** - 描述:数据源指定用户名的密码 - 必选:是 + - 字段类型:String - 默认值:无 +
+ +- **fetchSize** + - 描述:一次性从数据库中读取多少条数据,jdbc默认一次将所有结果都读取到内存中,在数据量很大时可能会造成OOM,设置这个参数可以控制每次读取fetchSize条数据。 + - 注意:此参数的值不可设置过大,否则会读取超时,导致任务失败。 + - 必选:否 + - 字段类型:int + - 默认值:1000 +
- **where** - 描述:筛选条件,reader插件根据指定的column、table、where条件拼接SQL,并根据这个SQL进行数据抽取。在实际业务场景中,往往会选择当天的数据进行同步,可以将where条件指定为gmt_create > time。 - 注意:不可以将where条件指定为limit 10,limit不是SQL的合法where子句。 - 必选:否 + - 字段类型:String - 默认值:无 - +
- **splitPk** - 描述:当speed配置中的channel大于1时指定此参数,Reader插件根据并发数和此参数指定的字段拼接sql,使每个并发读取不同的数据,提升读取速率。 - 注意: - 推荐splitPk使用表主键,因为表主键通常情况下比较均匀,因此切分出来的分片也不容易出现数据热点。 - - 目前splitPk仅支持整形数据切分,`不支持浮点、字符串、日期等其他类型`。如果用户指定其他非支持类型,FlinkX将报错! + - 目前splitPk仅支持整形数据切分,不支持浮点、字符串、日期等其他类型。如果用户指定其他非支持类型,flinkx将报错! - 如果channel大于1但是没有配置此参数,任务将置为失败。 - 必选:否 + - 字段类型:String - 默认值:无 - - -- **fetchSize** - - 描述:读取时每批次读取的数据条数。 - - 注意:此参数的值不可设置过大,否则会读取超时,导致任务失败。 - - 必选:否 - - 默认值:1000 - - +
- **queryTimeOut** - 描述:查询超时时间,单位秒。 - 注意:当数据量很大,或者从视图查询,或者自定义sql查询时,可通过此参数指定超时时间。 - 必选:否 + - 字段类型:int - 默认值:1000 - +
- **customSql** - 描述:自定义的查询语句,如果只指定字段不能满足需求时,可通过此参数指定查询的sql,可以是任意复杂的查询语句。 @@ -74,14 +113,16 @@ - 当指定了此参数时,connection里指定的table无效; - 当指定此参数时,column必须指定具体字段信息,不能以*号代替; - 必选:否 + - 字段类型:String - 默认值:无 - +
- **column** - 描述:需要读取的字段。 - 格式:支持3种格式 -
1.读取全部字段,如果字段数量很多,可以使用下面的写法: +
1.读取全部字段,如果字段数量很多,可以使用下面的写法: + ```bash "column":["*"] ``` @@ -99,35 +140,63 @@ }] ``` - - 属性说明: - - name:字段名称 - - type:字段类型,可以和数据库里的字段类型不一样,程序会做一次类型转换 - - format:如果字段是时间字符串,可以指定时间的格式,将字段类型转为日期格式返回 - - value:如果数据库里不存在指定的字段,则会报错。如果指定的字段存在,当指定字段的值为null时,会以此value值作为默认值返回 - - 必选:是 - - 默认值:无 - + - 属性说明: + - name:字段名称 + - type:字段类型,可以和数据库里的字段类型不一样,程序会做一次类型转换 + - format:如果字段是时间字符串,可以指定时间的格式,将字段类型转为日期格式返回 + - value:如果数据库里不存在指定的字段,则会报错。如果指定的字段存在,当指定字段的值为null时,会以此value值作为默认值返回 + - 必选:是 + - 字段类型:List + - 默认值:无 +
- **polling** - 描述:是否开启间隔轮询,开启后会根据`pollingInterval`轮询间隔时间周期性的从数据库拉取数据。开启间隔轮询还需配置参数`pollingInterval`,`increColumn`,可以选择配置参数`startLocation`。若不配置参数`startLocation`,任务启动时将会从数据库中查询增量字段最大值作为轮询的开始位置。 - 必选:否 + - 字段类型:Boolean - 默认值:false + +
+- **increColumn** + - 增量字段,可以是对应的增量字段名,也可以是纯数字,表示增量字段在column中的顺序位置(从0开始) + - 必选:否 + - 字段类型:String或int + - 默认值:无 +
- **pollingInterval** - 描述:轮询间隔时间,从数据库中拉取数据的间隔时间,默认为5000毫秒。 - 必选:否 + - 字段类型:long - 默认值:5000 +
+ +- **startLocation** + - 描述:增量查询起始位置 + - 必选:否 + - 字段类型:String + - 默认值:无 + +
+ +- **useMaxFunc** + - 描述:用于标记是否保存endLocation位置的一条或多条数据,true:不保存,false(默认):保存, 某些情况下可能出现最后几条数据被重复记录的情况,可以将此参数配置为true + - 必选:否 + - 字段类型:Boolean + - 默认值:false +
- **requestAccumulatorInterval** - 描述:发送查询累加器请求的间隔时间。 - 必选:否 + - 字段类型:int - 默认值:2 - + ## 四、配置示例 #### 1、基础配置 @@ -139,21 +208,18 @@ "parameter" : { "column" : [ { "name" : "id", - "type" : "bigint", - "key" : "id" + "type" : "bigint" }, { "name" : "user_id", - "type" : "bigint", - "key" : "user_id" + "type" : "bigint" }, { "name" : "name", - "type" : "varchar", - "key" : "name" + "type" : "varchar" } ], "username" : "username", "password" : "password", "connection" : [ { - "jdbcUrl" : [ "jdbc:clickhouse://0.0.0.1:8123/dtstack" ], + "jdbcUrl" : [ "jdbc:clickhouse://0.0.0.1:8123/database" ], "table" : [ "tableTest" ] } ], "where": "id > 1", @@ -194,21 +260,18 @@ "parameter" : { "column" : [ { "name" : "id", - "type" : "bigint", - "key" : "id" + "type" : "bigint" }, { "name" : "user_id", - "type" : "bigint", - "key" : "user_id" + "type" : "bigint" }, { "name" : "name", - "type" : "varchar", - "key" : "name" + "type" : "varchar" } ], "username" : "username", "password" : "password", "connection" : [ { - "jdbcUrl" : [ "jdbc:clickhouse://0.0.0.1:8123/dtstack" ], + "jdbcUrl" : [ "jdbc:clickhouse://0.0.0.1:8123/database" ], "table" : [ "tableTest" ] } ], "where": "id > 1", @@ -249,21 +312,18 @@ "parameter" : { "column" : [ { "name" : "id", - "type" : "bigint", - "key" : "id" + "type" : "bigint" }, { "name" : "user_id", - "type" : "bigint", - "key" : "user_id" + "type" : "bigint" }, { "name" : "name", - "type" : "varchar", - "key" : "name" + "type" : "varchar" } ], "username" : "username", "password" : "password", "connection" : [ { - "jdbcUrl" : [ "jdbc:clickhouse://0.0.0.1:8123/dtstack" ], + "jdbcUrl" : [ "jdbc:clickhouse://0.0.0.1:8123/database" ], "table" : [ "tableTest" ] } ], "where": "id > 1", @@ -304,21 +364,18 @@ "parameter" : { "column" : [ { "name" : "id", - "type" : "bigint", - "key" : "id" + "type" : "bigint" }, { "name" : "user_id", - "type" : "bigint", - "key" : "user_id" + "type" : "bigint" }, { "name" : "name", - "type" : "varchar", - "key" : "name" + "type" : "varchar" } ], "username" : "username", "password" : "password", "connection" : [ { - "jdbcUrl" : [ "jdbc:clickhouse://0.0.0.1:8123/dtstack" ], + "jdbcUrl" : [ "jdbc:clickhouse://0.0.0.1:8123/database" ], "table" : [ "tableTest" ] } ], "where": "id > 1", @@ -361,21 +418,18 @@ "parameter" : { "column" : [ { "name" : "id", - "type" : "bigint", - "key" : "id" + "type" : "bigint" }, { "name" : "user_id", - "type" : "bigint", - "key" : "user_id" + "type" : "bigint" }, { "name" : "name", - "type" : "varchar", - "key" : "name" + "type" : "varchar" } ], "username" : "username", "password" : "password", "connection" : [ { - "jdbcUrl" : [ "jdbc:clickhouse://0.0.0.1:8123/dtstack" ], + "jdbcUrl" : [ "jdbc:clickhouse://0.0.0.1:8123/database" ], "table" : [ "tableTest" ] } ], "where": "id > 1", diff --git a/docs/offline/reader/db2reader.md b/docs/offline/reader/db2reader.md index d04953cf60..dcd9c69b9a 100644 --- a/docs/offline/reader/db2reader.md +++ b/docs/offline/reader/db2reader.md @@ -10,54 +10,101 @@ ## 三、参数说明 +- **connection** + - 描述:数据库连接参数,包含jdbcUrl、schema、table等参数 + - 必选:是 + - 字段类型:List + - 示例:指定jdbcUrl、schema、table + ```json + "connection": [{ + "jdbcUrl": ["jdbc:db2://localhost:50000/database"], + "table": ["table"], + "schema":"public" + }] + ``` + - 默认值:无 + +
+ - **jdbcUrl** - 描述:针对关系型数据库的jdbc连接字符串
jdbcUrl参考文档:[db2官方文档](https://www.ibm.com/support/knowledgecenter/en/SSEPGG_11.1.0/com.ibm.db2.luw.apdv.java.doc/src/tpc/imjcc_rjv00004.htmlId=t14:12:14) - 必选:是 + - 字段类型:List - 默认值:无 + +
+- **schema** + - 描述:数据库schema名 + - 必选:否 + - 字段类型:String + - 默认值:无 +
+- **table** + - 描述:目的表的表名称。目前只支持配置单个表,后续会支持多表 + - 必选:是 + - 字段类型:List + - 默认值:无 + +
+ - **username** - 描述:数据源的用户名 - 必选:是 + - 字段类型:String - 默认值:无 - +
- **password** - 描述:数据源指定用户名的密码 - 必选:是 + - 字段类型:String - 默认值:无 +
+ +- **fetchSize** + - 描述:一次性从数据库中读取多少条数据,jdbc默认一次将所有结果都读取到内存中,在数据量很大时可能会造成OOM,设置这个参数可以控制每次读取fetchSize条数据。 + - 注意:此参数的值不可设置过大,否则会读取超时,导致任务失败。 + - 必选:否 + - 字段类型:int + - 默认值:1000 +
- **where** - 描述:筛选条件,reader插件根据指定的column、table、where条件拼接SQL,并根据这个SQL进行数据抽取。在实际业务场景中,往往会选择当天的数据进行同步,可以将where条件指定为gmt_create > time。 - 注意:不可以将where条件指定为limit 10,limit不是SQL的合法where子句。 - 必选:否 + - 字段类型:String - 默认值:无 - +
- **splitPk** - 描述:当speed配置中的channel大于1时指定此参数,Reader插件根据并发数和此参数指定的字段拼接sql,使每个并发读取不同的数据,提升读取速率。 - 注意: - 推荐splitPk使用表主键,因为表主键通常情况下比较均匀,因此切分出来的分片也不容易出现数据热点。 - - 目前splitPk仅支持整形数据切分,`不支持浮点、字符串、日期等其他类型`。如果用户指定其他非支持类型,FlinkX将报错! + - 目前splitPk仅支持整形数据切分,不支持浮点、字符串、日期等其他类型。如果用户指定其他非支持类型,flinkx将报错! - 如果channel大于1但是没有配置此参数,任务将置为失败。 - 必选:否 + - 字段类型:String - 默认值:空 - +
- **queryTimeOut** - 描述:查询超时时间,单位秒。 - 注意:当数据量很大,或者从视图查询,或者自定义sql查询时,可通过此参数指定超时时间。 - 必选:否 + - 字段类型:int - 默认值:1000 - +
- **customSql** - 描述:自定义的查询语句,如果只指定字段不能满足需求时,可通过此参数指定查询的sql,可以是任意复杂的查询语句。 @@ -67,9 +114,10 @@ - 当指定了此参数时,connection里指定的table无效; - 当指定此参数时,column必须指定具体字段信息,不能以*号代替; - 必选:否 + - 字段类型:String - 默认值:空 - +
- **column** - 描述:需要读取的字段。 @@ -98,27 +146,55 @@ - format:如果字段是时间字符串,可以指定时间的格式,将字段类型转为日期格式返回 - value:如果数据库里不存在指定的字段,则会把value的值作为常量列返回,如果指定的字段存在,当指定字段的值为null时,会以此value值作为默认值返回 - 必选:是 + - 字段类型:List - 默认值:无 - +
- **polling** - 描述:是否开启间隔轮询,开启后会根据`pollingInterval`轮询间隔时间周期性的从数据库拉取数据。开启间隔轮询还需配置参数`pollingInterval`,`increColumn`,可以选择配置参数`startLocation`。若不配置参数`startLocation`,任务启动时将会从数据库中查询增量字段最大值作为轮询的开始位置。 - 必选:否 + - 字段类型:Boolean - 默认值:false +
- +- **increColumn** + - 增量字段,可以是对应的增量字段名,也可以是纯数字,表示增量字段在column中的顺序位置(从0开始) + - 必选:否 + - 字段类型:String或int + - 默认值:无 + +
+ - **pollingInterval** - 描述:轮询间隔时间,从数据库中拉取数据的间隔时间,默认为5000毫秒。 - 必选:否 + - 字段类型:long - 默认值:5000 +
+ +- **startLocation** + - 描述:增量查询起始位置 + - 必选:否 + - 字段类型:String + - 默认值:无 + +
+ +- **useMaxFunc** + - 描述:用于标记是否保存endLocation位置的一条或多条数据,true:不保存,false(默认):保存, 某些情况下可能出现最后几条数据被重复记录的情况,可以将此参数配置为true + - 必选:否 + - 字段类型:Boolean + - 默认值:false +
- **requestAccumulatorInterval** - 描述:发送查询累加器请求的间隔时间。 - 必选:否 + - 字段类型:int - 默认值:2 ** @@ -134,18 +210,16 @@ "parameter" : { "column" : [ { "name" : "id", - "type" : "bigint", - "key" : "id" + "type" : "bigint" },{ "name" : "name", - "type" : "varchar", - "key" : "name" + "type" : "varchar" } ], - "username" : "user", + "username" : "username", "password" : "password", "connection" : [ { - "jdbcUrl" : [ "jdbc:db2://localhost:50000/sample" ], - "table" : [ "staff" ] + "jdbcUrl" : [ "jdbc:db2://localhost:50000/database" ], + "table" : [ "table" ] } ], "where": "id > 1", "splitPk": "id", @@ -192,18 +266,16 @@ "parameter" : { "column" : [ { "name" : "id", - "type" : "bigint", - "key" : "id" + "type" : "bigint" },{ "name" : "name", - "type" : "varchar", - "key" : "name" + "type" : "varchar" } ], - "username" : "user", + "username" : "username", "password" : "password", "connection" : [ { - "jdbcUrl" : [ "jdbc:db2://localhost:50000/sample" ], - "table" : [ "staff" ] + "jdbcUrl" : [ "jdbc:db2://localhost:50000/database" ], + "table" : [ "table" ] } ], "where": "id > 1", "splitPk": "id", @@ -250,24 +322,22 @@ "parameter" : { "column" : [ { "name" : "id", - "type" : "bigint", - "key" : "id" + "type" : "bigint" },{ "name" : "name", - "type" : "varchar", - "key" : "name" + "type" : "varchar" } ], - "username" : "user", + "username" : "username", "password" : "password", "connection" : [ { - "jdbcUrl" : [ "jdbc:db2://localhost:50000/sample" ], - "table" : [ "staff" ] + "jdbcUrl" : [ "jdbc:db2://localhost:50000/database" ], + "table" : [ "table" ] } ], "where": "id > 1", "splitPk": "id", "fetchSize": 1000, "queryTimeOut": 1000, - "customSql":"select id, name from staff where id > 300", + "customSql":"select id, name from table where id > 300", "requestAccumulatorInterval": 2 }, "name" : "db2reader" @@ -308,18 +378,16 @@ "parameter" : { "column" : [ { "name" : "id", - "type" : "bigint", - "key" : "id" + "type" : "bigint" },{ "name" : "name", - "type" : "varchar", - "key" : "name" + "type" : "varchar" } ], - "username" : "user", + "username" : "username", "password" : "password", "connection" : [ { - "jdbcUrl" : [ "jdbc:db2://localhost:50000/sample" ], - "table" : [ "staff" ] + "jdbcUrl" : [ "jdbc:db2://localhost:50000/database" ], + "table" : [ "table" ] } ], "where": "id > 1", "splitPk": "id", @@ -368,18 +436,16 @@ "parameter" : { "column" : [ { "name" : "id", - "type" : "bigint", - "key" : "id" + "type" : "bigint" },{ "name" : "name", - "type" : "varchar", - "key" : "name" + "type" : "varchar" } ], - "username" : "user", + "username" : "username", "password" : "password", "connection" : [ { - "jdbcUrl" : [ "jdbc:db2://localhost:50000/sample" ], - "table" : [ "staff" ] + "jdbcUrl" : [ "jdbc:db2://localhost:50000/database" ], + "table" : [ "table" ] } ], "where": "id > 1", "splitPk": "id", diff --git a/docs/offline/reader/dmreader.md b/docs/offline/reader/dmreader.md index 79639771f2..6220b4ce0b 100644 --- a/docs/offline/reader/dmreader.md +++ b/docs/offline/reader/dmreader.md @@ -10,35 +10,80 @@ ## 三、参数说明 +- **connection** + - 描述:数据库连接参数,包含jdbcUrl、schema、table等参数 + - 必选:是 + - 字段类型:List + - 示例:指定jdbcUrl、schema、table + ```json + "connection": [{ + "jdbcUrl": ["jdbc:dm://localhost:5236"], + "table": ["table"], + "schema":"public" + }] + ``` + - 默认值:无 + +
+ - **jdbcUrl** - 描述:针对关系型数据库的jdbc连接字符串 -
jdbcUrl参考文档:[达梦官方文档](http://www.dameng.com/down.aspx?TypeId=12&FId=t14:12:14) +
jdbcUrl参考文档:[达梦官方文档](http://www.dameng.com/down.aspx?TypeId=12&FId=t14:12:14) - 必选:是 + - 字段类型:List + - 默认值:无 + +
+ +- **schema** + - 描述:数据库schema名 + - 必选:否 + - 字段类型:String - 默认值:无 +
+ +- **table** + - 描述:目的表的表名称。目前只支持配置单个表,后续会支持多表 + - 必选:是 + - 字段类型:List + - 默认值:无 +
- **username** - 描述:数据源的用户名 - 必选:是 + - 字段类型:String - 默认值:无 - +
- **password** - 描述:数据源指定用户名的密码 - 必选:是 + - 字段类型:String - 默认值:无 +
+ +- **fetchSize** + - 描述:一次性从数据库中读取多少条数据,jdbc默认一次将所有结果都读取到内存中,在数据量很大时可能会造成OOM,设置这个参数可以控制每次读取fetchSize条数据。 + - 注意:此参数的值不可设置过大,否则会读取超时,导致任务失败。 + - 必选:否 + - 字段类型:int + - 默认值:1000 +
- **where** - 描述:筛选条件,reader插件根据指定的column、table、where条件拼接SQL,并根据这个SQL进行数据抽取。在实际业务场景中,往往会选择当天的数据进行同步,可以将where条件指定为gmt_create > time。 - 注意:不可以将where条件指定为limit 10,limit不是SQL的合法where子句。 - 必选:否 + - 字段类型:String - 默认值:无 - +
- **splitPk** - 描述:当speed配置中的channel大于1时指定此参数,Reader插件根据并发数和此参数指定的字段拼接sql,使每个并发读取不同的数据,提升读取速率。 @@ -47,17 +92,19 @@ - 目前splitPk仅支持整形数据切分,`不支持浮点、字符串、日期等其他类型`。如果用户指定其他非支持类型,FlinkX将报错! - 如果channel大于1但是没有配置此参数,任务将置为失败。 - 必选:否 + - 字段类型:String - 默认值:无 - +
- **queryTimeOut** - 描述:查询超时时间,单位秒。 - 注意:当数据量很大,或者从视图查询,或者自定义sql查询时,可通过此参数指定超时时间。 - 必选:否 + - 字段类型:int - 默认值:3000 - +
- **customSql** - 描述:自定义的查询语句,如果只指定字段不能满足需求时,可通过此参数指定查询的sql,可以是任意复杂的查询语句。 @@ -67,9 +114,10 @@ - 当指定了此参数时,connection里指定的table无效; - 当指定此参数时,column必须指定具体字段信息,不能以*号代替; - 必选:否 + - 字段类型:String - 默认值:无 - +
- **column** - 描述:需要读取的字段。 @@ -98,27 +146,55 @@ - format:如果字段是时间字符串,可以指定时间的格式,将字段类型转为日期格式返回 - value:如果数据库里不存在指定的字段,则会把value的值作为常量列返回,如果指定的字段存在,当指定字段的值为null时,会以此value值作为默认值返回 - 必选:是 + - 字段类型:List - 默认值:无 - +
- **polling** - 描述:是否开启间隔轮询,开启后会根据`pollingInterval`轮询间隔时间周期性的从数据库拉取数据。开启间隔轮询还需配置参数`pollingInterval`,`increColumn`,可以选择配置参数`startLocation`。若不配置参数`startLocation`,任务启动时将会从数据库中查询增量字段最大值作为轮询的开始位置。 - 必选:否 + - 字段类型:Boolean - 默认值:false +
+- **increColumn** + - 增量字段,可以是对应的增量字段名,也可以是纯数字,表示增量字段在column中的顺序位置(从0开始) + - 必选:否 + - 字段类型:String或int + - 默认值:无 + +
- **pollingInterval** - 描述:轮询间隔时间,从数据库中拉取数据的间隔时间,默认为5000毫秒。 - 必选:否 + - 字段类型:long - 默认值:5000 +
+ +- **startLocation** + - 描述:增量查询起始位置 + - 必选:否 + - 字段类型:String + - 默认值:无 + +
+ +- **useMaxFunc** + - 描述:用于标记是否保存endLocation位置的一条或多条数据,true:不保存,false(默认):保存, 某些情况下可能出现最后几条数据被重复记录的情况,可以将此参数配置为true + - 必选:否 + - 字段类型:Boolean + - 默认值:false +
- **requestAccumulatorInterval** - 描述:发送查询累加器请求的间隔时间。 - 必选:否 + - 字段类型:int - 默认值:2 ## 四、配置示例 @@ -144,16 +220,12 @@ ], "increColumn": "", "startLocation": "", - "username": "SYSDBA", - "password": "SYSDBA", + "username": "username", + "password": "password", "connection": [ { - "jdbcUrl": [ - "jdbc:dm://localhost:5236" - ], - "table": [ - "PERSON.STUDENT" - ] + "jdbcUrl": ["jdbc:dm://localhost:5236"], + "table": ["TABLE"] } ], "where": "" @@ -208,16 +280,12 @@ "splitPk": "ID", "increColumn": "", "startLocation": "", - "username": "SYSDBA", - "password": "SYSDBA", + "username": "username", + "password": "password", "connection": [ { - "jdbcUrl": [ - "jdbc:dm://localhost:5236" - ], - "table": [ - "PERSON.STUDENT" - ] + "jdbcUrl": ["jdbc:dm://localhost:5236"], + "table": ["TABLE"] } ], "where": "" @@ -271,17 +339,13 @@ ], "increColumn": "", "startLocation": "", - "customSql": "SELECT * FROM PERSON.STUDENT WHERE ID>30", - "username": "SYSDBA", - "password": "SYSDBA", + "customSql": "SELECT * FROM TABLE WHERE ID>30", + "username": "username", + "password": "password", "connection": [ { - "jdbcUrl": [ - "jdbc:dm://localhost:5236" - ], - "table": [ - "PERSON.STUDENT" - ] + "jdbcUrl": ["jdbc:dm://localhost:5236"], + "table": ["TABLE"] } ], "where": "" @@ -335,16 +399,12 @@ ], "increColumn": "ID", "startLocation": "20", - "username": "SYSDBA", - "password": "SYSDBA", + "username": "username", + "password": "password", "connection": [ { - "jdbcUrl": [ - "jdbc:dm://localhost:5236" - ], - "table": [ - "PERSON.STUDENT" - ] + "jdbcUrl": ["jdbc:dm://localhost:5236"], + "table": ["TABLE"] } ], "where": "" @@ -398,18 +458,14 @@ ], "increColumn": "", "startLocation": "", - "username": "SYSDBA", - "password": "SYSDBA", + "username": "username", + "password": "password", "polling": true, "pollingInterval": 3000, "connection": [ { - "jdbcUrl": [ - "jdbc:dm://localhost:5236" - ], - "table": [ - "PERSON.STUDENT" - ] + "jdbcUrl": ["jdbc:dm://localhost:5236"], + "table": ["TABLE"] } ], "where": "" diff --git a/docs/offline/reader/esreader.md b/docs/offline/reader/esreader.md index c9916083d5..bd36c06a9d 100644 --- a/docs/offline/reader/esreader.md +++ b/docs/offline/reader/esreader.md @@ -12,6 +12,7 @@ - **address** - 描述:Elasticsearch地址,单个节点地址采用host:port形式,多个节点的地址用逗号连接 - 必选:是 + - 字段类型:String - 默认值:无 @@ -19,6 +20,7 @@ - **username** - 描述:Elasticsearch认证用户名 - 必选:否 + - 字段类型:String - 默认值:无 @@ -26,6 +28,7 @@ - **password** - 描述:Elasticsearch认证密码 - 必选:否 + - 字段类型:String - 默认值:无 @@ -33,6 +36,7 @@ - **query** - 描述:Elasticsearch查询表达式,[查询表达式](https://www.elastic.co/guide/cn/elasticsearch/guide/current/query-dsl-intro.html) - 必选:否 + - 字段类型:json结构体 - 默认值:无,默认为全查询 @@ -40,6 +44,7 @@ - **batchSize** - 描述:每次读取数据条数 - 必选:否 + - 字段类型:int - 默认值:10 @@ -47,29 +52,46 @@ - **timeout** - 描述:连接超时时间 - 必选:否 + - 字段类型:int - 默认值:无 - **index** - - 描述:要查询的索引名称 - - 必选:否 + - 描述:要查询的索引名称,支持String和String[]两种类型 + - 必选:是 + - 字段类型:可以为String或者String[] - 默认值:无 - **type** - - 描述:要查询的类型 - - 必选:否 + - 描述:要查询的类型,支持String和String[]两种类型 + - 必选:是 + - 字段类型:可以为String或者String[] - 默认值:无 - **column** - 描述:读取elasticsearch的查询结果的若干个列,每列形式如下 - - name:字段名称,可使用多级格式查找 + - name:字段名称,可使用多级格式查找,多级查询时采用'.'作为间隔 - type:字段类型,当name没有指定时,则返回常量列,值为value指定 - - value:常量列的值 + - value:常量列的值 + 示例: + ```json + "column": [ + { + "name": "id", + "type": "integer" + },{ + "name": "user_id", + "type": "integer" + },{ + "name": "name", + "type": "string" + } +``` - 必选:是 - 默认值:无 @@ -84,7 +106,7 @@ "reader": { "name": "esreader", "parameter": { - "address": "kudu4:9200", + "address": "localhost:9200", "query": { "match_all": {} }, diff --git a/docs/offline/reader/ftpreader.md b/docs/offline/reader/ftpreader.md index e687c6e84d..3bb168fc71 100644 --- a/docs/offline/reader/ftpreader.md +++ b/docs/offline/reader/ftpreader.md @@ -1,10 +1,9 @@ # FTP Reader - ## 一、插件名称 -名称:**ftpreader**
+名称:**ftpreader** + - ## 二、数据源版本 | 协议 | 是否支持 | | --- | --- | @@ -13,94 +12,102 @@ - ## 三、数据源配置 -FTP服务搭建
windows:[地址](https://help.aliyun.com/document_detail/92046.html?spm=a2c4g.11186623.6.1185.6371dcd5DOfc5z)
linux:[地址](https://help.aliyun.com/document_detail/92048.html?spm=a2c4g.11186623.6.1184.7a9a2dbcRLDNlf)
sftp服务搭建
windows:[地址](http://www.freesshd.com/)
linux:[地址](https://yq.aliyun.com/articles/435356?spm=a2c4e.11163080.searchblog.102.576f2ec1BVgWY7)
+FTP服务搭建 +windows:[地址](https://help.aliyun.com/document_detail/92046.html?spm=a2c4g.11186623.6.1185.6371dcd5DOfc5z) +linux:[地址](https://help.aliyun.com/document_detail/92048.html?spm=a2c4g.11186623.6.1184.7a9a2dbcRLDNlf) +sftp服务搭建 +windows:[地址](http://www.freesshd.com/) +linux:[地址](https://yq.aliyun.com/articles/435356?spm=a2c4e.11163080.searchblog.102.576f2ec1BVgWY7) + - ## 四、参数说明 - **protocol** - 描述:ftp服务器协议,目前支持传输协议有`ftp`、`sftp` - 必选:是 + - 字段类型:string - 默认值:无 - +
- **host** - 描述:ftp服务器地址 - 必选:是 + - 字段类型:string - 默认值:无 - +
- **port** - 描述:ftp服务器端口 - 必选:否 + - 字段类型:int - 默认值:若传输协议是sftp协议,默认值是22;若传输协议是标准ftp协议,默认值是21 - - -- **connectPattern** - - 描述:协议为ftp时的连接模式,可选`pasv`,`port`,参数含义可参考:[模式说明](https://blog.csdn.net/qq_16038125/article/details/72851142) - - 必选:否 - - 默认值:`PASV` - - +
- **username** - 描述:ftp服务器访问用户名 - 必选:是 + - 字段类型:string - 默认值:无 - +
- **password** - 描述:ftp服务器访问密码 - 必选:否 + - 字段类型:string - 默认值:无 - +
- **path** - - 描述:远程FTP文件系统的路径信息,注意这里可以支持填写多个路径 + - 描述:远程FTP文件系统的路径信息,注意这里可以支持填写多个路径,多个路径以`,`隔开 - 必选:是 + - 字段类型:string - 默认值:无 - +
- **fieldDelimiter** - 描述:读取的字段分隔符 - 必选:是 + - 字段类型:string - 默认值:`,` - +
- **encoding** - 描述:读取文件的编码配置 - 必选:否 + - 字段类型:string - 默认值:`UTF-8` - +
- **isFirstLineHeader** - 描述:首行是否为标题行,如果是则不读取第一行 - 必选:否 + - 字段类型:boolean - 默认值:false - +
- **timeout** - 描述:连接超时时间,单位毫秒 - 必选:否 + - 字段类型:long - 默认值:5000 - +
- **column** - 描述:需要读取的字段 - - 格式:支持2中格式 -
1.读取全部字段,如果字段数量很多,可以使用下面的写法: + - 格式:支持2种格式 + +1.读取全部字段,如果字段数量很多,可以使用下面的写法: ``` "column":["*"] ``` @@ -115,20 +122,34 @@ FTP服务搭建
windows:[地址](https://help.aliyun.com/document_detail/ }] ``` - - 属性说明: - - index:字段索引 - - type:字段类型,ftp读取的为文本文件,本质上都是字符串类型,这里可以指定要转成的类型 - - format:如果字段是时间字符串,可以指定时间的格式,将字段类型转为日期格式返回 - - value:如果没有指定index,则会把value的值作为常量列返回,如果指定了index,当读取的字段的值为null时,会以此value值作为默认值返回 - - 必选:是 - - 默认值:无 +- 属性说明: + - index:字段索引 + - type:字段类型,ftp读取的为文本文件,本质上都是字符串类型,这里可以指定要转成的类型 + - format:如果字段是时间字符串,可以指定时间的格式,将字段类型转为日期格式返回 + - value:如果没有指定index,则会把value的值作为常量列返回,如果指定了index,当读取的字段的值为null时,会以此value值作为默认值返回 +- 必选:是 +- 字段类型:数组 +- 默认值:无 + +
+- **connectPattern** + - 描述:协议为ftp时的连接模式,可选`pasv`,`port`,参数含义可参考:[模式说明](https://blog.csdn.net/qq_16038125/article/details/72851142) + - 必选:否 + - 字段类型:string + - 默认值:`PASV` +
+ +- **privateKeyPath** + - 描述:私钥文件路径 + - 必选:否 + - 字段类型:string + - 默认值:无 - + ## 五、使用示例 - -#### 1、读取单个文件 +#### 1、sftp读取单个文件 ```json { "job": { @@ -190,8 +211,7 @@ FTP服务搭建
windows:[地址](https://help.aliyun.com/document_detail/ } } ``` - -#### 2、读取单个目录下的所有文件 +#### 2、sftp读取单个目录下的所有文件 ```json { "job": { @@ -236,25 +256,14 @@ FTP服务搭建
windows:[地址](https://help.aliyun.com/document_detail/ } ], "setting": { - "restore": { - "maxRowNumForCheckpoint": 0, - "isRestore": false, - "restoreColumnName": "", - "restoreColumnIndex": 0 - }, - "errorLimit": { - "record": 100 - }, "speed": { - "bytes": 0, "channel": 1 } } } } ``` - -#### 3、读取多个路径下的文件 +#### 3、sftp读取多个路径下的文件 ```json { "job": { @@ -266,7 +275,7 @@ FTP服务搭建
windows:[地址](https://help.aliyun.com/document_detail/ "protocol": "sftp", "port": 22, "isFirstLineHeader": true, - "host": "localhost", + "host": "host", "column": [ { "index": 0, @@ -299,22 +308,59 @@ FTP服务搭建
windows:[地址](https://help.aliyun.com/document_detail/ } ], "setting": { - "restore": { - "maxRowNumForCheckpoint": 0, - "isRestore": false, - "restoreColumnName": "", - "restoreColumnIndex": 0 - }, - "errorLimit": { - "record": 100 - }, "speed": { - "bytes": 0, "channel": 1 } } } } ``` - - +#### 4、ftp读取单个文件 +```json +{ + "job": { + "content": [ + { + "reader": { + "parameter": { + "isFirstLineHeader": false, + "column": [ + { + "index": 0, + "type": "STRING", + "key": 0 + }, + { + "index": 1, + "type": "STRING", + "key": 1 + } + ], + "fieldDelimiter": ",", + "encoding": "utf-8", + "path": "/data/a.csv", + "protocol": "ftp", + "password": "passwd", + "connectMode": "PORT", + "port": 21, + "host": "host", + "username": "usname" + }, + "name": "ftpreader" + }, + "writer": { + "parameter": { + "print": true + }, + "name": "streamwriter" + } + } + ], + "setting": { + "speed": { + "channel": 1 + } + } + } +} +``` diff --git a/docs/offline/reader/gbasereader.md b/docs/offline/reader/gbasereader.md index 605a762f0e..efd08e70bd 100644 --- a/docs/offline/reader/gbasereader.md +++ b/docs/offline/reader/gbasereader.md @@ -10,34 +10,70 @@ ## 三、参数说明 +- **connection** + - 描述:数据库连接参数,包含jdbcUrl、schema、table等参数 + - 必选:是 + - 字段类型:List + - 示例:指定jdbcUrl、schema、table + ```json + "connection": [{ + "jdbcUrl": ["jdbc:gbase://0.0.0.1:5258/database"], + "table": ["table"], + "schema":"public" + }] + ``` + - 默认值:无 + +
+ - **jdbcUrl** - 描述:针对关系型数据库的jdbc连接字符串,需要注意gbase有A、S、T三种发行版,jdbcUrl端口和驱动都有区别。
jdbcUrl参考文档:[gbase官方文档](https://help.finereport.com/doc-view-2569.html) - 必选:是 + - 字段类型:List + - 默认值:无 + +
+ +- **schema** + - 描述:数据库schema名 + - 必选:否 + - 字段类型:String - 默认值:无 +
+- **table** + - 描述:目的表的表名称。目前只支持配置单个表,后续会支持多表 + - 必选:是 + - 字段类型:List + - 默认值:无 + +
- **username** - 描述:数据源的用户名 - 必选:是 + - 字段类型:String - 默认值:无 - +
- **password** - 描述:数据源指定用户名的密码 - 必选:是 + - 字段类型:String - 默认值:无 - +
- **where** - 描述:筛选条件,reader插件根据指定的column、table、where条件拼接SQL,并根据这个SQL进行数据抽取。在实际业务场景中,往往会选择当天的数据进行同步,可以将where条件指定为gmt_create > time。 - 注意:不可以将where条件指定为limit 10,limit不是SQL的合法where子句。 - 必选:否 + - 字段类型:String - 默认值:无 - +
- **splitPk** - 描述:当speed配置中的channel大于1时指定此参数,Reader插件根据并发数和此参数指定的字段拼接sql,使每个并发读取不同的数据,提升读取速率。 @@ -46,25 +82,28 @@ - 目前splitPk仅支持整形数据切分,`不支持浮点、字符串、日期等其他类型`。如果用户指定其他非支持类型,FlinkX将报错! - 如果channel大于1但是没有配置此参数,任务将置为失败。 - 必选:否 + - 字段类型:String - 默认值:无 - +
- **fetchSize** - - 描述:读取时每批次读取的数据条数。 + - 描述:一次性从数据库中读取多少条数据,jdbc默认一次将所有结果都读取到内存中,在数据量很大时可能会造成OOM,设置这个参数可以控制每次读取fetchSize条数据。 - 注意:此参数的值不可设置过大,否则会读取超时,导致任务失败。 - 必选:否 - - 默认值:1000 - + - 字段类型:int + - 默认值:Integer.MIN_VALUE +
- **queryTimeOut** - 描述:查询超时时间,单位秒。 - 注意:当数据量很大,或者从视图查询,或者自定义sql查询时,可通过此参数指定超时时间。 - 必选:否 + - 字段类型:int - 默认值:1000 - +
- **customSql** - 描述:自定义的查询语句,如果只指定字段不能满足需求时,可通过此参数指定查询的sql,可以是任意复杂的查询语句。 @@ -74,9 +113,10 @@ - 当指定了此参数时,connection里指定的table无效; - 当指定此参数时,column必须指定具体字段信息,不能以*号代替; - 必选:否 + - 字段类型:String - 默认值:无 - +
- **column** - 描述:需要读取的字段。 @@ -105,27 +145,55 @@ - format:如果字段是时间字符串,可以指定时间的格式,将字段类型转为日期格式返回 - value:如果数据库里不存在指定的字段,则会把value的值作为常量列返回,如果指定的字段存在,当指定字段的值为null时,会以此value值作为默认值返回 - 必选:是 + - 字段类型:List - 默认值:无 - +
- **polling** - 描述:是否开启间隔轮询,开启后会根据`pollingInterval`轮询间隔时间周期性的从数据库拉取数据。开启间隔轮询还需配置参数`pollingInterval`,`increColumn`,可以选择配置参数`startLocation`。若不配置参数`startLocation`,任务启动时将会从数据库中查询增量字段最大值作为轮询的开始位置。 - 必选:否 + - 字段类型:Boolean - 默认值:false +
- +- **increColumn** + - 增量字段,可以是对应的增量字段名,也可以是纯数字,表示增量字段在column中的顺序位置(从0开始) + - 必选:否 + - 字段类型:String或int + - 默认值:无 + +
+ - **pollingInterval** - 描述:轮询间隔时间,从数据库中拉取数据的间隔时间,默认为5000毫秒。 - 必选:否 + - 字段类型:long - 默认值:5000 +
+ +- **startLocation** + - 描述:增量查询起始位置 + - 必选:否 + - 字段类型:String + - 默认值:无 + +
+ +- **useMaxFunc** + - 描述:用于标记是否保存endLocation位置的一条或多条数据,true:不保存,false(默认):保存, 某些情况下可能出现最后几条数据被重复记录的情况,可以将此参数配置为true + - 必选:否 + - 字段类型:Boolean + - 默认值:false +
- **requestAccumulatorInterval** - 描述:发送查询累加器请求的间隔时间。 - 必选:否 + - 字段类型:int - 默认值:2 ## 四、配置示例 @@ -139,21 +207,18 @@ "parameter" : { "column" : [ { "name" : "id", - "type" : "bigint", - "key" : "id" + "type" : "bigint" }, { "name" : "user_id", - "type" : "bigint", - "key" : "user_id" + "type" : "bigint" }, { "name" : "name", - "type" : "varchar", - "key" : "name" + "type" : "varchar" } ], "username" : "username", "password" : "password", "connection" : [ { - "jdbcUrl" : [ "jdbc:gbase://0.0.0.1:5258/dtstack" ], + "jdbcUrl" : [ "jdbc:gbase://0.0.0.1:5258/database" ], "table" : [ "tableTest" ] } ], "where": "id > 1", @@ -165,10 +230,6 @@ }, "name" : "gbasereader" }, - "writer": { - - } - }], "writer": { "name": "streamwriter", "parameter": { @@ -198,21 +259,18 @@ "parameter" : { "column" : [ { "name" : "id", - "type" : "bigint", - "key" : "id" + "type" : "bigint" }, { "name" : "user_id", - "type" : "bigint", - "key" : "user_id" + "type" : "bigint" }, { "name" : "name", - "type" : "varchar", - "key" : "name" + "type" : "varchar" } ], "username" : "username", "password" : "password", "connection" : [ { - "jdbcUrl" : [ "jdbc:gbase://0.0.0.1:5258/dtstack" ], + "jdbcUrl" : [ "jdbc:gbase://0.0.0.1:5258/database" ], "table" : [ "tableTest" ] } ], "where": "id > 1", @@ -230,13 +288,6 @@ "print": true } } - }], - "writer": { - "name": "streamwriter", - "parameter": { - "print": true - } - } }], "setting": { "speed": { @@ -260,21 +311,18 @@ "parameter" : { "column" : [ { "name" : "id", - "type" : "bigint", - "key" : "id" + "type" : "bigint" }, { "name" : "user_id", - "type" : "bigint", - "key" : "user_id" + "type" : "bigint" }, { "name" : "name", - "type" : "varchar", - "key" : "name" + "type" : "varchar" } ], "username" : "username", "password" : "password", "connection" : [ { - "jdbcUrl" : [ "jdbc:gbase://0.0.0.1:5258/dtstack" ], + "jdbcUrl" : [ "jdbc:gbase://0.0.0.1:5258/database" ], "table" : [ "tableTest" ] } ], "where": "id > 1", @@ -292,13 +340,6 @@ "print": true } } - }], - "writer": { - "name": "streamwriter", - "parameter": { - "print": true - } - } }], "setting": { "speed": { @@ -322,21 +363,18 @@ "parameter" : { "column" : [ { "name" : "id", - "type" : "bigint", - "key" : "id" + "type" : "bigint" }, { "name" : "user_id", - "type" : "bigint", - "key" : "user_id" + "type" : "bigint" }, { "name" : "name", - "type" : "varchar", - "key" : "name" + "type" : "varchar" } ], "username" : "username", "password" : "password", "connection" : [ { - "jdbcUrl" : [ "jdbc:gbase://0.0.0.1:5258/dtstack" ], + "jdbcUrl" : [ "jdbc:gbase://0.0.0.1:5258/database" ], "table" : [ "tableTest" ] } ], "where": "id > 1", @@ -350,9 +388,6 @@ }, "name" : "gbasereader" }, - "writer": { - } - }], "writer": { "name": "streamwriter", "parameter": { @@ -382,21 +417,18 @@ "parameter" : { "column" : [ { "name" : "id", - "type" : "bigint", - "key" : "id" + "type" : "bigint" }, { "name" : "user_id", - "type" : "bigint", - "key" : "user_id" + "type" : "bigint" }, { "name" : "name", - "type" : "varchar", - "key" : "name" + "type" : "varchar" } ], "username" : "username", "password" : "password", "connection" : [ { - "jdbcUrl" : [ "jdbc:gbase://0.0.0.1:5258/dtstack" ], + "jdbcUrl" : [ "jdbc:gbase://0.0.0.1:5258/database" ], "table" : [ "tableTest" ] } ], "where": "id > 1", @@ -410,9 +442,6 @@ }, "name" : "gbasereader" }, - "writer": { - } - }], "writer": { "name": "streamwriter", "parameter": { diff --git a/docs/offline/reader/greenplumreader.md b/docs/offline/reader/greenplumreader.md index a8d89386e7..b8bdd841f2 100644 --- a/docs/offline/reader/greenplumreader.md +++ b/docs/offline/reader/greenplumreader.md @@ -10,34 +10,79 @@ ## 三、参数说明 +- **connection** + - 描述:数据库连接参数,包含jdbcUrl、schema、table等参数 + - 必选:是 + - 字段类型:List + - 示例:指定jdbcUrl、schema、table + ```json + "connection": [{ + "jdbcUrl": ["jdbc:pivotal:greenplum://localhost:5432;DatabaseName=database"], + "table": ["table"], + "schema":"public" + }] + ``` + - 默认值:无 + +
+ - **jdbcUrl** - 描述:针对关系型数据库的jdbc连接字符串
jdbcUrl参考文档:[greenplum官方文档](https://gpdb.docs.pivotal.io/590/datadirect/datadirect_jdbc.html) - 必选:是 + - 字段类型:List - 默认值:无 +
+ +- **schema** + - 描述:数据库schema名 + - 必选:否 + - 字段类型:String + - 默认值:无 + +
+ +- **table** + - 描述:目的表的表名称。目前只支持配置单个表,后续会支持多表 + - 必选:是 + - 字段类型:List + - 默认值:无 +
- **username** - 描述:数据源的用户名 - 必选:是 + - 字段类型:String - 默认值:无 - +
- **password** - 描述:数据源指定用户名的密码 - 必选:是 + - 字段类型:String - 默认值:无 +
+ +- **fetchSize** + - 描述:一次性从数据库中读取多少条数据,jdbc默认一次将所有结果都读取到内存中,在数据量很大时可能会造成OOM,设置这个参数可以控制每次读取fetchSize条数据。 + - 注意:此参数的值不可设置过大,否则会读取超时,导致任务失败。 + - 必选:否 + - 字段类型:int + - 默认值:0(不开启) +
- **where** - 描述:筛选条件,reader插件根据指定的column、table、where条件拼接SQL,并根据这个SQL进行数据抽取。在实际业务场景中,往往会选择当天的数据进行同步,可以将where条件指定为gmt_create > time。 - 注意:不可以将where条件指定为limit 10,limit不是SQL的合法where子句。 - 必选:否 + - 字段类型:String - 默认值:无 - +
- **splitPk** - 描述:当speed配置中的channel大于1时指定此参数,Reader插件根据并发数和此参数指定的字段拼接sql,使每个并发读取不同的数据,提升读取速率。 @@ -46,25 +91,28 @@ - 目前splitPk仅支持整形数据切分,`不支持浮点、字符串、日期等其他类型`。如果用户指定其他非支持类型,FlinkX将报错! - 如果channel大于1但是没有配置此参数,任务将置为失败。 - 必选:否 + - 字段类型:String - 默认值:无 - +
- **fetchSize** - 描述:读取时每批次读取的数据条数。 - 注意:此参数的值不可设置过大,否则会读取超时,导致任务失败。 - 必选:否 + - 字段类型:int - 默认值:1000 - +
- **queryTimeOut** - 描述:查询超时时间,单位秒。 - 注意:当数据量很大,或者从视图查询,或者自定义sql查询时,可通过此参数指定超时时间。 - 必选:否 + - 字段类型:int - 默认值:1000 - +
- **customSql** - 描述:自定义的查询语句,如果只指定字段不能满足需求时,可通过此参数指定查询的sql,可以是任意复杂的查询语句。 @@ -74,9 +122,10 @@ - 当指定了此参数时,connection里指定的table无效; - 当指定此参数时,column必须指定具体字段信息,不能以*号代替; - 必选:否 + - 字段类型:String - 默认值:无 - +
- **column** - 描述:需要读取的字段。 @@ -104,27 +153,55 @@ - format:如果字段是时间字符串,可以指定时间的格式,将字段类型转为日期格式返回 - value:如果数据库里不存在指定的字段,则会把value的值作为常量列返回,如果指定的字段存在,当指定字段的值为null时,会以此value值作为默认值返回 - 必选:是 + - 字段类型:List - 默认值:无 - +
- **polling** - 描述:是否开启间隔轮询,开启后会根据`pollingInterval`轮询间隔时间周期性的从数据库拉取数据。开启间隔轮询还需配置参数`pollingInterval`,`increColumn`,可以选择配置参数`startLocation`。若不配置参数`startLocation`,任务启动时将会从数据库中查询增量字段最大值作为轮询的开始位置。 - 必选:否 + - 字段类型:Boolean - 默认值:false +
+- **increColumn** + - 增量字段,可以是对应的增量字段名,也可以是纯数字,表示增量字段在column中的顺序位置(从0开始) + - 必选:否 + - 字段类型:String或int + - 默认值:无 + +
- **pollingInterval** - 描述:轮询间隔时间,从数据库中拉取数据的间隔时间,默认为5000毫秒。 - 必选:否 + - 字段类型:long - 默认值:5000 +
+ +- **startLocation** + - 描述:增量查询起始位置 + - 必选:否 + - 字段类型:String + - 默认值:无 + +
+- **useMaxFunc** + - 描述:用于标记是否保存endLocation位置的一条或多条数据,true:不保存,false(默认):保存, 某些情况下可能出现最后几条数据被重复记录的情况,可以将此参数配置为true + - 必选:否 + - 字段类型:Boolean + - 默认值:false + +
- **requestAccumulatorInterval** - 描述:发送查询累加器请求的间隔时间。 - 必选:否 + - 字段类型:int - 默认值:2 ** @@ -139,11 +216,11 @@ "reader": { "parameter" : { "column" : [ {"name" : "id", "type": "int"}], - "username" : "gpadmin", - "password" : "gpadmin", + "username" : "username", + "password" : "password", "connection" : [ { - "jdbcUrl" : [ "jdbc:pivotal:greenplum://localhost:5432;DatabaseName=exampledb" ], - "table" : [ "performance" ] + "jdbcUrl" : [ "jdbc:pivotal:greenplum://localhost:5432;DatabaseName=database" ], + "table" : [ "table" ] } ], "where": "", "customSql": "", @@ -179,13 +256,14 @@ "reader": { "parameter" : { "column" : [ {"name" : "id", "type": "int"}], - "username" : "gpadmin", - "password" : "gpadmin", + "username" : "username", + "password" : "password", "connection" : [ { - "jdbcUrl" : [ "jdbc:pivotal:greenplum://localhost:5432;DatabaseName=exampledb" ], - "table" : [ "performance" ] + "jdbcUrl" : [ "jdbc:pivotal:greenplum://localhost:5432;DatabaseName=database" ], + "table" : [ "table" ] } ], "where": "", + "splitPk":"id", "customSql": "", "requestAccumulatorInterval": 2 }, @@ -219,14 +297,14 @@ "reader": { "parameter" : { "column" : [ {"name" : "id", "type": "int"}], - "username" : "gpadmin", - "password" : "gpadmin", + "username" : "username", + "password" : "password", "connection" : [ { - "jdbcUrl" : [ "jdbc:pivotal:greenplum://localhost:5432;DatabaseName=exampledb" ], - "table" : [ "performance" ] + "jdbcUrl" : [ "jdbc:pivotal:greenplum://localhost:5432;DatabaseName=database" ], + "table" : [ "table" ] } ], "where": "", - "customSql": "select id from performance", + "customSql": "select id from table", "requestAccumulatorInterval": 2 }, "name" : "greenplumreader" @@ -259,11 +337,11 @@ "reader": { "parameter" : { "column" : [ {"name" : "id", "type": "int"}], - "username" : "gpadmin", - "password" : "gpadmin", + "username" : "username", + "password" : "password", "connection" : [ { - "jdbcUrl" : [ "jdbc:pivotal:greenplum://localhost:5432;DatabaseName=exampledb" ], - "table" : [ "performance" ] + "jdbcUrl" : [ "jdbc:pivotal:greenplum://localhost:5432;DatabaseName=database" ], + "table" : [ "table" ] } ], "increColumn": "id", "startLocation": "20", @@ -301,10 +379,10 @@ "reader": { "parameter" : { "column" : [ {"name" : "id", "type": "int"}], - "username" : "gpadmin", - "password" : "gpadmin", + "username" : "username", + "password" : "password", "connection" : [ { - "jdbcUrl" : [ "jdbc:pivotal:greenplum://localhost:5432;DatabaseName=exampledb" ], + "jdbcUrl" : [ "jdbc:pivotal:greenplum://localhost:5432;DatabaseName=database" ], "table" : [ "performance" ] } ], "polling": true, diff --git a/docs/offline/reader/hbasereader.md b/docs/offline/reader/hbasereader.md index 5d4688aecc..d88dd4b133 100644 --- a/docs/offline/reader/hbasereader.md +++ b/docs/offline/reader/hbasereader.md @@ -5,82 +5,103 @@ 名称:**hbasereader** ## 二、支持的数据源版本 -**HBase 1.3及以上** +**HBase 1.2及以上** ## 三、参数说明 - **table** - 描述:hbase表名 - 必选:是 + - 字段类型:String - 默认值:无 - - +
- **hbaseConfig** - - 描述:hbase的连接配置,以json的形式组织 (见hbase-site.xml),key可以为以下七种: - -Kerberos;
hbase.security.authentication;
hbase.security.authorization;
hbase.master.kerberos.principal;
hbase.master.keytab.file;
hbase.regionserver.keytab.file;
hbase.regionserver.kerberos.principal - + - 描述:hbase的连接配置,以json的形式组织 (见hbase-site.xml) + - 基础配置 + ``` + "hbase.zookeeper.property.clientPort": "2181", + "hbase.rootdir": "hdfs://ns1/hbase", + "hbase.cluster.distributed": "true", + "hbase.zookeeper.quorum": "node01,node02,node03", + "zookeeper.znode.parent": "/hbase" + ``` + + - kerberos配置 + 在hbaseConfig中加入以下三条中的任一条即表明开启Kerberos配置: + ``` + "hbase.security.authentication" :"Kerberos", + "hbase.security.authorization" : "Kerberos", + "hbase.security.auth.enable" : true + ``` + 在开启kerberos后,需要根据自己的集群指定以下两个principal的value值 + ``` + "hbase.regionserver.kerberos.principal":"hbase/_HOST@DTSTACK.COM", + "hbase.master.kerberos.principal":"hbase/_HOST@DTSTACK.COM" + ``` + 还需要指定Kerberos相关文件的位置 + ``` + "principalFile": "path of keytab", + "java.security.krb5.conf": "path of krb5.conf" + ``` - 必选:是 + - 字段类型:Map - 默认值:无 - - +
+ - **range** - 描述:指定hbasereader读取的rowkey范围。 - startRowkey:指定开始rowkey; - endRowkey:指定结束rowkey; - - - - isBinaryRowkey:指定配置的startRowkey和endRowkey转换为byte[]时的方式,默认值为false,若为true,则调用Bytes.toBytesBinary(rowkey)方法进行转换;若为false:则调用Bytes.toBytes(rowkey),配置格式如下: -``` -"range": { - "startRowkey": "aaa", - "endRowkey": "ccc", - "isBinaryRowkey":false -} -``` - - + - isBinaryRowkey:指定配置的startRowkey和endRowkey转换为byte[]时的方式,默认值为false。 + 若为true,则调用Bytes.toBytesBinary(rowkey)方法进行转换;若为false:则调用Bytes.toBytes(rowkey) + 配置格式如下: + ``` + "range": { + "startRowkey": "aaa", + "endRowkey": "ccc", + "isBinaryRowkey":false + } + ``` - 注意:如果用户配置了 startRowkey 和 endRowkey,需要确保:startRowkey <= endRowkey - 必选:否 + - 字段类型:Map - 默认值:无 - - + +
- **encoding** - 描述:字符编码 - 必选:无 - - 默认值:无 - - + - 字段类型:String + - 默认值:UTF-8 + +
- **scanCacheSize** - - 描述:一次RPC请求批量读取的Results数量 + - 描述:一次RPC请求批量读取的Results数量。cache值得设置并不是越大越好,需要做一个平衡。cache的值越大,则查询的性能就越高,但是与此同时,每一次调用next()操作都需要花费更长的时间,因为获取的数据更多并且数据量大了传输到客户端需要的时间就越长,一旦超过了maximum heap the client process 拥有的值,就会报outofmemoryException异常。当传输rows数据到客户端的时候,如果花费时间过长,则会抛出ScannerTimeOutException异常。 - 必选:无 + - 字段类型:String - 默认值:256 - -
- -- **scanBatchSize** - - 描述:每一个result中的列的数量 - - 必选:无 - - 默认值:100 - -
+ +
- **column** - - 描述:要读取的hbase字段,normal 模式与multiVersionFixedColumn 模式下必填项。 - - name:指定读取的hbase列,除了rowkey外,必须为 列族:列名 的格式; + - 描述:要读取的hbase字段 + - name:指定读取的hbase列,除了rowkey外,必须为 列族:列名 的格式,注意rowkey区分大小写; - type:指定源数据的类型,format指定日期类型的格式,value指定当前类型为常量,不从hbase读取数据,而是根据value值自动生成对应的列。 - 必选:是 + - 字段类型:List - 默认值:无 - + +

## 四、配置示例 +未开启Kerberos的情况 ```json { "job": { @@ -148,5 +169,80 @@ Kerberos;
hbase.security.authentication;
hbase.security.authorizat } } ``` + +开启kerberos的情况 +```json +{ + "job": { + "content": [ + { + "reader": { + "name": "hbasereader", + "parameter": { + "hbaseConfig": { + "hbase.zookeeper.property.clientPort": "2181", + "hbase.rootdir": "hdfs://ns1/hbase", + "hbase.cluster.distributed": "true", + "hbase.zookeeper.quorum": "node01,node02,node03", + "zookeeper.znode.parent": "/hbase", + "hbase.security.auth.enable": true, + "hbase.regionserver.kerberos.principal":"hbase/host@DTSTACK.COM", + "hbase.master.kerberos.principal":"hbase/host@DTSTACK.COM", + "principalFile": "path of keytab", + "useLocalFile": "true", + "java.security.krb5.conf": "path of krb5.conf" + }, + "table": "sb5", + "encodig": "utf-8", + "column": [ + { + "name": "rowkey", + "type": "string" + }, + { + "name": "cf1:id", + "type": "string" + } + ], + "range": { + "startRowkey": "", + "endRowkey": "", + "isBinaryRowkey": false + } + } + }, + "writer": { + "parameter": { + "print": true + }, + "name": "streamwriter" + } + } + ], + "setting": { + "speed": { + "channel": 1, + "bytes": 0 + }, + "errorLimit": { + "record": 100 + }, + "restore": { + "maxRowNumForCheckpoint": 0, + "isRestore": false, + "isStream": false, + "restoreColumnName": "", + "restoreColumnIndex": 0 + }, + "log": { + "isLogger": false, + "level": "debug", + "path": "", + "pattern": "" + } + } + } +} +``` # diff --git a/docs/offline/reader/hdfsreader.md b/docs/offline/reader/hdfsreader.md index 5b4e92ffa6..88ae4b9772 100644 --- a/docs/offline/reader/hdfsreader.md +++ b/docs/offline/reader/hdfsreader.md @@ -1,10 +1,9 @@ # HDFS Reader - ## 一、插件名称 -名称:**hdfsreader**
+名称:**hdfsreader** + - ## 二、支持的数据源版本 | 协议 | 是否支持 | | --- | --- | @@ -13,40 +12,54 @@ - ## 三、数据源配置 -单机模式:[地址](http://hadoop.apache.org/docs/r2.7.6/hadoop-project-dist/hadoop-common/SingleCluster.html)
集群模式:[地址](http://hadoop.apache.org/docs/r2.7.6/hadoop-project-dist/hadoop-common/ClusterSetup.html)
+单机模式:[地址](http://hadoop.apache.org/docs/r2.7.6/hadoop-project-dist/hadoop-common/SingleCluster.html) +集群模式:[地址](http://hadoop.apache.org/docs/r2.7.6/hadoop-project-dist/hadoop-common/ClusterSetup.html) + - ## 四、参数说明 - **defaultFS** - 描述:Hadoop hdfs文件系统namenode节点地址。格式:hdfs://ip:端口;例如:hdfs://127.0.0.1:9 - 必选:是 + - 参数类型:string - 默认值:无 - +
- **hadoopConfig** - 描述:集群HA模式时需要填写的namespace配置及其它配置 - 必选:否 + - 参数类型:map - 默认值:无 - +
- **path** - 描述:数据文件的路径 + - 注意:真正读取的文件路径是 path+fileName - 必选:是 + - 参数类型:string - 默认值:无 +
+ +- **fileName** + - 描述:数据文件目录名称 + - 注意:不为空,则hdfs读取的路径为 path+filename + - 必选:否 + - 参数类型:string + - 默认值:无 +
- **filterRegex** - - 描述:文件过滤正则表达式 + - 描述:文件正则表达式,读取匹配到的文件 - 必选:否 + - 参数类型:string - 默认值:无 - +
- **fileType** - 描述:文件的类型,目前只支持用户配置为`text`、`orc`、`parquet` @@ -54,138 +67,142 @@ - orc:orcfile文件格式 - parquet:parquet文件格式 - 必选:否 + - 参数类型:string - 默认值:text - +
- **fieldDelimiter** - 描述:`fileType`为`text`时字段的分隔符 - 必选:否 + - 参数类型:string - 默认值:`\001` +
+ +- **column** + - 描述:需要读取的字段 + - 注意:不支持*格式 + - 格式: +```json +"column": [{ + "name": "col", + "type": "datetime", + "index":1, + "isPart":false, + "format": "yyyy-MM-dd hh:mm:ss", + "value": "value" +}] +``` +- 属性说明: + - name:必选,字段名称 + - type:必选,字段类型,可以和文件里的字段类型不一样,程序会做一次类型转换 + - index:非必选,字段在所有字段里的位置 从0开始计算,默认为-1 + - isPart:非必选,是否是分区字段,如果是分区字段,会自动从path上截取分区赋值,默认为fale + - format:非必选,如果字段是时间字符串,可以指定时间的格式,将字段类型转为日期格式返回 + - value:非必选,如果文件里不存在指定的字段,则会把value的值作为常量列返回 +- 必选:是 +- 参数类型:数组 +- 默认值:无 - ## 五、使用示例 - #### 1、读取text文件 ```json { - "job": { - "content": [ - { - "reader": { - "parameter": { - "path": "hdfs://ns1/flinkx/text", - "defaultFS": "hdfs://ns1", - "hadoopConfig": { - "dfs.ha.namenodes.ns1": "nn1,nn2", - "dfs.namenode.rpc-address.ns1.nn2": "flinkx02:9000", - "dfs.client.failover.proxy.provider.ns1": "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider", - "dfs.namenode.rpc-address.ns1.nn1": "flinkx01:9000", - "dfs.nameservices": "ns1" - }, - "column": [ - { - "name": "col1", - "index": 0, - "type": "string" - }, - { - "name": "col2", - "index": 1, - "type": "string" - }, - { - "name": "col3", - "index": 2, - "type": "int" - }, - { - "name": "col4", - "index": 3, - "type": "int" - } - ], - "fieldDelimiter": ",", - "fileType": "text" - }, - "name": "hdfsreader" - }, - "writer": { - "parameter": {}, - "name": "streamwriter" - } - } - ], - "setting": { - "speed": { - "bytes": 0, - "channel": 1 - } + "job": { + "content": [ + { "reader" : { + "parameter" : { + "path" : "/user/hive/warehouse/dev.db/merge_text", + "hadoopConfig" : { + "dfs.ha.namenodes.ns1" : "nn1,nn2", + "dfs.namenode.rpc-address.ns1.nn2" : "host1:9000", + "dfs.client.failover.proxy.provider.ns1" : "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider", + "dfs.namenode.rpc-address.ns1.nn1" : "host2:9000", + "dfs.nameservices" : "ns1" + }, + "column" : [ { + "name": "col1", + "index" : 0, + "type" : "STRING" + }, { + "name": "col2", + "index" : 1, + "type" : "STRING" + } ], + "defaultFS" : "hdfs://ns1", + "fieldDelimiter" : "\u0001", + "fileType" : "text" + }, + "name" : "hdfsreader" + }, + "writer": { + "name": "streamwriter", + "parameter": { + "print": true + } } + } + ], + "setting": { + "speed": { + "channel": 1 + }, + "restore": { + "isRestore": false + } } + } } ``` - #### 2、过滤文件名称 ```json { - "job": { - "content": [ - { - "reader": { - "parameter": { - "path": "hdfs://ns1/flinkx/text", - "filterRegex" : ".*\\.csv", - "defaultFS": "hdfs://ns1", - "hadoopConfig": { - "dfs.ha.namenodes.ns1": "nn1,nn2", - "dfs.namenode.rpc-address.ns1.nn2": "flinkx02:9000", - "dfs.client.failover.proxy.provider.ns1": "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider", - "dfs.namenode.rpc-address.ns1.nn1": "flinkx01:9000", - "dfs.nameservices": "ns1" - }, - "column": [ - { - "name": "col1", - "index": 0, - "type": "string" - }, - { - "name": "col2", - "index": 1, - "type": "string" - }, - { - "name": "col3", - "index": 2, - "type": "int" - }, - { - "name": "col4", - "index": 3, - "type": "int" - } - ], - "fieldDelimiter": ",", - "fileType": "text" - }, - "name": "hdfsreader" - }, - "writer": { - "parameter": {}, - "name": "streamwriter" - } - } - ], - "setting": { - "speed": { - "bytes": 1048576, - "channel": 1 - } + "job": { + "content": [ + { "reader" : { + "parameter" : { + "path" : "/user/hive/warehouse/dev.db/merge_orc", + "filterRegex" : "..*\\.snappy", + "hadoopConfig" : { + "dfs.ha.namenodes.ns1" : "nn1,nn2", + "dfs.namenode.rpc-address.ns1.nn2" : "host1:9000", + "dfs.client.failover.proxy.provider.ns1" : "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider", + "dfs.namenode.rpc-address.ns1.nn1" : "host2:9000", + "dfs.nameservices" : "ns1" + }, + "column" : [ { + "name": "col1", + "index" : 0, + "type" : "STRING" + }, { + "name": "col2", + "index" : 1, + "type" : "STRING" + } ], + "defaultFS" : "hdfs://ns1", + "fileType" : "orc" + }, + "name" : "hdfsreader" + }, + "writer": { + "name": "streamwriter", + "parameter": { + "print": true + } } + } + ], + "setting": { + "speed": { + "channel": 1 + }, + "restore": { + "isRestore": false + } } + } } ``` diff --git a/docs/offline/reader/kingbasereader.md b/docs/offline/reader/kingbasereader.md index 2ed22425dd..44c517ae8e 100644 --- a/docs/offline/reader/kingbasereader.md +++ b/docs/offline/reader/kingbasereader.md @@ -7,15 +7,49 @@ **KingBase 8.2、8.3** ## 三、参数说明 -- jdbcUrl + +- **connection** + - 描述:数据库连接参数,包含jdbcUrl、schema、table等参数 + - 必选:是 + - 字段类型:List + - 示例:指定jdbcUrl、schema、table + ```json + "connection": [{ + "jdbcUrl": ["jdbc:kingbase8://localhost:54321/database"], + "table": ["table"], + "schema":"public" + }] + ``` + - 默认值:无 + +
+ +- **jdbcUrl** - 描述:针对KingBase数据库的jdbc连接字符串 - 必选:是 - - 字段类型:String + - 字段类型:List - 默认值:无
-- username +- **schema** + - 描述:数据库schema名 + - 必选:否 + - 字段类型:String + - 默认值:无 + +
+ + +- **table** + - 描述:目的表的表名称。目前只支持配置单个表,后续会支持多表 + - 必选:是 + - 字段类型:List + - 默认值:无 + +
+ +- **username** - 描述:数据源的用户名 - 必选:是 - 字段类型:String @@ -23,7 +57,7 @@
-- password +- **password** - 描述:数据源指定用户名的密码 - 必选:是 - 字段类型:String @@ -31,15 +65,16 @@
-- schema - - 描述:查询数据库所在schema - - 必选:是 - - 字段类型:String - - 默认值:无 +- **fetchSize** + - 描述:一次性从数据库中读取多少条数据,jdbc默认一次将所有结果都读取到内存中,在数据量很大时可能会造成OOM,设置这个参数可以控制每次读取fetchSize条数据。 + - 注意:此参数的值不可设置过大,否则会读取超时,导致任务失败。 + - 必选:否 + - 字段类型:int + - 默认值:1000
-- where +- **where** - 描述:筛选条件,reader插件根据指定的column、table、where条件拼接SQL,并根据这个SQL进行数据抽取。在实际业务场景中,往往会选择当天的数据进行同步,可以将where条件指定为gmt_create > time。 - 注意:不可以将where条件指定为limit 10,limit不是SQL的合法where子句。 - 必选:否 @@ -48,7 +83,7 @@
-- splitPk +- **splitPk** - 描述:当speed配置中的channel大于1时指定此参数,Reader插件根据并发数和此参数指定的字段拼接sql,使每个并发读取不同的数据,提升读取速率。 - 注意: 推荐splitPk使用表主键,因为表主键通常情况下比较均匀,因此切分出来的分片也不容易出现数据热点。 @@ -60,7 +95,7 @@
-- queryTimeOut +- **queryTimeOut** - 描述:查询超时时间,单位秒。 - 注意:当数据量很大,或者从视图查询,或者自定义sql查询时,可通过此参数指定超时时间。 - 必选:否 @@ -69,7 +104,7 @@
-- customSql +- **customSql** - 描述:自定义的查询语句,如果只指定字段不能满足需求时,可通过此参数指定查询的sql,可以是任意复杂的查询语句。 - 注意: 只能是查询语句,否则会导致任务失败; @@ -82,7 +117,7 @@
-- column +- **column** - 描述:需要读取的字段。 - 格式:支持3种格式 1.读取全部字段,如果字段数量很多,可以使用下面的写法: @@ -98,55 +133,80 @@ "value": "value" }] ``` -   属性说明: -   name:字段名称 -   type:字段类型,可以和数据库里的字段类型不一样,程序会做一次类型转换 -   format:如果字段是时间字符串,可以指定时间的格式,将字段类型转为日期格式返回 -   value:如果数据库里不存在指定的字段,则会把value的值作为常量列返回,如果指定的字段存在,当指定字段的值为null时,会以此value值作为默认值返回 + - 属性说明: + - name:字段名称 + - type:字段类型,可以和数据库里的字段类型不一样,程序会做一次类型转换 + - format:如果字段是时间字符串,可以指定时间的格式,将字段类型转为日期格式返回 + - value:如果数据库里不存在指定的字段,则会把value的值作为常量列返回,如果指定的字段存在,当指定字段的值为null时,会以此value值作为默认值返回 - 必选:是 - 字段类型:List - 默认值:无
-- polling +- **polling** - 描述:是否开启间隔轮询,开启后会根据pollingInterval轮询间隔时间周期性的从数据库拉取数据。开启间隔轮询还需配置参数pollingInterval,increColumn,可以选择配置参数startLocation。若不配置参数startLocation,任务启动时将会从数据库中查询增量字段最大值作为轮询的开始位置。 - 必选:否 + - 字段类型:Boolean - 默认值:false
+ +- **increColumn** + - 增量字段,可以是对应的增量字段名,也可以是纯数字,表示增量字段在column中的顺序位置(从0开始) + - 必选:否 + - 字段类型:String或int + - 默认值:无 + +
-- pollingInterval +- **pollingInterval** - 描述:轮询间隔时间,从数据库中拉取数据的间隔时间,默认为5000毫秒。 - 必选:否 - - 字段类型:int + - 字段类型:long - 默认值:5000
-- requestAccumulatorInterval +- **startLocation** + - 描述:增量查询起始位置 + - 必选:否 + - 字段类型:String + - 默认值:无 + +
+ +- **useMaxFunc** + - 描述:用于标记是否保存endLocation位置的一条或多条数据,true:不保存,false(默认):保存, 某些情况下可能出现最后几条数据被重复记录的情况,可以将此参数配置为true + - 必选:否 + - 字段类型:Boolean + - 默认值:false + +
+ +- **requestAccumulatorInterval** - 描述:发送查询累加器请求的间隔时间。 - 必选:否 - 字段类型:int - 默认值:2
- + ## 四、配置示例 1、基础配置 -``` +```json { "job": { "content": [ { "reader": { "parameter": { - "username": "dtstack", - "password": "abc123", + "username": "username", + "password": "password", "connection": [{ - "jdbcUrl": ["jdbc:kingbase8://localhost:54321/test"], - "table": ["kudu"], - "schema":"test" + "jdbcUrl": ["jdbc:kingbase8://localhost:54321/database"], + "table": ["table"], + "schema":"schema" }], "column": ["*"], "customSql": "", @@ -190,19 +250,19 @@ } ``` 2、多通道 -``` +```json { "job": { "content": [ { "reader": { "parameter": { - "username": "dtstack", - "password": "abc123", + "username": "username", + "password": "password", "connection": [{ - "jdbcUrl": ["jdbc:kingbase8://localhost:54321/test"], - "table": ["kudu"], - "schema":"test" + "jdbcUrl": ["jdbc:kingbase8://localhost:54321/database"], + "table": ["table"], + "schema":"schema" }], "column": ["*"], "customSql": "", @@ -246,22 +306,22 @@ } ``` 3、指定customSql -``` +```json { "job": { "content": [ { "reader": { "parameter": { - "username": "dtstack", - "password": "abc123", + "username": "username", + "password": "password", "connection": [{ - "jdbcUrl": ["jdbc:kingbase8://localhost:54321/test"], - "table": ["kudu"], - "schema":"test" + "jdbcUrl": ["jdbc:kingbase8://localhost:54321/database"], + "table": ["table"], + "schema":"schema" }], "column": ["id","user_id","name"], - "customSql": "select * from kudu where id > 20", + "customSql": "select * from table where id > 20", "where": "id < 100", "splitPk": "", "queryTimeOut": 1000, @@ -302,19 +362,19 @@ } ``` 4、增量同步指定startLocation -``` +```json { "job": { "content": [ { "reader": { "parameter": { - "username": "dtstack", - "password": "abc123", + "username": "username", + "password": "password", "connection": [{ - "jdbcUrl": ["jdbc:kingbase8://localhost:54321/test"], - "table": ["kudu"], - "schema":"test" + "jdbcUrl": ["jdbc:kingbase8://localhost:54321/database"], + "table": ["table"], + "schema":"schema" }], "column": [{ "name": "id", @@ -369,19 +429,19 @@ } ``` 5、间隔轮询 -``` +```json { "job": { "content": [ { "reader": { "parameter": { - "username": "dtstack", - "password": "abc123", + "username": "username", + "password": "password", "connection": [{ - "jdbcUrl": ["jdbc:kingbase8://localhost:54321/test"], - "table": ["kudu"], - "schema":"test" + "jdbcUrl": ["jdbc:kingbase8://localhost:54321/database"], + "table": ["table"], + "schema":"schema" }], "column": [{ "name": "id", @@ -436,4 +496,4 @@ } } } -``` \ No newline at end of file +``` diff --git a/docs/offline/reader/kudureader.md b/docs/offline/reader/kudureader.md index bc8271299c..cf9ba2aae7 100644 --- a/docs/offline/reader/kudureader.md +++ b/docs/offline/reader/kudureader.md @@ -1,130 +1,144 @@ # Kudu Reader - ## 一、插件名称 -名称:**kudureader**
** - +名称:**kudureader** + ## 二、支持的数据源版本 -**kudu 1.10及以上**
+**kudu 1.10及以上** + - ## 三、参数说明 - **column** - 描述:需要生成的字段 - - 属性说明: - - name:字段名称; - - type:字段类型; - - 必选:是 - - 默认值:无 + - 格式 +```json +"column": [{ + "name": "col", + "type": "string", + "value": "value" +}] +``` +- 属性说明: + - name:字段名称 + - type:字段类型 + - value:如果此字段不为空,会以此value值作为默认值返回 +- 必选:是 +- 字段类型:数组 +- 默认值:无 +
- **masterAddresses** - 描述: master节点地址:端口,多个以,隔开 - 必选:是 + - 字段类型:string - 默认值:无 - +
- **table** - 描述: kudu表名。 - 必选:是 + - 字段类型:string - 默认值:无 - +
- **readMode** - 描述: kudu读取模式: - - 1、read_latest
-默认的读取模式。
-该模式下,服务器将始终在收到请求时返回已提交的写操作。
-这种类型的读取不会返回快照时间戳,并且不可重复。
-用ACID术语表示,它对应于隔离模式:“读已提交”。 - - 2、read_at_snapshot
-该模式下,服务器将尝试在提供的时间戳上执行读取。
-如果未提供时间戳,则服务器将当前时间作为快照时间戳。
-在这种模式下,读取是可重复的,即将来所有在相同时间戳记下的读取将产生相同的数据。
-执行此操作的代价是等待时间戳小于快照的时间戳的正在进行的正在进行的事务,因此可能会导致延迟损失。用ACID术语,这本身就相当于隔离模式“可重复读取”。
-如果对已扫描tablet的所有写入均在外部保持一致,则这对应于隔离模式“严格可序列化”。
-注意:当前存在“空洞”,在罕见的边缘条件下会发生,通过这种空洞有时即使在采取措施使写入如此时,它们在外部也不一致。
-在这些情况下,隔离可能会退化为“读取已提交”模式。 + - 1、read_latest + 默认的读取模式。 + 该模式下,服务器将始终在收到请求时返回已提交的写操作。 + 这种类型的读取不会返回快照时间戳,并且不可重复。 + 用ACID术语表示,它对应于隔离模式:“读已提交”。 + - 2、read_at_snapshot + 该模式下,服务器将尝试在提供的时间戳上执行读取。 + 如果未提供时间戳,则服务器将当前时间作为快照时间戳。 + 在这种模式下,读取是可重复的,即将来所有在相同时间戳记下的读取将产生相同的数据。 + 执行此操作的代价是等待时间戳小于快照的时间戳的正在进行的正在进行的事务,因此可能会导致延迟损失。用ACID术语,这本身就相当于隔离模式“可重复读取”。 + 如果对已扫描tablet的所有写入均在外部保持一致,则这对应于隔离模式“严格可序列化”。 + 注意:当前存在“空洞”,在罕见的边缘条件下会发生,通过这种空洞有时即使在采取措施使写入如此时,它们在外部也不一致。 + 在这些情况下,隔离可能会退化为“读取已提交”模式。 - 必选:是 + - 字段类型:string - 默认值:无 - +
- **authentication** - - 描述:认证方式,如:Kerberos - - 必选:否 - - 默认值:无 - - - -- **principal** - - 描述: 用户名 - - 必选:否 - - 默认值:无 - - - -- **keytabFile** - - 描述:keytab文件路径 + - 描述:认证方式,kudu开启kerberos时需要配置authentication为Kerberos - 必选:否 + - 字段类型:string - 默认值:无 - +
- **workerCount** - 描述:worker线程数 - 必选:否 - - 默认值:默认为cpu*2 - + - 字段类型:int + - 默认值:默认为cpu核心数*2 +
- **bossCount** - 描述:boss线程数 - 必选:否 + - 字段类型:int - 默认值:1 - +
- **operationTimeout** - - 描述:普通操作超时时间 + - 描述:普通操作超时时间,单位毫秒 - 必选:否 + - 字段类型:long - 默认值:30000 - +
- **adminOperationTimeout** - - 描述: 管理员操作(建表,删表)超时时间 + - 描述: 管理员操作(建表,删表)超时时间,单位毫秒 - 必选:否 - - 默认值:30000 - + - 字段类型:long + - 默认值:15000 +
- **queryTimeout** - - 描述:连接scan token的超时时间 + - 描述:连接scan token的超时时间,单位毫秒 - 必选:否 - - 默认值:与operationTimeout一致 - + - 字段类型:long + - 默认值:30000 +
- **where** - 描述:过滤条件字符串,多个以and连接 - 必选:否 + - 字段类型:string - 默认值:无 - +
- **batchSizeBytes** - 描述: kudu scan一次性最大读取字节数 - 必选:否 + - 字段类型:int - 默认值:1048576 +
+ +- **hadoopConfig** + - 描述: kudu开启kerberos,需要配置kerberos相关参数 + - 必选:否 + - 字段类型:map + - 默认值:无 + - ## 四、配置示例 ```json { @@ -136,22 +150,27 @@ "column": [ { "name": "id", - "type": "long" + "type": "string" + }, { + "name": "name", + "type": "string" + }, { + "name": "age", + "type": "int" + }, { + "name": "sex", + "type": "int" } ], - "masterAddresses": "kudu1:7051,kudu2:7051,kudu3:7051", - "table": "kudu", + "masterAddresses": "host:7051", + "table": "table", "readMode": "read_latest", - "authentication": "", - "principal": "", - "keytabFile": "", "workerCount": 2, "bossCount": 1, "operationTimeout": 30000, "adminOperationTimeout": 30000, "queryTimeout": 30000, - "where": " id >= 1 ", - "batchSizeBytes": 1048576 + "where": " id >= 1 " } }, "writer" : { @@ -163,23 +182,10 @@ } ], "setting" : { "restore" : { - "maxRowNumForCheckpoint" : 0, - "isRestore" : false, - "restoreColumnName" : "", - "restoreColumnIndex" : 0 - }, - "errorLimit" : { - "record" : 100 + "isRestore" : false }, "speed" : { - "bytes" : 0, "channel" : 1 - }, - "log" : { - "isLogger": false, - "level" : "debug", - "path" : "", - "pattern":"" } } } diff --git a/docs/offline/reader/mongodbreader.md b/docs/offline/reader/mongodbreader.md index af8cb1a7ad..89c2aafbfa 100644 --- a/docs/offline/reader/mongodbreader.md +++ b/docs/offline/reader/mongodbreader.md @@ -12,60 +12,73 @@ - **url** - 描述:MongoDB数据库连接的URL字符串,详细请参考[MongoDB官方文档](https://docs.mongodb.com/manual/reference/connection-string/) - 必选:否 + - 字段类型:String - 默认值:无 +
- **hostPorts** - 描述:MongoDB的地址和端口,格式为 IP1:port,可填写多个地址,以英文逗号分隔 - 必选:否 + - 字段类型:String - 默认值:无 - +
- **username** - 描述:数据源的用户名 - 必选:否 + - 字段类型:String - 默认值:无 +
- **password** - 描述:数据源指定用户名的密码 - 必选:否 + - 字段类型:String - 默认值:无 +
- **database** - 描述:数据库名称 - 必选:否 + - 字段类型:String - 默认值:无 +
- **collectionName** - 描述:集合名称 - 必选:是 + - 字段类型:String - 默认值:无 - -
+
+ - **fetchSize** - 描述:每次读取的数据条数,通过调整此参数来优化读取速率 - 必选:否 + - 字段类型:int - 默认值:100 - +
- **filter** - - 描述:过滤条件,通过该配置型来限制返回 MongoDB 数据范围,语法请参考[MongoDB查询语法](https://docs.mongodb.com/manual/crud/#read-operations) + - 描述:过滤条件,为json格式, 通过该配置型来限制返回 MongoDB 数据范围,语法请参考[MongoDB查询语法](https://docs.mongodb.com/manual/crud/#read-operations) - 必选:否 + - 字段类型:String - 默认值:无 - +
- **column** - 描述:需要读取的字段。 - - 格式:支持3中格式 + - 格式:支持三种格式
1.读取全部字段,如果字段数量很多,可以使用下面的写法: ```json {"column":["*"]} @@ -92,7 +105,9 @@ - value:如果数据库里不存在指定的字段,则会把value的值作为常量列返回,如果指定的字段存在,当指定字段的值为null时,会以此value值作为默认值返回 - splitter:因为 MongoDB 支持数组类型,所以 MongoDB 读出来的数组类型要通过这个分隔符合并成字符串 - 必选:是 + - 字段类型: List - 默认值:无 +
diff --git a/docs/offline/reader/mysqlreader.md b/docs/offline/reader/mysqlreader.md index 04ca11f99e..a9cc074c37 100644 --- a/docs/offline/reader/mysqlreader.md +++ b/docs/offline/reader/mysqlreader.md @@ -2,7 +2,7 @@ ## 一、插件名称 -名称:**mysqlreader**
** +名称:**mysqlreader**
## 二、支持的数据源版本 **MySQL 5.X**
@@ -10,35 +10,80 @@ ## 三、参数说明
+- **connection** + - 描述:数据库连接参数,包含jdbcUrl、schema、table等参数 + - 必选:是 + - 字段类型:List + - 示例:指定jdbcUrl、schema、table + ```json + "connection": [{ + "jdbcUrl": ["jdbc:mysql://0.0.0.1:3306/database?useUnicode=true&characterEncoding=utf8"], + "table": ["table"], + "schema":"public" + }] + ``` + - 默认值:无 + +
+ - **jdbcUrl** - 描述:针对关系型数据库的jdbc连接字符串
jdbcUrl参考文档:[MySQL官方文档](http://dev.mysql.com/doc/connector-j/en/connector-j-reference-configuration-properties.html) - 必选:是 + - 字段类型:List - 默认值:无 +
+ +- **schema** + - 描述:数据库schema名 + - 必选:否 + - 字段类型:String + - 默认值:无 +
+ +- **table** + - 描述:目的表的表名称。目前只支持配置单个表,后续会支持多表 + - 必选:是 + - 字段类型:List + - 默认值:无 + +
- **username** - 描述:数据源的用户名 - 必选:是 + - 字段类型:String - 默认值:无 - +
- **password** - 描述:数据源指定用户名的密码 - 必选:是 + - 字段类型:String - 默认值:无 +
+- **fetchSize** + - 描述:一次性从数据库中读取多少条数据,jdbc默认一次将所有结果都读取到内存中,在数据量很大时可能会造成OOM,设置这个参数可以控制每次读取fetchSize条数据;开启fetchSize需要满足:数据库版本要高于5.0.2、连接参数useCursorFetch=true。 + - 注意:此参数的值不可设置过大,否则会读取超时,导致任务失败。 + - 必选:否 + - 字段类型:int + - 默认值:Integer.MIN_VALUE + +
- **where** - 描述:筛选条件,reader插件根据指定的column、table、where条件拼接SQL,并根据这个SQL进行数据抽取。在实际业务场景中,往往会选择当天的数据进行同步,可以将where条件指定为gmt_create > time。 - 注意:不可以将where条件指定为limit 10,limit不是SQL的合法where子句。 - 必选:否 + - 字段类型:String - 默认值:无 - +
- **splitPk** - 描述:当speed配置中的channel大于1时指定此参数,Reader插件根据并发数和此参数指定的字段拼接sql,使每个并发读取不同的数据,提升读取速率。 @@ -47,17 +92,19 @@ - 目前splitPk仅支持整形数据切分,`不支持浮点、字符串、日期等其他类型`。如果用户指定其他非支持类型,FlinkX将报错! - 如果channel大于1但是没有配置此参数,任务将置为失败。 - 必选:否 + - 字段类型:String - 默认值:无 - +
- **queryTimeOut** - 描述:查询超时时间,单位秒。 - 注意:当数据量很大,或者从视图查询,或者自定义sql查询时,可通过此参数指定超时时间。 - 必选:否 + - 字段类型:int - 默认值:1000 - +
- **customSql** - 描述:自定义的查询语句,如果只指定字段不能满足需求时,可通过此参数指定查询的sql,可以是任意复杂的查询语句。 @@ -67,9 +114,10 @@ - 当指定了此参数时,connection里指定的table无效; - 当指定此参数时,column必须指定具体字段信息,不能以*号代替; - 必选:否 + - 字段类型:String - 默认值:无 - +
- **column** - 描述:需要读取的字段。 @@ -98,27 +146,55 @@ - format:如果字段是时间字符串,可以指定时间的格式,将字段类型转为日期格式返回 - value:如果数据库里不存在指定的字段,则会把value的值作为常量列返回,如果指定的字段存在,当指定字段的值为null时,会以此value值作为默认值返回 - 必选:是 + - 字段类型:List - 默认值:无 - +
- **polling** - 描述:是否开启间隔轮询,开启后会根据`pollingInterval`轮询间隔时间周期性的从数据库拉取数据。开启间隔轮询还需配置参数`pollingInterval`,`increColumn`,可以选择配置参数`startLocation`。若不配置参数`startLocation`,任务启动时将会从数据库中查询增量字段最大值作为轮询的开始位置。 - 必选:否 + - 字段类型:Boolean - 默认值:false +
- +- **increColumn** + - 增量字段,可以是对应的增量字段名,也可以是纯数字,表示增量字段在column中的顺序位置(从0开始) + - 必选:否 + - 字段类型:String或int + - 默认值:无 + +
+ - **pollingInterval** - 描述:轮询间隔时间,从数据库中拉取数据的间隔时间,默认为5000毫秒。 - 必选:否 + - 字段类型:long - 默认值:5000 +
+ +- **startLocation** + - 描述:增量查询起始位置 + - 必选:否 + - 字段类型:String + - 默认值:无 + +
+ +- **useMaxFunc** + - 描述:用于标记是否保存endLocation位置的一条或多条数据,true:不保存,false(默认):保存, 某些情况下可能出现最后几条数据被重复记录的情况,可以将此参数配置为true + - 必选:否 + - 字段类型:Boolean + - 默认值:false +
- **requestAccumulatorInterval** - 描述:发送查询累加器请求的间隔时间。 - 必选:否 + - 字段类型:int - 默认值:2 @@ -134,11 +210,11 @@ { "reader": { "parameter": { - "username": "dtstack", - "password": "abc123", + "username": "username", + "password": "password", "connection": [{ - "jdbcUrl": ["jdbc:mysql://kudu3:3306/tudou?useUnicode=true&characterEncoding=utf8"], - "table": ["kudu"] + "jdbcUrl": ["jdbc:mysql://0.0.0.1:3306/database?useUnicode=true&characterEncoding=utf8"], + "table": ["table"] }], "column": ["*"], "customSql": "", @@ -190,11 +266,11 @@ { "reader": { "parameter": { - "username": "dtstack", - "password": "abc123", + "username": "username", + "password": "password", "connection": [{ - "jdbcUrl": ["jdbc:mysql://kudu3:3306/tudou?useUnicode=true&characterEncoding=utf8"], - "table": ["kudu"] + "jdbcUrl": ["jdbc:mysql://0.0.0.1:3306/database?useUnicode=true&characterEncoding=utf8"], + "table": ["table"] }], "column": ["*"], "customSql": "", @@ -246,14 +322,14 @@ { "reader": { "parameter": { - "username": "dtstack", - "password": "abc123", + "username": "username", + "password": "password", "connection": [{ - "jdbcUrl": ["jdbc:mysql://kudu3:3306/tudou?useUnicode=true&characterEncoding=utf8"], - "table": ["kudu"] + "jdbcUrl": ["jdbc:mysql://0.0.0.1:3306/database?useUnicode=true&characterEncoding=utf8"], + "table": ["table"] }], "column": ["id","user_id","name"], - "customSql": "select * from kudu where id > 20", + "customSql": "select * from table where id > 20", "where": "id < 100", "splitPk": "", "queryTimeOut": 1000, @@ -302,11 +378,11 @@ { "reader": { "parameter": { - "username": "dtstack", - "password": "abc123", + "username": "username", + "password": "password", "connection": [{ - "jdbcUrl": ["jdbc:mysql://kudu3:3306/tudou?useUnicode=true&characterEncoding=utf8"], - "table": ["kudu"] + "jdbcUrl": ["jdbc:mysql://0.0.0.1:3306/database?useUnicode=true&characterEncoding=utf8"], + "table": ["table"] }], "column": [{ "name": "id", @@ -369,11 +445,11 @@ { "reader": { "parameter": { - "username": "dtstack", - "password": "abc123", + "username": "username", + "password": "password", "connection": [{ - "jdbcUrl": ["jdbc:mysql://kudu3:3306/tudou?useUnicode=true&characterEncoding=utf8"], - "table": ["kudu"] + "jdbcUrl": ["jdbc:mysql://0.0.0.1:3306/database?useUnicode=true&characterEncoding=utf8"], + "table": ["table"] }], "column": [{ "name": "id", diff --git a/docs/offline/reader/odpsreader.md b/docs/offline/reader/odpsreader.md index 57a0625031..f02f476d5d 100644 --- a/docs/offline/reader/odpsreader.md +++ b/docs/offline/reader/odpsreader.md @@ -1,62 +1,40 @@ # ODPS Reader - ## 一、插件名称 -名称:**odpsreader**
- -## 二、参数说明 - -- **accessId** - - 描述:ODPS系统登录ID - - 必选:是 - - 默认值:无 - - - -- **accessKey** - - 描述:ODPS系统登录Key - - 必选:是 - - 默认值:无 - - - -- **project** - - 描述:读取数据表所在的 ODPS 项目名称(大小写不敏感) - - 必选:是 - - 默认值:无 - +名称:**odpsreader** +## 二、参数说明 - **table** - 描述:读取数据表的表名称(大小写不敏感) - 必选:是 + - 字段类型:string - 默认值:无 - -
+
- **partition** - - 描述:读取数据所在的分区信息,支持linux shell通配符,包括 * 表示0个或多个字符,?代表任意一个字符。例如现在有分区表 test,其存在 pt=1,ds=hangzhou   pt=1,ds=shanghai   pt=2,ds=hangzhou   pt=2,ds=beijing 四个分区,如果你想读取 pt=1,ds=shanghai 这个分区的数据,那么你应该配置为: `"partition":["pt=1,ds=shanghai"]`; 如果你想读取 pt=1下的所有分区,那么你应该配置为: `"partition":["pt=1,ds=* "]`;如果你想读取整个 test 表的所有分区的数据,那么你应该配置为: `"partition":["pt=*,ds=*"]` - - 必选:如果表为分区表,则必填。如果表为非分区表,则不能填写 + - 描述:读取数据所在的分区信息,支持linux shell通配符,包括 * 表示0个或多个字符,?代表任意一个字符。例如现在有分区表 test,其存在 pt=1,ds=hangzhou   pt=1,ds=shanghai   pt=2,ds=hangzhou   pt=2,ds=beijing 四个分区,如果你想读取 pt=1,ds=shanghai 这个分区的数据,那么你应该配置为: `"partition":"pt=1,ds=shanghai"`; 如果你想读取 pt=1下的所有分区,那么你应该配置为: `"partition":"pt=1,ds=* "`;如果你想读取整个 test 表的所有分区的数据,那么你应该配置为: `"partition":"pt=*,ds=*"` + - 注意:如果表为分区表,则必填。如果表为非分区表,则不能填写 + - 必选:否 + - 字段类型:string - 默认值:无 - -
+
- **column** - 描述:需要读取的字段。 - - 格式:支持3中格式 -
1.读取全部字段,如果字段数量很多,可以使用下面的写法: + - 格式:支持3种格式 + +1.读取全部字段,如果字段数量很多,可以使用下面的写法: ``` "column":[*] ``` - - -
2.只指定字段名称: +2.只指定字段名称: ``` "column":["id","name"] ``` - - -
3.指定具体信息: + 3.指定具体信息: ``` "column": [{ "name": "col", @@ -66,17 +44,60 @@ }] ``` - - 属性说明: - - name:字段名称 - - type:字段类型,可以和数据库里的字段类型不一样,程序会做一次类型转换 - - format:如果字段是时间字符串,可以指定时间的格式,将字段类型转为日期格式返回 - - value:如果数据库里不存在指定的字段,则会把value的值作为常量列返回,如果指定的字段存在,当指定字段的值为null时,会以此value值作为默认值返回 +- 属性说明: + - name:字段名称 + - type:字段类型,可以和数据库里的字段类型不一样,程序会做一次类型转换 + - format:如果字段是时间字符串,可以指定时间的格式,将字段类型转为日期格式返回 + - value:如果数据库里不存在指定的字段,则会把value的值作为常量列返回,如果指定的字段存在,当指定字段的值为null时,会以此value值作为默认值返回 +- 必选:是 +- 字段类型:数组 +- 默认值:无 + +
+ +- **odpsConfig** + - 描述:ODPS的配置信息 - 必选:是 + - 字段类型 map - 默认值:无 + - 可选配置: + - **odpsServer** + - 描述:odps服务地址 + - 必选:否 + - 字段类型 string + - 默认值:[http://service.odps.aliyun.com/api](http://service.odps.aliyun.com/api) + - **accessId** + - 描述:ODPS系统登录ID + - 必选:是 + - 字段类型 string + - 默认值:无 + - **accessKey** + - 描述:ODPS系统登录Key + - 必选:是 + - 字段类型 string + - 默认值:无 + - **project** + - 描述:读取数据表所在的 ODPS 项目名称(大小写不敏感) + - 必选:是 + - 字段类型 string + - 默认值:无 + - **packageAuthorizedProject** + - 描述:读取数据表所在的 ODPS 项目名称(大小写不敏感) + - 注意:当 **packageAuthorizedProject **不为空时,当前project取packageAuthorizedProject对应值 而不是 project 对应的值 + - 必选:否 + - 字段类型 string + - 默认值:无 + - **accountType** + - 描述:account类型 + - 注意:目前只支持 aliyun 类型 + - 必选:否 + - 字段类型 string + - 默认值:aliyun + + - ## 三、配置示例 ```json { @@ -110,23 +131,10 @@ } ], "setting" : { "restore" : { - "maxRowNumForCheckpoint" : 0, - "isRestore" : false, - "restoreColumnName" : "", - "restoreColumnIndex" : 0 - }, - "errorLimit" : { - "record" : 100 + "isRestore" : false }, "speed" : { - "bytes" : 0, "channel" : 1 - }, - "log" : { - "isLogger": false, - "level" : "debug", - "path" : "", - "pattern":"" } } } diff --git a/docs/offline/reader/oraclereader.md b/docs/offline/reader/oraclereader.md index df2f749531..7ade70a875 100644 --- a/docs/offline/reader/oraclereader.md +++ b/docs/offline/reader/oraclereader.md @@ -2,42 +2,78 @@ ## 一、插件名称 -名称:**oraclereader**
** +名称:**oraclereader**
## 二、支持的数据源版本 -**Oracle 9 及以上**
** +**Oracle 9 及以上**
## 三、参数说明
+- **connection** + - 描述:数据库连接参数,包含jdbcUrl、schema、table等参数 + - 必选:是 + - 字段类型:List + - 示例:指定jdbcUrl、schema、table + ```json + "connection": [{ + "jdbcUrl": ["jdbc:oracle:thin:@0.0.0.1:1521:oracle"], + "table": ["table"], + "schema":"public" + }] + ``` + - 默认值:无 + +
+ - **jdbcUrl** - 描述:针对关系型数据库的jdbc连接字符串
jdbcUrl参考文档:[Oracle官方文档](http://www.oracle.com/technetwork/database/enterprise-edition/documentation/index.html) - 必选:是 + - 字段类型:List + - 默认值:无 + +
+ +- **schema** + - 描述:数据库schema名 + - 必选:否 + - 字段类型:String - 默认值:无 +
+- **table** + - 描述:目的表的表名称。目前只支持配置单个表,后续会支持多表 + - 必选:是 + - 字段类型:List + - 默认值:无 + +
- **username** - 描述:数据源的用户名 - 必选:是 + - 字段类型:String - 默认值:无 - +
- **password** - 描述:数据源指定用户名的密码 - 必选:是 + - 字段类型:String - 默认值:无 - +
- **where** - 描述:筛选条件,reader插件根据指定的column、table、where条件拼接SQL,并根据这个SQL进行数据抽取。在实际业务场景中,往往会选择当天的数据进行同步,可以将where条件指定为gmt_create > time。 - 注意:不可以将where条件指定为limit 10,limit不是SQL的合法where子句。 - 必选:否 + - 字段类型:String - 默认值:无 - +
- **splitPk** - 描述:当speed配置中的channel大于1时指定此参数,Reader插件根据并发数和此参数指定的字段拼接sql,使每个并发读取不同的数据,提升读取速率。 @@ -46,25 +82,28 @@ - 目前splitPk仅支持整形数据切分,`不支持浮点、字符串、日期等其他类型`。如果用户指定其他非支持类型,FlinkX将报错! - 如果channel大于1但是没有配置此参数,任务将置为失败。 - 必选:否 + - 字段类型:String - 默认值:无 - +
- **fetchSize** - - 描述:读取时每批次读取的数据条数。 + - 描述:一次性从数据库中读取多少条数据,jdbc默认一次将所有结果都读取到内存中,在数据量很大时可能会造成OOM,设置这个参数可以控制每次读取fetchSize条数据。 - 注意:此参数的值不可设置过大,否则会读取超时,导致任务失败。 - 必选:否 + - 字段类型:int - 默认值:1000 - +
- **queryTimeOut** - 描述:查询超时时间,单位秒。 - 注意:当数据量很大,或者从视图查询,或者自定义sql查询时,可通过此参数指定超时时间。 - 必选:否 + - 字段类型:int - 默认值:3000 - +
- **customSql** - 描述:自定义的查询语句,如果只指定字段不能满足需求时,可通过此参数指定查询的sql,可以是任意复杂的查询语句。 @@ -74,9 +113,10 @@ - 当指定了此参数时,connection里指定的table无效; - 当指定次参数时,column必须指定具体字段信息,不能以*号代替; - 必选:否 + - 字段类型:String - 默认值:无 - +
- **column** - 描述:需要读取的字段。 @@ -107,30 +147,58 @@ - format:如果字段是时间字符串,可以指定时间的格式,将字段类型转为日期格式返回 - value:如果数据库里不存在指定的字段,则会把value的值作为常量列返回,如果指定的字段存在,当指定字段的值为null时,会以此value值作为默认值返回 - 必选:是 + - 字段类型:List - 默认值:无 - +
- **polling** - 描述:是否开启间隔轮询,开启后会根据`pollingInterval`轮询间隔时间周期性的从数据库拉取数据。开启间隔轮询还需配置参数`pollingInterval`,`increColumn`,可以选择配置参数`startLocation`。若不配置参数`startLocation`,任务启动时将会从数据库中查询增量字段最大值作为轮询的开始位置。 - 必选:否 + - 字段类型:Boolean - 默认值:false - +
- **pollingInterval** - 描述:轮询间隔时间,从数据库中拉取数据的间隔时间,默认为5000毫秒。 - 必选:否 + - 字段类型:long - 默认值:5000 +
+- **increColumn** + - 增量字段,可以是对应的增量字段名,也可以是纯数字,表示增量字段在column中的顺序位置(从0开始) + - 必选:否 + - 字段类型:String或int + - 默认值:无 + +
+ +- **startLocation** + - 描述:增量查询起始位置 + - 必选:否 + - 字段类型:String + - 默认值:无 + +
+ +- **useMaxFunc** + - 描述:用于标记是否保存endLocation位置的一条或多条数据,true:不保存,false(默认):保存, 某些情况下可能出现最后几条数据被重复记录的情况,可以将此参数配置为true + - 必选:否 + - 字段类型:Boolean + - 默认值:false + +
- **requestAccumulatorInterval** - 描述:发送查询累加器请求的间隔时间。 - 必选:否 + - 字段类型:int - 默认值:2 - +
## 四、配置示例 @@ -143,11 +211,11 @@ { "reader": { "parameter": { - "username": "tudou", - "password": "abc123", + "username": "username", + "password": "password", "connection": [{ - "jdbcUrl": ["jdbc:oracle:thin:@kudu5:1521:helowin"], - "table": ["TUDOU.KUDU"] + "jdbcUrl": ["jdbc:oracle:thin:@0.0.0.1:1521:oracle"], + "table": ["TABLE"] }], "column": ["*"], "customSql": "", @@ -200,11 +268,11 @@ { "reader": { "parameter": { - "username": "tudou", - "password": "abc123", + "username": "username", + "password": "password", "connection": [{ - "jdbcUrl": ["jdbc:oracle:thin:@kudu5:1521:helowin"], - "table": ["TUDOU.KUDU"] + "jdbcUrl": ["jdbc:oracle:thin:@0.0.0.1:1521:oracle"], + "table": ["TABLE"] }], "column": ["*"], "customSql": "", @@ -257,14 +325,14 @@ { "reader": { "parameter": { - "username": "tudou", - "password": "abc123", + "username": "username", + "password": "password", "connection": [{ - "jdbcUrl": ["jdbc:oracle:thin:@kudu5:1521:helowin"], - "table": ["TUDOU.KUDU"] + "jdbcUrl": ["jdbc:oracle:thin:@0.0.0.1:1521:oracle"], + "table": ["table"] }], "column": ["ID","USER_ID","NAME"], - "customSql": "select * from kudu where ID > 20", + "customSql": "select * from table where ID > 20", "where": "ID < 10000", "splitPk": "ID", "fetchSize": 1024, @@ -314,11 +382,11 @@ { "reader": { "parameter": { - "username": "tudou", - "password": "abc123", + "username": "username", + "password": "password", "connection": [{ - "jdbcUrl": ["jdbc:oracle:thin:@kudu5:1521:helowin"], - "table": ["TUDOU.KUDU"] + "jdbcUrl": ["jdbc:oracle:thin:@0.0.0.1:1521:oracle"], + "table": ["TABLE"] }], "column": [{ "name": "ID", @@ -382,11 +450,11 @@ { "reader": { "parameter": { - "username": "tudou", - "password": "abc123", + "username": "username", + "password": "password", "connection": [{ - "jdbcUrl": ["jdbc:oracle:thin:@kudu5:1521:helowin"], - "table": ["TUDOU.KUDU"] + "jdbcUrl": ["jdbc:oracle:thin:@0.0.0.1:1521:oracle"], + "table": ["TABLE"] }], "column": [{ "name": "ID", diff --git a/docs/offline/reader/phoenixreader.md b/docs/offline/reader/phoenixreader.md index 14ceca3099..1285561746 100644 --- a/docs/offline/reader/phoenixreader.md +++ b/docs/offline/reader/phoenixreader.md @@ -9,35 +9,72 @@ phoenix4.12.0-HBase-1.3及以上 ## 三、参数说明 +- **connection** + - 描述:数据库连接参数,包含jdbcUrl、schema、table等参数 + - 必选:是 + - 字段类型:List + - 示例:指定jdbcUrl、schema、table + ```json + "connection": [{ + "jdbcUrl": ["jdbc:phoenix:node01,node02,node03:2181"], + "table": ["table"], + "schema":"public" + }] + ``` + - 默认值:无 + +
+ +- **schema** + - 描述:数据库schema名 + - 必选:否 + - 字段类型:String + - 默认值:无 + +
+ +- **table** + - 描述:目的表的表名称。目前只支持配置单个表,后续会支持多表 + - 必选:是 + - 字段类型:List + - 默认值:无 + +
+ + - **jdbcUrl** - 描述:针对关系型数据库的jdbc连接字符串
jdbcUrl参考文档:[phoniex官方文档](https://phoenix.apache.org/#) - 必选:是 + - 字段类型:List - 默认值:无 - +
- **username** - 描述:数据源的用户名 - 必选:是 + - 字段类型:String - 默认值:无 - +
- **password** - 描述:数据源指定用户名的密码 - 必选:是 + - 字段类型:String - 默认值:无 - +
- **where** - 描述:筛选条件,reader插件根据指定的column、table、where条件拼接SQL,并根据这个SQL进行数据抽取。在实际业务场景中,往往会选择当天的数据进行同步,可以将where条件指定为gmt_create > time。 - 注意:不可以将where条件指定为limit 10,limit不是SQL的合法where子句。 - 必选:否 + - 字段类型:String - 默认值:无 - +
- **splitPk** - 描述:当speed配置中的channel大于1时指定此参数,Reader插件根据并发数和此参数指定的字段拼接sql,使每个并发读取不同的数据,提升读取速率。 @@ -46,17 +83,28 @@ phoenix4.12.0-HBase-1.3及以上 - 目前splitPk仅支持整形数据切分,`不支持浮点、字符串、日期等其他类型`。如果用户指定其他非支持类型,FlinkX将报错! - 如果channel大于1但是没有配置此参数,任务将置为失败。 - 必选:否 + - 字段类型:String - 默认值:无 +
+ +- **fetchSize** + - 描述:一次性从数据库中读取多少条数据,jdbc默认一次将所有结果都读取到内存中,在数据量很大时可能会造成OOM,设置这个参数可以控制每次读取fetchSize条数据。 + - 注意:此参数的值不可设置过大,否则会读取超时,导致任务失败。 + - 必选:否 + - 字段类型:int + - 默认值:1000 +
- **queryTimeOut** - 描述:查询超时时间,单位秒。 - 注意:当数据量很大,或者从视图查询,或者自定义sql查询时,可通过此参数指定超时时间。 - 必选:否 + - 字段类型:int - 默认值:1000 - +
- **customSql** - 描述:自定义的查询语句,如果只指定字段不能满足需求时,可通过此参数指定查询的sql,可以是任意复杂的查询语句。 @@ -66,9 +114,10 @@ phoenix4.12.0-HBase-1.3及以上 - 当指定了此参数时,connection里指定的table无效; - 当指定此参数时,column必须指定具体字段信息,不能以*号代替; - 必选:否 + - 字段类型:String - 默认值:无 - +
- **column** - 描述:需要读取的字段。 @@ -97,27 +146,55 @@ phoenix4.12.0-HBase-1.3及以上 - format:如果字段是时间字符串,可以指定时间的格式,将字段类型转为日期格式返回 - value:如果数据库里不存在指定的字段,则会把value的值作为常量列返回,如果指定的字段存在,当指定字段的值为null时,会以此value值作为默认值返回 - 必选:是 + - 字段类型:List - 默认值:无 - +
- **polling** - 描述:是否开启间隔轮询,开启后会根据`pollingInterval`轮询间隔时间周期性的从数据库拉取数据。开启间隔轮询还需配置参数`pollingInterval`,`increColumn`,可以选择配置参数`startLocation`。若不配置参数`startLocation`,任务启动时将会从数据库中查询增量字段最大值作为轮询的开始位置。 - 必选:否 + - 字段类型:Boolean - 默认值:false - +
- **pollingInterval** - 描述:轮询间隔时间,从数据库中拉取数据的间隔时间,默认为5000毫秒。 - 必选:否 + - 字段类型:long - 默认值:5000 +
+ +- **increColumn** + - 增量字段,可以是对应的增量字段名,也可以是纯数字,表示增量字段在column中的顺序位置(从0开始) + - 必选:否 + - 字段类型:String或int + - 默认值:无 +
+ +- **startLocation** + - 描述:增量查询起始位置 + - 必选:否 + - 字段类型:String + - 默认值:无 + +
+ +- **useMaxFunc** + - 描述:用于标记是否保存endLocation位置的一条或多条数据,true:不保存,false(默认):保存, 某些情况下可能出现最后几条数据被重复记录的情况,可以将此参数配置为true + - 必选:否 + - 字段类型:Boolean + - 默认值:false + +
- **requestAccumulatorInterval** - 描述:发送查询累加器请求的间隔时间。 - 必选:否 + - 字段类型:int - 默认值:2 ## 四、配置示例 @@ -131,16 +208,13 @@ phoenix4.12.0-HBase-1.3及以上 "parameter" : { "column" : [ { "name" : "id", - "type" : "bigint", - "key" : "id" + "type" : "bigint" }, { "name" : "user_id", - "type" : "bigint", - "key" : "user_id" + "type" : "bigint" }, { "name" : "name", - "type" : "varchar", - "key" : "name" + "type" : "varchar" } ], "username" : "", "password" : "", @@ -199,16 +273,13 @@ phoenix4.12.0-HBase-1.3及以上 "parameter" : { "column" : [ { "name" : "id", - "type" : "bigint", - "key" : "id" + "type" : "bigint" }, { "name" : "user_id", - "type" : "bigint", - "key" : "user_id" + "type" : "bigint" }, { "name" : "name", - "type" : "varchar", - "key" : "name" + "type" : "varchar" } ], "username" : "", "password" : "", @@ -267,16 +338,13 @@ phoenix4.12.0-HBase-1.3及以上 "parameter" : { "column" : [ { "name" : "id", - "type" : "bigint", - "key" : "id" + "type" : "bigint" }, { "name" : "user_id", - "type" : "bigint", - "key" : "user_id" + "type" : "bigint" }, { "name" : "name", - "type" : "varchar", - "key" : "name" + "type" : "varchar" } ], "username" : "", "password" : "", @@ -333,16 +401,13 @@ phoenix4.12.0-HBase-1.3及以上 "parameter" : { "column" : [ { "name" : "id", - "type" : "bigint", - "key" : "id" + "type" : "bigint" }, { "name" : "user_id", - "type" : "bigint", - "key" : "user_id" + "type" : "bigint" }, { "name" : "name", - "type" : "varchar", - "key" : "name" + "type" : "varchar" } ], "username" : "", "password" : "", @@ -401,16 +466,13 @@ phoenix4.12.0-HBase-1.3及以上 "parameter" : { "column" : [ { "name" : "id", - "type" : "bigint", - "key" : "id" + "type" : "bigint" }, { "name" : "user_id", - "type" : "bigint", - "key" : "user_id" + "type" : "bigint" }, { "name" : "name", - "type" : "varchar", - "key" : "name" + "type" : "varchar" } ], "username" : "", "password" : "", @@ -471,16 +533,13 @@ phoenix4.12.0-HBase-1.3及以上 "parameter" : { "column" : [ { "name" : "id", - "type" : "bigint", - "key" : "id" + "type" : "bigint" }, { "name" : "user_id", - "type" : "bigint", - "key" : "user_id" + "type" : "bigint" }, { "name" : "name", - "type" : "varchar", - "key" : "name" + "type" : "varchar" } ], "username" : "", "password" : "", diff --git a/docs/offline/reader/polardbreader.md b/docs/offline/reader/polardbreader.md index fd9eed5962..f4a022cee7 100644 --- a/docs/offline/reader/polardbreader.md +++ b/docs/offline/reader/polardbreader.md @@ -10,35 +10,71 @@ ## 三、参数说明 +- **connection** + - 描述:数据库连接参数,包含jdbcUrl、schema、table等参数 + - 必选:是 + - 字段类型:List + - 示例:指定jdbcUrl、schema、table + ```json + "connection": [{ + "jdbcUrl": ["jdbc:polardb://0.0.0.1:3306/database"], + "table": ["table"], + "schema":"public" + }] + ``` + - 默认值:无 + +
+ - **jdbcUrl** - 描述:针对关系型数据库的jdbc连接字符串 -
jdbcUrl参考文档:[Mysql官方文档](http://dev.mysql.com/doc/connector-j/en/connector-j-reference-configuration-properties.html) +
jdbcUrl参考文档:[PolarDB官方文档](https://help.aliyun.com/document_detail/147247.html?spm=a2c4g.11186623.2.8.51346c98uYvnUU) - 必选:是 + - 字段类型:List - 默认值:无 +
+ +- **schema** + - 描述:数据库schema名 + - 必选:否 + - 字段类型:String + - 默认值:无 + +
+ +- **table** + - 描述:目的表的表名称。目前只支持配置单个表,后续会支持多表 + - 必选:是 + - 字段类型:List + - 默认值:无 +
- **username** - 描述:数据源的用户名 - 必选:是 + - 字段类型:String - 默认值:无 - +
- **password** - 描述:数据源指定用户名的密码 - 必选:是 + - 字段类型:String - 默认值:无 - +
- **where** - 描述:筛选条件,reader插件根据指定的column、table、where条件拼接SQL,并根据这个SQL进行数据抽取。在实际业务场景中,往往会选择当天的数据进行同步,可以将where条件指定为gmt_create > time。 - 注意:不可以将where条件指定为limit 10,limit不是SQL的合法where子句。 - 必选:否 + - 字段类型:String - 默认值:无 - +
- **splitPk** - 描述:当speed配置中的channel大于1时指定此参数,Reader插件根据并发数和此参数指定的字段拼接sql,使每个并发读取不同的数据,提升读取速率。 @@ -47,17 +83,28 @@ - 目前splitPk仅支持整形数据切分,`不支持浮点、字符串、日期等其他类型`。如果用户指定其他非支持类型,FlinkX将报错! - 如果channel大于1但是没有配置此参数,任务将置为失败。 - 必选:否 + - 字段类型:String - 默认值:无 +
+- **fetchSize** + - 描述:一次性从数据库中读取多少条数据,jdbc默认一次将所有结果都读取到内存中,在数据量很大时可能会造成OOM,设置这个参数可以控制每次读取fetchSize条数据。 + - 注意:此参数的值不可设置过大,否则会读取超时,导致任务失败。 + - 必选:否 + - 字段类型:int + - 默认值:1000 + +
- **queryTimeOut** - 描述:查询超时时间,单位秒。 - 注意:当数据量很大,或者从视图查询,或者自定义sql查询时,可通过此参数指定超时时间。 - 必选:否 + - 字段类型:int - 默认值:1000 - +
- **customSql** - 描述:自定义的查询语句,如果只指定字段不能满足需求时,可通过此参数指定查询的sql,可以是任意复杂的查询语句。 @@ -67,9 +114,10 @@ - 当指定了此参数时,connection里指定的table无效; - 当指定此参数时,column必须指定具体字段信息,不能以*号代替; - 必选:否 + - 字段类型:String - 默认值:无 - +
- **column** - 描述:需要读取的字段。 @@ -98,31 +146,57 @@ - format:如果字段是时间字符串,可以指定时间的格式,将字段类型转为日期格式返回 - value:如果数据库里不存在指定的字段,则会把value的值作为常量列返回,如果指定的字段存在,当指定字段的值为null时,会以此value值作为默认值返回 - 必选:是 + - 字段类型:List - 默认值:无 - +
- **polling** - 描述:是否开启间隔轮询,开启后会根据`pollingInterval`轮询间隔时间周期性的从数据库拉取数据。开启间隔轮询还需配置参数`pollingInterval`,`increColumn`,可以选择配置参数`startLocation`。若不配置参数`startLocation`,任务启动时将会从数据库中查询增量字段最大值作为轮询的开始位置。 - 必选:否 + - 字段类型:Boolean - 默认值:false - +
- **pollingInterval** - 描述:轮询间隔时间,从数据库中拉取数据的间隔时间,默认为5000毫秒。 - 必选:否 + - 字段类型:long - 默认值:5000 +
+- **increColumn** + - 增量字段,可以是对应的增量字段名,也可以是纯数字,表示增量字段在column中的顺序位置(从0开始) + - 必选:否 + - 字段类型:String或int + - 默认值:无 + +
+ +- **startLocation** + - 描述:增量查询起始位置 + - 必选:否 + - 字段类型:String + - 默认值:无 + +
+ +- **useMaxFunc** + - 描述:用于标记是否保存endLocation位置的一条或多条数据,true:不保存,false(默认):保存, 某些情况下可能出现最后几条数据被重复记录的情况,可以将此参数配置为true + - 必选:否 + - 字段类型:Boolean + - 默认值:false + +
- **requestAccumulatorInterval** - 描述:发送查询累加器请求的间隔时间。 - 必选:否 + - 字段类型:int - 默认值:2 - - ## 四、配置示例 @@ -134,11 +208,11 @@ { "reader": { "parameter": { - "username": "dtstack", - "password": "abc123", + "username": "username", + "password": "password", "connection": [{ - "jdbcUrl": ["jdbc:mysql://kudu3:3306/tudou?useUnicode=true&characterEncoding=utf8"], - "table": ["kudu"] + "jdbcUrl": ["jdbc:polardb://0.0.0.1:3306/database"], + "table": ["table"] }], "column": ["*"], "customSql": "", @@ -190,11 +264,11 @@ { "reader": { "parameter": { - "username": "dtstack", - "password": "abc123", + "username": "username", + "password": "password", "connection": [{ - "jdbcUrl": ["jdbc:mysql://kudu3:3306/tudou?useUnicode=true&characterEncoding=utf8"], - "table": ["kudu"] + "jdbcUrl": ["jdbc:polardb://0.0.0.1:3306/database"], + "table": ["table"] }], "column": ["*"], "customSql": "", @@ -246,14 +320,14 @@ { "reader": { "parameter": { - "username": "dtstack", - "password": "abc123", + "username": "username", + "password": "password", "connection": [{ - "jdbcUrl": ["jdbc:mysql://kudu3:3306/tudou?useUnicode=true&characterEncoding=utf8"], - "table": ["kudu"] + "jdbcUrl": ["jdbc:polardb://0.0.0.1:3306/database"], + "table": ["table"] }], "column": ["id","user_id","name"], - "customSql": "select * from kudu where id > 20", + "customSql": "select * from table where id > 20", "where": "id < 100", "splitPk": "", "fetchSize": 0, @@ -302,11 +376,11 @@ { "reader": { "parameter": { - "username": "dtstack", - "password": "abc123", + "username": "username", + "password": "password", "connection": [{ - "jdbcUrl": ["jdbc:mysql://kudu3:3306/tudou?useUnicode=true&characterEncoding=utf8"], - "table": ["kudu"] + "jdbcUrl": ["jdbc:polardb://0.0.0.1:3306/database"], + "table": ["table"] }], "column": [{ "name": "id", @@ -369,11 +443,11 @@ { "reader": { "parameter": { - "username": "dtstack", - "password": "abc123", + "username": "username", + "password": "password", "connection": [{ - "jdbcUrl": ["jdbc:mysql://kudu3:3306/tudou?useUnicode=true&characterEncoding=utf8"], - "table": ["kudu"] + "jdbcUrl": ["jdbc:polardb://0.0.0.1:3306/database"], + "table": ["table"] }], "column": [{ "name": "id", diff --git a/docs/offline/reader/postgresqlreader.md b/docs/offline/reader/postgresqlreader.md index 302eb54d36..ea1234e44b 100644 --- a/docs/offline/reader/postgresqlreader.md +++ b/docs/offline/reader/postgresqlreader.md @@ -10,35 +10,71 @@ ## 三、参数说明
+- **connection** + - 描述:数据库连接参数,包含jdbcUrl、schema、table等参数 + - 必选:是 + - 字段类型:List + - 示例:指定jdbcUrl、schema、table + ```json + "connection": [{ + "jdbcUrl": ["jdbc:postgresql://0.0.0.1:5432/postgres"], + "table": ["table"], + "schema":"public" + }] + ``` + - 默认值:无 + +
+ - **jdbcUrl** - 描述:针对关系型数据库的jdbc连接字符串
jdbcUrl参考文档:[postgresql官方文档](https://jdbc.postgresql.org/documentation/head/connect.html#connection-parameters) - 必选:是 + - 字段类型:List - 默认值:无 +
+ +- **schema** + - 描述:数据库schema名 + - 必选:否 + - 字段类型:String + - 默认值:无 +
+ +- **table** + - 描述:目的表的表名称。目前只支持配置单个表,后续会支持多表 + - 必选:是 + - 字段类型:List + - 默认值:无 + +
- **username** - 描述:数据源的用户名 - 必选:是 + - 字段类型:String - 默认值:无 - +
- **password** - 描述:数据源指定用户名的密码 - 必选:是 + - 字段类型:String - 默认值:无 - +
- **where** - 描述:筛选条件,reader插件根据指定的column、table、where条件拼接SQL,并根据这个SQL进行数据抽取。在实际业务场景中,往往会选择当天的数据进行同步,可以将where条件指定为gmt_create > time。 - 注意:不可以将where条件指定为limit 10,limit不是SQL的合法where子句。 - 必选:否 + - 字段类型:String - 默认值:无 - +
- **splitPk** - 描述:当speed配置中的channel大于1时指定此参数,Reader插件根据并发数和此参数指定的字段拼接sql,使每个并发读取不同的数据,提升读取速率。 @@ -47,25 +83,28 @@ - 目前splitPk仅支持整形数据切分,`不支持浮点、字符串、日期等其他类型`。如果用户指定其他非支持类型,FlinkX将报错! - 如果channel大于1但是没有配置此参数,任务将置为失败。 - 必选:否 + - 字段类型:String - 默认值:无 - +
- **fetchSize** - - 描述:读取时每批次读取的数据条数。 + - 描述:一次性从数据库中读取多少条数据,jdbc默认一次将所有结果都读取到内存中,在数据量很大时可能会造成OOM,设置这个参数可以控制每次读取fetchSize条数据。 - 注意:此参数的值不可设置过大,否则会读取超时,导致任务失败。 - 必选:否 + - 字段类型:int - 默认值:1000 - +
- **queryTimeOut** - 描述:查询超时时间,单位秒。 - 注意:当数据量很大,或者从视图查询,或者自定义sql查询时,可通过此参数指定超时时间。 - 必选:否 + - 字段类型:int - 默认值:1000 - +
- **customSql** - 描述:自定义的查询语句,如果只指定字段不能满足需求时,可通过此参数指定查询的sql,可以是任意复杂的查询语句。 @@ -75,9 +114,10 @@ - 当指定了此参数时,connection里指定的table无效; - 当指定此参数时,column必须指定具体字段信息,不能以*号代替; - 必选:否 + - 字段类型:String - 默认值:无 - +
- **column** - 描述:需要读取的字段。 @@ -106,27 +146,55 @@ - format:如果字段是时间字符串,可以指定时间的格式,将字段类型转为日期格式返回 - value:如果数据库里不存在指定的字段,则会把value的值作为常量列返回,如果指定的字段存在,当指定字段的值为null时,会以此value值作为默认值返回 - 必选:是 + - 字段类型:List - 默认值:无 - +
- **polling** - 描述:是否开启间隔轮询,开启后会根据`pollingInterval`轮询间隔时间周期性的从数据库拉取数据。开启间隔轮询还需配置参数`pollingInterval`,`increColumn`,可以选择配置参数`startLocation`。若不配置参数`startLocation`,任务启动时将会从数据库中查询增量字段最大值作为轮询的开始位置。 - 必选:否 + - 字段类型:Boolean - 默认值:false - +
- **pollingInterval** - 描述:轮询间隔时间,从数据库中拉取数据的间隔时间,默认为5000毫秒。 - 必选:否 + - 字段类型:long - 默认值:5000 +
+ +- **increColumn** + - 增量字段,可以是对应的增量字段名,也可以是纯数字,表示增量字段在column中的顺序位置(从0开始) + - 必选:否 + - 字段类型:String或int + - 默认值:无 + +
+ +- **startLocation** + - 描述:增量查询起始位置 + - 必选:否 + - 字段类型:String + - 默认值:无 + +
+ +- **useMaxFunc** + - 描述:用于标记是否保存endLocation位置的一条或多条数据,true:不保存,false(默认):保存, 某些情况下可能出现最后几条数据被重复记录的情况,可以将此参数配置为true + - 必选:否 + - 字段类型:Boolean + - 默认值:false +
- **requestAccumulatorInterval** - 描述:发送查询累加器请求的间隔时间。 - 必选:否 + - 字段类型:int - 默认值:2 ** @@ -142,16 +210,13 @@ "parameter" : { "column" : [ { "name" : "id", - "type" : "bigint", - "key" : "id" + "type" : "bigint" }, { "name" : "user_id", - "type" : "bigint", - "key" : "user_id" + "type" : "bigint" }, { "name" : "name", - "type" : "varchar", - "key" : "name" + "type" : "varchar" } ], "username" : "username", "password" : "password", @@ -197,16 +262,13 @@ "parameter" : { "column" : [ { "name" : "id", - "type" : "bigint", - "key" : "id" + "type" : "bigint" }, { "name" : "user_id", - "type" : "bigint", - "key" : "user_id" + "type" : "bigint" }, { "name" : "name", - "type" : "varchar", - "key" : "name" + "type" : "varchar" } ], "username" : "username", "password" : "password", @@ -252,16 +314,13 @@ "parameter" : { "column" : [ { "name" : "id", - "type" : "bigint", - "key" : "id" + "type" : "bigint" }, { "name" : "user_id", - "type" : "bigint", - "key" : "user_id" + "type" : "bigint" }, { "name" : "name", - "type" : "varchar", - "key" : "name" + "type" : "varchar" } ], "username" : "username", "password" : "password", @@ -307,16 +366,13 @@ "parameter" : { "column" : [ { "name" : "id", - "type" : "bigint", - "key" : "id" + "type" : "bigint" }, { "name" : "user_id", - "type" : "bigint", - "key" : "user_id" + "type" : "bigint" }, { "name" : "name", - "type" : "varchar", - "key" : "name" + "type" : "varchar" } ], "username" : "username", "password" : "password", @@ -364,16 +420,13 @@ "parameter" : { "column" : [ { "name" : "id", - "type" : "bigint", - "key" : "id" + "type" : "bigint" }, { "name" : "user_id", - "type" : "bigint", - "key" : "user_id" + "type" : "bigint" }, { "name" : "name", - "type" : "varchar", - "key" : "name" + "type" : "varchar" } ], "username" : "username", "password" : "password", diff --git a/docs/offline/reader/saphanareader.md b/docs/offline/reader/saphanareader.md index aea38f3a40..3194d14bad 100644 --- a/docs/offline/reader/saphanareader.md +++ b/docs/offline/reader/saphanareader.md @@ -9,34 +9,70 @@ SAP HANA 2.0及以上
## 三、参数说明 +- **connection** + - 描述:数据库连接参数,包含jdbcUrl、schema、table等参数 + - 必选:是 + - 字段类型:List + - 示例:指定jdbcUrl、schema、table + ```json + "connection": [{ + "jdbcUrl": ["jdbc:sap://0.0.0.1:39017"], + "table": ["table"], + "schema":"public" + }] + ``` + - 默认值:无 + +
+ - **jdbcUrl** - 描述:jdbc连接字符串 - 必选:是 + - 字段类型:List - 默认值:无 - +
- **username** - 描述:数据源的用户名 - 必选:是 + - 字段类型:String - 默认值:无 - +
- **password** - 描述:数据源指定用户名的密码 - 必选:是 + - 字段类型:String - 默认值:无 +
+- **schema** + - 描述:数据库schema名 + - 必选:否 + - 字段类型:String + - 默认值:无 + +
+ +- **table** + - 描述:目的表的表名称。目前只支持配置单个表,后续会支持多表 + - 必选:是 + - 字段类型:List + - 默认值:无 + +
- **where** - 描述:筛选条件,reader插件根据指定的column、table、where条件拼接SQL,并根据这个SQL进行数据抽取。在实际业务场景中,往往会选择当天的数据进行同步,可以将where条件指定为gmt_create > time。 - 注意:不可以将where条件指定为limit 10,limit不是SQL的合法where子句。 - 必选:否 + - 字段类型:String - 默认值:无 - +
- **splitPk** - 描述:当speed配置中的channel大于1时指定此参数,Reader插件根据并发数和此参数指定的字段拼接sql,使每个并发读取不同的数据,提升读取速率。 @@ -45,25 +81,28 @@ SAP HANA 2.0及以上
- 目前splitPk仅支持整形数据切分,`不支持浮点、字符串、日期等其他类型`。如果用户指定其他非支持类型,FlinkX将报错! - 如果channel大于1但是没有配置此参数,任务将置为失败。 - 必选:否 + - 字段类型:String - 默认值:无 - +
- **fetchSize** - - 描述:读取时每批次读取的数据条数。 + - 描述:一次性从数据库中读取多少条数据,jdbc默认一次将所有结果都读取到内存中,在数据量很大时可能会造成OOM,设置这个参数可以控制每次读取fetchSize条数据。 - 注意:此参数的值不可设置过大,否则会读取超时,导致任务失败。 - 必选:否 + - 字段类型:int - 默认值:1000 - +
- **queryTimeOut** - 描述:查询超时时间,单位秒。 - 注意:当数据量很大,或者从视图查询,或者自定义sql查询时,可通过此参数指定超时时间。 - 必选:否 - - 默认值:3000 - + - 字段类型:int + - 默认值:1000 +
- **customSql** - 描述:自定义的查询语句,如果只指定字段不能满足需求时,可通过此参数指定查询的sql,可以是任意复杂的查询语句。 @@ -73,9 +112,10 @@ SAP HANA 2.0及以上
- 当指定了此参数时,connection里指定的table无效; - 当指定次参数时,column必须指定具体字段信息,不能以*号代替; - 必选:否 + - 字段类型:String - 默认值:无 - +
- **column** - 描述:需要读取的字段。 @@ -106,31 +146,57 @@ SAP HANA 2.0及以上
- format:如果字段是时间字符串,可以指定时间的格式,将字段类型转为日期格式返回 - value:如果数据库里不存在指定的字段,则会把value的值作为常量列返回,如果指定的字段存在,当指定字段的值为null时,会以此value值作为默认值返回 - 必选:是 + - 字段类型:List - 默认值:无 - +
- **polling** - 描述:是否开启间隔轮询,开启后会根据`pollingInterval`轮询间隔时间周期性的从数据库拉取数据。开启间隔轮询还需配置参数`pollingInterval`,`increColumn`,可以选择配置参数`startLocation`。若不配置参数`startLocation`,任务启动时将会从数据库中查询增量字段最大值作为轮询的开始位置。 - 必选:否 + - 字段类型:Boolean - 默认值:false - +
- **pollingInterval** - 描述:轮询间隔时间,从数据库中拉取数据的间隔时间,默认为5000毫秒。 - 必选:否 + - 字段类型:long - 默认值:5000 +
+ +- **increColumn** + - 增量字段,可以是对应的增量字段名,也可以是纯数字,表示增量字段在column中的顺序位置(从0开始) + - 必选:否 + - 字段类型:String或int + - 默认值:无 + +
+ +- **startLocation** + - 描述:增量查询起始位置 + - 必选:否 + - 字段类型:String + - 默认值:无 + +
+- **useMaxFunc** + - 描述:用于标记是否保存endLocation位置的一条或多条数据,true:不保存,false(默认):保存, 某些情况下可能出现最后几条数据被重复记录的情况,可以将此参数配置为true + - 必选:否 + - 字段类型:Boolean + - 默认值:false + +
- **requestAccumulatorInterval** - 描述:发送查询累加器请求的间隔时间。 - 必选:否 + - 字段类型:int - 默认值:2 - - ## 四、配置示例 ```json @@ -145,7 +211,7 @@ SAP HANA 2.0及以上
"connection": [ { "jdbcUrl": [ - "jdbc:sap://kudu3:39017" + "jdbc:sap://0.0.0.1:39017" ], "table": [ "SYS.P_DPAPI_KEY_" diff --git a/docs/offline/reader/sqlserverreader.md b/docs/offline/reader/sqlserverreader.md index e5cebc2803..9b465721da 100644 --- a/docs/offline/reader/sqlserverreader.md +++ b/docs/offline/reader/sqlserverreader.md @@ -10,34 +10,70 @@ ## 三、参数说明 +- **connection** + - 描述:数据库连接参数,包含jdbcUrl、schema、table等参数 + - 必选:是 + - 字段类型:List + - 示例:指定jdbcUrl、schema、table + ```json + "connection": [{ + "jdbcUrl": ["jdbc:jtds:sqlserver://0.0.0.1:1433;DatabaseName=database"], + "table": ["table"], + "schema":"public" + }] + ``` + - 默认值:无 + +
+ - **jdbcUrl** - 描述:使用开源的jtds驱动连接 而非Microsoft的官方驱动
jdbcUrl参考文档:[jtds驱动官方文档](http://jtds.sourceforge.net/faq.html) - 必选:是 + - 字段类型:List + - 默认值:无 + +
+ +- **schema** + - 描述:数据库schema名 + - 必选:否 + - 字段类型:String - 默认值:无 +
+- **table** + - 描述:目的表的表名称。目前只支持配置单个表,后续会支持多表 + - 必选:是 + - 字段类型:List + - 默认值:无 + +
- **username** - 描述:数据源的用户名 - 必选:是 + - 字段类型:String - 默认值:无 - +
- **password** - 描述:数据源指定用户名的密码 - 必选:是 + - 字段类型:String - 默认值:无 - +
- **where** - 描述:筛选条件,reader插件根据指定的column、table、where条件拼接SQL,并根据这个SQL进行数据抽取。在实际业务场景中,往往会选择当天的数据进行同步,可以将where条件指定为gmt_create > time。 - 注意:不可以将where条件指定为limit 10,limit不是SQL的合法where子句。 - 必选:否 + - 字段类型:String - 默认值:无 - +
- **splitPk** - 描述:当speed配置中的channel大于1时指定此参数,Reader插件根据并发数和此参数指定的字段拼接sql,使每个并发读取不同的数据,提升读取速率。 @@ -46,25 +82,28 @@ - 目前splitPk仅支持整形数据切分,`不支持浮点、字符串、日期等其他类型`。如果用户指定其他非支持类型,FlinkX将报错! - 如果channel大于1但是没有配置此参数,任务将置为失败。 - 必选:否 + - 字段类型:String - 默认值:无 - +
- **fetchSize** - - 描述:读取时每批次读取的数据条数。 + - 描述:一次性从数据库中读取多少条数据,jdbc默认一次将所有结果都读取到内存中,在数据量很大时可能会造成OOM,设置这个参数可以控制每次读取fetchSize条数据。 - 注意:此参数的值不可设置过大,否则会读取超时,导致任务失败。 - 必选:否 + - 字段类型:int - 默认值:1000 - +
- **queryTimeOut** - 描述:查询超时时间,单位秒。 - 注意:当数据量很大,或者从视图查询,或者自定义sql查询时,可通过此参数指定超时时间。 - 必选:否 + - 字段类型:int - 默认值:1000 - +
- **customSql** - 描述:自定义的查询语句,如果只指定字段不能满足需求时,可通过此参数指定查询的sql,可以是任意复杂的查询语句。 @@ -74,9 +113,18 @@ - 当指定了此参数时,connection里指定的table无效; - 当指定此参数时,column必须指定具体字段信息,不能以*号代替; - 必选:否 + - 字段类型:String - 默认值:无 +
+- **withNoLock** + - 描述:是否在sql语句后面添加 with(nolock) + - 必选:否 + - 字段类型:Boolean + - 默认值:false + +
- **column** - 描述:需要读取的字段。 @@ -107,27 +155,56 @@ - format:如果字段是时间字符串,可以指定时间的格式,将字段类型转为日期格式返回 - value:如果数据库里不存在指定的字段,则会把value的值作为常量列返回,如果指定的字段存在,当指定字段的值为null时,会以此value值作为默认值返回 - 必选:是 + - 字段类型:List - 默认值:无 - +
- **polling** - 描述:是否开启间隔轮询,开启后会根据`pollingInterval`轮询间隔时间周期性的从数据库拉取数据。开启间隔轮询还需配置参数`pollingInterval`,`increColumn`,可以选择配置参数`startLocation`。若不配置参数`startLocation`,任务启动时将会从数据库中查询增量字段最大值作为轮询的开始位置。 - 必选:否 + - 字段类型:Boolean - 默认值:false - +
- **pollingInterval** - 描述:轮询间隔时间,从数据库中拉取数据的间隔时间,默认为5000毫秒。 - 必选:否 + - 字段类型:long - 默认值:5000 +
+ +- **increColumn** + - 增量字段,可以是对应的增量字段名,也可以是纯数字,表示增量字段在column中的顺序位置(从0开始) + - 必选:否 + - 字段类型:String或int + - 默认值:无 + +
+ +- **startLocation** + - 描述:增量查询起始位置 + - 必选:否 + - 字段类型:String + - 默认值:无 + +
+ +- **useMaxFunc** + - 描述:用于标记是否保存endLocation位置的一条或多条数据,true:不保存,false(默认):保存, 某些情况下可能出现最后几条数据被重复记录的情况,可以将此参数配置为true + - 必选:否 + - 字段类型:Boolean + - 默认值:false + +
- **requestAccumulatorInterval** - 描述:发送查询累加器请求的间隔时间。 - 必选:否 + - 字段类型:int - 默认值:2 ** @@ -143,21 +220,18 @@ "parameter" : { "column" : [ { "name" : "id", - "type" : "bigint", - "key" : "id" + "type" : "bigint" }, { "name" : "user_id", - "type" : "bigint", - "key" : "user_id" + "type" : "bigint" }, { "name" : "name", - "type" : "varchar", - "key" : "name" + "type" : "varchar" } ], "username" : "username", "password" : "password", "connection" : [ { - "jdbcUrl" : [ "jdbc:jtds:sqlserver://0.0.0.1:1433;DatabaseName=DTstack" ], + "jdbcUrl" : [ "jdbc:jtds:sqlserver://0.0.0.1:1433;DatabaseName=database" ], "table" : [ "tableTest" ] } ], "where": "id > 1", @@ -209,21 +283,18 @@ "parameter" : { "column" : [ { "name" : "id", - "type" : "bigint", - "key" : "id" + "type" : "bigint" }, { "name" : "user_id", - "type" : "bigint", - "key" : "user_id" + "type" : "bigint" }, { "name" : "name", - "type" : "varchar", - "key" : "name" + "type" : "varchar" } ], "username" : "username", "password" : "password", "connection" : [ { - "jdbcUrl" : [ "jdbc:jtds:sqlserver://0.0.0.1:1433;DatabaseName=DTstack" ], + "jdbcUrl" : [ "jdbc:jtds:sqlserver://0.0.0.1:1433;DatabaseName=database" ], "table" : [ "tableTest" ] } ], "where": "id > 1", @@ -275,28 +346,25 @@ "parameter" : { "column" : [ { "name" : "id", - "type" : "bigint", - "key" : "id" + "type" : "bigint" }, { "name" : "user_id", - "type" : "bigint", - "key" : "user_id" + "type" : "bigint" }, { "name" : "name", - "type" : "varchar", - "key" : "name" + "type" : "varchar" } ], "username" : "username", "password" : "password", "connection" : [ { - "jdbcUrl" : [ "jdbc:jtds:sqlserver://0.0.0.1:1433;DatabaseName=DTstack" ], + "jdbcUrl" : [ "jdbc:jtds:sqlserver://0.0.0.1:1433;DatabaseName=database" ], "table" : [ "tableTest" ] } ], "where": "id > 1", "splitPk": "id", "fetchSize": 1000, "queryTimeOut": 1000, - "customSql": "select * from kudu where id > 20", + "customSql": "select * from table where id > 20", "requestAccumulatorInterval": 2 }, "name" : "sqlserverreader" @@ -341,21 +409,18 @@ "parameter" : { "column" : [ { "name" : "id", - "type" : "bigint", - "key" : "id" + "type" : "bigint" }, { "name" : "user_id", - "type" : "bigint", - "key" : "user_id" + "type" : "bigint" }, { "name" : "name", - "type" : "varchar", - "key" : "name" + "type" : "varchar" } ], "username" : "username", "password" : "password", "connection" : [ { - "jdbcUrl" : [ "jdbc:jtds:sqlserver://0.0.0.1:1433;DatabaseName=DTstack" ], + "jdbcUrl" : [ "jdbc:jtds:sqlserver://0.0.0.1:1433;DatabaseName=database" ], "table" : [ "tableTest" ] } ], "where": "id > 1", @@ -410,21 +475,18 @@ "parameter" : { "column" : [ { "name" : "id", - "type" : "bigint", - "key" : "id" + "type" : "bigint" }, { "name" : "user_id", - "type" : "bigint", - "key" : "user_id" + "type" : "bigint" }, { "name" : "name", - "type" : "varchar", - "key" : "name" + "type" : "varchar" } ], "username" : "username", "password" : "password", "connection" : [ { - "jdbcUrl" : [ "jdbc:jtds:sqlserver://0.0.0.1:1433;DatabaseName=DTstack" ], + "jdbcUrl" : [ "jdbc:jtds:sqlserver://0.0.0.1:1433;DatabaseName=database" ], "table" : [ "tableTest" ] } ], "where": "id > 1", diff --git a/docs/offline/reader/streamreader.md b/docs/offline/reader/streamreader.md index 697571098a..e7e29ac74b 100644 --- a/docs/offline/reader/streamreader.md +++ b/docs/offline/reader/streamreader.md @@ -12,7 +12,7 @@ ### 三、参数说明 - **sliceRecordCount** - - 描述:每个通道生成的数据条数,不配置此参数或者配置为0,程序会持续生成数据,不会停止 + - 描述:每个通道生成的数据条数,不配置此参数或者配置为0,程序会持续生成数据,不会停止。例如当配置通道数为5时,就需要填写5个数字。不同通道数写入数量可以不同 - 必选:否 - 默认值:0 @@ -42,10 +42,10 @@ - float - double - date - - timestamp + - timestamp: 以当前时间戳作为模拟值,因此也是递增的 - bigdecimal - biginteger - - int[] + - int[]: 数组的长度也是随机的 - byte[] - boolean[] - char[],character[] diff --git a/docs/offline/writer/alluxiowriter.md b/docs/offline/writer/alluxiowriter.md new file mode 100644 index 0000000000..4a6cc3785f --- /dev/null +++ b/docs/offline/writer/alluxiowriter.md @@ -0,0 +1,319 @@ +# Alluxio Writer + +## 一、插件名称 +名称:**alluxiowriter** + + +## 二、数据源版本 +Alluxio 2.0.1-2.6.2 + + +## 三、参数说明 + +- **writeType** + - 描述:指定写入新文件时的数据写入行为,支持用户配置为`CACHE_THROUGH`、`MUST_CACHE`、`THROUGH`、`ASYNC_THROUGH` + - CACHE_THROUGH:数据同步写入alluxio worker和底层存储 + - MUST_CACHE:数据同步写入alluxio worker,但不会写入底层存储 + - THROUGH:数据同步写入底层存储,但不会写入alluxio worker + - ASYNC_THROUGH:数据同步写入alluxio worker,异步写入底层存储 + - 必选:否 + - 字段类型:string + - 默认值:THROUGH + +
+ +- **fileType** + - 描述:文件的类型,目前只支持用户配置为`text`、`orc`、`parquet` + - text:textfile文件格式 + - orc:orcfile文件格式 + - parquet:parquet文件格式 + - 必选:是 + - 字段类型:string + - 默认值:无 + +
+ +- **path** + - 描述:数据文件的路径 + - 必选:是 + - 字段类型:string + - 默认值:无 + +
+ +- **fileName** + - 描述:写入的目录名称 + - 注意:不为空,写入的路径为 path+fileName + - 必须:否 + - 字段类型:string + - 默认值:无 + +
+ +- **fieldDelimiter** + - 描述:`fileType`为`text`时字段的分隔符 + - 必选:否 + - 字段类型:string + - 默认值:`\001` + +
+ +- **encoding** + - 描述:`fileType`为`text`时可配置编码格式 + - 必选:否 + - 字段类型:string + - 默认值:UTF-8 + +
+ +- **maxFileSize** + - 描述:写入alluxio单个文件最大大小,单位字节 + - 必须:否 + - 字段类型:long + - 默认值:1073741824‬(1G) + +
+ +- **compress** + - 描述:alluxio文件压缩类型 + - text:支持`GZIP`、`BZIP2`格式 + - orc:支持`SNAPPY`、`ZLIB`、`LZO`格式 + - parquet:支持`SNAPPY`、`GZIP`、`LZO`格式 + - 注意:`SNAPPY`格式需要用户安装**SnappyCodec** + - 必选:否 + - 字段类型:string + - 默认值: + - text 默认 不进行压缩 + - orc 默认为ZLIB格式 + - parquet 默认为SNAPPY格式 + +
+ +- **writeMode** + - 描述:alluxiowriter写入前数据清理处理模式: + - append:追加 + - overwrite:覆盖 + - 注意:overwrite模式时会删除 alluxio当前目录下的所有文件 + - 必选:否 + - 字段类型:string + - 默认值:append + +
+ +- **column** + - 描述:需要读取的字段。 + - 格式:指定具体信息: +```json +"column": [{ + "name": "col", + "type": "datetime" +}] +``` + +- 属性说明: + - name:字段名称 + - type:字段类型,可以和源字段类型不一样,程序会做一次类型转换 +- 必选:是 +- 默认值:无 + +
+ +- **fullColumnName** + - 描述:写入的字段名称 + - 必须:否 + - 字段类型:list + - 默认值:column的name集合 + +
+ +- **fullColumnType** + - 描述:写入的字段类型 + - 必须:否 + - 字段类型:list + - 默认值:column的type集合 + +
+ +- **rowGroupSIze** + - 描述:parquet类型文件参数,指定row group的大小,单位字节 + - 必须:否 + - 字段类型:int + - 默认值:134217728(128M) + +
+ +- **enableDictionary** + - 描述:parquet类型文件参数,是否启动字典编码 + - 必须:否 + - 字段类型:boolean + - 默认值:true + +## 四、使用示例 +#### 1、写入text文件 +```json +{ + "job": { + "content": [ + { + "reader": { + "name": "streamreader", + "parameter": { + "column": [ + { + "name": "id", + "type": "id" + }, + { + "name": "name", + "type": "string" + } + ], + "sliceRecordCount" : ["100"] + } + }, + "writer": { + "parameter": { + "path": "alluxio://ip:port/mnt/alluxio_text", + "fileName": "pt=20211220", + "column": [ + { + "name": "id", + "index": 0, + "type": "bigint" + }, + { + "name": "name", + "index": 1, + "type": "string" + } + ], + "writeMode": "overwrite", + "fieldDelimiter": "|", + "encoding": "utf-8", + "fileType": "text" + }, + "name": "alluxiowriter" + } + } + ], + "setting": { + "restore": { + "isRestore": false + } + } + } +} +``` +#### 2、写入orc文件 +```json +{ + "job": { + "content": [ + { + "reader": { + "name": "streamreader", + "parameter": { + "column": [ + { + "name": "id", + "type": "id" + }, + { + "name": "name", + "type": "string" + } + ], + "sliceRecordCount" : ["100"] + } + }, + "writer": { + "parameter": { + "path": "alluxio://36.138.22.18:19998/mnt/alluxio_orc", + "fileName": "pt=20211220", + "column": [ + { + "name": "id", + "index": 0, + "type": "bigint" + }, + { + "name": "name", + "index": 1, + "type": "string" + } + ], + "writeMode": "overwrite", + "fieldDelimiter": "|", + "encoding": "utf-8", + "fileType": "orc" + }, + "name": "alluxiowriter" + } + } + ], + "setting": { + "restore": { + "isRestore": false + } + } + } +} +``` +#### 3、写入parquet文件 +```json +{ + "job": { + "content": [ + { + "reader": { + "name": "streamreader", + "parameter": { + "column": [ + { + "name": "id", + "type": "id" + }, + { + "name": "name", + "type": "string" + } + ], + "sliceRecordCount" : ["100"] + } + }, + "writer": { + "parameter": { + "path": "alluxio://36.138.22.18:19998/mnt/alluxio_parquet", + "fileName": "pt=20211220", + "column": [ + { + "name": "id", + "index": 0, + "type": "bigint" + }, + { + "name": "name", + "index": 1, + "type": "string" + } + ], + "writeMode": "overwrite", + "fieldDelimiter": "|", + "encoding": "utf-8", + "writeType": "", + "fileType": "parquet" + }, + "name": "alluxiowriter" + } + } + ], + "setting": { + "restore": { + "isRestore": false + } + } + } +} +``` + + diff --git a/docs/offline/writer/carbondatawriter.md b/docs/offline/writer/carbondatawriter.md index 2ad1b8e41e..ba4d954b46 100644 --- a/docs/offline/writer/carbondatawriter.md +++ b/docs/offline/writer/carbondatawriter.md @@ -2,7 +2,7 @@ ## 一、插件名称 -名称:**carbondatawriter**
** +名称:**carbondatawriter**
## 二、支持的数据源版本 **Carbondata 1.5及以上**
@@ -13,34 +13,50 @@ - **path** - 描述:carbondata表的存储路径 - 必选:是 + - 字段类型:String - 默认值:无 - +
- **table** - 描述:carbondata表名 - 必选:否 + - 字段类型:String - 默认值:无 - +
- **database** - 描述:carbondata库名 - 必选:否 + - 字段类型:String - 默认值:无 - +
- **column** - 描述:所配置的表中需要同步的字段名列表 - - 必选:是 + - 必选:是 + 字段包括表字段和常量字段, + 表字段的格式: + - name:字段名称 + - type:字段类型 + ``` +{ + "name": "col1", + "type": "string" +} +``` + - 必选:是 + - 字段类型:List - 默认值:无 - +
- **hadoopConfig** - 描述:集群HA模式时需要填写的namespace配置及其它配置 - 必选:是 + - 字段类型:Map - 默认值:无
@@ -48,29 +64,33 @@ - **defaultFS** - 描述:Hadoop hdfs文件系统namenode节点地址 - 必选:是 + - 字段类型:String - 默认值:无 - +
- **writeMode** - 描述:写入模式,支持append和overwrite - 必填:否 + - 字段类型:String - 默认值:append - +
- **partition** - 描述:carbondata分区 - 必填:否 - - 默认值:append - + - 字段类型:String + - 默认值:无 +
- **batchSize** - 描述:批量提交条数 - 必填:否 + - 字段类型:int - 默认值:204800 - +
diff --git a/docs/offline/writer/cassandrawriter.md b/docs/offline/writer/cassandrawriter.md index 2e999d0242..e04786faca 100644 --- a/docs/offline/writer/cassandrawriter.md +++ b/docs/offline/writer/cassandrawriter.md @@ -13,91 +13,104 @@ - 描述:数据库地址 - 必选:是 - 默认值:无 - + - 字段类型:String +
- **port** - 描述:端口 - 必选:否 - 默认值:9042 - + - 字段类型:Integer +
- **username** - 描述:用户名 - 必选:否 - 默认值:无 - + - 字段类型:String +
- **password** - 描述:密码 - 必选:否 - 默认值:无 - + - 字段类型:String +
- **useSSL** - 描述:数字证书 - 必选:否 - 默认值:false - + - 字段类型:Boolean +
- **column** - 描述:查询结果中被select出来的属性集合,为空则select * - 必选:否 - 默认值:无 - + - 字段类型:List +
- **keyspace** - 描述:需要同步的表所在的keyspace - 必选:是 - 默认值:无 - + - 字段类型:String +
- **table** - 描述:要查询的表 - 必选:是 - 默认值:无 - + - 字段类型:String +
- **batchSize** - 描述:异步写入的批次大小 - 必选:否 - 默认值:1 - + - 字段类型:Integer +
- **asyncWrite** - 描述:是否异步写入 - 必选:否 - 默认值:false - + - 字段类型:Boolean +
- **connecttionsPerHost** - 描述:分配给每个host的连接数 - 必选:否 - 默认值:8 - + - 字段类型:Integer +
- **maxPendingPerConnection** - 描述:最多能建立的连接数 - 必选:否 - 默认值:128 - + - 字段类型:Integer +
- **consistancyLevel** - 描述:数据一致性级别。可选`ONE`、`QUORUM`、`LOCAL_QUORUM`、`EACH_QUORUM`、`ALL`、`ANY`、`TWO`、`THREE`、`LOCAL_ONE` - 必选:否 - 默认值:无 - + - 字段类型:String +
diff --git a/docs/offline/writer/clickhousewriter.md b/docs/offline/writer/clickhousewriter.md index b6fc22df1c..f985c243c9 100644 --- a/docs/offline/writer/clickhousewriter.md +++ b/docs/offline/writer/clickhousewriter.md @@ -10,71 +10,113 @@ ## 三、参数说明 +- **connection** + - 描述:数据库连接参数,包含jdbcUrl、schema、table等参数 + - 必选:是 + - 字段类型:List + - 示例:指定jdbcUrl、schema、table + ```json + "connection": [{ + "jdbcUrl": "jdbc:clickhouse://localhost:8123/database", + "table": ["table"], + "schema":"public" + }] + ``` + - 默认值:无 + +
+ - **jdbcUrl** - 描述:针对关系型数据库的jdbc连接字符串 - 必选:是 + - 字段类型:String - 默认值:无 + +
+- **schema** + - 描述:数据库schema名 + - 必选:否 + - 字段类型:String + - 默认值:无 +
+ +- **table** + - 描述:目的表的表名称。目前只支持配置单个表,后续会支持多表 + - 必选:是 + - 字段类型:List + - 默认值:无 + +
- **username** - 描述:数据源的用户名 - 必选:是 + - 字段类型:String - 默认值:无 - +
- **password** - 描述:数据源指定用户名的密码 - 必选:是 + - 字段类型:String - 默认值:无 - +
- **column** - 描述:目的表需要写入数据的字段,字段之间用英文逗号分隔。例如: "column": ["id","name","age"]。 - 必选:是 - 默认值:否 + - 字段类型:List - 默认值:无 +
+- **fullcolumn** + - 描述:目的表中的所有字段,字段之间用英文逗号分隔。例如: "column": ["id","name","age","hobby"],如果不配置,将在系统表中获取 + - 必选:否 + - 字段类型:List + - 默认值:无 + +
- **preSql** - 描述:写入数据到目的表前,会先执行这里的一组标准语句 - 必选:否 + - 字段类型:String - 默认值:无 - +
- **postSql** - 描述:写入数据到目的表后,会执行这里的一组标准语句 - 必选:否 + - 字段类型:String - 默认值:无 - - -- **table** - - 描述:目的表的表名称。目前只支持配置单个表,后续会支持多表 - - 必选:是 - - 默认值:无 - - +
- **writeMode** - 描述:控制写入数据到目标表采用 `insert into` 语句,只支持insert操作 - 必选:是 - 所有选项:insert + - 字段类型:String - 默认值:insert - +
- **batchSize** - 描述:一次性批量提交的记录数大小,该值可以极大减少FlinkX与数据库的网络交互次数,并提升整体吞吐量。但是该值设置过大可能会造成FlinkX运行进程OOM情况 - 必选:否 + - 字段类型:int - 默认值:1024 + +
-
** ## 四、配置示例 ```json @@ -90,8 +132,16 @@ "type": "int" }, { - "name": "age", + "name": "user_id", "type": "int" + }, + { + "name" : "name", + "type" : "string" + }, + { + "name" : "eventDate", + "type" : "date" } ] }, @@ -101,29 +151,12 @@ "name": "clickhousewriter", "parameter": { "connection": [{ - "jdbcUrl": "jdbc:clickhouse://0.0.0.1:8123/dtstack", - "table": [ - "tableTest" - ] + "jdbcUrl": "jdbc:clickhouse://localhost:8123/database", + "table": ["test"] }], "username": "username", "password": "password", - "column": [ - { - "name": "id", - "type": "BIGINT", - "key": "id" - }, - { - "name": "user_id", - "type": "BIGINT", - "key": "user_id" - }, - { - "name": "name", - "type": "varchar", - "key": "name" - }], + "column": ["id","user_id","name","eventDate"], "writeMode": "insert", "batchSize": 1024, "preSql": [], diff --git a/docs/offline/writer/db2writer.md b/docs/offline/writer/db2writer.md index e7b66dd319..f7d8ecf731 100644 --- a/docs/offline/writer/db2writer.md +++ b/docs/offline/writer/db2writer.md @@ -2,83 +2,126 @@ ## 一、插件名称 -名称:**db2writer**
** +名称:**db2writer**
## 二、支持的数据源版本 -**DB2 9、10**
** +**DB2 9、10**
## 三、参数说明 -- **jdbcUrl** - - 描述:针对关系型数据库的jdbc连接字符串 +- **connection** + - 描述:数据库连接参数,包含jdbcUrl、schema、table等参数 - 必选:是 + - 字段类型:List + - 示例:指定jdbcUrl、schema、table + ```json + "connection": [{ + "jdbcUrl": "jdbc:db2://localhost:50000/database", + "table": ["table"], + "schema":"public" + }] + ``` - 默认值:无 +
+- **jdbcUrl** + - 描述:针对关系型数据库的jdbc连接字符串 + - 必选:是 + - 字段类型:String + - 默认值:无 + +
+ +- **schema** + - 描述:数据库schema名 + - 必选:否 + - 字段类型:String + - 默认值:无 +
+ +- **table** + - 描述:目的表的表名称。目前只支持配置单个表,后续会支持多表 + - 必选:是 + - 字段类型:List + - 默认值:无 + +
+ - **username** - 描述:数据源的用户名 - 必选:是 + - 字段类型:String - 默认值:无 - +
- **password** - 描述:数据源指定用户名的密码 - 必选:是 + - 字段类型:String - 默认值:无 - +
- **column** - 描述:目的表需要写入数据的字段,字段之间用英文逗号分隔。例如: "column": ["id","name","age"] - 必选:是 - 默认值:否 + - 字段类型:List - 默认值:无 +
+- **fullcolumn** + - 描述:目的表中的所有字段,字段之间用英文逗号分隔。例如: "column": ["id","name","age","hobby"],如果不配置,将在系统表中获取 + - 必选:否 + - 字段类型:List + - 默认值:无 + +
- **preSql** - 描述:写入数据到目的表前,会先执行这里的一组标准语句 - 必选:否 + - 字段类型:String - 默认值:无 - +
- **postSql** - 描述:写入数据到目的表后,会执行这里的一组标准语句 - 必选:否 + - 字段类型:String - 默认值:无 - - -- **table** - - 描述:目的表的表名称。目前只支持配置单个表,后续会支持多表 - - 必选:是 - - 默认值:无 - - +
- **writeMode** - 描述:控制写入数据到目标表采用 `insert into` 或者 `merge into`  语句 - 必选:是 - - 所有选项:insert/update + - 所有选项:insert、update + - 字段类型:String - 默认值:insert - +
- **batchSize** - 描述:一次性批量提交的记录数大小,该值可以极大减少FlinkX与数据库的网络交互次数,并提升整体吞吐量。但是该值设置过大可能会造成FlinkX运行进程OOM情况 - 必选:否 + - 字段类型:int - 默认值:1024 - +
- **updateKey** - - 描述:当写入模式为update时,需要指定此参数的值为唯一索引字段 + - 描述:当写入模式为update时,需要指定此参数的值为`唯一索引字段` - 注意: - 采用`merge into`语法,对目标表进行匹配查询,匹配成功时更新,不成功时插入; - 必选:否 + - 字段类型:Map + - 示例:"updateKey": {"key": ["id"]} - 默认值:无 ** @@ -120,33 +163,12 @@ "name": "db2writer", "parameter": { "connection": [{ - "jdbcUrl": "jdbc:db2://localhost:50000/sample", - "table": [ - "staff" - ] + "jdbcUrl": "jdbc:db2://localhost:50000/database", + "table": ["table"] }], - "username": "user", + "username": "username", "password": "password", - "column": [ - { - "name": "id", - "type": "SMALLINT", - "key": "id" - }, - { - "name": "name", - "type": "VARCHAR", - "key": "user_id" - }, - { - "name": "dept", - "type": "SMALLINT", - "key": "name" - },{ - "name": "job", - "type": "VARCHAR" - } - ], + "column": ["id","name","dept","job"], "writeMode": "insert", "batchSize": 1024, "preSql": [], @@ -210,39 +232,17 @@ "name": "db2writer", "parameter": { "connection": [{ - "jdbcUrl": "jdbc:db2://localhost:50000/sample", - "table": [ - "staff" - ] + "jdbcUrl": "jdbc:db2://localhost:50000/database", + "table": ["table"] }], - "username": "user", + "username": "username", "password": "password", - "column": [ - { - "name": "id", - "type": "SMALLINT", - "key": "id" - }, - { - "name": "name", - "type": "VARCHAR", - "key": "user_id" - }, - { - "name": "dept", - "type": "SMALLINT", - "key": "name" - },{ - "name": "job", - "type": "VARCHAR" - } - ], + "column": ["id","name","dept","job"], "writeMode": "update", "updateKey": {"key": ["id"]}, "batchSize": 1024, "preSql": [], - "postSql": [], - "updateKey": {} + "postSql": [] } } } diff --git a/docs/offline/writer/dmwriter.md b/docs/offline/writer/dmwriter.md index ec37a9f433..15d2893281 100644 --- a/docs/offline/writer/dmwriter.md +++ b/docs/offline/writer/dmwriter.md @@ -9,80 +9,125 @@ ## 三、参数说明 +- **connection** + - 描述:数据库连接参数,包含jdbcUrl、schema、table等参数 + - 必选:是 + - 字段类型:List + - 示例:指定jdbcUrl、schema、table + ```json + "connection": [{ + "jdbcUrl": "jdbc:dm://localhost:5236", + "table": ["table"], + "schema":"public" + }] + ``` + - 默认值:无 + +
+ - **jdbcUrl** - - 描述:针对关系型数据库的jdbc连接字符串 + - 描述:针对关系型数据库的jdbc连接字符串 - 必选:是 + - 字段类型:String + - 默认值:无 + +
+ +- **schema** + - 描述:数据库schema名 + - 必选:否 + - 字段类型:String - 默认值:无 + +
+ +- **table** + - 描述:目的表的表名称。目前只支持配置单个表,后续会支持多表 + - 必选:是 + - 字段类型:List + - 默认值:无 +
- **username** - 描述:数据源的用户名 - 必选:是 + - 字段类型:String - 默认值:无 - +
- **password** - 描述:数据源指定用户名的密码 - 必选:是 + - 字段类型:String - 默认值:无 - +
- **column** - 描述:目的表需要写入数据的字段,字段之间用英文逗号分隔。例如: "column": ["id","name","age"] - 必选:是 - 默认值:否 + - 字段类型:List - 默认值:无 +
+ +- **fullcolumn** + - 描述:目的表中的所有字段,字段之间用英文逗号分隔。例如: "column": ["id","name","age","hobby"],如果不配置,将在系统表中获取 + - 必选:否 + - 字段类型:List + - 默认值:无 +
- **preSql** - 描述:写入数据到目的表前,会先执行这里的一组标准语句 - 必选:否 + - 字段类型:String - 默认值:无 - +
- **postSql** - 描述:写入数据到目的表后,会执行这里的一组标准语句 - 必选:否 + - 字段类型:String - 默认值:无 - - -- **table** - - 描述:目的表的表名称。目前只支持配置单个表,后续会支持多表 - - 必选:是 - - 默认值:无 - - +
- **writeMode** - 描述:控制写入数据到目标表采用 `insert into` 或者 `merge into` 语句 - 必选:是 - 所有选项:insert/update + - 字段类型:String - 默认值:insert - +
- **batchSize** - 描述:一次性批量提交的记录数大小,该值可以极大减少FlinkX与数据库的网络交互次数,并提升整体吞吐量。但是该值设置过大可能会造成FlinkX运行进程OOM情况 - 必选:否 + - 字段类型:int - 默认值:1024 - +
- **updateKey** - 描述:当写入模式为update时,需要指定此参数的值为唯一索引字段 - 注意: - 采用`merge into`语法,对目标表进行匹配查询,匹配成功时更新,不成功时插入; - 必选:否 + - 字段类型:Map + - 示例:"updateKey": {"key": ["id"]} - 默认值:无 -** + + ## 四、配置示例 #### 1、insert @@ -110,30 +155,18 @@ "writer": { "name": "dmwriter", "parameter": { - "username": "SYSDBA", - "password": "SYSDBA", + "username": "username", + "password": "password", "connection": [ { "jdbcUrl": "jdbc:dm://localhost:5236", - "table": [ - "PERSON.STUDENT" - ] + "table": ["table"] } ], - "session": [], "preSql": [], "postSql": [], - "writeMode": "insert", - "column": [ - { - "name": "ID", - "type": "int" - }, - { - "name": "AGE", - "type": "int" - } - ] + "mode": "insert", + "column": ["ID","AGE"] } } } @@ -184,31 +217,19 @@ "writer": { "name": "dmwriter", "parameter": { - "username": "SYSDBA", - "password": "SYSDBA", + "username": "username", + "password": "password", "connection": [ { "jdbcUrl": "jdbc:dm://localhost:5236", - "table": [ - "PERSON.STUDENT" - ] + "table": ["table"] } ], - "session": [], "preSql": [], "postSql": [], - "writeMode": "update", + "mode": "update", "updateKey": {"key": ["ID"]}, - "column": [ - { - "name": "ID", - "type": "int" - }, - { - "name": "AGE", - "type": "int" - } - ] + "column": ["ID","AGE"] } } } diff --git a/docs/offline/writer/eswriter.md b/docs/offline/writer/eswriter.md index 35029a6c80..e815047b1e 100644 --- a/docs/offline/writer/eswriter.md +++ b/docs/offline/writer/eswriter.md @@ -12,36 +12,42 @@ - **address** - 描述:Elasticsearch地址,单个节点地址采用host:port形式,多个节点的地址用逗号连接 - 必选:是 + - 字段类型:String - 默认值:无 +
- **username** - 描述:Elasticsearch认证用户名 - 必选:否 + - 字段类型:String - 默认值:无 - +
- **password** - 描述:Elasticsearch认证密码 - 必选:否 + - 字段类型:String - 默认值:无 - +
- **index** - 描述:Elasticsearch 索引值 - 必选:是 + - 字段类型:String - 默认值:无 - +
- **type** - 描述:Elasticsearch 索引类型 - 必选:是 + - 字段类型:String - 默认值:无 - +
- **column** @@ -54,11 +60,12 @@ ``` - 必选:是 + - 字段类型:List - 默认值:无 -- **idColumns** +- **idColumn** - 描述:用于构造文档id的若干个列,每列形式如下 普通列 @@ -78,24 +85,27 @@ - 必选:否 - 注意: - - 如果不指定idColumns属性,则会随机产生文档id + - 如果不指定idColumn属性,则会随机产生文档id - 如果指定的字段值存在重复或者指定了常数,按照es的逻辑,同样值的doc只会保留一份 + - 字段类型:List - 默认值:无 - +
- **bulkAction** - 描述:批量写入的记录条数 - 必选:是 + - 字段类型:int - 默认值:100 - +
- **timeout** - 描述:连接超时时间,如果bulkAction指定的数值过大,写入数据可能会超时,这时可以配置超时时间 - 必选:否 + - 字段类型:int - 默认值:无 - +
@@ -128,7 +138,7 @@ "writer": { "name": "eswriter", "parameter": { - "address": "172.16.8.193:9200", + "address": "localhost:9200", "username": "elastic", "password": "abc123", "index": "tudou", diff --git a/docs/offline/writer/ftpwriter.md b/docs/offline/writer/ftpwriter.md index 9b6995ef01..dca718c91b 100644 --- a/docs/offline/writer/ftpwriter.md +++ b/docs/offline/writer/ftpwriter.md @@ -1,10 +1,9 @@ # FTP Writer - ## 一、插件名称 -名称:**ftpwriter**
+名称:**ftpwriter** + - ## 二、数据源版本 | 协议 | 是否支持 | | --- | --- | @@ -13,220 +12,222 @@ - ## 三、数据源配置 -FTP服务搭建
windows:[地址](https://help.aliyun.com/document_detail/92046.html?spm=a2c4g.11186623.6.1185.6371dcd5DOfc5z)
linux:[地址](https://help.aliyun.com/document_detail/92048.html?spm=a2c4g.11186623.6.1184.7a9a2dbcRLDNlf)
sftp服务搭建
windows:[地址](http://www.freesshd.com/)
linux:[地址](https://yq.aliyun.com/articles/435356?spm=a2c4e.11163080.searchblog.102.576f2ec1BVgWY7)
+FTP服务搭建 +windows:[地址](https://help.aliyun.com/document_detail/92046.html?spm=a2c4g.11186623.6.1185.6371dcd5DOfc5z) +linux:[地址](https://help.aliyun.com/document_detail/92048.html?spm=a2c4g.11186623.6.1184.7a9a2dbcRLDNlf) +sftp服务搭建 +windows:[地址](http://www.freesshd.com/) +linux:[地址](https://yq.aliyun.com/articles/435356?spm=a2c4e.11163080.searchblog.102.576f2ec1BVgWY7) + - ## 四、参数说明 - **protocol** - 描述:ftp服务器协议,目前支持传输协议有`ftp`、`sftp` - 必选:是 + - 字段类型:string - 默认值:无 - +
- **host** - 描述:ftp服务器地址 - 必选:是 + - 字段类型:string - 默认值:无 - +
- **port** - 描述:ftp服务器端口 - 必选:否 + - 字段类型:int - 默认值:若传输协议是sftp协议,默认值是22;若传输协议是标准ftp协议,默认值是21 - +
- **connectPattern** - 描述:协议为ftp时的连接模式,可选`pasv`,`port`,参数含义可参考:[模式说明](https://blog.csdn.net/qq_16038125/article/details/72851142) - 必选:否 + - 字段类型:string - 默认值:`PASV` - +
- **username** - 描述:ftp服务器访问用户名 - 必选:是 + - 字段类型:string - 默认值:无 - +
- **password** - 描述:ftp服务器访问密码 - 必选:否 + - 字段类型:string - 默认值:无 - +
- **path** - 描述:远程FTP文件系统的路径信息,注意这里可以支持填写多个路径 - 必选:是 + - 字段类型:string - 默认值:无 - +
- **fieldDelimiter** - 描述:读取的字段分隔符 - 必选:是 + - 字段类型:string - 默认值:`,` - +
- **encoding** - 描述:读取文件的编码配置 - 必选:否 + - 字段类型:string - 默认值:`UTF-8` - +
- **privateKeyPath** - 描述:私钥文件路径 - 必选:否 + - 字段类型:string - 默认值:无 -
+
- **writeMode** - 描述:ftpwriter写入前数据清理处理模式: - append:追加 - overwrite:覆盖 - - 注意:overwrite模式时会删除dtp当前目录下的所有文件 + - 注意:overwrite模式时会删除ftp当前目录下的所有文件 - 必选:否 + - 字段类型:string - 默认值:append - +
- **isFirstLineHeader** - 描述:首行是否为标题行,如果是则不读取第一行 - 必选:否 + - 字段类型:boolean - 默认值:false - +
- **timeout** - 描述:连接超时时间,单位毫秒 - 必选:否 + - 字段类型:long - 默认值:5000 - +
- **maxFileSize** - - 描述:写入hdfs单个文件最大大小,单位字节 + - 描述:写入ftp单个文件最大大小,单位字节 - 必须:否 + - 字段类型:long - 默认值:1073741824‬(1G) - +
- **column** - 描述:需要读取的字段 - 格式:指定具体信息: -```json +``` "column": [{ "name": "col1", "type": "datetime" }] ``` - - 属性说明: - - name:字段名称 - - type:字段类型,ftp读取的为文本文件,本质上都是字符串类型,这里可以指定要转成的类型 - - 必选:是 - - 默认值:无 +- 属性说明: + - name:字段名称 + - type:字段类型,ftp读取的为文本文件,本质上都是字符串类型,这里可以指定要转成的类型 +- 必选:是 +- 字段类型:数组 +- 默认值:无 - ## 五、使用示例 - -#### 1、append模式写入 +#### 1、sftp append模式写入 ```json { - "job": { - "content": [ - { - "reader": { - "parameter": { - "column": [ - { - "name": "col1", - "type": "string" - }, - { - "name": "col2", - "type": "string" - }, - { - "name": "col3", - "type": "int" - }, - { - "name": "col4", - "type": "int" - } - ], - "sliceRecordCount": [ - "100" - ] - }, - "name": "streamreader" - }, - "writer": { - "parameter": { - "path": "/data/ftp/flinkx", - "protocol": "sftp", - "port": 22, - "writeMode": "append", - "host": "localhost", - "column": [ - { - "name": "col1", - "type": "string" - }, - { - "name": "col2", - "type": "string" - }, - { - "name": "col3", - "type": "int" - }, - { - "name": "col4", - "type": "int" - } - ], - "password": "pass", - "fieldDelimiter": ",", - "encoding": "utf-8", - "username": "user" - }, - "name": "ftpwriter" - } - } - ], - "setting": { - "restore": { - "maxRowNumForCheckpoint": 0, - "isRestore": false, - "restoreColumnName": "", - "restoreColumnIndex": 0 - }, - "errorLimit": { - "record": 100 - }, - "speed": { - "bytes": 0, - "channel": 1 - } + "job": { + "content": [ + { + "reader": { + "parameter": { + "column": [ + { + "name": "name", + "type": "string" + }, + { + "name": "name", + "type": "string" + }, + { + "name": "name", + "type": "string" + } + ], + "sliceRecordCount": [ + "100" + ] + }, + "name": "streamreader" + }, + "writer": { + "parameter": { + "path": "/data/ftp", + "protocol": "sftp", + "port": 22, + "writeMode": "append", + "host": "host", + "column": [ + { + "name": "name", + "type": "string" + }, + { + "name": "name", + "type": "string" + }, + { + "name": "name", + "type": "string" + } + ], + "username": "name", + "password": "passwd", + "fieldDelimiter": ",", + "encoding": "utf-8" + }, + "name": "ftpwriter" } + } + ], + "setting": { + "restore": { + "isRestore": false + }, + "speed": { + "bytes": 0, + "channel": 1 + } } + } } ``` - #### 2、指定文件大小 ```json { @@ -265,7 +266,7 @@ FTP服务搭建
windows:[地址](https://help.aliyun.com/document_detail/ "protocol": "sftp", "port": 22, "writeMode": "append", - "host": "localhost", + "host": "host", "column": [ { "name": "col1", @@ -296,16 +297,9 @@ FTP服务搭建
windows:[地址](https://help.aliyun.com/document_detail/ ], "setting": { "restore": { - "maxRowNumForCheckpoint": 0, - "isRestore": false, - "restoreColumnName": "", - "restoreColumnIndex": 0 - }, - "errorLimit": { - "record": 100 + "isRestore": false }, "speed": { - "bytes": 0, "channel": 1 } } diff --git a/docs/offline/writer/gbasewriter.md b/docs/offline/writer/gbasewriter.md index e1ed14c344..d154e98b4d 100644 --- a/docs/offline/writer/gbasewriter.md +++ b/docs/offline/writer/gbasewriter.md @@ -10,76 +10,120 @@ ## 三、参数说明 +- **connection** + - 描述:数据库连接参数,包含jdbcUrl、schema、table等参数 + - 必选:是 + - 字段类型:List + - 示例:指定jdbcUrl、schema、table + ```json + "connection": [{ + "jdbcUrl": "jdbc:gbase://0.0.0.1:5258/database", + "table": ["table"], + "schema":"public" + }] + ``` + - 默认值:无 + +
+ - **jdbcUrl** - 描述:针对关系型数据库的jdbc连接字符串 - 必选:是 + - 字段类型:String - 默认值:无 +
+ +- **schema** + - 描述:数据库schema名 + - 必选:否 + - 字段类型:String + - 默认值:无 + +
+ + +- **table** + - 描述:目的表的表名称。目前只支持配置单个表,后续会支持多表 + - 必选:是 + - 字段类型:List + - 默认值:无 +
- **username** - 描述:数据源的用户名 - 必选:是 + - 字段类型:String - 默认值:无 - +
- **password** - 描述:数据源指定用户名的密码 - 必选:是 + - 字段类型:String - 默认值:无 - +
- **column** - 描述:目的表需要写入数据的字段,字段之间用英文逗号分隔。例如: "column": ["id","name","age"] - 必选:是 - 默认值:否 + - 字段类型:List - 默认值:无 +
+- **fullcolumn** + - 描述:目的表中的所有字段,字段之间用英文逗号分隔。例如: "column": ["id","name","age","hobby"],如果不配置,将在系统表中获取 + - 必选:否 + - 字段类型:List + - 默认值:无 + +
- **preSql** - 描述:写入数据到目的表前,会先执行这里的一组标准语句 - 必选:否 + - 字段类型:String - 默认值:无 - +
- **postSql** - 描述:写入数据到目的表后,会执行这里的一组标准语句 - 必选:否 + - 字段类型:String - 默认值:无 - - -- **table** - - 描述:目的表的表名称。目前只支持配置单个表,后续会支持多表 - - 必选:是 - - 默认值:无 - - +
- **writeMode** - 描述:控制写入数据到目标表采用 `insert into` 或者` merge into` 语句 - 必选:是 - 所有选项:insert/update + - 字段类型:String - 默认值:insert - +
- **updateKey** - 描述:当写入模式为update时,需要指定此参数的值为唯一索引字段 - 注意: - 采用`merge into`语法,对目标表进行匹配查询,匹配成功时更新,不成功时插入; - 必选:否 + - 字段类型:Map + - 示例:"updateKey": {"key": ["id"]} - 默认值:无 - +
- **batchSize** - 描述:一次性批量提交的记录数大小,该值可以极大减少FlinkX与数据库的网络交互次数,并提升整体吞吐量。但是该值设置过大可能会造成FlinkX运行进程OOM情况 - 必选:否 + - 字段类型:int - 默认值:1024 ** @@ -111,22 +155,12 @@ "name": "gbasewriter", "parameter": { "connection": [{ - "jdbcUrl": "jdbc:gbase://0.0.0.1:5258/dtstack", - "table": [ - "tableTest" - ] + "jdbcUrl": "jdbc:gbase://0.0.0.1:5258/database", + "table": ["tableTest"] }], "username": "username", "password": "password", - "column": [ - { - "name": "id", - "type": "INT" - }, - { - "name": "age", - "type": "INT" - }], + "column": ["id","age"], "writeMode": "insert", "batchSize": 1024, "preSql": [], @@ -173,22 +207,12 @@ "name": "gbasewriter", "parameter": { "connection": [{ - "jdbcUrl": "jdbc:gbase://0.0.0.1:5258/dtstack", - "table": [ - "tableTest" - ] + "jdbcUrl": "jdbc:gbase://0.0.0.1:5258/database", + "table": ["tableTest"] }], "username": "username", "password": "password", - "column": [ - { - "name": "id", - "type": "INT" - }, - { - "name": "age", - "type": "INT" - }], + "column": ["id","age"], "writeMode": "update", "batchSize": 1024, "updateKey": {"key": ["id"]}, diff --git a/docs/offline/writer/greenplumwriter.md b/docs/offline/writer/greenplumwriter.md index 57a069ce88..2731ea73ba 100644 --- a/docs/offline/writer/greenplumwriter.md +++ b/docs/offline/writer/greenplumwriter.md @@ -9,62 +9,101 @@ ## 三、参数说明 +- **connection** + - 描述:数据库连接参数,包含jdbcUrl、schema、table等参数 + - 必选:是 + - 字段类型:List + - 示例:指定jdbcUrl、schema、table + ```json + "connection": [{ + "jdbcUrl": "jdbc:pivotal:greenplum://localhost:5432;DatabaseName=database", + "table": ["table"], + "schema":"public" + }] + ``` + - 默认值:无 + +
+ - **jdbcUrl** - 描述:针对关系型数据库的jdbc连接字符串 - 必选:是 + - 字段类型:String - 默认值:无 +
+ +- **schema** + - 描述:数据库schema名 + - 必选:否 + - 字段类型:String + - 默认值:无 +
+ +- **table** + - 描述:目的表的表名称。目前只支持配置单个表,后续会支持多表 + - 必选:是 + - 字段类型:List + - 默认值:无 + +
- **username** - 描述:数据源的用户名 - 必选:是 + - 字段类型:String - 默认值:无 - +
- **password** - 描述:数据源指定用户名的密码 - 必选:是 + - 字段类型:String - 默认值:无 - +
- **column** - 描述:目的表需要写入数据的字段,字段之间用英文逗号分隔。例如: "column": ["id","name","age"] - 必选:是 - - 默认值:否 + - 字段类型:List - 默认值:无 +
+ +- **fullcolumn** + - 描述:目的表中的所有字段,字段之间用英文逗号分隔。例如: "column": ["id","name","age","hobby"],如果不配置,将在系统表中获取 + - 必选:否 + - 字段类型:List + - 默认值:无 +
- **preSql** - 描述:写入数据到目的表前,会先执行这里的一组标准语句 - 必选:否 + - 字段类型:String - 默认值:无 - +
- **postSql** - 描述:写入数据到目的表后,会执行这里的一组标准语句 - 必选:否 + - 字段类型:String - 默认值:无 - - -- **table** - - 描述:目的表的表名称。目前只支持配置单个表,后续会支持多表 - - 必选:是 - - 默认值:无 - - +
- **writeMode** - 描述:仅支持`insert`操作,可以搭配insertSqlMode使用 - 必选:是 + - 字段类型:String - 默认值:无, - +
- **insertSqlMode** - 描述:控制写入数据到目标表采用  `COPY table_name [ ( column_name [, ...] ) ] FROM STDIN DELIMITER 'delimiter_character'`语句,提高数据的插入效率 @@ -72,13 +111,15 @@ - 为了避免`insert`过慢带来的问题,此参数被固定为`copy` - 当指定此参数时,writeMode的值必须为 `insert`,否则设置无效 - 必选:否 - - 默认值:无 - + - 字段类型:String + - 默认值:copy +
- **batchSize** - 描述:一次性批量提交的记录数大小,该值可以极大减少FlinkX与数据库的网络交互次数,并提升整体吞吐量。但是该值设置过大可能会造成FlinkX运行进程OOM情况 - 必选:否 + - 字段类型:int - 默认值:1024 ** @@ -107,20 +148,16 @@ "name": "greenplumwriter", "parameter": { "connection": [{ - "jdbcUrl": "jdbc:pivotal:greenplum://localhost:5432;DatabaseName=exampledb", - "table": ["tbl_pay_log_copy"] + "jdbcUrl": "jdbc:pivotal:greenplum://localhost:5432;DatabaseName=database", + "table": ["table"] }], - "username": "gpadmin", - "password": "gpadmin", - "column": [ - { - "name": "id", - "type": "int" - }], + "username": "username", + "password": "password", + "column": ["id"], "writeMode": "insert", "insertSqlMode": "copy", "batchSize": 100, - "preSql": ["TRUNCATE tbl_pay_log_copy"], + "preSql": ["TRUNCATE table"], "postSql": [] } } diff --git a/docs/offline/writer/hbasewriter.md b/docs/offline/writer/hbasewriter.md index 3429e34bea..a0c3db9ee1 100644 --- a/docs/offline/writer/hbasewriter.md +++ b/docs/offline/writer/hbasewriter.md @@ -5,128 +5,119 @@ 名称:**hbasewriter**
## 二、支持的数据源版本 -**HBase 1.3及以上**
+**HBase 1.2及以上**
## 三、参数说明 - **tablename** - 描述:hbase表名 - 必选:是 + - 字段类型:String - 默认值:无 - - - -- **hbaseConfig** - - 描述:hbase的连接配置,以json的形式组织 (见hbase-site.xml),key可以为以下七种: - -Kerberos;
hbase.security.authentication;
hbase.security.authorization;
hbase.master.kerberos.principal;
hbase.master.keytab.file;
hbase.regionserver.keytab.file;
hbase.regionserver.kerberos.principal - +
+ + - kerberos配置 + 在hbaseConfig中加入以下三条中的任一条即表明开启Kerberos配置: + ``` + "hbase.security.authentication" :"Kerberos", + "hbase.security.authorization" : "Kerberos", + "hbase.security.auth.enable" : true + ``` + 在开启kerberos后,需要根据自己的集群指定以下两个principal的value值 + ``` + "hbase.regionserver.kerberos.principal":"hbase/_HOST@DTSTACK.COM", + "hbase.master.kerberos.principal":"hbase/_HOST@DTSTACK.COM" + ``` + 还需要指定Kerberos相关文件的位置 + ``` + "principalFile": "path of keytab", + "java.security.krb5.conf": "path of krb5.conf" + ``` - 必选:是 + - 字段类型:Map - 默认值:无 - +
- **nullMode** - 描述:读取的null值时,如何处理。支持两种方式: - skip:表示不向hbase写这列; - empty:写入HConstants.EMPTY_BYTE_ARRAY,即new byte [0] - 必选:否 + - 字段类型:String - 默认值:skip - - + +
- **encoding** - 描述:字符编码 - 必选:无 + - 字段类型:String - 默认值:UTF-8 - -
+ +
- **walFlag** - - 描述:在HBae client向集群中的RegionServer提交数据时(Put/Delete操作),首先会先写WAL(Write Ahead Log)日志(即HLog,一个RegionServer上的所有Region共享一个HLog),只有当WAL日志写成功后,再接着写MemStore,然后客户端被通知提交数据成功;如果写WAL日志失败,客户端则被通知提交失败。关闭(false)放弃写WAL日志,从而提高数据写入的性能 + - 描述:在HBase client向集群中的RegionServer提交数据时(Put/Delete操作),首先会先写WAL(Write Ahead Log)日志(即HLog,一个RegionServer上的所有Region共享一个HLog),只有当WAL日志写成功后,再接着写MemStore,然后客户端被通知提交数据成功;如果写WAL日志失败,客户端则被通知提交失败。关闭(false)放弃写WAL日志,从而提高数据写入的性能 - 必选:否 + - 字段类型:Boolean - 默认值:false - - + +
- **writeBufferSize** - 描述:设置HBae client的写buffer大小,单位字节。配合autoflush使用。autoflush,开启(true)表示Hbase client在写的时候有一条put就执行一次更新;关闭(false),表示Hbase client在写的时候只有当put填满客户端写缓存时,才实际向HBase服务端发起写请求 - 必选:否 + - 字段类型:long - 默认值:8388608(8M) - - - -- **scanCacheSize** - - 描述:一次RPC请求批量读取的Results数量 - - 必选:无 - - 默认值:256 - - - -- **scanBatchSize** - - 描述:每一个result中的列的数量 - - 必选:无 - - 默认值:100 - - + +
- **column** - - 描述:要读取的hbase字段,normal 模式与multiVersionFixedColumn 模式下必填项。 - - name:指定读取的hbase列,除了rowkey外,必须为 列族:列名 的格式; - - type:指定源数据的类型,format指定日期类型的格式,value指定当前类型为常量,不从hbase读取数据,而是根据value值自动生成对应的列。 + - 描述:要写入的hbase字段。 + - name:指定写入的hbase列,必须为 列族:列名 的格式; + - type:指定源数据的类型,format指定日期类型的格式 - 必选:是 + - 字段类型:List - 默认值:无 - - + +
- **rowkeyColumn** - - 描述:用于构造rowkey的描述信息,支持两种格式,每列形式如下 - - 字符串格式 -
字符串格式为:$(cf:col),可以多个字段组合:$(cf:col1)_$(cf:col2), -
可以使用md5函数:md5($(cf:col)) - - 数组格式 - - 普通列 -``` -{ - "index": 0, // 该列在column属性中的序号,从0开始 - "type": "string" 列的类型,默认为string -} -``` - - - 常数列 -``` -{ - "value": "ffff", // 常数值 - "type": "string" // 常数列的类型,默认为string -} -``` - - - 必选:否 -
如果不指定idColumns属性,则会随机产生文档id + - 描述:用于构造rowkey的描述信息,每列形式如下 + 字符串格式为:$(cf:col),可以多个字段组合:$(cf:col1)_$(cf:col2), + 可以使用md5函数:md5($(cf:col)) + - 必选: 是 + - 字段类型:String - 默认值:无 - - + +
- **versionColumn** - - 描述:指定写入hbase的时间戳。支持:当前时间、指定时间列,指定时间,三者选一。若不配置表示用当前时间。index:指定对应reader端column的索引,从0开始,需保证能转换为long,若是Date类型,会尝试用yyyy-MM-dd HH:mm:ss和yyyy-MM-dd HH:mm:ss SSS去解析;若不指定index;value:指定时间的值,类型为字符串。配置格式如下: -``` -"versionColumn":{ -"index":1 -} -``` - - -
或者 -``` -"versionColumn":{ -"value":"123456789" -} -``` + - 描述:指定写入hbase的时间戳。支持:当前时间、指定时间列,指定时间,三者选一。若不配置表示用当前时间。index:指定对应reader端column的索引,从0开始,需保证能转换为long,若是Date类型,会尝试用yyyy-MM-dd HH:mm:ss和yyyy-MM-dd HH:mm:ss SSS去解析;若不指定index;value:指定时间的值,类型为字符串。注意,在hbase中查询默认会显示时间戳最大的数据,因此简单查询可能会出现看不到更新的情况,需要加过滤条件筛选。 + 配置格式如下: + ``` + "versionColumn":{ + "index":1 + } + ``` + 或者 + ``` + "versionColumn":{ + "value":"123456789" + } + ``` + - 必选: 否 + - 字段类型:Map + - 默认值:当前时间
## 三、配置示例 -```json + +未开启Kerberos的情况 +``` { "job" : { "content" : [ { @@ -204,6 +195,88 @@ Kerberos;
hbase.security.authentication;
hbase.security.authorizat } } ``` - -
- +开启了Kerberos的情况 +``` +{ + "job" : { + "content" : [ { + "reader": { + "name": "streamreader", + "parameter": { + "column": [ + { + "name": "id", + "type": "id" + }, + { + "name": "user_id", + "type": "int" + }, + { + "name": "name", + "type": "string" + } + ], + "sliceRecordCount" : ["100"] + } + }, + "writer": { + "name": "hbasewriter", + "parameter": { + "hbaseConfig": { + "hbase.zookeeper.property.clientPort": "2181", + "hbase.rootdir": "hdfs://ns1/hbase", + "hbase.cluster.distributed": "true", + "hbase.zookeeper.quorum": "node01,node02,node03", + "zookeeper.znode.parent": "/hbase", + "hbase.security.auth.enable": true, + "hbase.regionserver.kerberos.principal":"hbase/host@DTSTACK.COM", + "hbase.master.kerberos.principal":"hbase/host@DTSTACK.COM", + "principalFile": "path of keytab", + "useLocalFile": "true", + "java.security.krb5.conf": "path of krb5.conf" + }, + "table": "tb1", + "rowkeyColumn": "col1#col2", + "column": [ + { + "name": "cf1:id", + "type": "int" + }, + { + "name": "cf1:user_id", + "type": "int" + }, + { + "name": "cf1:name", + "type": "string" + } + ] + } + } + } ], + "setting": { + "speed": { + "channel": 1, + "bytes": 0 + }, + "errorLimit": { + "record": 100 + }, + "restore": { + "maxRowNumForCheckpoint": 0, + "isRestore": false, + "isStream" : false, + "restoreColumnName": "", + "restoreColumnIndex": 0 + }, + "log" : { + "isLogger": false, + "level" : "debug", + "path" : "", + "pattern":"" + } + } + } +} +``` diff --git a/docs/offline/writer/hdfswriter.md b/docs/offline/writer/hdfswriter.md index 97390a0a3d..6a8350785f 100644 --- a/docs/offline/writer/hdfswriter.md +++ b/docs/offline/writer/hdfswriter.md @@ -1,10 +1,9 @@ # HDFS Writer - ## 一、插件名称 -名称:**hdfswriter**
+名称:**hdfswriter** + - ## 二、数据源版本 | 协议 | 是否支持 | | --- | --- | @@ -13,100 +12,104 @@ - ## 三、数据源配置 -单机模式:[地址](http://hadoop.apache.org/docs/r2.7.6/hadoop-project-dist/hadoop-common/SingleCluster.html)
集群模式:[地址](http://hadoop.apache.org/docs/r2.7.6/hadoop-project-dist/hadoop-common/ClusterSetup.html)
+单机模式:[地址](http://hadoop.apache.org/docs/r2.7.6/hadoop-project-dist/hadoop-common/SingleCluster.html) +集群模式:[地址](http://hadoop.apache.org/docs/r2.7.6/hadoop-project-dist/hadoop-common/ClusterSetup.html) + - ## 四、参数说明 - **defaultFS** - 描述:Hadoop hdfs文件系统namenode节点地址。格式:hdfs://ip:端口;例如:hdfs://127.0.0.1:9 - 必选:是 + - 字段类型:string - 默认值:无 - +
- **hadoopConfig** - 描述:集群HA模式时需要填写的namespace配置及其它配置 - 必选:否 + - 字段类型:map - 默认值:无 +
+- **fileType** + - 描述:文件的类型,目前只支持用户配置为`text`、`orc`、`parquet` + - text:textfile文件格式 + - orc:orcfile文件格式 + - parquet:parquet文件格式 + - 必选:是 + - 字段类型:string + - 默认值:无 + + +
- **path** - 描述:数据文件的路径 - 必选:是 + - 字段类型:string - 默认值:无 +
+ +- **fileName** + - 描述:写入的目录名称 + - 注意:不为空,写入的路径为 path+fileName + - 必须:否 + - 字段类型:string + - 默认值:无 +
- **filterRegex** - 描述:文件过滤正则表达式 - 必选:否 + - 字段类型:string - 默认值:无 - - -- **fileType** - - 描述:文件的类型,目前只支持用户配置为`text`、`orc`、`parquet` - - text:textfile文件格式 - - orc:orcfile文件格式 - - parquet:parquet文件格式 - - 必选:否 - - 默认值:text - - +
- **fieldDelimiter** - 描述:`fileType`为`text`时字段的分隔符 - 必选:否 + - 字段类型:string - 默认值:`\001` - +
- **encoding** - 描述:`fileType`为`text`时可配置编码格式 - 必选:否 + - 字段类型:string - 默认值:UTF-8 - +
- **maxFileSize** - 描述:写入hdfs单个文件最大大小,单位字节 - 必须:否 + - 字段类型:long - 默认值:1073741824‬(1G) - +
- **compress** - - 描述:hdfs文件压缩类型,默认不填写意味着没有压缩 + - 描述:hdfs文件压缩类型 - text:支持`GZIP`、`BZIP2`格式 - - orc:支持`SNAPPY`、`GZIP`、`BZIP`、`LZ4`格式 - - parquet:支持`SNAPPY`、`GZIP`、`LZO` + - orc:支持`SNAPPY`、`ZLIB`、`LZO`格式 + - parquet:支持`SNAPPY`、`GZIP`、`LZO`格式 - 注意:`SNAPPY`格式需要用户安装**SnappyCodec** - 必选:否 - - 默认值:无 - - - -- **compress** - - 描述:hdfs文件压缩类型,默认不填写意味着没有压缩 - - text:支持`GZIP`、`BZIP2`格式 - - orc:支持`SNAPPY`、`GZIP`、`BZIP`、`LZ4`格式 - - parquet:支持`SNAPPY`、`GZIP`、`LZO` - - 注意:`SNAPPY`格式需要用户安装**SnappyCodec** - - 必选:否 - - 默认值:无 - - - -- **fileName** - - 描述:写入的目录名称 - - 必须:否 - - 默认值:无 - + - 字段类型:string + - 默认值: + - text 默认 不进行压缩 + - orc 默认为ZLIB格式 + - parquet 默认为SNAPPY格式 +
- **writeMode** - 描述:hdfswriter写入前数据清理处理模式: @@ -114,9 +117,10 @@ - overwrite:覆盖 - 注意:overwrite模式时会删除hdfs当前目录下的所有文件 - 必选:否 + - 字段类型:string - 默认值:append - +
- **column** - 描述:需要读取的字段。 @@ -128,261 +132,232 @@ }] ``` - - 属性说明: - - name:字段名称 - - type:字段类型,可以和数据库里的字段类型不一样,程序会做一次类型转换 - - 必选:是 - - 默认值:无 +- 属性说明: + - name:字段名称 + - type:字段类型,可以和源字段类型不一样,程序会做一次类型转换 +- 必选:是 +- 默认值:无 + +
+ +- **fullColumnName** + - 描述:写入的字段名称 + - 必须:否 + - 字段类型:list + - 默认值:column的name集合 +
+- **fullColumnType** + - 描述:写入的字段类型 + - 必须:否 + - 字段类型:list + - 默认值:column的type集合 + +
+ +- **rowGroupSIze** + - 描述:parquet类型文件参数,指定row group的大小,单位字节 + - 必须:否 + - 字段类型:int + - 默认值:134217728(128M) + +
+ +- **enableDictionary** + - 描述:parquet类型文件参数,是否启动字典编码 + - 必须:否 + - 字段类型:boolean + - 默认值:true - ## 五、使用示例 - #### 1、写入text文件 ```json { - "job": { - "content": [ - { - "reader": { - "parameter": { - "column": [ - { - "name": "col1", - "type": "string" - }, - { - "name": "col2", - "type": "string" - }, - { - "name": "col3", - "type": "int" - }, - { - "name": "col4", - "type": "int" - } - ], - "sliceRecordCount": [ - "100" - ] - }, - "name": "streamreader" - }, - "writer": { - "parameter": { - "path": "hdfs://ns1/flinkx/text", - "defaultFS": "hdfs://ns1", - "hadoopConfig": { - "dfs.ha.namenodes.ns1": "nn1,nn2", - "dfs.namenode.rpc-address.ns1.nn2": "flinkx02:9000", - "dfs.client.failover.proxy.provider.ns1": "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider", - "dfs.namenode.rpc-address.ns1.nn1": "flinkx01:9000", - "dfs.nameservices": "ns1" - }, - "column": [ - { - "name": "col1", - "index": 0, - "type": "string" - }, - { - "name": "col2", - "index": 1, - "type": "string" - }, - { - "name": "col3", - "index": 2, - "type": "int" - }, - { - "name": "col4", - "index": 3, - "type": "int" - } - ], - "fieldDelimiter": ",", - "fileType": "text", - "writeMode": "append" - }, - "name": "hdfswriter" - } - } - ], - "setting": { - "speed": { - "bytes": 0, - "channel": 1 - } + "job": { + "content": [ + { + "reader": { + "name": "streamreader", + "parameter": { + "column": [ + { + "name": "id", + "type": "id" + }, + { + "name": "name", + "type": "string" + } + ], + "sliceRecordCount" : ["100"] + } + }, + "writer": { + "parameter": { + "path": "hdfs://ns1/user/hive/warehouse/dev.db/test_text", + "fileName": "pt=20201214", + "column": [ + { + "name": "id", + "index": 0, + "type": "bigint" + }, + { + "name": "name", + "index": 1, + "type": "string" + } + ], + "writeMode": "overwrite", + "fieldDelimiter": "\u0001", + "encoding": "utf-8", + "defaultFS": "hdfs://ns1", + "hadoopConfig": { + "dfs.ha.namenodes.ns1" : "nn1,nn2", + "dfs.namenode.rpc-address.ns1.nn2" : "host1:9000", + "dfs.client.failover.proxy.provider.ns1" : "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider", + "dfs.namenode.rpc-address.ns1.nn1" : "host2:9000", + "dfs.nameservices" : "ns1" + }, + "fileType": "text" + }, + "name": "hdfswriter" } + } + ], + "setting": { + "restore": { + "isRestore": false + } } + } } ``` - #### 2、写入orc文件 ```json { - "job": { - "content": [ - { - "reader": { - "parameter": { - "column": [ - { - "name": "col1", - "type": "string" - }, - { - "name": "col2", - "type": "string" - }, - { - "name": "col3", - "type": "int" - }, - { - "name": "col4", - "type": "int" - } - ], - "sliceRecordCount": [ - "100" - ] - }, - "name": "streamreader" - }, - "writer": { - "parameter": { - "path": "hdfs://ns1/flinkx/text", - "defaultFS": "hdfs://ns1", - "hadoopConfig": { - "dfs.ha.namenodes.ns1": "nn1,nn2", - "dfs.namenode.rpc-address.ns1.nn2": "flinkx02:9000", - "dfs.client.failover.proxy.provider.ns1": "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider", - "dfs.namenode.rpc-address.ns1.nn1": "flinkx01:9000", - "dfs.nameservices": "ns1" - }, - "column": [ - { - "name": "col1", - "index": 0, - "type": "string" - }, - { - "name": "col2", - "index": 1, - "type": "string" - }, - { - "name": "col3", - "index": 2, - "type": "int" - }, - { - "name": "col4", - "index": 3, - "type": "int" - } - ], - "fileType": "orc", - "writeMode": "append" - }, - "name": "hdfswriter" - } - } - ], - "setting": { - "speed": { - "bytes": 0, - "channel": 1 - } + "job": { + "content": [ + { + "reader": { + "name": "streamreader", + "parameter": { + "column": [ + { + "name": "id", + "type": "id" + }, + { + "name": "name", + "type": "string" + } + ], + "sliceRecordCount" : ["100"] + } + }, + "writer": { + "parameter": { + "path": "hdfs://ns1/user/hive/warehouse/dev.db/test_orc", + "fileName": "pt=20201214", + "column": [ + { + "name": "id", + "index": 0, + "type": "bigint" + }, + { + "name": "name", + "index": 1, + "type": "string" + } + ], + "writeMode": "overwrite", + "fieldDelimiter": "\u0001", + "encoding": "utf-8", + "defaultFS": "hdfs://ns1", + "hadoopConfig": { + "dfs.ha.namenodes.ns1" : "nn1,nn2", + "dfs.namenode.rpc-address.ns1.nn2" : "host1:9000", + "dfs.client.failover.proxy.provider.ns1" : "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider", + "dfs.namenode.rpc-address.ns1.nn1" : "host2:9000", + "dfs.nameservices" : "ns1" + }, + "fileType": "orc" + }, + "name": "hdfswriter" } + } + ], + "setting": { + "restore": { + "isRestore": false + } } + } } ``` - #### 3、写入parquet文件 ```json { - "job": { - "content": [ - { - "reader": { - "parameter": { - "column": [ - { - "name": "col1", - "type": "string" - }, - { - "name": "col2", - "type": "string" - }, - { - "name": "col3", - "type": "int" - }, - { - "name": "col4", - "type": "int" - } - ], - "sliceRecordCount": [ - "100" - ] - }, - "name": "streamreader" - }, - "writer": { - "parameter": { - "path": "hdfs://ns1/flinkx/text", - "defaultFS": "hdfs://ns1", - "hadoopConfig": { - "dfs.ha.namenodes.ns1": "nn1,nn2", - "dfs.namenode.rpc-address.ns1.nn2": "flinkx02:9000", - "dfs.client.failover.proxy.provider.ns1": "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider", - "dfs.namenode.rpc-address.ns1.nn1": "flinkx01:9000", - "dfs.nameservices": "ns1" - }, - "column": [ - { - "name": "col1", - "index": 0, - "type": "string" - }, - { - "name": "col2", - "index": 1, - "type": "string" - }, - { - "name": "col3", - "index": 2, - "type": "int" - }, - { - "name": "col4", - "index": 3, - "type": "int" - } - ], - "fileType": "parquet", - "writeMode": "append" - }, - "name": "hdfswriter" - } - } - ], - "setting": { - "speed": { - "bytes": 0, - "channel": 1 - } + "job": { + "content": [ + { + "reader": { + "name": "streamreader", + "parameter": { + "column": [ + { + "name": "id", + "type": "id" + }, + { + "name": "name", + "type": "string" + } + ], + "sliceRecordCount" : ["100"] + } + }, + "writer": { + "parameter": { + "path": "hdfs://ns1/user/hive/warehouse/dev.db/test_parquet", + "fileName": "pt=20201214", + "column": [ + { + "name": "id", + "index": 0, + "type": "bigint" + }, + { + "name": "name", + "index": 1, + "type": "string" + } + ], + "writeMode": "overwrite", + "fieldDelimiter": "\u0001", + "encoding": "utf-8", + "defaultFS": "hdfs://ns1", + "hadoopConfig": { + "dfs.ha.namenodes.ns1" : "nn1,nn2", + "dfs.namenode.rpc-address.ns1.nn2" : "host1:9000", + "dfs.client.failover.proxy.provider.ns1" : "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider", + "dfs.namenode.rpc-address.ns1.nn1" : "host2:9000", + "dfs.nameservices" : "ns1" + }, + "fileType": "parquet" + }, + "name": "hdfswriter" } + } + ], + "setting": { + "restore": { + "isRestore": false + } } + } } ``` diff --git a/docs/offline/writer/hivewriter.md b/docs/offline/writer/hivewriter.md index 49c9842da8..58b476d8ca 100644 --- a/docs/offline/writer/hivewriter.md +++ b/docs/offline/writer/hivewriter.md @@ -1,34 +1,37 @@ # Hive Writer - ## 一、插件名称 -名称:**hivewriter**
- +名称:**hivewriter** + ## 二、支持的数据源版本 -**Hive 2.X**
- -## 三、参数说明
+**Hive 2.X** + +## 三、参数说明 + - **jdbcUrl** - 描述:连接Hive JDBC的字符串 - 必选:是 + - 字段类型:string - 默认值:无 - +
- **username** - 描述:Hive认证用户名 - 必选:否 + - 字段类型:string - 默认值:无 - +
- **password** - 描述:Hive认证密码 - 必选:否 + - 字段类型:string - 默认值:无 - +
- **fileType** - 描述:文件的类型,目前只支持用户配置为`text`、`orc`、`parquet` @@ -36,17 +39,19 @@ - orc:orcfile文件格式 - parquet:parquet文件格式 - 必选:是 + - 字段类型:string - 默认值:无 - +
- **fieldDelimiter** - 描述:hivewriter中`fileType`为`text`时字段的分隔符, - 注意:用户需要保证与创建的Hive表的字段分隔符一致,否则无法在Hive表中查到数据 - - 必选:是 + - 必选:否 + - 字段类型:string - 默认值:`\u0001` - +
- **writeMode** - 描述:hivewriter写入前数据清理处理模式: @@ -54,9 +59,10 @@ - overwrite:覆盖 - 注意:overwrite模式时会删除Hive当前分区下的所有文件 - 必选:否 + - 字段类型:string - 默认值:append - +
- **compress** - 描述:hdfs文件压缩类型,默认不填写意味着没有压缩 @@ -65,26 +71,37 @@ - parquet:支持`SNAPPY`、`GZIP`、`LZO` - 注意:`SNAPPY`格式需要用户安装**SnappyCodec** - 必选:否 - - 默认值:无 + - 字段类型:string + - 默认值: + - text 默认 不进行压缩 + - orc 默认为ZLIB格式 + - parquet 默认为SNAPPY格式 +
- **charsetName** - 描述:写入text文件的编码配置 - 必选:否 + - 字段类型:string - 默认值:UTF-8 - +
- **maxFileSize** - 描述:写入hdfs单个文件最大大小,单位字节 - 必须:否 + - 字段类型:string - 默认值:1073741824‬(1G) - +
- **tablesColumn** - - 描述:写入hive表的表结构信息,**若表不存在则会自动建表**。示例: + - 描述:写入hive表的表结构信息,**若表不存在则会自动建表**。 + - 必选:是 + - 字段类型:json + - 默认值:无 + - 示例: ```json { "kudu":[ @@ -104,45 +121,114 @@ } ``` - - 必选:是 - - 默认值:无 +
+- **distributeTable** + - 描述:如果数据来源于各个CDC数据,则将不同的表进行聚合,多张表的数据写入同一个hive表 + - 必选:否 + - 字段类型:json + - 默认值:无 + - 示例: +```json + "distributeTable" : "{\"fenzu1\":[\"table1\"],\"fenzu2\":[\"table2\",\"table3\"]}" +``` +table1的数据将写入hive表fenzu1里,table2和table3的数据将写入fenzu2里,如果配置distributeTable,则tablesColumn需要配置为如下格式: +```json +{ + "fenzu1":[ + { + "key":"id", + "type":"int" + }, + { + "key":"user_id", + "type":"int" + }, + { + "key":"name", + "type":"string" + } + ], + "fenzu2":[ + { + "key":"id", + "type":"int" + }, + { + "key":"user_id", + "type":"int" + }, + { + "key":"name", + "type":"string" + } + ] +} +``` +
- **partition** - 描述:分区字段名称 - - 必选:是 + - 必选:否 + - 字段类型:string - 默认值:`pt` - +
- **partitionType** - 描述:分区类型,包括 DAY、HOUR、MINUTE三种。**若分区不存在则会自动创建,自动创建的分区时间以当前任务运行的服务器时间为准** - - DAY:天分区,分区示例:pt=202000101 + - DAY:天分区,分区示例:pt=20200101 - HOUR:小时分区,分区示例:pt=2020010110 - MINUTE:分钟分区,分区示例:pt=202001011027 - - 必选:是 + - 必选:否 + - 字段类型:string - 默认值:无 - +
- **defaultFS** - - 描述:Hadoop hdfs文件系统namenode节点地址。格式:hdfs://ip:端口;例如:hdfs://127.0.0.1:9 + - 描述:Hadoop hdfs文件系统namenode节点地址。取core-site.xml文件里fs.defaultFS配置值 - 必选:是 + - 字段类型:string - 默认值:无 - +
- **hadoopConfig** - - 描述:集群HA模式时需要填写的namespace配置及其它配置 + - 描述:集群HA模式时需要填写的namespace配置及其它hive配置 + - 必选:否 + - 字段类型:map + - 默认值:无 + +
+ +- **rowGroupSIze** + - 描述:parquet格式文件的row group的大小,单位字节 + - 必选:否 + - 字段类型:int + - 默认值:134217728(128M) + +
+ +- **analyticalRules** + - 描述: 建表的动态规则获取表名,按照${XXXX}的占位符,从待写入数据(map结构)里根据key XXX 获取值进行替换,创建对应的表,并将数据写入对应的表 + - 示例:stream_${schema}_${table} + - 必选:否 + - 字段类型:string + - 默认值:无 + +
+ +- **schema** + - 描述: 自动建表时,analyticalRules里如果指定schema占位符,schema将此schema参数值进行替换 - 必选:否 + - 字段类型:string - 默认值:无 - ## 四、配置示例 - #### 1、写入text ```json { @@ -186,13 +272,56 @@ "partitionType" : "DAY", "defaultFS" : "hdfs://ns1", "hadoopConfig" : { - "dfs.ha.namenodes.ns1" : "nn1,nn2", - "dfs.namenode.rpc-address.ns1.nn2" : "kudu2:9000", - "dfs.client.failover.proxy.provider.ns1" : "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider", - "dfs.namenode.rpc-address.ns1.nn1" : "kudu1:9000", - "dfs.nameservices" : "ns1", - "fs.hdfs.impl.disable.cache" : "true", - "fs.hdfs.impl" : "org.apache.hadoop.hdfs.DistributedFileSystem" + "javax.jdo.option.ConnectionDriverName": "com.mysql.jdbc.Driver", + "dfs.replication": "2", + "dfs.ha.fencing.ssh.private-key-files": "~/.ssh/id_rsa", + "dfs.nameservices": "ns1", + "fs.hdfs.impl.disable.cache": "true", + "dfs.safemode.threshold.pct": "0.5", + "dfs.ha.namenodes.ns1": "nn1,nn2", + "dfs.journalnode.rpc-address": "0.0.0.0:8485", + "dfs.journalnode.http-address": "0.0.0.0:8480", + "dfs.namenode.rpc-address.ns1.nn2": "kudu2new:9000", + "dfs.namenode.rpc-address.ns1.nn1": "kudu1new:9000", + "hive.metastore.warehouse.dir": "/user/hive/warehouse", + "hive.server2.webui.host": "172.16.10.34", + "hive.metastore.schema.verification": "false", + "hive.server2.support.dynamic.service.discovery": "true", + "javax.jdo.option.ConnectionPassword": "abc123", + "hive.metastore.uris": "thrift://kudu1new:9083", + "hive.exec.dynamic.partition.mode": "nonstrict", + "hadoop.proxyuser.admin.hosts": "*", + "hive.zookeeper.quorum": "kudu1new:2181,kudu2new:2181,kudu3new:2181", + "ha.zookeeper.quorum": "kudu1new:2181,kudu2new:2181,kudu3new:2181", + "hive.server2.thrift.min.worker.threads": "200", + "hive.server2.webui.port": "10002", + "fs.defaultFS": "hdfs://ns1", + "hadoop.proxyuser.admin.groups": "*", + "dfs.ha.fencing.methods": "sshfence", + "dfs.client.failover.proxy.provider.ns1": "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider", + "typeName": "yarn2-hdfs2-hadoop2", + "hadoop.proxyuser.root.groups": "*", + "javax.jdo.option.ConnectionURL": "jdbc:mysql://kudu2new:3306/ide?useSSL=false", + "dfs.qjournal.write-txns.timeout.ms": "60000", + "fs.trash.interval": "30", + "hadoop.proxyuser.root.hosts": "*", + "dfs.namenode.shared.edits.dir": "qjournal://kudu1new:8485;kudu2new:8485;kudu3new:8485/namenode-ha-data", + "javax.jdo.option.ConnectionUserName": "dtstack", + "hive.server2.thrift.port": "10000", + "fs.hdfs.impl": "org.apache.hadoop.hdfs.DistributedFileSystem", + "ha.zookeeper.session-timeout.ms": "5000", + "hadoop.tmp.dir": "/data/hadoop_root", + "dfs.journalnode.edits.dir": "/data/dtstack/hadoop/journal", + "hive.server2.zookeeper.namespace": "hiveserver2", + "hive.server2.enable.doAs": "/false", + "dfs.namenode.http-address.ns1.nn2": "kudu2new:50070", + "dfs.namenode.http-address.ns1.nn1": "kudu1new:50070", + "hive.exec.scratchdir": "/user/hive/warehouse", + "hive.server2.webui.max.threads": "100", + "datanucleus.schema.autoCreateAll": "true", + "hive.exec.dynamic.partition": "true", + "hive.server2.thrift.bind.host": "kudu1", + "dfs.ha.automatic-failover.enabled": "true" } } } @@ -217,7 +346,6 @@ } } ``` - #### 2、写入orc ```json { @@ -261,13 +389,56 @@ "partitionType" : "DAY", "defaultFS" : "hdfs://ns1", "hadoopConfig" : { - "dfs.ha.namenodes.ns1" : "nn1,nn2", - "dfs.namenode.rpc-address.ns1.nn2" : "kudu2:9000", - "dfs.client.failover.proxy.provider.ns1" : "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider", - "dfs.namenode.rpc-address.ns1.nn1" : "kudu1:9000", - "dfs.nameservices" : "ns1", - "fs.hdfs.impl.disable.cache" : "true", - "fs.hdfs.impl" : "org.apache.hadoop.hdfs.DistributedFileSystem" + "javax.jdo.option.ConnectionDriverName": "com.mysql.jdbc.Driver", + "dfs.replication": "2", + "dfs.ha.fencing.ssh.private-key-files": "~/.ssh/id_rsa", + "dfs.nameservices": "ns1", + "fs.hdfs.impl.disable.cache": "true", + "dfs.safemode.threshold.pct": "0.5", + "dfs.ha.namenodes.ns1": "nn1,nn2", + "dfs.journalnode.rpc-address": "0.0.0.0:8485", + "dfs.journalnode.http-address": "0.0.0.0:8480", + "dfs.namenode.rpc-address.ns1.nn2": "kudu2new:9000", + "dfs.namenode.rpc-address.ns1.nn1": "kudu1new:9000", + "hive.metastore.warehouse.dir": "/user/hive/warehouse", + "hive.server2.webui.host": "172.16.10.34", + "hive.metastore.schema.verification": "false", + "hive.server2.support.dynamic.service.discovery": "true", + "javax.jdo.option.ConnectionPassword": "abc123", + "hive.metastore.uris": "thrift://kudu1new:9083", + "hive.exec.dynamic.partition.mode": "nonstrict", + "hadoop.proxyuser.admin.hosts": "*", + "hive.zookeeper.quorum": "kudu1new:2181,kudu2new:2181,kudu3new:2181", + "ha.zookeeper.quorum": "kudu1new:2181,kudu2new:2181,kudu3new:2181", + "hive.server2.thrift.min.worker.threads": "200", + "hive.server2.webui.port": "10002", + "fs.defaultFS": "hdfs://ns1", + "hadoop.proxyuser.admin.groups": "*", + "dfs.ha.fencing.methods": "sshfence", + "dfs.client.failover.proxy.provider.ns1": "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider", + "typeName": "yarn2-hdfs2-hadoop2", + "hadoop.proxyuser.root.groups": "*", + "javax.jdo.option.ConnectionURL": "jdbc:mysql://kudu2new:3306/ide?useSSL=false", + "dfs.qjournal.write-txns.timeout.ms": "60000", + "fs.trash.interval": "30", + "hadoop.proxyuser.root.hosts": "*", + "dfs.namenode.shared.edits.dir": "qjournal://kudu1new:8485;kudu2new:8485;kudu3new:8485/namenode-ha-data", + "javax.jdo.option.ConnectionUserName": "dtstack", + "hive.server2.thrift.port": "10000", + "fs.hdfs.impl": "org.apache.hadoop.hdfs.DistributedFileSystem", + "ha.zookeeper.session-timeout.ms": "5000", + "hadoop.tmp.dir": "/data/hadoop_root", + "dfs.journalnode.edits.dir": "/data/dtstack/hadoop/journal", + "hive.server2.zookeeper.namespace": "hiveserver2", + "hive.server2.enable.doAs": "/false", + "dfs.namenode.http-address.ns1.nn2": "kudu2new:50070", + "dfs.namenode.http-address.ns1.nn1": "kudu1new:50070", + "hive.exec.scratchdir": "/user/hive/warehouse", + "hive.server2.webui.max.threads": "100", + "datanucleus.schema.autoCreateAll": "true", + "hive.exec.dynamic.partition": "true", + "hive.server2.thrift.bind.host": "kudu1", + "dfs.ha.automatic-failover.enabled": "true" } } } @@ -292,7 +463,6 @@ } } ``` - #### 3、写入parquet ```json { @@ -336,13 +506,56 @@ "partitionType" : "DAY", "defaultFS" : "hdfs://ns1", "hadoopConfig" : { - "dfs.ha.namenodes.ns1" : "nn1,nn2", - "dfs.namenode.rpc-address.ns1.nn2" : "kudu2:9000", - "dfs.client.failover.proxy.provider.ns1" : "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider", - "dfs.namenode.rpc-address.ns1.nn1" : "kudu1:9000", - "dfs.nameservices" : "ns1", - "fs.hdfs.impl.disable.cache" : "true", - "fs.hdfs.impl" : "org.apache.hadoop.hdfs.DistributedFileSystem" + "javax.jdo.option.ConnectionDriverName": "com.mysql.jdbc.Driver", + "dfs.replication": "2", + "dfs.ha.fencing.ssh.private-key-files": "~/.ssh/id_rsa", + "dfs.nameservices": "ns1", + "fs.hdfs.impl.disable.cache": "true", + "dfs.safemode.threshold.pct": "0.5", + "dfs.ha.namenodes.ns1": "nn1,nn2", + "dfs.journalnode.rpc-address": "0.0.0.0:8485", + "dfs.journalnode.http-address": "0.0.0.0:8480", + "dfs.namenode.rpc-address.ns1.nn2": "kudu2new:9000", + "dfs.namenode.rpc-address.ns1.nn1": "kudu1new:9000", + "hive.metastore.warehouse.dir": "/user/hive/warehouse", + "hive.server2.webui.host": "172.16.10.34", + "hive.metastore.schema.verification": "false", + "hive.server2.support.dynamic.service.discovery": "true", + "javax.jdo.option.ConnectionPassword": "abc123", + "hive.metastore.uris": "thrift://kudu1new:9083", + "hive.exec.dynamic.partition.mode": "nonstrict", + "hadoop.proxyuser.admin.hosts": "*", + "hive.zookeeper.quorum": "kudu1new:2181,kudu2new:2181,kudu3new:2181", + "ha.zookeeper.quorum": "kudu1new:2181,kudu2new:2181,kudu3new:2181", + "hive.server2.thrift.min.worker.threads": "200", + "hive.server2.webui.port": "10002", + "fs.defaultFS": "hdfs://ns1", + "hadoop.proxyuser.admin.groups": "*", + "dfs.ha.fencing.methods": "sshfence", + "dfs.client.failover.proxy.provider.ns1": "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider", + "typeName": "yarn2-hdfs2-hadoop2", + "hadoop.proxyuser.root.groups": "*", + "javax.jdo.option.ConnectionURL": "jdbc:mysql://kudu2new:3306/ide?useSSL=false", + "dfs.qjournal.write-txns.timeout.ms": "60000", + "fs.trash.interval": "30", + "hadoop.proxyuser.root.hosts": "*", + "dfs.namenode.shared.edits.dir": "qjournal://kudu1new:8485;kudu2new:8485;kudu3new:8485/namenode-ha-data", + "javax.jdo.option.ConnectionUserName": "dtstack", + "hive.server2.thrift.port": "10000", + "fs.hdfs.impl": "org.apache.hadoop.hdfs.DistributedFileSystem", + "ha.zookeeper.session-timeout.ms": "5000", + "hadoop.tmp.dir": "/data/hadoop_root", + "dfs.journalnode.edits.dir": "/data/dtstack/hadoop/journal", + "hive.server2.zookeeper.namespace": "hiveserver2", + "hive.server2.enable.doAs": "/false", + "dfs.namenode.http-address.ns1.nn2": "kudu2new:50070", + "dfs.namenode.http-address.ns1.nn1": "kudu1new:50070", + "hive.exec.scratchdir": "/user/hive/warehouse", + "hive.server2.webui.max.threads": "100", + "datanucleus.schema.autoCreateAll": "true", + "hive.exec.dynamic.partition": "true", + "hive.server2.thrift.bind.host": "kudu1", + "dfs.ha.automatic-failover.enabled": "true" } } } diff --git a/docs/offline/writer/kingbasewriter.md b/docs/offline/writer/kingbasewriter.md index 5844b52b90..62eac805a8 100644 --- a/docs/offline/writer/kingbasewriter.md +++ b/docs/offline/writer/kingbasewriter.md @@ -6,15 +6,48 @@ **KingBase 8.2及8.3**
## 三、参数说明 -- jdbcUrl + +- **connection** + - 描述:数据库连接参数,包含jdbcUrl、schema、table等参数 + - 必选:是 + - 字段类型:List + - 示例:指定jdbcUrl、schema、table + ```json + "connection": [{ + "jdbcUrl": "jdbc:kingbase8://localhost:54321/database", + "table": ["table"], + "schema":"public" + }] + ``` + - 默认值:无 + +
+ +- **jdbcUrl** - 描述:针对KingBase数据库的jdbc连接字符串 - 必选:是 - 字段类型:String - 默认值:无
+ + - **schema** + - 描述:数据库schema名 + - 必选:否 + - 字段类型:String + - 默认值:无 + +
+ +- **table** + - 描述:目的表的表名称。目前只支持配置单个表,后续会支持多表 + - 必选:是 + - 字段类型:List + - 默认值:无 + +
-- username +- **username** - 描述:数据源的用户名 - 必选:是 - 字段类型:String @@ -22,23 +55,15 @@
-- password +- **password** - 描述:数据源指定用户名的密码 - 必选:是 - 字段类型:String - 默认值:无
- -- schema - - 描述:写入数据库所在schema - - 必选:是 - - 字段类型:String - - 默认值:无 - -
- -- column + +- **column** - 描述:目的表需要写入数据的字段,字段之间用英文逗号分隔。例如: "column": ["id","name","age"] - 必选:是 - 默认值:否 @@ -46,32 +71,32 @@ - 默认值:无
+ + - **fullcolumn** + - 描述:目的表中的所有字段,字段之间用英文逗号分隔。例如: "column": ["id","name","age","hobby"],如果不配置,将在系统表中获取 + - 必选:否 + - 字段类型:List + - 默认值:无 + +
-- preSql +- **preSql** - 描述:写入数据到目的表前,会先执行这里的一组标准语句 - 必选:否 - - 字段类型:List + - 字段类型:String - 默认值:无
-- postSql +- **postSql** - 描述:写入数据到目的表后,会执行这里的一组标准语句 - 必选:否 - - 字段类型:List - - 默认值:无 - -
- -- table - - 描述:目的表的表名称。目前只支持配置单个表,后续会支持多表 - - 必选:是 - 字段类型:String - 默认值:无
-- writeMode +- **writeMode** - 描述:仅支持insert、update操作,可以搭配insertSqlMode使用 - 必选:是 - 字段类型:String @@ -79,16 +104,16 @@
-- insertSqlMode +- **insertSqlMode** - 描述:控制写入数据到目标表采用 COPY table_name [ ( column_name [, ...] ) ] FROM STDIN DELIMITER 'delimiter_character'语句,提高数据的插入效率 - 注意: 目前该参数值固定传入 copy,否则抛出提示为not support insertSqlMode的RuntimeException。当指定此参数时,writeMode的值必须为 insert,否则设置无效 - 必选:否 - 字段类型:String - - 默认值:无 + - 默认值:copy
-- batchSize +- **batchSize** - 描述:一次性批量提交的记录数大小,该值可以极大减少FlinkX与数据库的网络交互次数,并提升整体吞吐量。但是该值设置过大可能会造成FlinkX运行进程OOM情况 - 必选:否 - 字段类型:int @@ -96,9 +121,18 @@
+- **updateKey** + - 描述:当写入模式为update时,需要指定此参数的值为唯一索引字段 + - 注意: + - 采用`merge into`语法,对目标表进行匹配查询,匹配成功时更新,不成功时插入; + - 必选:否 + - 字段类型:Map + - 示例:"updateKey": {"key": ["id"]} + - 默认值:无 + ## 四、配置示例 1、insert -``` +```json { "job": { "content": [{ @@ -134,19 +168,7 @@ }], "username": "username", "password": "password", - "column": [ - { - "name": "id", - "type": "BIGINT" - }, - { - "name": "user_id", - "type": "BIGINT" - }, - { - "name": "name", - "type": "varchar" - }], + "column": ["id","user_id","name"], "writeMode": "insert", "batchSize": 1024, "preSql": [], @@ -167,7 +189,7 @@ } ``` 2、 insert with copy mode -``` +```json { "job": { "content": [{ @@ -203,19 +225,7 @@ }], "username": "username", "password": "password", - "column": [ - { - "name": "id", - "type": "BIGINT" - }, - { - "name": "user_id", - "type": "BIGINT" - }, - { - "name": "name", - "type": "varchar" - }], + "column": ["id","user_id","name"], "writeMode": "insert", "batchSize": 1024, "preSql": [], @@ -237,7 +247,7 @@ } ``` 3、 update -``` +```json { "job": { "content": [{ @@ -265,30 +275,15 @@ "name": "kingbasewriter", "parameter": { "connection": [{ - "jdbcUrl": "jdbc:kingbase8://localhost:54321/ide", - "table": [ - "tableTest" - ], + "jdbcUrl": "jdbc:kingbase8://localhost:54321/database", + "table": ["tableTest"], "schema":"test" }], "username": "username", "password": "password", - "column": [ - { - "name": "id", - "type": "BIGINT" - }, - { - "name": "user_id", - "type": "BIGINT" - }, - { - "name": "name", - "type": "varchar" - }], + "column": ["id","user_id","name"], "writeMode": "update", "updateKey": {"key": ["id"]}, - "column": ["id","user_id","name"], "batchSize": 1024, "preSql": [], "postSql": [] diff --git a/docs/offline/writer/kuduwriter.md b/docs/offline/writer/kuduwriter.md index 8bba48abf3..74597501a3 100644 --- a/docs/offline/writer/kuduwriter.md +++ b/docs/offline/writer/kuduwriter.md @@ -1,38 +1,49 @@ # Kudu Writer - ## 一、插件名称 -名称:**kuduwriter**
- -## 二、支持的数据源版本 -**kudu 1.10及以上**
+名称:**kuduwriter** - -## 三、参数说明 +## 二、支持的数据源版本 +**kudu 1.10及以上** -- **column** - - 描述:需要生成的字段 - - 属性说明: - - name:字段名称; - - type:字段类型; - - 必选:是 - - 默认值:无 +## 三、参数说明 - **masterAddresses** - 描述: master节点地址:端口,多个以,隔开 - 必选:是 + - 参数类型:string - 默认值:无 - +
- **table** - 描述: kudu表名 - 必选:是 + - 参数类型:string - 默认值:无 +
+ +- **column** + - 描述:需要生成的字段 + - 格式 +```json +"column": [{ + "name": "col", + "type": "string" +}] +``` + +- 属性说明: + - name:字段名称 + - type:字段类型 +- 必选:是 +- 参数类型:数组 +- 默认值:无 +
- **writeMode** - 描述: kudu数据写入模式: @@ -40,153 +51,139 @@ - 2、update - 3、upsert - 必选:是 + - 参数类型:string - 默认值:无 - +
- **flushMode** - 描述: kudu session刷新模式: - - 1、auto_flush_sync - - 2、auto_flush_background - - 3、manual_flush + - 1、auto_flush_sync 同步刷新 + - 2、auto_flush_background 后台自动刷新 + - 3、manual_flush 手动刷新 - 必选:否 + - 参数类型:string - 默认值:auto_flush_sync - +
- **batchInterval** - 描述: 单次批量写入数据条数 - 必选:否 + - 参数类型:int - 默认值:1 - +
- **authentication** - - 描述: 认证方式,如:Kerberos + - 描述:认证方式,kudu开启kerberos时需要配置authentication为Kerberos - 必选:否 + - 参数类型:string - 默认值:无 - - -- **principal** - - 描述: 用户名。 - - 必选:否 - - 默认值:无 - - - -- **keytabFile** - - 描述: keytab文件路径 - - 必选:否 - - 默认值:无 - - +
- **workerCount** - - 描述: worker线程数 + - 描述:worker线程数 - 必选:否 - - 默认值:默认为cpu*2 - + - 字段类型:int + - 默认值:默认为cpu核心数*2 +
- **bossCount** - - 描述: boss线程数 + - 描述:boss线程数 - 必选:否 + - 字段类型:int - 默认值:1 - +
- **operationTimeout** - - 描述: 普通操作超时时间 + - 描述:普通操作超时时间,单位毫秒 - 必选:否 + - 字段类型:long - 默认值:30000 - +
- **adminOperationTimeout** - - 描述: 管理员操作(建表,删表)超时时间 + - 描述: 管理员操作(建表,删表)超时时间,单位毫秒 - 必选:否 - - 默认值:30000 + - 字段类型:long + - 默认值:15000 +
- -- **queryTimeout** - - 描述: 连接scan token的超时时间 - - 必选:否 - - 默认值:与operationTimeout一致 - - - -- **batchSizeBytes** - - 描述: kudu scan一次性最大读取字节数 +- **hadoopConfig** + - 描述: kudu开启kerberos,需要配置kerberos相关参数 - 必选:否 - - 默认值:1048576 + - 字段类型:map + - 默认值:无 - ## 四、配置示例 ```json { "job" : { "content" : [ { - "reader" : { - "parameter" : { - "column" : [ { - "name" : "id", - "type" : "id" - }, { - "name" : "data", - "type" : "string" - } ], - "sliceRecordCount" : [ "100"] - }, - "name" : "streamreader" + "reader": { + "name": "streamreader", + "parameter": { + "column": [ + { + "name": "id", + "type": "string" + }, { + "name": "name", + "type": "string" + }, { + "name": "age", + "type": "int" + }, { + "name": "sex", + "type": "int" + } + ], + "sliceRecordCount" : [100] + } }, "writer" : { "parameter": { "column": [ { "name": "id", - "type": "long" + "type": "string" + }, { + "name": "name", + "type": "string" + }, { + "name": "age", + "type": "int" + }, { + "name": "sex", + "type": "int" } ], - "masterAddresses": "kudu1:7051,kudu2:7051,kudu3:7051", - "table": "kudu", + "masterAddresses": "host:7051", + "table": "student", "writeMode": "insert", "flushMode": "manual_flush", "batchInterval": 10000, - "authentication": "", - "principal": "", - "keytabFile": "", "workerCount": 2, - "bossCount": 1, - "operationTimeout": 30000, - "adminOperationTimeout": 30000, - "queryTimeout": 30000, - "batchSizeBytes": 1048576 - } + "bossCount": 1 + }, + "name": "kuduwriter" } } ], - "setting" : { - "restore" : { - "maxRowNumForCheckpoint" : 0, - "isRestore" : false, - "restoreColumnName" : "", - "restoreColumnIndex" : 0 - }, - "errorLimit" : { - "record" : 100 - }, - "speed" : { - "bytes" : 0, - "channel" : 1 + "setting": { + "speed": { + "channel": 1 }, - "log" : { - "isLogger": false, - "level" : "debug", - "path" : "", - "pattern":"" + "restore": { + "isRestore": false, + "isStream" : false } } } diff --git a/docs/offline/writer/mongodbwriter.md b/docs/offline/writer/mongodbwriter.md index fc49b2cd74..ffc892853c 100644 --- a/docs/offline/writer/mongodbwriter.md +++ b/docs/offline/writer/mongodbwriter.md @@ -12,75 +12,95 @@ - **url** - 描述:MongoDB数据库连接的URL字符串,详细请参考[MongoDB官方文档](https://docs.mongodb.com/manual/reference/connection-string/) - 必选:否 + - 字段类型:String - 默认值:无 +
- **hostPorts** - 描述:MongoDB的地址和端口,格式为 IP1:port,可填写多个地址,以英文逗号分隔 - 必选:是 + - 字段类型:String - 默认值:无 - +
- **username** - 描述:数据源的用户名 - 必选:否 + - 字段类型:String - 默认值:无 - +
- **password** - 描述:数据源指定用户名的密码 - 必选:否 + - 字段类型:String - 默认值:无 - +
- **database** - 描述:数据库名称 - 必选:否 + - 字段类型:String - 默认值:无 - +
- **collectionName** - 描述:集合名称 - 必选:是 + - 字段类型:String - 默认值:无 - +
- **column** - 描述:MongoDB 的文档列名,配置为数组形式表示 MongoDB 的多个列 - name:Column 的名字 - type:Column 的类型 - - splitter:特殊分隔符,当且仅当要处理的字符串要用分隔符分隔为字符数组 Array 时,才使用这个参数。通过这个参数指定的分隔符,将字符串分隔存储到 MongoDB 的数组中 + - splitter:特殊分隔符,当且仅当要处理的字符串要用分隔符分隔为字符数组 Array 时,才使用这个参数。通过这个参数指定的分隔符,将字符串分隔存储到 MongoDB 的数组中 + 当指定了这个参数,写入mongodb的数组类型只能为string + 示例 + ``` + "column": [{ + "name": "col", + "type": "Array", + "splitter":"," + }] + ``` - 必选:是 + - 字段类型:List - 默认值:无 - +
- **replaceKey** - 描述:replaceKey 指定了每行记录的业务主键,用来做覆盖时使用(不支持 replaceKey为多个键,一般是指Monogo中的主键) - 必选:否 + - 字段类型:String - 默认值:无 - +
- **writeMode** - 描述:写入模式,当 batchSize > 1 时不支持 replace 和 update 模式 - 必选:是 - 所有选项:insert/replace/update + - 字段类型:String - 默认值:insert - +
- **batchSize** - 描述:一次性批量提交的记录数大小,该值可以极大减少FlinkX与MongoDB的网络交互次数,并提升整体吞吐量。但是该值设置过大可能会造成FlinkX运行进程OOM情况 - 必选:否 + - 字段类型:int - 默认值:1 - +
diff --git a/docs/offline/writer/mysqlwriter.md b/docs/offline/writer/mysqlwriter.md index c7342458cb..39d1bd2ff0 100644 --- a/docs/offline/writer/mysqlwriter.md +++ b/docs/offline/writer/mysqlwriter.md @@ -9,70 +9,110 @@ ## 三、参数说明
+- **connection** + - 描述:数据库连接参数,包含jdbcUrl、schema、table等参数 + - 必选:是 + - 字段类型:List + - 示例:指定jdbcUrl、schema、table + ```json + "connection": [{ + "jdbcUrl": "jdbc:mysql://0.0.0.1:3306/database?useSSL=false", + "table": ["table"], + "schema":"public" + }] + ``` + - 默认值:无 + +
+ - **jdbcUrl** - 描述:针对关系型数据库的jdbc连接字符串 - 必选:是 + - 字段类型:String + - 默认值:无 + +
+ +- **schema** + - 描述:数据库schema名 + - 必选:否 + - 字段类型:String - 默认值:无 +
+- **table** + - 描述:目的表的表名称。目前只支持配置单个表,后续会支持多表 + - 必选:是 + - 字段类型:List + - 默认值:无 + +
- **username** - 描述:数据源的用户名 - 必选:是 + - 字段类型:String - 默认值:无 - +
- **password** - 描述:数据源指定用户名的密码 - 必选:是 + - 字段类型:String - 默认值:无 - +
- **column** - 描述:目的表需要写入数据的字段,字段之间用英文逗号分隔。例如: "column": ["id","name","age"] - 必选:是 - - 默认值:否 + - 字段类型:List - 默认值:无 +
+- **fullcolumn** + - 描述:目的表中的所有字段,字段之间用英文逗号分隔。例如: "column": ["id","name","age","hobby"],如果不配置,将在系统表中获取 + - 必选:否 + - 字段类型:List + - 默认值:无 + +
- **preSql** - 描述:写入数据到目的表前,会先执行这里的一组标准语句 - 必选:否 + - 字段类型:String - 默认值:无 - +
- **postSql** - 描述:写入数据到目的表后,会执行这里的一组标准语句 - 必选:否 + - 字段类型:String - 默认值:无 - - -- **table** - - 描述:目的表的表名称。目前只支持配置单个表,后续会支持多表 - - 必选:是 - - 默认值:无 - - +
- **writeMode** - 描述:控制写入数据到目标表采用 `insert into` 或者 `replace into` 或者 `ON DUPLICATE KEY UPDATE` 语句 - 必选:是 - 所有选项:insert/replace/update + - 字段类型:String - 默认值:insert - +
- **batchSize** - 描述:一次性批量提交的记录数大小,该值可以极大减少FlinkX与数据库的网络交互次数,并提升整体吞吐量。但是该值设置过大可能会造成FlinkX运行进程OOM情况 - 必选:否 + - 字段类型:int - 默认值:1024 - +
- **updateKey** - 描述:当写入模式为update和replace时,需要指定此参数的值为唯一索引字段 @@ -80,6 +120,8 @@ - 如果此参数为空,并且写入模式为update和replace时,应用会自动获取数据库中的唯一索引; - 如果数据表没有唯一索引,但是写入模式配置为update和replace,应用会以insert的方式写入数据; - 必选:否 + - 字段类型:Map + - 示例:"updateKey": {"key": ["id"]} - 默认值:无 @@ -116,16 +158,16 @@ "writer": { "name": "mysqlwriter", "parameter": { - "username": "dtstack", - "password": "abc123", + "username": "username", + "password": "password", "connection": [ { - "jdbcUrl": "jdbc:mysql://kudu3:3306/tudou?useSSL=false", - "table": ["kudu"] + "jdbcUrl": "jdbc:mysql://0.0.0.1:3306/database?useSSL=false", + "table": ["table"] } ], - "preSql": ["truncate table kudu;"], - "postSql": ["update kudu set user_id = 1;"], + "preSql": ["truncate table table;"], + "postSql": ["update table set user_id = 1;"], "writeMode": "insert", "column": ["id","user_id","name"], "batchSize": 1024 @@ -187,12 +229,12 @@ "writer": { "name": "mysqlwriter", "parameter": { - "username": "dtstack", - "password": "abc123", + "username": "username", + "password": "password", "connection": [ { - "jdbcUrl": "jdbc:mysql://kudu3:3306/tudou?useSSL=false", - "table": ["kudu"] + "jdbcUrl": "jdbc:mysql://0.0.0.1:3306/database?useSSL=false", + "table": ["table"] } ], "preSql": [], @@ -259,12 +301,12 @@ "writer": { "name": "mysqlwriter", "parameter": { - "username": "dtstack", - "password": "abc123", + "username": "username", + "password": "password", "connection": [ { - "jdbcUrl": "jdbc:mysql://kudu3:3306/tudou?useSSL=false", - "table": ["kudu"] + "jdbcUrl": "jdbc:mysql://0.0.0.1:3306/database?useSSL=false", + "table": ["table"] } ], "preSql": [], diff --git a/docs/offline/writer/odpswriter.md b/docs/offline/writer/odpswriter.md index 64e14e4c4e..76f5c80ca4 100644 --- a/docs/offline/writer/odpswriter.md +++ b/docs/offline/writer/odpswriter.md @@ -1,63 +1,104 @@ # ODPS Writer - ## 一、插件名称 -名称:**odpswriter**
- -## 二、参数说明 - -- **odpsConfig** - - 描述:ODPS系统配置参数,包含以下参数 - - accessId:ODPS系统登录ID - - accessKey:ODPS系统登录Key - - project:读取数据表所在的 ODPS 项目名称(大小写不敏感) - - packageAuthorizedProject:ODPS认证项目,不填默认为project值 - - accountType:ODPS账户类型,默认为aliyun - - odpsServer:ODPS服务URL,默认为[http://service.odps.aliyun.com/api](http://service.odps.aliyun.com/api) - - 必选:是 - - 默认值:无 +名称:**odpswriter** +## 二、参数说明 - **table** - 描述:读取数据表的表名称(大小写不敏感) - 必选:是 + - 字段类型:string - 默认值:无 - -
+
- **partition** - 描述:需要写入数据表的分区信息,必须指定到最后一级分区。把数据写入一个三级分区表,必须配置到最后一级分区,例如pt=20150101/type=1/biz=2。 - - 必选:**如果是分区表,该选项必填,如果非分区表,该选项不可填写。** - - 默认值:空 - + - 注意:**如果是分区表,该选项必填,如果非分区表,该选项不可填写** + - 必选:否 + - 字段类型:string + - 默认值:无 -
+
- **column** - - 描述:需要导入的字段列表,当导入全部字段时,可以配置为"column": ["*"], 当需要插入部分odps列填写部分列,例如"column": ["id", "name"]。ODPSWriter支持列筛选、列换序,例如表有a,b,c三个字段,用户只同步c,b两个字段。可以配置成["c","b"], 在导入过程中,字段a自动补空,设置为null。 - - 必选:否 - - 默认值:无 + - 描述:需要读取的字段 + - 格式: +```json +"column": [{ + "name": "col", + "type": "datetime" +}] +``` +- 属性说明: + - name:字段名称 必填 + - type:字段类型,可以和数据库里的字段类型不一样,程序会做一次类型转换 必填 +- 必选:是 +- 字段类型:数组 +- 默认值:无 -
+
- **writeMode** - 描述:写入模式,支持append和overwrite - 必填:否 + - 字段类型:string - 默认值:append - +
- **bufferSize** - - 描述:写入缓存大小,单位兆,odps写入数据时会先缓存,达到一定值后才会写入数据,如果写入数据时出现内存溢出,可以降低此参数的值。 + - 描述:写入缓存大小,单位M,odps写入数据时会先缓存,达到一定值后才会写入数据,如果写入数据时出现内存溢出,可以降低此参数的值。 - 必填:否 - - 默认值:64 + - 字段类型:long + - 默认值:64M + +
+ +- **odpsConfig** + - 描述:ODPS的配置信息 + - 必选:是 + - 字段类型 map + - 默认值:无 + - 可选配置: + - **odpsServer** + - 描述:odps服务地址 + - 必选:否 + - 字段类型 string + - 默认值:[http://service.odps.aliyun.com/api](http://service.odps.aliyun.com/api) + - **accessId** + - 描述:ODPS系统登录ID + - 必选:是 + - 字段类型 string + - 默认值:无 + - **accessKey** + - 描述:ODPS系统登录Key + - 必选:是 + - 字段类型 string + - 默认值:无 + - **project** + - 描述:读取数据表所在的 ODPS 项目名称(大小写不敏感) + - 必选:是 + - 字段类型 string + - 默认值:无 + - **packageAuthorizedProject** + - 描述:ODPS认证项目 + - 注意:当 **packageAuthorizedProject **不为空时,当前project取packageAuthorizedProject对应值 而不是 project 对应的值 + - 必选:否 + - 字段类型 string + - 默认值:无 + - **accountType** + - 描述:odps账户类型 + - 注意:目前只支持 aliyun 类型 + - 必选:否 + - 字段类型 string + - 默认值:aliyun - ## 三、配置示例 ```json { @@ -94,23 +135,10 @@ } ], "setting" : { "restore" : { - "maxRowNumForCheckpoint" : 0, - "isRestore" : false, - "restoreColumnName" : "", - "restoreColumnIndex" : 0 - }, - "errorLimit" : { - "record" : 100 + "isRestore" : false }, "speed" : { - "bytes" : 0, "channel" : 1 - }, - "log" : { - "isLogger": false, - "level" : "debug", - "path" : "", - "pattern":"" } } } diff --git a/docs/offline/writer/oraclewriter.md b/docs/offline/writer/oraclewriter.md index 3161b3dd3d..1c21510781 100644 --- a/docs/offline/writer/oraclewriter.md +++ b/docs/offline/writer/oraclewriter.md @@ -9,70 +9,110 @@ ## 三、参数说明 +- **connection** + - 描述:数据库连接参数,包含jdbcUrl、schema、table等参数 + - 必选:是 + - 字段类型:List + - 示例:指定jdbcUrl、schema、table + ```json + "connection": [{ + "jdbcUrl": "jdbc:oracle:thin:0.0.0.1:1521:oracle", + "table": ["table"], + "schema":"public" + }] + ``` + - 默认值:无 + +
+ - **jdbcUrl** - 描述:针对关系型数据库的jdbc连接字符串 - 必选:是 + - 字段类型:String + - 默认值:无 + +
+ +- **schema** + - 描述:数据库schema名 + - 必选:否 + - 字段类型:String - 默认值:无 +
+ +- **table** + - 描述:目的表的表名称。目前只支持配置单个表,后续会支持多表 + - 必选:是 + - 字段类型:List + - 默认值:无 +
- **username** - 描述:数据源的用户名 - 必选:是 + - 字段类型:String - 默认值:无 - +
- **password** - 描述:数据源指定用户名的密码 - 必选:是 + - 字段类型:String - 默认值:无 - +
- **column** - 描述:目的表需要写入数据的字段,字段之间用英文逗号分隔。例如: "column": ["id","name","age"] - 必选:是 - - 默认值:否 + - 字段类型:List - 默认值:无 +
+- **fullcolumn** + - 描述:目的表中的所有字段,字段之间用英文逗号分隔。例如: "column": ["id","name","age","hobby"],如果不配置,将在系统表中获取 + - 必选:否 + - 字段类型:List + - 默认值:无 + +
- **preSql** - 描述:写入数据到目的表前,会先执行这里的一组标准语句 - 必选:否 + - 字段类型:String - 默认值:无 - +
- **postSql** - 描述:写入数据到目的表后,会执行这里的一组标准语句 - 必选:否 + - 字段类型:String - 默认值:无 - - -- **table** - - 描述:目的表的表名称。目前只支持配置单个表,后续会支持多表 - - 必选:是 - - 默认值:无 - - +
- **writeMode** - 描述:控制写入数据到目标表采用 `insert into` 或者 `replace into` 或者 `ON DUPLICATE KEY UPDATE` 语句 - 必选:是 - 所有选项:insert/update + - 字段类型:String - 默认值:insert - +
- **batchSize** - 描述:一次性批量提交的记录数大小,该值可以极大减少FlinkX与数据库的网络交互次数,并提升整体吞吐量。但是该值设置过大可能会造成FlinkX运行进程OOM情况 - 必选:否 + - 字段类型:int - 默认值:1024 - +
- **updateKey** - 描述:当写入模式为update和replace时,需要指定此参数的值为唯一索引字段 @@ -80,6 +120,8 @@ - 如果此参数为空,并且写入模式为update和replace时,应用会自动获取数据库中的唯一索引; - 如果数据表没有唯一索引,但是写入模式配置为update和replace,应用会以insert的方式写入数据; - 必选:否 + - 字段类型:Map + - 示例:"updateKey": {"key": ["id"]} - 默认值:无 ** @@ -115,16 +157,16 @@ "writer": { "name": "oraclewriter", "parameter": { - "username": "tudou", - "password": "abc123", + "username": "username", + "password": "password", "connection": [ { - "jdbcUrl": "jdbc:oracle:thin:@kudu5:1521:helowin", - "table": ["TUDOU.KUDU"] + "jdbcUrl": "jdbc:oracle:thin:0.0.0.1:1521:oracle", + "table": ["TABLE"] } ], - "preSql": ["delete from TUDOU.KUDU"], - "postSql": ["update TUDOU.KUDU set USER_ID = 1"], + "preSql": ["delete from TABLE"], + "postSql": ["update TABLE set USER_ID = 1"], "writeMode": "insert", "column": ["ID","USER_ID","NAME"], "batchSize": 1024 @@ -186,12 +228,12 @@ "writer": { "name": "oraclewriter", "parameter": { - "username": "tudou", - "password": "abc123", + "username": "username", + "password": "password", "connection": [ { - "jdbcUrl": "jdbc:oracle:thin:@kudu5:1521:helowin", - "table": ["TUDOU.KUDU"] + "jdbcUrl": "jdbc:oracle:thin:0.0.0.1:1521:oracle", + "table": ["TABLE"] } ], "preSql": [], diff --git a/docs/offline/writer/phoenixwriter.md b/docs/offline/writer/phoenixwriter.md index 88100c2ac3..e65be49e7d 100644 --- a/docs/offline/writer/phoenixwriter.md +++ b/docs/offline/writer/phoenixwriter.md @@ -9,80 +9,121 @@ phoenix4.12.0-HBase-1.3及以上
## 三、参数说明 +- **connection** + - 描述:数据库连接参数,包含jdbcUrl、schema、table等参数 + - 必选:是 + - 字段类型:List + - 示例:指定jdbcUrl、schema、table + ```json + "connection": [{ + "jdbcUrl": "jdbc:phoenix:node01,node02,node03:2181", + "table": ["table"], + "schema":"public" + }] + ``` + - 默认值:无 + +
+ - **jdbcUrl** - 描述:针对关系型数据库的jdbc连接字符串 - 必选:是 + - 字段类型:String + - 默认值:无 + +
+ +- **schema** + - 描述:数据库schema名 + - 必选:否 + - 字段类型:String - 默认值:无 +
+ +- **table** + - 描述:目的表的表名称。目前只支持配置单个表,后续会支持多表 + - 必选:是 + - 字段类型:List + - 默认值:无 +
- **username** - 描述:数据源的用户名 - 必选:是 + - 字段类型:String - 默认值:无 - +
- **password** - 描述:数据源指定用户名的密码 - 必选:是 + - 字段类型:String - 默认值:无 - +
- **column** - 描述:目的表需要写入数据的字段,字段之间用英文逗号分隔。例如: "column": ["id","name","age"] - 必选:是 - 默认值:否 + - 字段类型:List - 默认值:无 +
+ +- **fullcolumn** + - 描述:目的表中的所有字段,字段之间用英文逗号分隔。例如: "column": ["id","name","age","hobby"],如果不配置,将在系统表中获取 + - 必选:否 + - 字段类型:List + - 默认值:无 +
- **preSql** - 描述:写入数据到目的表前,会先执行这里的一组标准语句 - 必选:否 + - 字段类型:String - 默认值:无 - +
- **postSql** - 描述:写入数据到目的表后,会执行这里的一组标准语句 - 必选:否 + - 字段类型:String - 默认值:无 - - -- **table** - - 描述:目的表的表名称。目前只支持配置单个表,后续会支持多表 - - 必选:是 - - 默认值:无 - - +
- **writeMode** - 描述:控制写入数据到目标表采用 `insert into` 或者 `merge into` 语句 - 必选:是 - 所有选项:insert/update + - 字段类型:String - 默认值:insert - +
- **batchSize** - 描述:一次性批量提交的记录数大小,该值可以极大减少FlinkX与数据库的网络交互次数,并提升整体吞吐量。但是该值设置过大可能会造成FlinkX运行进程OOM情况 - 必选:否 + - 字段类型:int - 默认值:1024 - +
- **updateKey** - 描述:当写入模式为update时,需要指定此参数的值为唯一索引字段 - 注意: - 采用`merge into`语法,对目标表进行匹配查询,匹配成功时更新,不成功时插入; - 必选:否 + - 字段类型:Map + - 示例:"updateKey": {"key": ["id"]} - 默认值:无 - - ## 四、配置示例 @@ -126,18 +167,15 @@ phoenix4.12.0-HBase-1.3及以上
"column": [ { "name": "id", - "type": "BIGINT", - "key": "id" + "type": "BIGINT" }, { "name": "user_id", - "type": "BIGINT", - "key": "user_id" + "type": "BIGINT" }, { "name": "name", - "type": "varchar", - "key": "name" + "type": "varchar" }], "writeMode": "insert", "batchSize": 1024, @@ -206,25 +244,21 @@ phoenix4.12.0-HBase-1.3及以上
"column": [ { "name": "id", - "type": "BIGINT", - "key": "id" + "type": "BIGINT" }, { "name": "user_id", - "type": "BIGINT", - "key": "user_id" + "type": "BIGINT" }, { "name": "name", - "type": "varchar", - "key": "name" + "type": "varchar" }], "writeMode": "update", "updateKey": {"key": ["id"]}, "batchSize": 1024, "preSql": [], - "postSql": [], - "updateKey": {} + "postSql": [] } } }], diff --git a/docs/offline/writer/polardbwriter.md b/docs/offline/writer/polardbwriter.md index 4028d820fc..26e4b0ca01 100644 --- a/docs/offline/writer/polardbwriter.md +++ b/docs/offline/writer/polardbwriter.md @@ -9,70 +9,110 @@ ## 三、参数说明 +- **connection** + - 描述:数据库连接参数,包含jdbcUrl、schema、table等参数 + - 必选:是 + - 字段类型:List + - 示例:指定jdbcUrl、schema、table + ```json + "connection": [{ + "jdbcUrl": "jdbc:polardb://0.0.0.1:3306/database", + "table": ["table"], + "schema":"public" + }] + ``` + - 默认值:无 + +
+ - **jdbcUrl** - 描述:针对关系型数据库的jdbc连接字符串 - 必选:是 + - 字段类型:String - 默认值:无 +
+ +- **schema** + - 描述:数据库schema名 + - 必选:否 + - 字段类型:String + - 默认值:无 +
+ +- **table** + - 描述:目的表的表名称。目前只支持配置单个表,后续会支持多表 + - 必选:是 + - 字段类型:List + - 默认值:无 + +
- **username** - 描述:数据源的用户名 - 必选:是 + - 字段类型:String - 默认值:无 - +
- **password** - 描述:数据源指定用户名的密码 - 必选:是 + - 字段类型:String - 默认值:无 - +
- **column** - 描述:目的表需要写入数据的字段,字段之间用英文逗号分隔。例如: "column": ["id","name","age"]。 - 必选:是 - - 默认值:否 + - 字段类型:List - 默认值:无 +
+- **fullcolumn** + - 描述:目的表中的所有字段,字段之间用英文逗号分隔。例如: "column": ["id","name","age","hobby"],如果不配置,将在系统表中获取 + - 必选:否 + - 字段类型:List + - 默认值:无 + +
- **preSql** - 描述:写入数据到目的表前,会先执行这里的一组标准语句 - 必选:否 + - 字段类型:String - 默认值:无 - +
- **postSql** - 描述:写入数据到目的表后,会执行这里的一组标准语句 - 必选:否 + - 字段类型:String - 默认值:无 - - -- **table** - - 描述:目的表的表名称。目前只支持配置单个表,后续会支持多表 - - 必选:是 - - 默认值:无 - - +
- **writeMode** - 描述:控制写入数据到目标表采用 `insert into` 或者 `replace into` 或者 `ON DUPLICATE KEY UPDATE` 语句 - 必选:是 - 所有选项:insert/replace/update + - 字段类型:String - 默认值:insert - +
- **batchSize** - 描述:一次性批量提交的记录数大小,该值可以极大减少FlinkX与数据库的网络交互次数,并提升整体吞吐量。但是该值设置过大可能会造成FlinkX运行进程OOM情况 - 必选:否 + - 字段类型:int - 默认值:1024 - +
- **updateKey** - 描述:当写入模式为update和replace时,需要指定此参数的值为唯一索引字段 @@ -80,10 +120,10 @@ - 如果此参数为空,并且写入模式为update和replace时,应用会自动获取数据库中的唯一索引; - 如果数据表没有唯一索引,但是写入模式配置为update和replace,应用会以insert的方式写入数据; - 必选:否 + - 字段类型:Map + - 示例:"updateKey": {"key": ["id"]} - 默认值:无 - - ## 四、配置示例 @@ -116,16 +156,16 @@ "writer": { "name": "polarwriter", "parameter": { - "username": "dtstack", - "password": "abc123", + "username": "username", + "password": "password", "connection": [ { - "jdbcUrl": "jdbc:mysql://kudu3:3306/tudou?useSSL=false", - "table": ["kudu"] + "jdbcUrl": "jdbc:polardb://0.0.0.1:3306/database", + "table": ["table"] } ], - "preSql": ["truncate table kudu;"], - "postSql": ["update kudu set user_id = 1;"], + "preSql": ["truncate table table;"], + "postSql": ["update table set user_id = 1;"], "writeMode": "insert", "column": ["id","user_id","name"], "batchSize": 1024 @@ -187,12 +227,12 @@ "writer": { "name": "polardbwriter", "parameter": { - "username": "dtstack", - "password": "abc123", + "username": "username", + "password": "password", "connection": [ { - "jdbcUrl": "jdbc:mysql://kudu3:3306/tudou?useSSL=false", - "table": ["kudu"] + "jdbcUrl": "jdbc:polardb://0.0.0.1:3306/database", + "table": ["table"] } ], "preSql": [], @@ -259,12 +299,12 @@ "writer": { "name": "polardbwriter", "parameter": { - "username": "dtstack", - "password": "abc123", + "username": "username", + "password": "password", "connection": [ { - "jdbcUrl": "jdbc:mysql://kudu3:3306/tudou?useSSL=false", - "table": ["kudu"] + "jdbcUrl": "jdbc:polardb://0.0.0.1:3306/database", + "table": ["table"] } ], "preSql": [], diff --git a/docs/offline/writer/postgresqlwriter.md b/docs/offline/writer/postgresqlwriter.md index 4df7a7c9c5..69a008b25c 100644 --- a/docs/offline/writer/postgresqlwriter.md +++ b/docs/offline/writer/postgresqlwriter.md @@ -9,62 +9,101 @@ ## 三、参数说明
+- **connection** + - 描述:数据库连接参数,包含jdbcUrl、schema、table等参数 + - 必选:是 + - 字段类型:List + - 示例:指定jdbcUrl、schema、table + ```json + "connection": [{ + "jdbcUrl": "jdbc:postgresql://0.0.0.1:5432/postgres", + "table": ["table"], + "schema":"public" + }] + ``` + - 默认值:无 + +
+ - **jdbcUrl** - 描述:针对关系型数据库的jdbc连接字符串 - 必选:是 + - 字段类型:String - 默认值:无 +
+- **schema** + - 描述:数据库schema名 + - 必选:否 + - 字段类型:String + - 默认值:无 + +
+ +- **table** + - 描述:目的表的表名称。目前只支持配置单个表,后续会支持多表 + - 必选:是 + - 字段类型:List + - 默认值:无 + +
- **username** - 描述:数据源的用户名 - 必选:是 + - 字段类型:String - 默认值:无 - +
- **password** - 描述:数据源指定用户名的密码 - 必选:是 + - 字段类型:String - 默认值:无 - +
- **column** - 描述:目的表需要写入数据的字段,字段之间用英文逗号分隔。例如: "column": ["id","name","age"] - 必选:是 - - 默认值:否 + - 字段类型:List - 默认值:无 +
+ +- **fullcolumn** + - 描述:目的表中的所有字段,字段之间用英文逗号分隔。例如: "column": ["id","name","age","hobby"],如果不配置,将在系统表中获取 + - 必选:否 + - 字段类型:List + - 默认值:无 +
- **preSql** - 描述:写入数据到目的表前,会先执行这里的一组标准语句 - 必选:否 + - 字段类型:String - 默认值:无 - +
- **postSql** - 描述:写入数据到目的表后,会执行这里的一组标准语句 - 必选:否 + - 字段类型:String - 默认值:无 - - -- **table** - - 描述:目的表的表名称。目前只支持配置单个表,后续会支持多表 - - 必选:是 - - 默认值:无 - - +
- **writeMode** - 描述:仅支持`insert`操作,可以搭配insertSqlMode使用 - 必选:是 + - 字段类型:String - 默认值:无 - +
- **insertSqlMode** - 描述:控制写入数据到目标表采用  `COPY table_name [ ( column_name [, ...] ) ] FROM STDIN DELIMITER 'delimiter_character'`语句,提高数据的插入效率 @@ -73,14 +112,17 @@ - 目前该参数值固定传入 `copy`,否则抛出提示为`not support insertSqlMode`的`RuntimeException` - 当指定此参数时,writeMode的值必须为 `insert`,否则设置无效 - 必选:否 + - 字段类型:String - 默认值:无 - +
- **batchSize** - 描述:一次性批量提交的记录数大小,该值可以极大减少FlinkX与数据库的网络交互次数,并提升整体吞吐量。但是该值设置过大可能会造成FlinkX运行进程OOM情况 - 必选:否 + - 字段类型:int - 默认值:1024 + ** diff --git a/docs/offline/writer/rediswriter.md b/docs/offline/writer/rediswriter.md index 416981a19a..79b2a65ece 100644 --- a/docs/offline/writer/rediswriter.md +++ b/docs/offline/writer/rediswriter.md @@ -13,35 +13,40 @@ - 描述:Redis的IP地址和端口 - 必选:是 - 默认值:localhost:6379 - + - 字段类型:String +
- **password** - 描述:数据源指定用户名的密码 - 必选:是 - 默认值:无 - + - 字段类型:String +
- **database** - 描述:要写入Redis数据库 - 必选:否 - 默认值:0 - + - 字段类型:Integer +
- **keyFieldDelimiter** - 描述:写入 Redis 的 key 分隔符。比如: key=key1\u0001id,如果 key 有多个需要拼接时,该值为必填项,如果 key 只有一个则可以忽略该配置项。 - 必选:否 - 默认值:\u0001 - + - 字段类型:String +
- **dateFormat** - 描述:写入 Redis 时,Date 的时间格式:”yyyy-MM-dd HH:mm:ss” - 必选:否 - 默认值:将日期以long类型写入 - + - 字段类型:String +
- **expireTime** @@ -49,7 +54,8 @@ - 注意:如果过期时间的秒数大于 60_60_24*30(即 30 天),则服务端认为是 Unix 时间,该时间指定了到未来某个时刻数据失效。否则为相对当前时间的秒数,该时间指定了从现在开始多长时间后数据失效。 - 必选:否 - 默认值:0(0 表示永久有效) - + - 字段类型:Long +
- **timeout** @@ -57,7 +63,8 @@ - 单位:毫秒 - 必选:否 - 默认值:30000 - + - 字段类型:Long +
- **type和mode** @@ -71,17 +78,18 @@ | set | 字符串集合 | sadd | 向 set 集合中存储这个数据,如果已经存在则覆盖 | | | zset | 有序字符串集合 | zadd | 向 zset 有序集合中存储这个数据,如果已经存在则覆盖 | 当 value 类型是 zset 时,数据源的每一行记录需要遵循相应的规范,即每一行记录除 key 以外,只能有一对 score 和 value,并且 score 必须在 value 前面,rediswriter 方能解析出哪一个 column 是 score,哪一个 column 是 value。 | | hash | 哈希 | hset | 向 hash 有序集合中存储这个数据,如果已经存在则覆盖 | 当 value 类型是 hash 时,数据源的每一行记录需要遵循相应的规范,即每一行记录除 key 以外,只能有一对 attribute 和 value,并且 attribute 必须在 value 前面,Rediswriter 方能解析出哪一个 column 是 attribute,哪一个 column 是 value。 | - - 必选:是 - 默认值:无 - + - 字段类型:String +
- **valueFieldDelimiter** - 描述:该配置项是考虑了当源数据每行超过两列的情况(如果您的源数据只有两列即 key 和 value 时,那么可以忽略该配置项,不用填写),value 类型是 string 时,value 之间的分隔符,比如 value1\u0001value2\u0001value3。 - 必选:否 - 默认值:\u0001 - + - 字段类型:String +
- **keyIndexes** @@ -89,10 +97,10 @@ - 注意:配置 keyIndexes 后,Redis Writer 会将其余的列作为 value,如果您只想同步源表的某几列作为 key,某几列作为 value,不需要同步所有字段,那么在 Reader 插件端就指定好 column 作好列筛选即可。例如:Redis中的数据为 "test,redis,First,Second",keyIndexes = [0,1] ,因此得到的key为 "test\\u0001redis", value为 "First\\u0001Second" - 必选:是 - 默认值:无 + - 字段类型:List +
-
- ## 四、 使用示例 ```json diff --git a/docs/offline/writer/saphanawriter.md b/docs/offline/writer/saphanawriter.md index 487c69e0b7..86daa8c031 100644 --- a/docs/offline/writer/saphanawriter.md +++ b/docs/offline/writer/saphanawriter.md @@ -9,70 +9,110 @@ SAP HANA 2.0及以上
## 三、参数说明 +- **connection** + - 描述:数据库连接参数,包含jdbcUrl、schema、table等参数 + - 必选:是 + - 字段类型:List + - 示例:指定jdbcUrl、schema、table + ```json + "connection": [{ + "jdbcUrl": "jdbc:sap://0.0.0.1:39017", + "table": ["table"], + "schema":"public" + }] + ``` + - 默认值:无 + +
+ - **jdbcUrl** - 描述:针对关系型数据库的jdbc连接字符串 - 必选:是 + - 字段类型:String + - 默认值:无 + +
+ +- **schema** + - 描述:数据库schema名 + - 必选:否 + - 字段类型:String - 默认值:无 +
+- **table** + - 描述:目的表的表名称。目前只支持配置单个表,后续会支持多表 + - 必选:是 + - 字段类型:List + - 默认值:无 + +
- **username** - 描述:数据源的用户名 - 必选:是 + - 字段类型:String - 默认值:无 - +
- **password** - 描述:数据源指定用户名的密码 - 必选:是 + - 字段类型:String - 默认值:无 - +
- **column** - 描述:目的表需要写入数据的字段,字段之间用英文逗号分隔。例如: "column": ["id","name","age"] - 必选:是 - - 默认值:否 + - 字段类型:List - 默认值:无 +
+ +- **fullcolumn** + - 描述:目的表中的所有字段,字段之间用英文逗号分隔。例如: "column": ["id","name","age","hobby"],如果不配置,将在系统表中获取 + - 必选:否 + - 字段类型:List + - 默认值:无 +
- **preSql** - 描述:写入数据到目的表前,会先执行这里的一组标准语句 - 必选:否 + - 字段类型:String - 默认值:无 - +
- **postSql** - 描述:写入数据到目的表后,会执行这里的一组标准语句 - 必选:否 + - 字段类型:String - 默认值:无 - - -- **table** - - 描述:目的表的表名称。目前只支持配置单个表,后续会支持多表 - - 必选:是 - - 默认值:无 - - +
- **writeMode** - 描述:控制写入数据到目标表采用 `insert into` 或者 `replace into` 或者 `ON DUPLICATE KEY UPDATE` 语句 - 必选:是 - 所有选项:insert/update + - 字段类型:String - 默认值:insert - +
- **batchSize** - 描述:一次性批量提交的记录数大小,该值可以极大减少FlinkX与数据库的网络交互次数,并提升整体吞吐量。但是该值设置过大可能会造成FlinkX运行进程OOM情况 - 必选:否 + - 字段类型:int - 默认值:1024 - +
- **updateKey** - 描述:当写入模式为update和replace时,需要指定此参数的值为唯一索引字段 @@ -80,10 +120,10 @@ SAP HANA 2.0及以上
- 如果此参数为空,并且写入模式为update和replace时,应用会自动获取数据库中的唯一索引; - 如果数据表没有唯一索引,但是写入模式配置为update和replace,应用会以insert的方式写入数据; - 必选:否 + - 字段类型:Map + - 示例:"updateKey": {"key": ["id"]} - 默认值:无 - - ## 四、配置示例 ```json @@ -109,9 +149,9 @@ SAP HANA 2.0及以上
"parameter": { "connection": [ { - "jdbcUrl": "jdbc:sap://kudu3:39017", + "jdbcUrl": "jdbc:sap://0.0.0.1:39017", "table": [ - "SYS.P_ROLES_" + "TABLE" ] } ], diff --git a/docs/offline/writer/sqlserverwriter.md b/docs/offline/writer/sqlserverwriter.md index 4c3317ab49..95181476be 100644 --- a/docs/offline/writer/sqlserverwriter.md +++ b/docs/offline/writer/sqlserverwriter.md @@ -9,38 +9,82 @@ ## 三、参数说明 +- **connection** + - 描述:数据库连接参数,包含jdbcUrl、schema、table等参数 + - 必选:是 + - 字段类型:List + - 示例:指定jdbcUrl、schema、table + ```json + "connection": [{ + "jdbcUrl": "jdbc:jtds:sqlserver://0.0.0.1:1433;DatabaseName=DTstack", + "table": ["table"], + "schema":"public" + }] + ``` + - 默认值:无 + +
+ - **jdbcUrl** - 描述:使用开源的jtds驱动连接 而非Microsoft的官方驱动
jdbcUrl参考文档:[jtds驱动官方文档](http://jtds.sourceforge.net/faq.html) - 必选:是 + - 字段类型:String - 默认值:无 +
+- **schema** + - 描述:数据库schema名 + - 必选:否 + - 字段类型:String + - 默认值:无 + +
+ +- **table** + - 描述:目的表的表名称。目前只支持配置单个表,后续会支持多表 + - 必选:是 + - 字段类型:List + - 默认值:无 + +
- **username** - 描述:数据源的用户名 - 必选:是 + - 字段类型:String - 默认值:无 - +
- **password** - 描述:数据源指定用户名的密码 - 必选:是 + - 字段类型:String - 默认值:无 - +
- **column** - 描述:目的表需要写入数据的字段,字段之间用英文逗号分隔。例如: "column": ["id","name","age"] - 必选:是 - - 默认值:否 + - 字段类型:List - 默认值:无 +<>br/ +- **fullcolumn** + - 描述:目的表中的所有字段,字段之间用英文逗号分隔。例如: "column": ["id","name","age","hobby"],如果不配置,将在系统表中获取 + - 必选:否 + - 字段类型:List + - 默认值:无 + +
- **presql** - 描述:写入数据到目的表前,会先执行这里的一组标准语句 - 必选:否 + - 字段类型:String - 默认值:无
@@ -48,37 +92,43 @@ - **postSql** - 描述:写入数据到目的表后,会执行这里的一组标准语句 - 必选:否 + - 字段类型:String - 默认值:无 +
+- **withNoLock** + - 描述:是否在sql语句后面添加 with(nolock) + - 必选:否 + - 字段类型:Boolean + - 默认值:false -- **table** - - 描述:目的表的表名称。目前只支持配置单个表,后续会支持多表 - - 必选:是 - - 默认值:无 - - +
- **writeMode** - 描述:控制写入数据到目标表采用 `insert into` 或者` merge into` 语句 - 必选:是 - 所有选项:insert/update + - 字段类型:String - 默认值:insert - +
- **updateKey** - 描述:当写入模式为update时,需要指定此参数的值为唯一索引字段 - 注意: - 采用`merge into`语法,对目标表进行匹配查询,匹配成功时更新,不成功时插入; - 必选:否 + - 字段类型:Map + - 示例:"updateKey": {"key": ["id"]} - 默认值:无 - +
- **batchSize** - 描述:一次性批量提交的记录数大小,该值可以极大减少FlinkX与数据库的网络交互次数,并提升整体吞吐量。但是该值设置过大可能会造成FlinkX运行进程OOM情况 - 必选:否 + - 字段类型:int - 默认值:1024 diff --git a/docs/questions.md b/docs/questions.md index 64073f9c6c..9d707038ba 100644 --- a/docs/questions.md +++ b/docs/questions.md @@ -12,9 +12,12 @@ ./install_jars.sh ``` -### 2.FlinkX版本需要与Flink版本保持一致 -1.8_release版本对应flink1.8 -1.10_release版本对应flink1.10 版本 +### 2.FlinkX版本需要与Flink版本保持一致,最好小版本也保持一致 +| FlinkX分支 | Flink版本 | +| --- | --- | +| 1.8_release | Flink1.8.3 | +| 1.10_release | Flink1.10.1 | +| 1.11_release | Flink1.11.3 | 不对应在standalone和yarn session模式提交时,会报错: Caused by: java.io.InvalidClassException: org.apache.flink.api.common.operators.ResourceSpec; incompatible types for field cpuCores diff --git a/docs/quickstart.md b/docs/quickstart.md index 7fda36c0d9..59467c8445 100644 --- a/docs/quickstart.md +++ b/docs/quickstart.md @@ -1,3 +1,19 @@ +目录: + + +- [下载代码](#下载代码) +- [编译插件](#编译插件) + - [1.编译找不到DB2、达梦、gbase、ojdbc8、kingbase、vertica等驱动包](#1编译找不到db2达梦gbaseojdbc8kingbasevertica等驱动包) + - [2.编译报错找不到其他包](#2编译报错找不到其他包) +- [运行任务](#运行任务) + - [Local模式运行任务](#local模式运行任务) + - [Standalone模式运行](#standalone模式运行) + - [以Yarn Session模式运行任务](#以yarn-session模式运行任务) + - [以Yarn Perjob模式运行任务](#以yarn-perjob模式运行任务) +- [参数说明](#参数说明) + + + ## 下载代码 1.使用git工具把项目clone到本地 @@ -10,16 +26,9 @@ cd flinkx 2.直接下载源码 ``` -wget https://github.com/DTStack/flinkx/archive/1.10_release.zip -unzip 1.10_release.zip -cd 1.10_release -``` - -3.直接下载源码和编译好的插件包(推荐) -``` -wget https://github.com/DTStack/flinkx/releases/download/1.10.4/flinkx.7z -7za x flinkx.7z -cd flinkx +wget https://github.com/DTStack/flinkx/archive/1.11_release.zip +unzip 1.11_release.zip +cd 1.11_release ``` ## 编译插件 @@ -28,11 +37,66 @@ cd flinkx mvn clean package -DskipTests ``` -## 常见问题 +对于不需要的插件,可以修改$FLINKX_HOME目录下的pom文件,可以将不需要的模块和`flinkx-test`模块注释掉,在编译时将不会编译该插件,这样可以缩短编译时间. + +注:**部分模块有依赖关系,请注意**。若遇到这种情况,请根据maven报错提示,将对应依赖的模块取消注释。 + +```xml + + flinkx-core + + flinkx-launcher + flinkx-test + flinkx-stream + + + flinkx-rdb + flinkx-mysql + flinkx-polardb + flinkx-oracle + flinkx-sqlserver + flinkx-postgresql + flinkx-db2 + flinkx-dm + flinkx-gbase + flinkx-clickhouse + flinkx-saphana + flinkx-teradata + flinkx-greenplum + flinkx-kingbase + + flinkx-hdfs + flinkx-hive + flinkx-es + flinkx-ftp + flinkx-odps + flinkx-hbase + flinkx-phoenix5 + flinkx-carbondata + flinkx-kudu + flinkx-cassandra + + flinkx-redis + flinkx-mongodb + + + flinkx-binlog + flinkx-kb + flinkx-kafka09 + flinkx-kafka10 + flinkx-kafka11 + flinkx-kafka + flinkx-emqx + flinkx-pulsar + flinkx-pgwal + flinkx-restapi + flinkx-oraclelogminer + +``` -### 1.编译找不到DB2、达梦、gbase、ojdbc8等驱动包 +### 1.编译找不到DB2、达梦、gbase、ojdbc8、kingbase、vertica等驱动包 -解决办法:在$FLINKX_HOME/jars目录下有这些驱动包,可以手动安装,也可以使用插件提供的脚本安装: +解决办法:在$FLINKX_HOME/jars目录下有这些驱动包,可以手动安装,也可以使用$FLINKX_HOME/bin目录下的脚本安装: ```bash ## windows平台 @@ -42,6 +106,30 @@ mvn clean package -DskipTests ./install_jars.sh ``` +### 2.编译报错找不到其他包 + +解决办法:在$FLINKX_HOME/jars目录下有maven的setting文件,内容如下,**修改仓库路径**后替换本地maven的setting文件,重新安装[步骤一](#1.编译找不到DB2、达梦、gbase、ojdbc8、kingbase、vertica等驱动包)中的驱动包,然后再编译插件 +```xml + + + + + /home/apache-maven-3.6.1/repository + + + + alimaven + aliyun maven + http://maven.aliyun.com/nexus/content/groups/public/ + central + + + + +``` + ## 运行任务 首先准备要运行的任务json,这里以stream插件为例: @@ -50,35 +138,27 @@ mvn clean package -DskipTests { "job" : { "content" : [ { - "reader": { - "name": "streamreader", - "parameter": { - "column": [ - { - "name": "id", - "type": "int" - }, - { - "name": "name", - "type": "string" - } - ] - } + "reader" : { + "parameter" : { + "column" : [ { + "name": "id", + "type" : "id" + }, { + "name": "string", + "type" : "string" + } ], + "sliceRecordCount" : [ "10"] + }, + "name" : "streamreader" }, "writer" : { "parameter" : { - "print": false + "print" : true }, "name" : "streamwriter" } } ], "setting" : { - "restore" : { - "isRestore" : false, - "isStream" : false - }, - "errorLimit" : { - }, "speed" : { "channel" : 1 } @@ -87,20 +167,26 @@ mvn clean package -DskipTests } ``` +
+ ### Local模式运行任务 命令模板: ```bash bin/flinkx \ - -mode local \ - -job docs/example/stream_stream.json \ - -pluginRoot syncplugins + -mode local \ + -job docs/example/stream_stream.json \ + -pluginRoot syncplugins \ + -flinkconf flinkconf ``` -可以在flink的配置文件里配置端口: - +修改flink配置文件,指定web UI端口 ```bash +vi flinkconf/flink-conf.yaml +``` + +```yml ## web服务端口,不指定的话会随机生成一个 rest.bind-port: 8888 ``` @@ -109,9 +195,10 @@ rest.bind-port: 8888 ```bash bin/flinkx \ - -mode local \ - -job docs/example/stream_stream.json \ - -pluginRoot syncplugins + -mode local \ + -job docs/example/stream_stream.json \ + -pluginRoot syncplugins \ + -flinkconf flinkconf ``` 任务运行后可以通过8888端口访问flink界面查看任务运行情况: @@ -120,6 +207,8 @@ bin/flinkx \ +
+ ### Standalone模式运行 命令模板: @@ -161,6 +250,8 @@ $FLINK_HOME/bin/start-cluster.sh +
+ ### 以Yarn Session模式运行任务 命令示例: @@ -175,10 +266,26 @@ bin/flinkx \ -confProp "{\"flink.checkpoint.interval\":60000}" ``` -首先确保yarn集群是可用的,然后手动启动一个yarn session: +[下载](https://mvnrepository.com/artifact/org.apache.flink/flink-shaded-hadoop-2-uber)对应Hadoop版本的flink shade包,放入$FLINK_HOME/lib目录下(从flink1.11开始官方不再提供打包好的flink shade包,需要自行[下载](https://github.com/apache/flink-shaded)打包) + +[下载](https://mvnrepository.com/artifact/org.apache.flink/flink-metrics-prometheus)对应版本的flink prometheus包,放入$FLINK_HOME/lib目录下 +修改flink配置文件,指定flink类加载方式 ```bash -$FLINK_HOME/bin/yarn-session.sh -n 1 -s 2 -jm 1024 -tm 1024 +vi ../conf/flink-conf.yaml +``` + +```yml +## flink类加载方式,指定为父类优先 +classloader.resolve-order: parent-first +``` + +确保yarn集群是可用的,然后手动启动一个yarn session: + +注:-ship: 启动flink session时上传FlinkX插件包,这样只需要在提交FlinkX任务的节点部署FlinkX插件包,其他服务器节点不需要部署,同时更换FlinkX插件包后需要重启yarn session,需要配合修改flink的类加载方式。 + +```bash +nohup $FLINK_HOME/bin/yarn-session.sh -qu a -ship $FLINKX_HOME/syncplugins/ & ```
@@ -196,7 +303,8 @@ bin/flinkx \ -mode yarn \ -job docs/example/stream_stream.json \ -flinkconf $FLINK_HOME/conf \ - -yarnconf $HADOOP_HOME/etc/hadoop + -yarnconf $HADOOP_HOME/etc/hadoop \ + -queue a ``` 然后在flink界面查看任务运行情况: @@ -205,6 +313,8 @@ bin/flinkx \
+
+ ### 以Yarn Perjob模式运行任务 命令示例: @@ -218,8 +328,7 @@ bin/flinkx \ -yarnconf $HADOOP_HOME/etc/hadoop \ -flinkLibJar $FLINK_HOME/lib \ -confProp "{\"flink.checkpoint.interval\":60000}" \ - -queue default \ - -pluginLoadMode classpath + -queue default ``` 首先确保yarn集群是可用的,启动一个Yarn Application运行任务: @@ -228,10 +337,10 @@ bin/flinkx \ bin/flinkx \ -mode yarnPer \ -job docs/example/stream_stream.json \ - -pluginRoot syncplugins \ + -pluginRoot $FLINK_HOME/conf \ -yarnconf $HADOOP_HOME/etc/hadoop \ -flinkLibJar $FLINK_HOME/lib \ - -pluginLoadMode classpath + -queue a ``` 然后在集群上查看任务运行情况 @@ -250,18 +359,18 @@ bin/flinkx \ | ------------------ | ------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---- | ----------------------- | | **mode** | 执行模式,也就是flink集群的工作模式 | 1.**local**: 本地模式
2.**standalone**: 独立部署模式的flink集群
3.**yarn**: yarn模式的flink集群,需要提前在yarn上启动一个flink session,使用默认名称"Flink session cluster"
4.**yarnPer**: yarn模式的flink集群,单独为当前任务启动一个flink session,使用默认名称"Flink per-job cluster" | 否 | local | | **job** | 数据同步任务描述文件的存放路径;该描述文件中使用json字符串存放任务信息 | 无 | 是 | 无 | -| **jobid** | 任务名称 | 无 | 否 | Flink Job | +| **jobid** | 指定flink任务名称 | 无 | 否 | Flink Job | | **pluginRoot** | 插件根目录地址,也就是打包后产生的pluginRoot目录。 | 无 | 否 | $FLINKX_HOME/syncplugins | | **flinkconf** | flink配置文件所在的目录(单机模式下不需要) | $FLINK_HOME/conf | 否 | $FLINK_HOME/conf | -| **flinkLibJar** | flink lib所在的目录(单机模式下不需要),如/opt/dtstack/flink-1.10.1/lib | $FLINK_HOME/lib | 否 | $FLINK_HOME/lib | +| **flinkLibJar** | flink lib所在的目录(单机模式下不需要),如/opt/dtstack/flink-1.11.3/lib | $FLINK_HOME/lib | 否 | $FLINK_HOME/lib | | **yarnconf** | Hadoop配置文件(包括hdfs和yarn)所在的目录 | $HADOOP_HOME/etc/hadoop | 否 | $HADOOP_HOME/etc/hadoop | | **queue** | yarn队列,如default | 无 | 否 | default | | **pluginLoadMode** | yarn session模式插件加载方式 | 1.**classpath**:提交任务时不上传插件包,需要在yarn-node节点pluginRoot目录下部署插件包,但任务启动速度较快
2.**shipfile**:提交任务时上传pluginRoot目录下部署插件包的插件包,yarn-node节点不需要部署插件包,任务启动速度取决于插件包的大小及网络环境 | 否 | shipfile | -| **confProp** | checkpoint配置 | **flink.checkpoint.interval**:快照生产频率
**flink.checkpoint.stateBackend**:快照存储路径 | 否 | 无 | -| **s** | checkpoint快照路径 | | 否 | 无 | +| **confProp** | flink额外配置,如checkpoint、内存 | **flink.checkpoint.interval**:快照生产频率(毫秒)
**flink.checkpoint.timeout**:快照超时时间(毫秒)
**jobmanager.memory.mb**:perJob模式下jobmanager内存设置
**taskmanager.memory.mb**:perJob模式下taskmanager内存设置
**taskmanager.slots**:perJob模式下jobmanager slots个数设置 | 否 | 无 | +| **s** | checkpoint快照路径,设置后从该快照恢复任务 | | 否 | 无 | | **p** | 自定义入参,用于替换脚本中的占位符,如脚本中存在占位符${pt1},${pt2},则该参数可配置为pt1=20200101,pt2=20200102| | 否 | 无 | -| **appId** | yarn模式下,提交到指定的的flink session的application Id | | 否 | 无 | -| **krb5conf** | 提交到开启kerberos的Hadoop集群的krb5文件路径 | | 否 | 无 | -| **keytab** | 提交到开启kerberos的Hadoop集群的keytab文件路径 | | 否 | 无 | -| **principal** | kerberos认证的principal | | 否 | 无 | +| **appId** | yarn session模式下,提交到指定的的flink session的application Id | | 否 | 无 | +| **krb5conf** | 提交到开启kerberos的Hadoop集群的krb5文件路径 | | 否 | 无 | +| **keytab** | 提交到开启kerberos的Hadoop集群的keytab文件路径 | | 否 | 无 | +| **principal** | kerberos认证的principal | | 否 | 无 | diff --git "a/docs/realTime/other/LogMiner\345\216\237\347\220\206.md" "b/docs/realTime/other/LogMiner\345\216\237\347\220\206.md" new file mode 100644 index 0000000000..818851d50c --- /dev/null +++ "b/docs/realTime/other/LogMiner\345\216\237\347\220\206.md" @@ -0,0 +1,392 @@ +# FlinkX Oracle LogMiner实时采集基本原理 + +本文主要对Logminer基本原理以及如何使用和Flinkx与Logminer的集成进行介绍 +通过本文你可以了解到: + +- Logminer是什么 +- Logminer的使用 +- Flinkx如何和Logminer的集成 + +# Logminer是什么? +LogMiner 是Oracle公司从产品8i以后提供的一个实际非常有用的分析工具,使用该工具可以轻松获得Oracle 重做日志文件(归档日志文件)中的具体内容,LogMiner分析工具实际上是由一组PL/SQL包和一些动态视图组成,它作为Oracle数据库的一部分来发布,是oracle公司提供的一个完全免费的工具。 + + +具体的说: +对用户数据或数据库字典所做的所有更改都记录在Oracle重做日志文件RedoLog中,Logminer就是一个解析RedoLog的工具,通过Logminer解析RedoLog可以得到对应的SQL数据。 + +Oracle 中的RedoLog写入流程: +Oracle重做日志采用**循环写入**的方式,每一个Oracle实例至少拥有**2组日志组**。Oracle重做日志一般由Oracle自动切换,重做日志文件在当LGWR进程停止写入并开始写入下一个日志组时发生切换,或在用户收到发出ALTER SYSTEM SWITCH LOGFILE时发生切换。如果Oracle数据库开启了归档功能,则在日志组发生切换的时候,上一个日志组的日志文件会被归档到归档目录里 + + +从上面可知 Oracle里的RedoLog文件分为两种: + +- 当前写的日志组的文件,可通过 v$log 和 v$logfile 得到 +- 归档的redoLog文件,可通过 v$archived_log 得到 + +v$log 文档 +[https://docs.oracle.com/cd/B19306_01/server.102/b14237/dynviews_1150.htm#REFRN30127](https://docs.oracle.com/cd/B19306_01/server.102/b14237/dynviews_1150.htm#REFRN30127) + + +v$logfile 文档 +[https://docs.oracle.com/cd/B28359_01/server.111/b28320/dynviews_2031.htm#REFRN30129](https://docs.oracle.com/cd/B28359_01/server.111/b28320/dynviews_2031.htm#REFRN30129) + + +v$archived_log 文档 +[https://docs.oracle.com/cd/E18283_01/server.112/e17110/dynviews_1016.htm](https://docs.oracle.com/cd/E18283_01/server.112/e17110/dynviews_1016.htm) + + +**通过循环查找到最新符合要求的RedoLog并让Logminer加载分析,分析的数据在视图 v$logmnr_contents 里,通过读取 v$logmnr_contents 就可以得到 Oracle的实时数据** + +# Logminer的使用 + + +## Logminer的配置与开启 +[Oracle配置LogMiner](LogMiner配置.md) + +## Logminer的使用 + +1. 指定LogMiner字典。 + +1. 指定重做日志文件列表以进行分析。 + 使用 `DBMS_LOGMNR.ADD_LOGFILE` 过程,或指示LogMiner在启动LogMiner时自动创建要分析的日志文件列表(在步骤3中)。 + +1. 启动LogMiner。 + 使用 `DBMS_LOGMNR.START_LOGMNR` 程序。 + +1. 请求感兴趣的重做数据。 + 查询`V$LOGMNR_CONTENTS`视图。(您必须具有`SELECT ANY TRANSACTION`查询此视图的权限) + +1. 结束LogMiner会话。 + 使用 `DBMS_LOGMNR.END_LOGMNR` 程序 + + + + +### Logminer字典 +#### LogMiner字典作用 +Oracle数据字典记录当前所有表的信息,字段的信息等等。LogMiner使用字典将内部对象标识符和数据类型转换为对象名称和外部数据格式。如果没有字典,LogMiner将返回内部对象ID,并将数据显示为二进制数 +```sql +INSERT INTO HR.JOBS(JOB_ID, JOB_TITLE, MIN_SALARY, MAX_SALARY) VALUES('IT_WT','Technical Writer', 4000, 11000); + +``` +没有字典,LogMiner将显示: +```sql +insert into "UNKNOWN"."OBJ# 45522"("COL 1","COL 2","COL 3","COL 4") values +(HEXTORAW('45465f4748'),HEXTORAW('546563686e6963616c20577269746572'), +HEXTORAW('c229'),HEXTORAW('c3020b')); +``` + + +#### Logminer字典选项 +LogMiner字典的选项支持三种: + +- [Using the Online Catalog](https://docs.oracle.com/cd/B19306_01/server.102/b14215/logminer.htm#i1014720) + Oracle recommends that you use this option when you will have access to the source database from which the redo log files were created and when no changes to the column definitions in the tables of interest are anticipated. This is the most efficient and easy-to-use option. + +- [Extracting a LogMiner Dictionary to the Redo Log Files](https://docs.oracle.com/cd/B19306_01/server.102/b14215/logminer.htm#i1014735) + Oracle recommends that you use this option when you do not expect to have access to the source database from which the redo log files were created, or if you anticipate that changes will be made to the column definitions in the tables of interest. + +- [Extracting the LogMiner Dictionary to a Flat File](https://docs.oracle.com/cd/B19306_01/server.102/b14215/logminer.htm#i1014763) + This option is maintained for backward compatibility with previous releases. This option does not guarantee transactional consistency. Oracle recommends that you use either the online catalog or extract the dictionary from redo log files instead. + + + +翻译: + +- [使用在线目录](https://docs.oracle.com/cd/B19306_01/server.102/b14215/logminer.htm#i1014720) + 当您可以访问从其创建重做日志文件的源数据库并且预计不会对目标表中的列定义进行任何更改时,Oracle建议您使用此选项。这是最有效和易于使用的选项。 + +- [将LogMiner字典提取到重做日志文件](https://docs.oracle.com/cd/B19306_01/server.102/b14215/logminer.htm#i1014735) + 如果您不希望访问创建重做日志文件的源数据库,或者希望对感兴趣的表中的列定义进行更改,则Oracle建议您使用此选项。 + +- [将LogMiner字典提取到平面文件](https://docs.oracle.com/cd/B19306_01/server.102/b14215/logminer.htm#i1014763) + 维护此选项是为了与以前的版本向后兼容。此选项不能保证事务的一致性。Oracle建议您使用联机目录或从重做日志文件中提取字典。 + + + + + + +### 指定Logminer重做日志文件 +要启动新的重做日志文件列表,需要使用  DBMS_LOGMNR.NEW 以表明这是新列表的开始 +```sql +EXECUTE DBMS_LOGMNR.ADD_LOGFILE( + LOGFILENAME => '/oracle/logs/log1.f', + OPTIONS => DBMS_LOGMNR.NEW); +``` +可以使用下列语句额外再添加日志文件 +```sql +EXECUTE DBMS_LOGMNR.ADD_LOGFILE( + LOGFILENAME => '/oracle/logs/log2.f', + OPTIONS => DBMS_LOGMNR.ADDFILE); +``` + + +### 启动LogMiner +使用 `DBMS_LOGMNR.START_LOGMN` 启动Logminer。可以指定参数: + +- 指定LogMiner如何过滤返回的数据(例如,通过开始和结束时间或SCN值) + +- 指定用于格式化LogMiner返回的数据的选项 + +- 指定要使用的LogMiner词典 + + + + +主要的参数有: +```text + OPTIONS参数说明: + * DBMS_LOGMNR.SKIP_CORRUPTION - 跳过出错的redlog + * DBMS_LOGMNR.NO_SQL_DELIMITER - 不使用 ';'分割redo sql + * DBMS_LOGMNR.NO_ROWID_IN_STMT - 默认情况下,用于UPDATE和DELETE操作的SQL_REDO和SQL_UNDO语句在where子句中包含“ ROWID =”。 + * 但是,这对于想要重新执行SQL语句的应用程序是不方便的。设置此选项后,“ ROWID”不会放置在重构语句的末尾 + * DBMS_LOGMNR.DICT_FROM_ONLINE_CATALOG - 使用在线字典 + * DBMS_LOGMNR.CONTINUOUS_MINE - 需要在生成重做日志的同一实例中使用日志 + * DBMS_LOGMNR.COMMITTED_DATA_ONLY - 指定此选项时,LogMiner将属于同一事务的所有DML操作分组在一起。事务按提交顺序返回。 + * DBMS_LOGMNR.STRING_LITERALS_IN_STMT - 默认情况下,格式化格式化的SQL语句时,SQL_REDO和SQL_UNDO语句会使用数据库会话的NLS设置 + * 例如NLS_DATE_FORMAT,NLS_NUMERIC_CHARACTERS等)。使用此选项,将使用ANSI / ISO字符串文字格式对重构的SQL语句进行格式化。 +``` + + +示例 + +```sql +EXECUTE DBMS_LOGMNR.START_LOGMNR( + STARTTIME => '01-Jan-2003 08:30:00', + ENDTIME => '01-Jan-2003 08:45:00', + OPTIONS => DBMS_LOGMNR.DICT_FROM_ONLINE_CATALOG + + DBMS_LOGMNR.CONTINUOUS_MINE); +``` + + +### 在V$ LOGMNR_CONTENTS中查询感兴趣的重做数据 +Logminer会解析redoLog里的日志加载到 v$LOGMNR_CONTENTS 视图里,我们只需要使用 sql查询 即可获取对应数据 +v$LOGMNR_CONTENTS视图相关字段 +[https://docs.oracle.com/cd/B19306_01/server.102/b14237/dynviews_1154.htm](https://docs.oracle.com/cd/B19306_01/server.102/b14237/dynviews_1154.htm) + + +主要字段有: + +| 列 | 数据类型 | 描述 | +| --- | --- | --- | +| SCN | NUMBER | oracle为每个已提交的事务分配唯一的scn | +| OPERATION | VARCHAR2(32) | INSERT UPDATE DELETE DDL COMMIT ROLLBACK.....| +| SEG_OWNER | VARCHAR2(32) | schema | +| TABLE_NAME | VARCHAR2(32) | 表名 | +| TIMESTAMP | DATE | 数据库变动时间戳 | +| SQL_REDO | VARCHAR2(4000) | 重建的SQL语句,该语句等效于进行更改的原始SQL语句 | + + + +示例 + +```sql +SELECT + scn, + timestamp, + operation, + seg_owner, + table_name, + sql_redo, + row_id, + csf +FROM + v$logmnr_contents +WHERE + scn > ? +``` + + +查询出来的数据示例: + +
+ +
+ +# Flinkx如何使用Logminer + + +使用Logminer在于关键2步骤: + +- 找到需要分析的Redolog日志,加载到Logminer +- 开启Logminer,在 v$LOGMNR_CONTENTS 查询感兴趣数据 +### 1. 查找RedoLog文件 +从上面介绍中 我们可以知道 Redolog来源于日志组和归档日志里,所以flinkx 根据SCN号查询日志组以及归档日志获取到对应的文件 +```sql +SELECT + MIN(name) name, + first_change# +FROM + ( + SELECT + MIN(member) AS name, + first_change#, + 281474976710655 AS next_change# + FROM + v$log l + INNER JOIN v$logfile f ON l.group# = f.group# + WHERE l.STATUS = 'CURRENT' OR l.STATUS = 'ACTIVE' + GROUP BY + first_change# + UNION + SELECT + name, + first_change#, + next_change# + FROM + v$archived_log + WHERE + name IS NOT NULL + ) +WHERE + first_change# >= ? + OR ? < next_change# +GROUP BY + first_change# +ORDER BY + first_change# +``` +查询出来的数据示例: +
+ +
+注意: +如果Logminer的处理速度比Oracle产生数据速度快,那么理论上Flinkx只需要加载日志组文件不需要加载归档日志文件,而Logminer加载文件会比较消耗资源,所以会先进行RedoLog文件的查找,如果本次查找的文件和上次的没有区别,说明Logminer不需要加载新的日志文件,只需要重新再从视图里查询数据即可 + + +### 2. 加载文件到Logminer +通过一个存储过程 查询到日志文件之后 加载到Logminer里 并开启Logminer +```sql +DECLARE + st BOOLEAN := true; + start_scn NUMBER := ?; +BEGIN + FOR l_log_rec IN ( + SELECT + MIN(name) name, + first_change# + FROM + ( + SELECT + MIN(member) AS name, + first_change#, + 281474976710655 AS next_change# + FROM + v$log l + INNER JOIN v$logfile f ON l.group# = f.group# + WHERE l.STATUS = 'CURRENT' OR l.STATUS = 'ACTIVE' + GROUP BY + first_change# + UNION + SELECT + name, + first_change#, + next_change# + FROM + v$archived_log + WHERE + name IS NOT NULL + ) + WHERE + first_change# >= start_scn + OR start_scn < next_change# + GROUP BY + first_change# + ORDER BY + first_change# + ) LOOP IF st THEN + SYS.DBMS_LOGMNR.add_logfile(l_log_rec.name, SYS.DBMS_LOGMNR.new); + st := false; + ELSE + SYS.DBMS_LOGMNR.add_logfile(l_log_rec.name); + END IF; + END LOOP; + + SYS.DBMS_LOGMNR.start_logmnr( options => SYS.DBMS_LOGMNR.skip_corruption + SYS.DBMS_LOGMNR.no_sql_delimiter + SYS.DBMS_LOGMNR.no_rowid_in_stmt + + SYS.DBMS_LOGMNR.dict_from_online_catalog + SYS.DBMS_LOGMNR.string_literals_in_stmt ); +END; +``` +### 3. 查询数据 +```sql +SELECT + scn, + timestamp, + operation, + seg_owner, + table_name, + sql_redo, + row_id, + csf +FROM + v$logmnr_contents +WHERE + scn > ? +``` + + +Flinkx就是在一个循环里 执行上述sql语句查询数据。 查询日志文件,加载到logminer,开启logminer,读取数据,更新当前最新SCN号,当数据读取完毕,代表本次加载的日志文件加载完了,通过SCN号寻找后续日志文件,重复上述操作 + +
+ +
+ + +从 v$logmnr_contents获取到数据之后,Flinkx 使用 net.sf.jsqlparser.parser.CCJSqlParserUtil 来解析 sql_redo 值 +获取到的sql_redo语句格式示例: +```json +insert into "TUDOU"."CDC"("ID","USER_ID","NAME","date1") values ('19','1','b',TO_DATE('2021-01-29 11:25:50', 'YYYY-MM-DD HH24:MI:SS')) +``` +使用net.sf.jsqlparser.parser.CCJSqlParserUtil 解析之后,flinkx根据paving参数对数据进行操作, +当pavingData为true时,数据为 +```json +{ + "scn": 1977762, + "type": "INSERT", + "schema": "TUDOU", + "table": "CDC", + "ts": 6762187276702322688, + "opTime": "2021-01-29 11:52:02.0", + "after_ID": "19", + "after_USER_ID": "1", + "after_NAME": "b", + "after_date1": "2021-01-29 11:25:50" +} +``` +当paving为false时,数据为 +```json +{ + "message": { + "scn": 1977679, + "type": "INSERT", + "schema": "TUDOU", + "table": "CDC", + "ts": 6762186352151891968, + "opTime": "2021-01-29 11:52:02.0", + "before": {}, + "after": { + "ID": "19", + "USER_ID": "1", + "NAME": "b", + "date1": "2021-01-29 11:25:50" + } + } +} +``` + + +### Oracle10 和Oracle11的部分区别 + +1. v$LOGMNR_CONTENTS 里Oracle10 比 Oracle11 少了 commit_scn字段 +1. 日志组字段里没有next_change#字段 +1. 如果Sql里含有ToDate函数,Logminer10的sql_redo加载的是ToDate函数日期格式默认是DD-MON-RR格式,而Logminer11则是Todate函数执行后的值,所以Logminer10会在获取连接的时候,执行下列SQL,设置日期格式,FLinkx再对其进行正则匹配,替换得到最终的值。 +```sql + //修改当前会话的date日期格式 + public final static String SQL_ALTER_DATE_FORMAT ="ALTER SESSION SET NLS_DATE_FORMAT = 'YYYY-MM-DD HH24:MI:SS'"; + + //修改当前会话的timestamp日期格式 + public final static String NLS_TIMESTAMP_FORMAT ="ALTER SESSION SET NLS_TIMESTAMP_FORMAT = 'YYYY-MM-DD HH24:MI:SS.FF6'"; + +``` + + diff --git "a/docs/realTime/other/LogMiner\351\205\215\347\275\256.md" "b/docs/realTime/other/LogMiner\351\205\215\347\275\256.md" new file mode 100644 index 0000000000..519c7fe9ee --- /dev/null +++ "b/docs/realTime/other/LogMiner\351\205\215\347\275\256.md" @@ -0,0 +1,434 @@ +# Oracle配置LogMiner + +目录: + + +- [Oracle配置LogMiner](#oracle配置logminer) + - [一、Oracle 10g(单机版)](#一oracle-10g单机版) + - [1、查询Oracle版本信息,这里配置的是`Oracle 10g`](#1查询oracle版本信息这里配置的是oracle-10g) + - [2、通过命令行方式登录Oracle,查看是否开启日志归档](#2通过命令行方式登录oracle查看是否开启日志归档) + - [3、开启日志归档,开启日志归档需要重启数据库,请注意](#3开启日志归档开启日志归档需要重启数据库请注意) + - [a、配置归档日志保存的路径](#a配置归档日志保存的路径) + - [b、关闭数据库](#b关闭数据库) + - [c、开启日志归档](#c开启日志归档) + - [d、开启扩充日志](#d开启扩充日志) + - [e、开启数据库](#e开启数据库) + - [4、配置日志组](#4配置日志组) + - [a、查询默认日志组信息](#a查询默认日志组信息) + - [b、查询日志组储存路径](#b查询日志组储存路径) + - [c、新增日志组与删除原有日志组](#c新增日志组与删除原有日志组) + - [d、查询创建的日志组](#d查询创建的日志组) + - [5、检查是否安装LogMiner工具](#5检查是否安装logminer工具) + - [6、创建LogMiner角色并赋权](#6创建logminer角色并赋权) + - [7、创建LogMiner用户并赋权](#7创建logminer用户并赋权) + - [8、验证用户权限](#8验证用户权限) + - [二、Oracle 11g(单机版)](#二oracle-11g单机版) + - [1、查询Oracle版本信息,这里配置的是`Oracle 11g`](#1查询oracle版本信息这里配置的是oracle-11g) + - [2、通过命令行方式登录Oracle,查看是否开启日志归档](#2通过命令行方式登录oracle查看是否开启日志归档-1) + - [3、开启日志归档,开启日志归档需要重启数据库,请注意](#3开启日志归档开启日志归档需要重启数据库请注意-1) + - [a、配置归档日志保存的路径](#a配置归档日志保存的路径-1) + - [b、关闭数据库](#b关闭数据库-1) + - [c、开启日志归档](#c开启日志归档-1) + - [d、开启扩充日志](#d开启扩充日志-1) + - [e、开启数据库](#e开启数据库-1) + - [4、检查是否安装LogMiner工具](#4检查是否安装logminer工具) + - [5、创建LogMiner角色并赋权](#5创建logminer角色并赋权) + - [6、创建LogMiner用户并赋权](#6创建logminer用户并赋权) + - [7、验证用户权限](#7验证用户权限) + - [三、Oracle 12c(单机版非CBD)](#三oracle-12c单机版非cbd) + - [1、查询Oracle版本信息,这里配置的是`Oracle 12c`](#1查询oracle版本信息这里配置的是oracle-12c) + - [2、通过命令行方式登录Oracle,查看是否开启日志归档](#2通过命令行方式登录oracle查看是否开启日志归档-2) + - [3、开启日志归档,开启日志归档需要重启数据库,请注意](#3开启日志归档开启日志归档需要重启数据库请注意-2) + - [a、配置归档日志保存的路径](#a配置归档日志保存的路径-2) + - [b、关闭数据库](#b关闭数据库-2) + - [c、开启日志归档](#c开启日志归档-2) + - [d、开启扩充日志](#d开启扩充日志-2) + - [e、开启数据库](#e开启数据库-2) + - [4、创建LogMiner角色并赋权](#4创建logminer角色并赋权) + - [5、创建LogMiner用户并赋权](#5创建logminer用户并赋权) + - [6、验证用户权限](#6验证用户权限) + + + +注意: + +1、某个Oracle数据源能同时运行的任务数量取决于该Oracle的内存大小 + +2、若数据量太大导致日志组频繁切换需要增加日志组数量,增大单个日志组存储大小 + +## 一、Oracle 10g(单机版) +### 1、查询Oracle版本信息,这里配置的是`Oracle 10g` +```sql +--查看oracle版本 +select * from v$version; +``` +
+ +
+本章Oracle的版本如上图所示。 + + +### 2、通过命令行方式登录Oracle,查看是否开启日志归档 +```sql +--查询数据库归档模式 +archive log list; +``` +
+ +
+图中显示`No Archive Mode`表示未开启日志归档。 + + +### 3、开启日志归档,开启日志归档需要重启数据库,请注意 +#### a、配置归档日志保存的路径 +根据自身环境配置归档日志保存路径,需要提前创建相应目录及赋予相应访问权限 +```shell +# 创建归档日志保存目录 +mkdir -p /data/oracle/archivelog + +# 进入Oracle目录 +cd $ORACLE_HOME + +# 查看Oracle权限组,本章权限组如下图所示 +ls -l + +# 对归档日志保存目录赋予相应权限 +chown -R 下图中的用户名:下图中的组名 /data/oracle/ +``` +
+ +
+ +```sql +--配置归档日志保存的路径 +alter system set log_archive_dest_1='location=/data/oracle/archivelog' scope=spfile; +``` +#### b、关闭数据库 +```sql +shutdown immediate; +startup mount; +``` +#### c、开启日志归档 +```sql +--开启日志归档 +alter database archivelog; +``` +#### d、开启扩充日志 +```sql +--开启扩充日志 +alter database add supplemental log data (all) columns; +``` +#### e、开启数据库 +```sql +alter database open; +``` +再次查询数据库归档模式,`Archive Mode`表示已开启归档模式,`Archive destination`表示归档日志储存路径。 +
+ +
+ +### 4、配置日志组 +#### a、查询默认日志组信息 +```sql +SELECT * FROM v$log; +``` +
+ +
+ +如上图所示,日志组的默认数量为2组,大小为4194304/1024/1024 = 4MB,这意味着日志大小每达到4MB就会进行日志组的切换,切换太过频繁会导致查询出错,因此需要增加日志组数量及大小。 +#### b、查询日志组储存路径 +```sql +SELECT * FROM v$logfile; +``` +
+ +
+ +如上图所示,默认路径为`/usr/lib/oracle/xe/app/oracle/flash_recovery_area/XE/onlinelog/`。 +#### c、新增日志组与删除原有日志组 +请与DBA联系,决定是否可以删除原有日志组。 +```sql +--增加两组日志组 +alter database add logfile group 3 ('/usr/lib/oracle/xe/app/oracle/flash_recovery_area/XE/onlinelog/redo3.log') size 200m; +alter database add logfile group 4 ('/usr/lib/oracle/xe/app/oracle/flash_recovery_area/XE/onlinelog/redo4.log') size 200m; +``` +```sql +--删除原有两组日志组,并继续新增两组日志组 +alter system checkpoint; +alter system switch logfile; +alter database drop logfile group 1; +alter database drop logfile group 2; +alter database add logfile group 1 ('/usr/lib/oracle/xe/app/oracle/flash_recovery_area/XE/onlinelog/redo1.log') size 200m; +alter database add logfile group 2 ('/usr/lib/oracle/xe/app/oracle/flash_recovery_area/XE/onlinelog/redo2.log') size 200m; +``` +#### d、查询创建的日志组 +```sql +SELECT * FROM v$log; +SELECT * FROM v$logfile; +``` +
+ +
+ +
+ +
+ +### 5、检查是否安装LogMiner工具 +Oracle10g默认已安装LogMiner工具包,通过以下命令查询: +```sql +desc DBMS_LOGMNR; +desc DBMS_LOGMNR_D; +``` +若无信息打印,则执行下列SQL初始化LogMiner工具包: +```sql +@$ORACLE_HOME/rdbms/admin/dbmslm.sql; +@$ORACLE_HOME/rdbms/admin/dbmslmd.sql; +``` + + +### 6、创建LogMiner角色并赋权 +其中`roma_logminer_privs`为角色名称,可根据自身需求修改。 +```sql +create role roma_logminer_privs; +grant create session,execute_catalog_role,select any transaction,flashback any table,select any table,lock any table,select any dictionary to roma_logminer_privs; +grant select on SYSTEM.LOGMNR_COL$ to roma_logminer_privs; +grant select on SYSTEM.LOGMNR_OBJ$ to roma_logminer_privs; +grant select on SYSTEM.LOGMNR_USER$ to roma_logminer_privs; +grant select on SYSTEM.LOGMNR_UID$ to roma_logminer_privs; +grant select_catalog_role to roma_logminer_privs; +``` + + +### 7、创建LogMiner用户并赋权 +其中`roma_logminer`为用户名,`password`为密码,请根据自身需求修改。 +```sql +create user roma_logminer identified by password default tablespace users; +grant roma_logminer_privs to roma_logminer; +grant execute_catalog_role to roma_logminer; +alter user roma_logminer quota unlimited on users; +``` + + +### 8、验证用户权限 +以创建的LogMiner用户登录Oracle数据库,执行以下SQL查询权限,结果如图所示: +```sql + SELECT * FROM USER_ROLE_PRIVS; +``` +
+ +
+ +```sql +SELECT * FROM SESSION_PRIVS; +``` +
+ +
+ +至此,Oracle 10g数据库LogMiner实时采集配置完毕。 + + +## 二、Oracle 11g(单机版) +### 1、查询Oracle版本信息,这里配置的是`Oracle 11g` +```sql +--查看oracle版本 +select * from v$version; +``` +
+ +
+本章Oracle的版本如上图所示。 + + +### 2、通过命令行方式登录Oracle,查看是否开启日志归档 +```sql +--查询数据库归档模式 +archive log list; +``` +
+ +
+图中显示`No Archive Mode`表示未开启日志归档。 + + +### 3、开启日志归档,开启日志归档需要重启数据库,请注意 +#### a、配置归档日志保存的路径 +根据自身环境配置归档日志保存路径,需要提前创建相应目录及赋予相应访问权限 +```sql + alter system set log_archive_dest_1='location=/data/oracle/archivelog' scope=spfile; +``` +#### b、关闭数据库 +```sql +shutdown immediate; +startup mount; +``` +#### c、开启日志归档 +```sql +--开启日志归档 +alter database archivelog; +``` +#### d、开启扩充日志 +```sql +--开启扩充日志 +alter database add supplemental log data (all) columns; +``` +#### e、开启数据库 +```sql +alter database open; +``` +再次查询数据库归档模式,`Archive Mode`表示已开启归档模式,`Archive destination`表示归档日志储存路径。 +
+ +
+ +### 4、检查是否安装LogMiner工具 +Oracle11g默认已安装LogMiner工具包,通过以下命令查询: +```sql +desc DBMS_LOGMNR; +desc DBMS_LOGMNR_D; +``` +若无信息打印,则执行下列SQL初始化LogMiner工具包: +```sql +@$ORACLE_HOME/rdbms/admin/dbmslm.sql; +@$ORACLE_HOME/rdbms/admin/dbmslmd.sql; +``` + + +### 5、创建LogMiner角色并赋权 +其中`roma_logminer_privs`为角色名称,可根据自身需求修改。 +```sql +create role roma_logminer_privs; +grant create session,execute_catalog_role,select any transaction,flashback any table,select any table,lock any table,select any dictionary to roma_logminer_privs; +grant select on SYSTEM.LOGMNR_COL$ to roma_logminer_privs; +grant select on SYSTEM.LOGMNR_OBJ$ to roma_logminer_privs; +grant select on SYSTEM.LOGMNR_USER$ to roma_logminer_privs; +grant select on SYSTEM.LOGMNR_UID$ to roma_logminer_privs; +grant select_catalog_role to roma_logminer_privs; +``` + + +### 6、创建LogMiner用户并赋权 +其中`roma_logminer`为用户名,`password`为密码,请根据自身需求修改。 +```sql +create user roma_logminer identified by password default tablespace users; +grant roma_logminer_privs to roma_logminer; +grant execute_catalog_role to roma_logminer; +alter user roma_logminer quota unlimited on users; +``` + + +### 7、验证用户权限 +以创建的LogMiner用户登录Oracle数据库,执行以下SQL查询权限,结果如图所示: +```sql + SELECT * FROM USER_ROLE_PRIVS; +``` +
+ +
+ +```sql +SELECT * FROM SESSION_PRIVS; +``` +
+ +
+ +至此,Oracle 11g数据库LogMiner实时采集配置完毕。 + + +## 三、Oracle 12c(单机版非CBD) +### 1、查询Oracle版本信息,这里配置的是`Oracle 12c` +```sql +--查看oracle版本 +select BANNER from v$version; +``` +
+ +
+本章Oracle的版本如上图所示。 + + +### 2、通过命令行方式登录Oracle,查看是否开启日志归档 +```sql +--查询数据库归档模式 +archive log list; +``` +
+ +
+图中显示`No Archive Mode`表示未开启日志归档。 + + +### 3、开启日志归档,开启日志归档需要重启数据库,请注意 +#### a、配置归档日志保存的路径 +根据自身环境配置归档日志保存路径,需要提前创建相应目录及赋予相应访问权限 +```sql + alter system set log_archive_dest_1='location=/data/oracle/archivelog' scope=spfile; +``` +#### b、关闭数据库 +```sql +shutdown immediate; +startup mount; +``` +#### c、开启日志归档 +```sql +--开启日志归档 +alter database archivelog; +``` +#### d、开启扩充日志 +```sql +--开启扩充日志 +alter database add supplemental log data (all) columns; +``` +#### e、开启数据库 +```sql +alter database open; +``` +再次查询数据库归档模式,`Archive Mode`表示已开启归档模式,`Archive destination`表示归档日志储存路径。 +
+ +
+ +### 4、创建LogMiner角色并赋权 +其中`roma_logminer_privs`为角色名称,可根据自身需求修改。 +```sql +create role roma_logminer_privs; +grant create session,execute_catalog_role,select any transaction,flashback any table,select any table,lock any table,logmining,select any dictionary to roma_logminer_privs; +grant select on SYSTEM.LOGMNR_COL$ to roma_logminer_privs; +grant select on SYSTEM.LOGMNR_OBJ$ to roma_logminer_privs; +grant select on SYSTEM.LOGMNR_USER$ to roma_logminer_privs; +grant select on SYSTEM.LOGMNR_UID$ to roma_logminer_privs; +grant select_catalog_role to roma_logminer_privs; +grant LOGMINING to roma_logminer_privs; +``` + + +### 5、创建LogMiner用户并赋权 +其中`roma_logminer`为用户名,`password`为密码,请根据自身需求修改。 +```sql +create user roma_logminer identified by password default tablespace users; +grant roma_logminer_privs to roma_logminer; +grant execute_catalog_role to roma_logminer; +alter user roma_logminer quota unlimited on users; +``` + + +### 6、验证用户权限 +以创建的LogMiner用户登录Oracle数据库,执行以下SQL查询权限,结果如图所示: +```sql + SELECT * FROM USER_ROLE_PRIVS; +``` +
+ +
+ +```sql +SELECT * FROM SESSION_PRIVS; +``` +
+ +
+ +至此,Oracle 12c数据库LogMiner实时采集配置完毕。 diff --git "a/docs/realTime/other/PgWal\345\216\237\347\220\206\345\217\212\351\205\215\347\275\256.md" "b/docs/realTime/other/PgWal\345\216\237\347\220\206\345\217\212\351\205\215\347\275\256.md" new file mode 100644 index 0000000000..13eb4ea428 --- /dev/null +++ "b/docs/realTime/other/PgWal\345\216\237\347\220\206\345\217\212\351\205\215\347\275\256.md" @@ -0,0 +1,237 @@ +# FlinkX PostgreSQL WAL实时采集基本原理 + + + +- [FlinkX PostgreSQL WAL实时采集基本原理](#flinkx-postgresql-wal实时采集基本原理) + - [版本限制](#版本限制) + - [主要涉及模块说明](#主要涉及模块说明) + - [逻辑复制](#逻辑复制) + - [创建发布](#创建发布) + - [WAL日志](#wal日志) + - [WAL何时被写入](#wal何时被写入) + - [WAL主要配置](#wal主要配置) + - [复制槽](#复制槽) + - [局限性](#局限性) + - [FlinkX PostgreSQL WAL实时采集配置](#flinkx-postgresql-wal实时采集配置) + - [postgresql.conf设置](#postgresqlconf设置) + - [部分核心代码分析](#部分核心代码分析) + - [执行发布SQL](#执行发布sql) + - [创建一个逻辑复制流](#创建一个逻辑复制流) + - [业务处理](#业务处理) + + + +
+ +PostgreSQL 实时采集是基于 PostgreSQL的逻辑复制以及逻辑解码功能来完成的。逻辑复制同步数据的原理是,在wal日志产生的数据库上,由逻辑解析模块对wal日志进行初步的解析,它的解析结果为ReorderBufferChange(可以简单理解为HeapTupleData),再由pgoutput plugin对中间结果进行过滤和消息化拼接后,然后将其发送到订阅端,订阅端通过逻辑解码功能进行解析。 + +## 版本限制 +逻辑复制是pgsql10.0版本之后才支持的,因此此方案只支持10.0之后版本 + + +## 主要涉及模块说明 +| Logical Decoding | PostgreSQL 的逻辑日志来源于解析物理 WAL 日志。
解析 WAL 成为逻辑数据的过程叫 Logical Decoding。 | +| :--- | :--- | +| Replication Slots | 保存逻辑或物理流复制的基础信息。类似 Mysql 的位点信息。
一个 逻辑 slot 创建后,它的相关信息可以通过 pg_replication_slots 系统视图获取。
如果它在 active 状态,则可以通过系统视图 pg_stat_replication 看到一些 slot 的实时的状态信息。 | +| Output Plugins | PostgreSQL 的逻辑流复制协议开放一组可编程接口,用于自定义输数据到客户端的逻辑数据的格式。
这部分实现使用插件的方式被内核集成和使用,称作 Output Plugins。 | +| Exported Snapshots | 当一个逻辑流复制 slot 被创建时,系统会产生一个快照。客户端可以通过它订阅到数据库任意时间点的数据变化。 | + + + +对于修改一条数据之后 ,pgsql订阅端decode解析后的数据格式为 +```json +{"id":"schema1.test1", + "schema":"schema1", +"table":"test1", + "columnList":[ + {"name":"id","type":"int4","index":0}, + {"name":"name","type":"varchar","index":1} + ], + "oldData":["2","23"], + "newData":["2","name1"], + "type":"UPDATE", + "currentLsn":23940928, + "ts":1596358573614 +} +``` +主要包含schema table以及类型`INSERT`, `UPDATE`和`DELETE`以及WAL日志id等相关信息
+
+ + +## 逻辑复制 +逻辑复制使用_发布_和_订阅_模型, 其中一个或多个_订阅者_订阅_发布者_ 节点上的一个或多个_发布_。 订阅者从他们订阅的发布中提取数据,逻辑复制是根据复制标识(通常是主键)复制数据对象及其更改的一种方法,因此在上面订阅端收到消息数据实例中可以发现 具备数据库以及表信息外 还具备修改前数据,修改后数据信息以及执行的type和对应的WAL日志ID + +发布可以选择将它们所产生的改变限制在`INSERT`, `UPDATE`和`DELETE`的任意组合上, 类似于触发器被特定事件类型触发。默认情况下,复制所有操作类型。
已发布的table必须配置一个“副本标识”以便能够复制 `UPDATE`和`DELETE`操作, 这样可以在订阅者端识别适当的行来更新或删除。默认情况下,这是主键, 如果有的话。另外唯一的索引(有一些额外的要求)也可以被设置为副本标识。 如果表没有任何合适的键,那么它可以设置为复制标识“full”, 这意味着整个行成为键。但是,这是非常低效的, 并且只能在没有其他可能的解决方案时用作后备
+ + +## 创建发布 +为哪些表设置创建一个发布 +```sql +CREATE PUBLICATION name + [ FOR TABLE [ ONLY ] table_name [ * ] [, ...] + | FOR ALL TABLES ] + [ WITH ( publication_parameter [= value] [, ... ] ) ] +``` + + + +## WAL日志 +WAL 是 Write Ahead Log的缩写,中文称之为预写式日志。WAL log也被简称为xlog,每一次change操作都是先写日志再写数据,保证了事务持久性和数据完整性同时又尽量地避免了频繁IO对性能的影响。WAL的中心概念是**数据文件(存储着表和索引)的修改必须在这些动作被日志记录之后才被写入**
WAL日志保存在pg_xlog下,每个xlog文件默认是16MB,为了满足恢复需求,在xlog目录下会产生多个WAL日志,不需要的WAL日志将会被覆盖
WAL具备归档功能,通过归档的WAL文件可以恢复数据库到WAL日志覆盖时间内的任意一个时间点的状态并且有了WAL日志之后,逻辑复制就可以在WAL日志生成之后,对其进行一系列操作之后传递给订阅客户端,使得订阅客户端能实时获取到源服务器上的修改数据
+ + +### WAL何时被写入 +WAL也有个内存缓冲区WAL Buffer,WAL都是先写入缓存中,对于事务操作,缓存的WAL日志是在事务提交的时候写入磁盘的,对于非事务型的由一个异步线程追加进日志文件或者在checkPoint(数据脏页缓存写入磁盘需要先刷新WAL缓存)的时候写入。
+ + +### WAL主要配置 +``` +wal_level 可以选择为minimal, replica, or logical 使用逻辑复制需要设置为logical + +fsync boolean类型 表示是否使用fsync()系统调用把WAL文件刷新到物理磁盘,确保数据库在操作系统或硬件奔溃的情况下可恢复到最终状态 默认是on + +synchronous_commit boolean类型 声明提交一个事务是否需要等待其把WAL日志写入磁盘后再返回,默认值是’on’ + +on:默认值,为on且没有开启同步备库的时候,会当wal日志真正刷新到磁盘永久存储后才会返回客户端事务已提交成功, + 当为on且开启了同步备库的时候(设置了synchronous_standby_names),必须要等事务日志刷新到本地磁盘,并且还要等远程备库也提交到磁盘才能返回客户端已经提交. + +remote_apply:提交将等待, 直到来自当前同步备用数据库的回复表明它们已收到事务的提交记录并应用它, 以便它对备用数据库上的查询可见。 + +remote_write:提交将等待,直到来自当前同步的后备服务器的一个回复指示该服务器已经收到了该事务的提交记录并且已经把该记录写出到后备服务器的操作系统。 + +local:当事务提交时,仅写入本地磁盘即可返回客户端事务提交成功,而不管是否有同步备库。 + +off:写到缓存中就会向客户端返回提交成功,但也不是一直不刷到磁盘,延迟写入磁盘,延迟的时间为最大3倍的wal_writer_delay参数的(默认200ms)的时间,所有如果即使关闭synchronous_commit,也只会造成最多600ms的事务丢失 可能会造成一些最近已提交的事务丢失,但数据库状态是一致的,就像这些事务已经被干净地中止。但对高并发的小事务系统来说,性能来说提升较大。 + + +wal_sync_method enum类型 用来指定向磁盘强制更新WAL日志数据的方法open_datasync fdatasync fsync_writethrough fsync open_sync + + + +Wal_writer_delay 指定wal writer process 把WAL日志写入磁盘的周期 在每个周期中会先把缓存中的WAL日志刷到磁盘 + +``` + + + +## 复制槽 +每个订阅都将通过一个复制槽接收更改,记录某个订阅者的WAL接收情况。
在源数据库写入修改频繁导致WAL日志的写入速度很快,导致大量WAL日志生成,或者订阅者接受日志很慢,在消费远远小于生产的时候,会导致源数据库上的WAL日志还没有传递到备库就被回卷覆盖掉了,如果被覆盖掉的WAL日志文件又没有归档备份,那么订阅者就再也无法消费到此数据。
复制槽则保存了此订阅的接收信息,使得未被接收的WAL日日志不会被回收 + +注意
数据库会记录slot的wal复制位点,并在wal文件夹中保留所有未发送的wal文件,如果客户创建了slot但是后期不再使用就有可能导致数据库的wal日志爆仓,需要及时删除不用的slot
+
可通过以下SQL获取相关信息 +```sql +select * from pg_replication_slots; +``` +字段含义 +```text +Name Type References Description +slot_name name 复制槽的唯一的集群范围标识符 +plugin name 正在使用的包含逻辑槽输出插件的共享对象的基本名称,对于物理插槽则为null。 +slot_type text 插槽类型 - 物理或逻辑 +datoid oid 该插槽所关联的数据库的OID,或为空。 只有逻辑插槽才具有关联的数据库。 +database text 该插槽所关联的数据库的名称,或为空。 只有逻辑插槽才具有关联的数据库。 +active boolean 如果此插槽当前正在使用,则为真 +active_pid integer 如果当前正在使用插槽,则使用此插槽的会话的进程ID。 NULL如果不活动。 +xmin xid 此插槽需要数据库保留的最早事务。 VACUUM无法删除任何后来的事务删除的元组。 +catalog_xmin xid 影响该插槽需要数据库保留的系统目录的最早的事务。 VACUUM不能删除任何后来的事务删除的目录元组。 +restart_lsn pg_lsn 最老的WAL的地址(LSN)仍然可能是该插槽的使用者所需要的,因此在检查点期间不会被自动移除 +``` + + + +## 局限性 + +- 不复制数据库模式和DDL命令。初始模式可以使用`pg_dump --schema-only` 手动复制。后续的模式更改需要手动保持同步。(但是请注意, 两端的架构不需要完全相同。)当实时数据库中的模式定义更改时,逻辑复制是健壮的: 当模式在发布者上发生更改并且复制的数据开始到达订阅者但不符合表模式, 复制将错误,直到模式更新。在很多情况下, 间歇性错误可以通过首先将附加模式更改应用于订阅者来避免。
+- 不复制序列数据。序列支持的序列或标识列中的数据当然会作为表的一部分被复制, 但序列本身仍然会显示订阅者的起始值。如果订阅者被用作只读数据库, 那么这通常不成问题。但是,如果打算对订阅者数据库进行某种切换或故障切换, 则需要将序列更新为最新值,方法是从发布者复制当前数据 (可能使用`pg_dump`)或者从表中确定足够高的值。
+- 不复制`TRUNCATE`命令。当然,可以通过使用`DELETE` 来解决。为了避免意外的`TRUNCATE`调用,可以撤销表的 `TRUNCATE`权限。
+- 不复制大对象 没有什么解决办法,除非在普通表中存储数据。 +- 复制只能从基表到基表。也就是说,发布和订阅端的表必须是普通表,而不是视图, 物化视图,分区根表或外部表。对于分区,您可以一对一地复制分区层次结构, 但目前不能复制到不同的分区设置。尝试复制基表以外的表将导致错误 + + +
+ + +## FlinkX PostgreSQL WAL实时采集配置 + +### postgresql.conf设置 +``` +wal_level = logical +``` + + +用于复制链接的角色必须具有`REPLICATION`属性(或者是超级用户) 需要在pg_hba.conf做出如下配置 +``` +host replication all 10.0.3.0/24 md5 +``` + + +## 部分核心代码分析 + + + +### 执行发布SQL +逻辑复制流是发布/订阅模型,因此生成流之前 先进行发布 +```java +public static final String PUBLICATION_NAME = "dtstack_flinkx"; +public static final String CREATE_PUBLICATION = "CREATE PUBLICATION %s FOR ALL TABLES;"; +public static final String QUERY_PUBLICATION = "SELECT COUNT(1) FROM pg_publication WHERE pubname = '%s';"; + +先执行查找sql 判断是否存在 dtstack_flinkx 的 PUBLICATION +如果不存在 执行创建sql语句 +conn.createStatement() + .execute(String.format(CREATE_PUBLICATION, PUBLICATION_NAME)); +``` + + + +### 创建一个逻辑复制流 +```java + ChainedLogicalStreamBuilder builder = conn.getReplicationAPI() + .replicationStream() //定义一个逻辑复制流 + .logical() //级别是logical + .withSlotName(format.getSlotName())//复制槽名称 + //协议版本。当前仅支持版本1 + .withSlotOption("proto_version", "1")//槽版本号 + //逗号分隔的要订阅的发布名称列表(接收更改)。 单个发布名称被视为标准对象名称,并可根据需要引用 + .withSlotOption("publication_names", PgWalUtil.PUBLICATION_NAME)//关联的发布名称 + .withStatusInterval(format.getStatusInterval(), TimeUnit.MILLISECONDS); + long lsn = format.getStartLsn(); + if(lsn != 0){ + builder.withStartPosition(LogSequenceNumber.valueOf(lsn)); + } + stream = builder.start(); +``` + +### 业务处理 +逻辑复制流接收到订阅的消息后 进行编码 获取到相应信息处理 +```java + public void run() { + LOG.info("PgWalListener start running....."); + try { + init(); + while (format.isRunning()) { + //接收到流对象 + ByteBuffer buffer = stream.readPending(); + if (buffer == null) { + continue; + } + //解码为table对象 具体信息为库 表 字段信息 WAL id等 + //然后就可以对其进行处理了 + Table table = decoder.decode(buffer); + if(StringUtils.isBlank(table.getId())){ + continue; + } + String type = table.getType().name().toLowerCase(); + if(!cat.contains(type)){ + continue; + } + if(!tableSet.contains(table.getId())){ + continue; + } + LOG.trace("table = {}",gson.toJson(table)); + ............... + } + } + } +``` + +
\ No newline at end of file diff --git "a/docs/realTime/other/SqlserverCDC\345\216\237\347\220\206.md" "b/docs/realTime/other/SqlserverCDC\345\216\237\347\220\206.md" new file mode 100644 index 0000000000..acf1e21283 --- /dev/null +++ "b/docs/realTime/other/SqlserverCDC\345\216\237\347\220\206.md" @@ -0,0 +1,277 @@ +# SqlServer CDC实时采集原理 + + +- [SqlServer CDC实时采集原理](#sqlserver-cdc实时采集原理) +- [一、基础](#一基础) +- [二、配置](#二配置) +- [三、原理](#三原理) + - [1、SQL Server Agent](#1sql-server-agent) + - [2、数据库CDC开启前后对比](#2数据库cdc开启前后对比) + - [3、业务表CDC开启前后对比](#3业务表cdc开启前后对比) + - [4、采集原理](#4采集原理) + - [1、insert/delete](#1insertdelete) + - [2、update](#2update) + - [3、流程图](#3流程图) + - [4、数据格式](#4数据格式) + + +# 一、基础 +SqlServer官方从SqlServer 2008版本开始支持CDC,文档连接如下: +[https://docs.microsoft.com/zh-cn/sql/relational-databases/track-changes/about-change-data-capture-sql-server?view=sql-server-ver15](https://docs.microsoft.com/zh-cn/sql/relational-databases/track-changes/about-change-data-capture-sql-server?view=sql-server-ver15) + + +# 二、配置 +配置文档链接如下: +[SqlServer配置CDC](../other/SqlserverCDC配置.md) + +# 三、原理 +### 1、SQL Server Agent +SQL Server Agent代理服务,是sql server的一个标准服务,作用是代理执行所有sql的自动化任务,以及数据库事务性复制等无人值守任务。这个服务在默认安装情况下是停止状态,需要手动启动,或改为自动运动,否则sql的自动化任务都不会执行的,还要注意服务的启动帐户。 +简单的说就是启动了这个服务,捕获进程才会处理事务日志并将条目写入CDC表。 +[https://docs.microsoft.com/zh-cn/sql/ssms/agent/sql-server-agent?view=sql-server-ver15](https://docs.microsoft.com/zh-cn/sql/ssms/agent/sql-server-agent?view=sql-server-ver15) + + +### 2、数据库CDC开启前后对比 +开启前: +
+ +
+
+开启后: +EXEC sys.sp_cdc_enable_db; +
+ +
+
+ +我们首先观察dbo下新增了一张**systranschemas**表,**systranschemas**表用于跟踪事务发布和快照发布中发布的项目中的架构更改。 + +| 列名称 | 数据类型 | 说明 | +| --- | --- | --- | +| tabid | int | 标识发生了架构更改的表项目。 | +| startlsn 时发生 | binary | 架构更改开始时的 LSN 值。 | +| endlsn | binary | 架构更改结束时的 LSN 值。 | +| typeid | int | 架构更改的类型。 | + + + +数据库下新增了名为cdc的schema,其实也新增了cdc用户。cdc下新增了以下四张表: +
+**1、captured_columns** +为在捕获实例中跟踪的每一列返回一行。 默认情况下,将捕获源表中的所有列。 但是,如果为变更数据捕获启用了源表,则可以通过指定列列表将列包括在捕获范围内或排除在捕获范围之外。 +当没有任何业务表开启了CDC时,该表为空。 + +| 列名称 | 数据类型 | 说明 | +| --- | --- | --- | +| object_id | int | 捕获的列所属的更改表的 ID。 | +| column_name | sysname | 捕获的列的名称。 | +| column_id | int | 捕获的列在源表内的 ID。 | +| column_type | sysname | 捕获的列的类型。 | +| column_ordinal | int | 更改表中的列序号(从 1 开始)。 将排除更改表中的元数据列。 序号 1 将分配给捕获到的第一个列。 | +| is_computed | bit | 表示捕获到的列是源表中计算所得的列。 | + + + +**2、change_tables** +为数据库中的每个更改表返回一行。 对源表启用变更数据捕获时,将创建一个更改表。 +当没有任何业务表开启了CDC时,该表为空。 + +| 列名称 | 数据类型 | 说明 | +| --- | --- | --- | +| object_id | int | 更改表的 ID。 在数据库中是唯一的。 | +| version | int | 标识为仅供参考。 不支持。 不保证以后的兼容性。对于 SQL Server 2012 (11.x),此列始终返回 0。 | +| source_object_id | int | 为变更数据捕获启用的源表的 ID。 | +| capture_instance | sysname | 用于命名特定于实例的跟踪对象的捕获实例的名称。 默认情况下,该名称从源架构名称加上源表名称派生,格式 schemaname_sourcename。 | +| start_lsn | binary(10) | 日志序列号 (LSN),表示查询更改表中的更改数据时的低端点。NULL = 尚未建立低端点。 | +| end_lsn | binary(10) | 标识为仅供参考。 不支持。 不保证以后的兼容性。对于 SQL Server 2008,此列始终返回 NULL。 | +| supports_net_changes | bit | 对更改表启用了查询净更改支持。 | +| has_drop_pending | bit | 捕获进程收到关于源表已被删除的通知。 | +| role_name | sysname | 用于访问更改数据的数据库角色的名称。NULL = 未使用角色。 | +| index_name | sysname | 用于唯一标识源表中的行的索引名称。 index_name 为源表的主键索引的名称,或者在对源表启用了变更数据捕获时指定的唯一索引的名称。NULL = 在变更数据捕获启用时,源表无主键,且未指定唯一索引。注意:如果对具有主键的表启用了变更数据捕获,则不管是否启用了净更改,"变更数据捕获" 功能都将使用索引。 启用变更数据捕获之后,将不允许对主键进行修改。 如果该表没有主键,则仍可以启用变更数据捕获,但是只能将净更改设置为 False。 启用变更数据捕获之后,即可以创建主键。 由于变更数据捕获功能不使用主键,因此还可以修改主键。 | +| filegroup_name | sysname | 更改表所驻留的文件组的名称。 NULL = 更改表在数据库的默认文件组中。| +| create_date | datetime | 启用源表的日期。 | +| partition_switch | bit | 指示是否可以对启用了变更数据捕获的表执行 ALTER TABLE 的 SWITCH PARTITION 命令。 0 指示分区切换被阻止。 未分区表始终返回 1。 | + + + +**3、ddl_history** +为对启用了变更数据捕获的表所做的每一项数据定义语言 (DDL) 更改返回一行。 可以使用此表来确定源表发生 DDL 更改的时间以及更改的内容。 此表中不包含未发生 DDL 更改的源表的任何条目。 +当没有任何开启了CDC的业务表的表结构发生变更时,该表为空。 + +| 列名称 | 数据类型 | 说明 | +| --- | --- | --- | +| source_object_id | int | 应用 DDL 更改的源表的 ID。 | +| object_id | int | 与源表的捕获实例相关联的更改表的 ID。 | +| required_column_update | bit | 指示在源表中修改了捕获列的数据类型。 此修改改变了更改表中的列。 | +| ddl_command | nvarchar(max) | 应用于源表的 DDL 语句。 | +| ddl_lsn | binary(10) | 与 DDL 修改的提交相关联的日志序列号 (LSN)。 | +| ddl_time | datetime | 对源表所做的 DDL 更改的日期和时间。 | + + + +**4、index_columns** +为与更改表关联的每个索引列返回一行。 变更数据捕获使用这些索引列来唯一标识源表中的行。 默认情况下,将包括源表的主键列。 但是,如果在对源表启用变更数据捕获时指定了源表的唯一索引,则将改用该索引中的列。 如果启用净更改跟踪,则该源表需要主键或唯一索引。 +当没有任何开启了CDC的业务表存在存在索引列时,该表为空。 + +| 列名称 | 数据类型 | 说明 | +| --- | --- | --- | +| object_id | int | 更改表的 ID。 | +| column_name | sysname | 索引列的名称。 | +| index_ordinal | tinyint | 索引中的列序号(从 1 开始)。 | +| column_id | int | 源表中的列 ID。 | + + + +**5、lsn_time_mapping** +为每个在更改表中存在行的事务返回一行。 该表用于在日志序列号 (LSN) 提交值和提交事务的时间之间建立映射。 没有对应的更改表项的项也可以记录下来, 以便表在变更活动少或者无变更活动期间将 LSN 处理的完成过程记录下来。 + +| 列名称 | 数据类型 | 说明 | +| --- | --- | --- | +| start_lsn | binary(10) | 提交的事务的 LSN。 | +| tran_begin_time | datetime | 与 LSN 关联的事务开始的时间。 | +| tran_end_time | datetime | 事务结束的时间。 | +| tran_id | varbinary (10) | 事务的 ID。 | + + + +cdc下新增以下函数: +
+ +**1、fn_cdc_get_all_changes_** +为在指定日志序列号 (LSN) 范围内应用到源表的每项更改返回一行。 如果源行在该间隔内有多项更改,则每项更改都会表示在返回的结果集中。 除了返回更改数据外,四个元数据列还提供了将更改应用到另一个数据源所需的信息。 行筛选选项可控制元数据列的内容以及结果集中返回的行。 当指定“all”行筛选选项时,针对每项更改将只有一行来标识该更改。 当指定“all update old”选项时,更新操作会表示为两行:一行包含更新之前已捕获列的值,另一行包含更新之后已捕获列的值。此枚举函数是在对源表启用变更数据捕获时创建的。 函数名称是派生的,并使用 **cdc.fn_cdc_get_all_changes_**_capture_instance_ 格式,其中 _capture_instance_ 是在源表启用变更数据捕获时为捕获实例指定的值。 + +| 列名称 | 数据类型 | 说明 | +| --- | --- | --- | +| __$start_lsn | binary(10) | 与更改关联的提交 LSN,用于保留更改的提交顺序。 在同一事务中提交的更改将共享同一个提交 LSN 值。 | +| __$seqval | binary(10) | 用于对某事务内的行更改进行排序的序列值。 | +| __$operation | int | 标识将更改数据行应用到目标数据源所需的数据操作语言 (DML) 操作。 可以是以下值之一:
1 = 删除
2 = 插入
3 = 更新(捕获的列值是执行更新操作前的值)。 仅当指定了行筛选选项“all update old”时才应用此值。
4 = 更新(捕获的列值是执行更新操作后的值)。 | +| __$update_mask | varbinary(128) | 位掩码,为捕获实例标识的每个已捕获列均对应于一个位。 当 __ $ operation = 1 或2时,该值将所有已定义的位设置为1。 当 __ $ operation = 3 或4时,只有与更改的列相对应的位设置为1。 | +| \ | 多种多样 | 函数返回的其余列是在创建捕获实例时标识的已捕获列。 如果已捕获列的列表中未指定任何列,则将返回源表中的所有列。 | + + + +**2、fn_cdc_get_net_changes_** +为 (LSN) 范围内的指定日志序列号内的每个源行返回一个净更改行,返回格式跟上面一样。 + +### 3、业务表CDC开启前后对比 +开启前跟上一张图一致 + + +开启SQL: +```sql +sys.sp_cdc_enable_table +-- 表所属的架构名 +[ @source_schema = ] 'source_schema', + +-- 表名 +[ @source_name = ] 'source_name' , + +-- 是用于控制更改数据访问的数据库角色的名称 +[ @role_name = ] 'role_name'。 + +-- 是用于命名变更数据捕获对象的捕获实例的名称,这个名称在后面的存储过程和函数中需要经常用到。 +[,[ @capture_instance = ] 'capture_instance' ] + +-- 指示是否对此捕获实例启用净更改查询支持如果此表有主键,或者有已使用 @index_name 参数进行标识的唯一索引,则此参数的默认值为 1。否则,此参数默认为 0。 +[,[ @supports_net_changes = ] supports_net_changes ] + +-- 用于唯一标识源表中的行的唯一索引的名称。index_name 为 sysname,并且可以为 NULL。 +-- 如果指定,则 index_name 必须是源表的唯一有效索引。如果指定 index_name,则标识的索引列优先于任何定义的主键列,就像表的唯一行标识符一样。 +[,[ @index_name = ] 'index_name' ] + +-- 需要对哪些列进行捕获。captured_column_list 的数据类型为 nvarchar(max),并且可以为 NULL。如果为 NULL,则所有列都将包括在更改表中。 +[,[ @captured_column_list = ] 'captured_column_list' ] + +-- 是要用于为捕获实例创建的更改表的文件组。 +[,[ @filegroup_name = ] 'filegroup_name' ] + +-- 指示是否可以对启用了变更数据捕获的表执行 ALTER TABLE 的 SWITCH PARTITION 命令。 +-- allow_partition_switch 为 bit,默认值为 1。 +[,[ @partition_switch = ] 'partition_switch' ] +``` +开启后: + +
+ +
+
+此时,cdc下新增了一张名为dbo_kudu_CT的表,对于任意开启CDC的业务表而言,都会在其对应的cdc schema下创建一张格式为${schema}_${table}_CT的表。 + +**1、dbo_kudu_CT:** +对源表启用变更数据捕获时创建的更改表。 该表为对源表执行的每个插入和删除操作返回一行,为对源表执行的每个更新操作返回两行。 如果在启用源表时未指定更改表的名称,则会使用一个派生的名称。 名称的格式为 cdc。capture_instance _CT 其中 capture_instance 是源表的架构名称和格式 schema_table 的源表名称。 例如,如果对 AdventureWorks 示例数据库中的表 Person 启用了变更数据捕获,则派生的更改表名称将 cdc.Person_Address_CT。 + +| 列名称 | 数据类型 | 说明 | +| --- | --- | --- | +| __$start_lsn | binary(10) | 与相应更改的提交事务关联的日志序列号 (LSN)。在同一事务中提交的所有更改将共享同一个提交 LSN。 例如,如果对源表的 delete 操作删除两行,则更改表将包含两行,每行都具有相同的 __ $ start_lsn 值。 | +| __ $ end_lsn | binary(10) | 标识为仅供参考。 不支持。 不保证以后的兼容性。在 SQL Server 2012 (11.x) 中,此列始终为 NULL。 | +| __$seqval | binary(10) | 用于对事务内的行更改进行排序的序列值。 | +| __$operation | int | 标识与相应更改关联的数据操作语言 (DML) 操作。 可以是以下值之一:
1 = 删除
2 = 插入
3 = 更新(旧值)列数据中具有执行更新语句之前的行值。
4 = 更新(新值)列数据中具有执行更新语句之后的行值。 | +| __$update_mask | varbinary(128) | 基于更改表的列序号的位掩码,用于标识那些发生更改的列。 | +| \ | 多种多样 | 更改表中的其余列是在创建捕获实例时源表中标识为已捕获列的那些列。 如果已捕获列的列表中未指定任何列,则源表中的所有列将包括在此表中。 | +| __ $ command_id | int | 跟踪事务中的操作顺序。 | + + + +**2、captured_columns:** +
+ +
+
+ +**3、change_tables:** + +
+ +
+
+ + +### 4、采集原理 +#### 1、insert/delete +对于insert和delete类型的数据变更,对于每一行变更都会在对应的${schema}_${table}_CT表中增加一行记录。对于insert,id,user_id,name记录的是insert之后的value值;对于delete,id,user_id,name记录的是delete之前的value值; +
+ +
+
+ +#### 2、update +a、更新了主键 +此时,SqlServer数据库的做法是在同一事物内,先将原来的记录删除,然后再重新插入。 +执行如下SQL,日志表如图所示: +UPDATE [dbo].[kudu] SET [id] = 2, [user_id] = '2', [name] = 'b' WHERE [id] = 1; +
+ +
+
+b、未更新主键 +此时,SqlServer数据库的做法是直接更新字段信息。 +执行如下SQL,日志表如图所示: +UPDATE [dbo].[kudu] SET [user_id] = '3', [name] = 'c' WHERE [id] = 2; + +
+ +
+
+ + +#### 3、流程图 + + +
+ +
+
+对于FlinkX SqlServer CDC实时采集插件,其基本原理便是以轮询的方式,循环调用fn_cdc_get_all_changes_函数,获取上次结束时的lsn与当前数据库最大lsn值之间的数据。对于insert/delete类型的数据获取并解析一行,对于update类型获取并解析两行。解析完成后把数据传递到下游并记录当前解析到的数据的lsn,为下次轮询做准备。 + +#### 4、数据格式 +```json +{ + "type":"update", + "schema":"dbo", + "table":"tb1", + "lsn":"00000032:00002038:0005", + "ts": 6760525407742726144, + "before_id":1, + "after_id":2 +} +``` diff --git "a/docs/realTime/other/SqlserverCDC\351\205\215\347\275\256.md" "b/docs/realTime/other/SqlserverCDC\351\205\215\347\275\256.md" new file mode 100644 index 0000000000..cee0058106 --- /dev/null +++ "b/docs/realTime/other/SqlserverCDC\351\205\215\347\275\256.md" @@ -0,0 +1,128 @@ +# SqlServer配置CDC + + +- [SqlServer配置CDC](#sqlserver配置cdc) + - [1、查询SqlServer数据库版本](#1查询sqlserver数据库版本) + - [2、查询当前用户权限,必须为 sysadmin 固定服务器角色的成员才允许对数据库启用CDC(变更数据捕获)功能](#2查询当前用户权限必须为 sysadmin 固定服务器角色的成员才允许对数据库启用cdc变更数据捕获功能) + - [3、查询数据库是否已经启用CDC(变更数据捕获)功能](#3查询数据库是否已经启用cdc变更数据捕获功能) + - [4、对数据库数据库启用CDC(变更数据捕获)功能](#4对数据库数据库启用cdc变更数据捕获功能) + - [5、查询表是否已经启用CDC(变更数据捕获)功能](#5查询表是否已经启用cdc变更数据捕获功能) + - [6、对表启用CDC(变更数据捕获)功能](#6对表启用cdc变更数据捕获功能) + - [7、确认CDC agent 是否正常启动](#7确认cdc-agent-是否正常启动) + +注:SqlServer自2008版本开始支持CDC(变更数据捕获)功能,本文基于SqlServer 2017编写。 + + + +#### 1、查询SqlServer数据库版本 +SQL:`SELECT @@VERSION` +结果: +
+ +
+ + +#### 2、查询当前用户权限,必须为 sysadmin 固定服务器角色的成员才允许对数据库启用CDC(变更数据捕获)功能 +SQL:`exec sp_helpsrvrolemember 'sysadmin'` +结果: +
+ +
+ + +#### 3、查询数据库是否已经启用CDC(变更数据捕获)功能 +SQL:`select is_cdc_enabled, name from sys.databases where name = 'tudou'` +结果: +
+ +
+
+0:未启用;1:启用 + + +#### 4、对数据库数据库启用CDC(变更数据捕获)功能 +SQL: +```sql +USE tudou +GO +EXEC sys.sp_cdc_enable_db +GO +``` + + +重复第三步操作,确认数据库已经启用CDC(变更数据捕获)功能。 + +
+ +
+ + +#### 5、查询表是否已经启用CDC(变更数据捕获)功能 +SQL:`select name,is_tracked_by_cdc from sys.tables where name = 'test';` +结果: +
+ +
+0:未启用;1:启用 + + +#### 6、对表启用CDC(变更数据捕获)功能     +SQL: +```sql +EXEC sys.sp_cdc_enable_table +@source_schema = 'dbo', +@source_name = 'test', +@role_name = NULL, +@supports_net_changes = 0; +``` +source_schema:表所在的schema名称 +source_name:表名 +role_name:访问控制角色名称,此处为null不设置访问控制 +supports_net_changes:是否为捕获实例生成一个净更改函数,0:否;1:是 + + +重复第五步操作,确认表已经启用CDC(变更数据捕获)功能。 +
+ +
+ +至此,表`test`启动CDC(变更数据捕获)功能配置完成。 + +#### 7、确认CDC agent 是否正常启动 +```sql +EXEC master.dbo.xp_servicecontrol N'QUERYSTATE', N'SQLSERVERAGENT' +``` +
+ +
+如显示上图状态,需要启动对应的agent. + +**Windows 环境操作开启 CDC agent** +点击 下图位置代理开启 +
+ +
+ +**重新启动数据库** +
+ +
+ +**再次查询agent 状态,确认状态变更为running** +
+ +
+ +至此,表`test`启动CDC(变更数据捕获)功能配置完成。 + +**docker 环境操作开启 CDC agent** + +**开启mssql-server的代理服务**_ +```shell +docker exec -it sqlserver bash +/opt/mssql/bin/mssql-conf set sqlagent.enabled true +docker stop sqlserver +docker start sqlserver +``` + +参考阅读:[https://docs.microsoft.com/zh-cn/sql/relational-databases/track-changes/enable-and-disable-change-data-capture-sql-server?view=sql-server-2017](https://docs.microsoft.com/zh-cn/sql/relational-databases/track-changes/enable-and-disable-change-data-capture-sql-server?view=sql-server-2017) diff --git a/docs/realTime/reader/LogMiner.md b/docs/realTime/reader/LogMiner.md new file mode 100644 index 0000000000..8e159692e3 --- /dev/null +++ b/docs/realTime/reader/LogMiner.md @@ -0,0 +1,236 @@ +# Oracle LogMiner Reader + + + +- [Oracle LogMiner Reader](#oracle-logminer-reader) + - [一、插件名称](#一插件名称) + - [二、支持的数据源版本](#二支持的数据源版本) + - [三、数据库配置](#三数据库配置) + - [四、基本原理](#四基本原理) + - [五、参数说明](#五参数说明) + - [六、配置示例](#六配置示例) + + + +
+ +## 一、插件名称 +名称:**oraclelogminerreader** + +
+ +## 二、支持的数据源版本 +**支持Oracle 10,Oracle 11以及Oracle12单机版,不支持RAC模式,暂不支持Oracle18、Oracle19** + +
+ +## 三、数据库配置 +[Oracle配置LogMiner](../other/LogMiner配置.md) + +
+ +## 四、基本原理 +[FlinkX Oracle LogMiner实时采集基本原理](../other/LogMiner原理.md) + +
+ +## 五、参数说明 + +- **jdbcUrl** + - 描述:Oracle数据库的JDBC URL链接 + - 必选:是 + - 字段类型:String + - 默认值:无 + +
+ +- **username** + - 描述: 用户名 + - 必选:是 + - 字段类型:String + - 默认值:无 + +
+ +- **password** + - 描述: 密码 + - 必选:是 + - 字段类型:String + - 默认值:无 + +
+ +- **table** + - 描述: 需要监听的表,格式为:schema.table,多个以,分割,schema不能配置为\*,但table可以配置\*监听指定库下所有的表,如:schema1.table1,schema1.table2,schema2.\* + - 必选:否,不配置则监听除`SYS`库以外的所有库的所有表变更信息 + - 字段类型:String + - 默认值:无 + +
+ +- **cat** + - 描述:需要监听的操作数据操作类型,有UPDATE,INSERT,DELETE三种可选,大小写不敏感,多个以,分割 + - 必选:否 + - 字段类型:String + - 默认值:UPDATE,INSERT,DELETE + +
+ +- **readPosition** + - 描述:Oracle实时采集的采集起点 + - 可选值: + - all: 从Oracle数据库中最早的归档日志组开始采集(不建议使用) + - current:从任务运行时开始采集 + - time: 从指定时间点开始采集 + - scn: 从指定SCN号处开始采集 + - 必选:否 + - 字段类型:String + - 默认值:current + +
+ +- **startTime** + - 描述: 指定采集起点的毫秒级时间戳 + - 必选:当`readPosition`为`time`时,该参数必填 + - 字段类型:Long(毫秒级时间戳) + - 默认值:无 + +
+ +- **startSCN** + - 描述: 指定采集起点的SCN号 + - 必选:当`readPosition`为`scn`时,该参数必填 + - 字段类型:String + - 默认值:无 + +
+ +- **fetchSize** + - 描述: 批量从v$logmnr_contents视图中拉取的数据条数,对于大数据量的数据变更,调大该值可一定程度上增加任务的读取速度 + - 必选:否 + - 字段类型:Integer + - 默认值:1000 + +
+ +- **queryTimeout** + - 描述: LogMiner执行查询SQL的超时参数,单位秒 + - 必选:否 + - 字段类型:Long + - 默认值:300 + +
+ +- **supportAutoAddLog** + - 描述:启动LogMiner是否自动添加日志组 + - 必选:否 + - 字段类型:Boolean + - 默认值:false + +
+ +- **pavingData** + - 描述:是否将解析出的json数据拍平 + - 必选:否 + - 字段类型:String + - 默认值:false(一般配置成true比较好) + - 示例:假设解析的表为CDC,数据库schema为TUDOU,对CDC中的NAME字段做update操作,NAME原来的值为a,更新后为b,则pavingData为true时数据格式为: + + ```json + { + "scn": 1807399, + "type": "UPDATE", + "schema": "TUDOU", + "table": "CDC", + "ts": 6760525407742726144, + "opTime": "2021-01-28 11:52:02.0", + "after_NAME": "b", + "after_ID": "1", + "after_USER_ID": "1", + "before_ID": "1", + "before_USER_ID": "1", + "before_NAME": "a" + } + ``` + - pavingData为false时: + ```json + { + "message": { + "scn": 1807399, + "type": "UPDATE", + "schema": "TUDOU", + "table": "CDC", + "ts": 6760525407742726144, + "opTime": "2021-01-28 11:52:02.0", + "before": { + "ID": "1", + "USER_ID": "1", + "NAME": "a" + }, + "after": { + "NAME": "b", + "ID": "1", + "USER_ID": "1" + } + } + } + ``` + 其中: + + 1、scn:Oracle数据库变更记录对应的scn号 + 2、type:变更类型,INSERT,UPDATE、DELETE + 3、opTime:Oracle数据库中数据的变更时间 + 4、ts:自增ID,不重复,可用于排序,解码后为FlinkX的事件时间,解码规则如下: + + ```java + long id = Long.parseLong("6760525407742726144"); + long res = id >> 22; + DateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss"); + System.out.println(sdf.format(res)); //2021-01-28 19:54:21 + ``` + +
+ +## 六、配置示例 +```json +{ + "job": { + "content": [ + { + "reader": { + "parameter": { + "jdbcUrl": "jdbc:oracle:thin:@127.0.0.1:1521:xe", + "username": "kminer", + "password": "kminerpass", + "table": "SCHEMA1.*", + "cat": "UPDATE,INSERT,DELETE", + "startSCN": "482165", + "readPosition": "current", + "startTime": 1576540477000, + "pavingData": true, + "queryTimeout": 300 + }, + "name": "oraclelogminerreader" + }, + "writer": { + "parameter": { + "print": true + }, + "name": "streamwriter" + } + } + ], + "setting": { + "restore": { + "isRestore" : true, + "isStream" : true + }, + "speed": { + "channel": 1 + } + } + } +} +``` + + diff --git a/docs/realTime/reader/binlogreader.md b/docs/realTime/reader/binlogreader.md index 5a7269f3cc..fdfe5a680e 100644 --- a/docs/realTime/reader/binlogreader.md +++ b/docs/realTime/reader/binlogreader.md @@ -1,16 +1,37 @@ # MySQL Binlog Reader - + + +- [MySQL Binlog Reader](#mysql-binlog-reader) + - [一、插件名称](#一插件名称) + - [二、支持的数据源版本](#二支持的数据源版本) + - [三、数据库配置](#三数据库配置) + - [1.修改配置文件](#1修改配置文件) + - [2.添加权限](#2添加权限) + - [四、参数说明](#四参数说明) + - [五、配置示例](#五配置示例) + - [1、单表监听](#1单表监听) + - [2、多表监听](#2多表监听) + - [3、正则监听](#3正则监听) + - [4、指定起始位置](#4指定起始位置) + + + +
+ ## 一、插件名称 -名称:**binlogreader**
+名称:**binlogreader** + +
- ## 二、支持的数据源版本 -**MySQL 5.X**
+**MySQL5.1.5及以上** + +
- ## 三、数据库配置 -**1.修改配置文件** +### 1.修改配置文件 +binlog_format需要修改为 ROW 格式,在/etc/my.cnf文件里[mysqld]下添加下列配置 ```sql server_id=109 log_bin = /var/lib/mysql/mysql-bin @@ -18,110 +39,153 @@ binlog_format = ROW expire_logs_days = 30 ``` -
**2.添加权限** + +### 2.添加权限 +mysql binlog权限需要三个权限 SELECT, REPLICATION SLAVE, REPLICATION CLIENT ```sql GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'canal'@'%' IDENTIFIED BY 'canal'; ``` -
- -## 四、参数说明
+ + + +- 缺乏SELECT权限时,报错为 +``` +com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: +Access denied for user 'canal'@'%' to database 'binlog' +``` + +- 缺乏REPLICATION SLAVE权限时,报错为 +``` +java.io.IOException: +Error When doing Register slave:ErrorPacket [errorNumber=1045, fieldCount=-1, message=Access denied for user 'canal'@'%' +``` + +- 缺乏REPLICATION CLIENT权限时,报错为 +``` + com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: + Access denied; you need (at least one of) the SUPER, REPLICATION CLIENT privilege(s) for this operation +``` + + +binlog为什么需要这些权限: + +- Select权限代表允许从表中查看数据 +- Replication client权限代表允许执行show master status,show slave status,show binary logs命令 +- Replication slave权限代表允许slave主机通过此用户连接master以便建立主从 复制关系 + +
+ +## 四、参数说明 + - **jdbcUrl** - 描述:MySQL数据库的jdbc连接字符串,参考文档:[Mysql官方文档](http://dev.mysql.com/doc/connector-j/en/connector-j-reference-configuration-properties.html) - 必选:是 + - 字段类型:string - 默认值:无 - +
- **username** - 描述:数据源的用户名 - 必选:是 + - 字段类型:string - 默认值:无 - +
- **password** - 描述:数据源指定用户名的密码 - 必选:是 + - 字段类型:string - 默认值:无 - +
- **host** - 描述:启动MySQL slave的机器ip - 必选:是 + - 字段类型:string - 默认值:无 - +
- **port** - 描述:启动MySQL slave的端口 - 必选:否 + - 字段类型:int - 默认值:3306 - +
- **table** - 描述:需要解析的数据表。 - - 注意:指定此参数后filter参数将无效 + - 注意:指定此参数后filter参数将无效,table和filter都为空,监听jdbcUrl里的schema下所有表 - 必选:否 + - 字段类型:list - 默认值:无 - +
- **filter** - 描述:过滤表名的Perl正则表达式 + - 注意:table和filter都为空,监听jdbcUrl里的schema下所有表 + - 必选:否 + - 字段类型:string + - 默认值:无 - 例子: - 所有表:`_.*_` - canal schema下所有表: `canal\..*` - canal下的以canal打头的表:`canal\.canal.*` - canal schema下的一张表:`canal\.test1` - - 必选:否 - - 默认值:无 - +
- **cat** - 描述:需要解析的数据更新类型,包括insert、update、delete三种 - - 注意:以英文逗号分割的格式填写。 + - 注意:以英文逗号分割的格式填写。如果为空,解析所有数据更新类型 - 必选:否 + - 字段类型:string - 默认值:无 - +
- **start** - 描述:要读取的binlog文件的开始位置 + - 注意:为空,则从当前position处消费,timestamp的优先级高于 journalName+position - 参数: - - journalName:采集起点按文件开始时的文件名称; - - timestamp:采集起点按时间开始时的时间戳; + - timestamp:时间戳,采集起点从指定的时间戳处消费; + - journalName:文件名,采集起点从指定文件的起始处消费; + - position:文件的指定位置,采集起点从指定文件的指定位置处消费 + - 字段类型:map - 默认值:无 - +
- **pavingData** - 描述:是否将解析出的json数据拍平 - - 示例:假设解析的表为tb1,数据库为test,对tb1中的id字段做update操作,id原来的值为1,更新后为2,则pavingData为true时数据格式为: + - 必选:否 + - 字段类型:boolean + - 默认值:true + - 示例:假设解析的表为tb1,数据库为test,对tb1中的id字段做update操作,id原来的值为1,更新后为2,则pavingData为true时,数据格式为: ```json { "type":"update", "schema":"test", "table":"tb1", - "ts":1231232, - "ingestion":123213, + "ts":6760525407742726144, "before_id":1, "after_id":2 } ``` - -
pavingData为false时: +pavingData为false时: ```json { "message":{ "type":"update", "schema":"test", "table":"tb1", - "ts":1231232, - "ingestion":123213, + "ts":6760525407742726144, "before":{ "id":1 }, @@ -131,210 +195,248 @@ GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'canal'@'%' IDENTI } } ``` -其中”ts“是数据变更时间,ingestion是插件解析这条数据的纳秒时间 +- type:变更类型,INSERT,UPDATE、DELETE +- ts:自增ID,不重复,可用于排序,解码后为FlinkX的事件时间,解码规则如下: +```java + long id = Long.parseLong("6760525407742726144"); + long res = id >> 22; + DateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss"); + System.out.println(sdf.format(res)); //2021-01-28 19:54:21 +``` + +
+ +- **slaveId** + - 描述:从服务器的ID + - 注意:同一个MYSQL复制组内不能重复 - 必选:否 - - 默认值:false + - 字段类型:long + - 默认值:new Object().hashCode() + +
+- **connectionCharset** + - 描述:编码信息 + - 必选:否 + - 字段类型:string + - 默认值:UTF-8 + +
+- **detectingEnable** + - 描述:是否开启心跳 + - 必选:否 + - 字段类型:boolean + - 默认值:true + +
+ +- **detectingSQL** + - 描述:心跳SQL + - 必选:否 + - 字段类型:string + - 默认值:SELECT CURRENT_DATE + +
+ +- **enableTsdb** + - 描述:是否开启时序表结构能力 + - 必选:否 + - 字段类型:boolean + - 默认值:true + +
- **bufferSize** - 描述:并发缓存大小 - 注意:必须为2的幂 - 必选:否 - 默认值:1024 - + +
+ +- **parallel** + - 描述:是否开启并行解析binlog日志 + - 必选:否 + - 字段类型:boolean + - 默认值:true + +
+ +- **parallelThreadSize** + - 描述:并行解析binlog日志线程数 + - 注意:只有 paraller 设置为true才生效 + - 必选:否 + - 字段类型:int + - 默认值:2 + +
+ +- **isGTIDMode** + - 描述:是否开启gtid模式 + - 必选:否 + - 字段类型:boolean + - 默认值:false + +
+ ## 五、配置示例 - -#### 1、单表监听 +### 1、单表监听 ```json { - "job" : { - "content" : [ { - "reader" : { - "parameter" : { - "schema" : "tudou", - "password" : "abc123", - "cat" : "insert,delete,update", - "jdbcUrl" : "jdbc:mysql://kudu3:3306/tudou", - "host" : "kudu3", - "start" : { + "job": { + "content": [ + { + "reader": { + "parameter": { + "table": ["table"], + "password": "passwd", + "database": "database", + "port": 3306, + "cat": "DELETE,INSERT,UPDATE", + "host": "host", + "jdbcUrl": "jdbc:mysql://host:port/schema", + "pavingData": true, + "username": "name" }, - "table" : [ "binlog" ], - "pavingData" : true, - "username" : "dtstack" - }, - "name" : "binlogreader" - }, - "writer" : { - "parameter" : { - "print" : true + "name": "binlogreader" }, - "name" : "streamwriter" + "writer": { + "parameter": { + "print": true + }, + "name": "streamwriter" + } } - } ], - "setting" : { - "restore" : { - "isRestore" : false, - "isStream" : true - }, - "errorLimit" : { }, - "speed" : { - "bytes" : 0, - "channel" : 1 + ], + "setting": { + "restore": { + "isStream": true }, - "log" : { - "isLogger": false, - "level" : "trace", - "path" : "", - "pattern":"" + "speed": { + "channel": 1 } } } } ``` - -#### 2、多表监听 +### 2、多表监听 ```json { - "job" : { - "content" : [ { - "reader" : { - "parameter" : { - "schema" : "tudou", - "password" : "abc123", - "cat" : "insert,delete,update", - "jdbcUrl" : "jdbc:mysql://kudu3:3306/tudou", - "host" : "kudu3", - "start" : { + "job": { + "content": [ + { + "reader": { + "parameter": { + "table": ["table1","table2"], + "password": "passwd", + "database": "database", + "port": 3306, + "cat": "DELETE,INSERT,UPDATE", + "host": "host", + "jdbcUrl": "jdbc:mysql://host:port/schema", + "pavingData": true, + "username": "name" }, - "table" : ["kudu1", "kudu2"], - "filter" : "", - "pavingData" : true, - "username" : "dtstack" - }, - "name" : "binlogreader" - }, - "writer" : { - "parameter" : { - "print" : true + "name": "binlogreader" }, - "name" : "streamwriter" + "writer": { + "parameter": { + "print": true + }, + "name": "streamwriter" + } } - } ], - "setting" : { - "restore" : { - "isRestore" : false, - "isStream" : true - }, - "errorLimit" : { }, - "speed" : { - "bytes" : 0, - "channel" : 1 + ], + "setting": { + "restore": { + "isStream": true }, - "log" : { - "isLogger": false, - "level" : "trace", - "path" : "", - "pattern":"" + "speed": { + "channel": 1 } } } } ``` - -#### 3、正则监听 +### 3、正则监听 ```json { - "job" : { - "content" : [ { - "reader" : { - "parameter" : { - "schema" : "tudou", - "password" : "abc123", - "cat" : "insert,delete,update", - "jdbcUrl" : "jdbc:mysql://kudu3:3306/tudou", - "host" : "kudu3", - "start" : { + "job": { + "content": [ + { + "reader": { + "parameter": { + "filter": "schema\\..*", + "password": "passwd", + "database": "database", + "port": 3306, + "cat": "DELETE,INSERT,UPDATE", + "host": "host", + "jdbcUrl": "jdbc:mysql://host:port/schema", + "pavingData": true, + "username": "name" }, - "filter" : "tudou\\.kudu.*", - "pavingData" : true, - "username" : "dtstack" + "name": "binlogreader" }, - "name" : "binlogreader" - }, - "writer" : { - "parameter" : { - "print" : true - }, - "name" : "streamwriter" + "writer": { + "parameter": { + "print": true + }, + "name": "streamwriter" + } } - } ], - "setting" : { - "restore" : { - "isRestore" : false, - "isStream" : true + ], + "setting": { + "restore": { + "isStream": true }, - "errorLimit" : { }, - "speed" : { - "bytes" : 0, - "channel" : 1 - }, - "log" : { - "isLogger": false, - "level" : "trace", - "path" : "", - "pattern":"" + "speed": { + "channel": 1 } } } } ``` - -#### 4、指定起始位置 +### 4、指定起始位置 ```json { - "job" : { - "content" : [ { - "reader" : { - "parameter" : { - "schema" : "tudou", - "password" : "abc123", - "cat" : "insert,delete,update", - "jdbcUrl" : "jdbc:mysql://kudu3:3306/tudou", - "host" : "kudu3", - "start" : { - "journalName": "mysql-bin.000002", - "timestamp" : 1589353414000 + "job": { + "content": [ + { + "reader": { + "parameter": { + "filter": "schema\\..*", + "password": "passwd", + "database": "database", + "port": 3306, + "start" : { + "journalName": "binlog.000031", + "position": 4 + }, + "cat": "DELETE,INSERT,UPDATE", + "host": "host", + "jdbcUrl": "jdbc:mysql://host:port/schema", + "pavingData": true, + "username": "name" }, - "table" : ["kudu"], - "pavingData" : true, - "username" : "dtstack" - }, - "name" : "binlogreader" - }, - "writer" : { - "parameter" : { - "print" : true + "name": "binlogreader" }, - "name" : "streamwriter" + "writer": { + "parameter": { + "print": true + }, + "name": "streamwriter" + } } - } ], - "setting" : { - "restore" : { - "isRestore" : false, - "isStream" : true - }, - "errorLimit" : { }, - "speed" : { - "bytes" : 0, - "channel" : 1 + ], + "setting": { + "restore": { + "isStream": true }, - "log" : { - "isLogger": false, - "level" : "trace", - "path" : "", - "pattern":"" + "speed": { + "channel": 1 } } } @@ -342,6 +444,3 @@ GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'canal'@'%' IDENTI ``` - -## 六、问题排查 -采集mysql binlog 发现采集不到数据
1、查看binlog是否开启
       `show variables like '%log_bin%' ; ` 
2、binlog_format 是否设置为ROW
        注意 binlog_format 必须设置为 ROW, 因为在 STATEMENT 或 MIXED 模式下, Binlog 只会记录和传输 SQL 语句(以减少日志大小),而不包含具体数据,我们也就无法保存了。
3、从节点通过一个专门的账号连接主节点,这个账号需要拥有全局的 REPLICATION 权限。我们可以使用 GRANT 命令创建这样的账号:
     GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT
    ON *.* TO 'canal'@'%' IDENTIFIED BY 'canal';
  参考:[https://blog.csdn.net/zjerryj/article/details/77152226](https://blog.csdn.net/zjerryj/article/details/77152226) diff --git a/docs/realTime/reader/emqxreader.md b/docs/realTime/reader/emqxreader.md index a4b6ae60b2..aec21267ba 100644 --- a/docs/realTime/reader/emqxreader.md +++ b/docs/realTime/reader/emqxreader.md @@ -1,41 +1,61 @@ # Emqx Reader + + +- [Emqx Reader](#emqx-reader) + - [一、插件名称](#一插件名称) + - [二、支持的数据源版本](#二支持的数据源版本) + - [三、参数说明
](#三参数说明br-) + - [四、配置示例](#四配置示例) + + + +
## 一、插件名称 名称:**emqxreader**
+
+ ## 二、支持的数据源版本 **Emqx 4.0及以上**
+ +
+ ## 三、参数说明
- **broker** - 描述:连接URL信息 - 必选:是 + - 字段类型:String - 默认值:无 - +
- **topic** - 描述:订阅主题 - 必选:是 + - 字段类型:String - 默认值:无 - +
- **username** - 描述:认证用户名 - 必选:否 + - 字段类型:String - 默认值:无 - +
- **password** - 描述:认证密码 - 必选:否 + - 字段类型:String - 默认值:无 - +
- **isCleanSession** @@ -43,8 +63,9 @@ - false:MQTT服务器保存于客户端会话的的主题与确认位置; - true:MQTT服务器不保存于客户端会话的的主题与确认位置 - 必选:否 + - 字段类型:boolean - 默认值:true - +
- **qos** @@ -53,8 +74,9 @@ - 1:AT_LEAST_ONCE,至少一次; - 2:EXACTLY_ONCE,精准一次; - 必选:否 + - 字段类型:int - 默认值:2 - +
- **codec** @@ -66,9 +88,10 @@ - 当其中不包含message字段时,增加一个key为message,value为原始消息字符串的键值对,如:`{"key": "key", "value": "value", "message": "{\"key\": \"key\", \"value\": \"value\"}"}` - 若改字符串不为json格式,则按照plain类型进行处理 - 必选:否 + -字段类型:String - 默认值:plain - +
## 四、配置示例 @@ -78,13 +101,17 @@ "content": [{ "reader": { "parameter" : { - "broker" : "tcp://0.0.0.1:1883", + "broker" : "tcp://localhost:1883", "topic" : "test", - "username" : "username", - "password" : "password", + "username" : "admin", + "password" : "public", "isCleanSession": true, "qos": 2, - "codec": "plain" + "codec": "plain", + "column" : [ { + "name": "message", + "type" : "string" + }] }, "name" : "emqxreader" }, diff --git a/docs/realTime/reader/kafkareader.md b/docs/realTime/reader/kafkareader.md index bd94bc72ad..faf28c2a48 100644 --- a/docs/realTime/reader/kafkareader.md +++ b/docs/realTime/reader/kafkareader.md @@ -1,16 +1,31 @@ # Kafka Reader + + +- [Kafka Reader](#kafka-reader) + - [一、插件名称](#一插件名称) + - [二、参数说明](#二参数说明) + - [三、配置示例](#三配置示例) + - [1、kafka10](#1kafka10) + - [2、kafka11](#2kafka11) + - [3、kafka](#3kafka) + - [4、kafka->Hive](#4kafka-hive) + + + +
+ ## 一、插件名称 -kafka插件存在四个版本,根据kafka版本的不同,插件名称也略有不同。具体对应关系如下表所示: +kafka插件存在三个版本,根据kafka版本的不同,插件名称也略有不同。具体对应关系如下表所示: | kafka版本 | 插件名称 | | --- | --- | -| kafka 0.9 | kafka09reader | | kafka 0.10 | kafka10reader | | kafka 0.11 | kafka11reader | | kafka 1.0及以后 | kafkareader | +注:从FlinkX1.11版本开始不再支持kafka 0.9 - +
## 二、参数说明 @@ -78,7 +93,7 @@ kafka插件存在四个版本,根据kafka版本的不同,插件名称也略 ```json [ { - "message":"{\"key\": \"key\", \"value\": \"value\"}" + "message":"{\"key\": \"key\", \"message\": \"value\"}" } ] ``` @@ -125,9 +140,7 @@ kafka插件存在四个版本,根据kafka版本的不同,插件名称也略 - 必选:是 - 字段类型:Map - 默认值:无 - - 注意: - - kafka09 reader插件: consumerSettings必须至少包含`zookeeper.connect`参数 - - kafka09 reader以外的插件:consumerSettings必须至少包含`bootstrap.servers`参数 + - 注意:consumerSettings必须至少包含`bootstrap.servers`参数 - 如: ```json { @@ -137,46 +150,12 @@ kafka插件存在四个版本,根据kafka版本的不同,插件名称也略 } ``` +
+ + ## 三、配置示例 -#### 1、kafka09 -```json -{ - "job" : { - "content" : [ { - "reader" : { - "parameter" : { - "topic" : "kafka09", - "groupId" : "default", - "codec" : "text", - "encoding": "UTF-8", - "blankIgnore": false, - "consumerSettings" : { - "zookeeper.connect" : "localhost:2181/kafka09" - } - }, - "name" : "kafka09reader" - }, - "writer" : { - "parameter" : { - "print" : true - }, - "name" : "streamwriter" - } - } ], - "setting" : { - "restore" : { - "isRestore" : false, - "isStream" : true - }, - "speed" : { - "channel" : 1 - } - } - } -} -``` -#### 2、kafka10 +### 1、kafka10 ```json { "job": { @@ -215,7 +194,7 @@ kafka插件存在四个版本,根据kafka版本的不同,插件名称也略 } } ``` -#### 3、kafka11 +### 2、kafka11 ```json { "job" : { @@ -252,7 +231,7 @@ kafka插件存在四个版本,根据kafka版本的不同,插件名称也略 } } ``` -#### 4、kafka +### 3、kafka ```json { "job" : { @@ -291,7 +270,7 @@ kafka插件存在四个版本,根据kafka版本的不同,插件名称也略 } } ``` -#### 5、kafka->Hive +### 4、kafka->Hive ```json { "job": { diff --git a/docs/realTime/reader/mongodboplogreader.md b/docs/realTime/reader/mongodboplogreader.md index ece557f274..8b7ad0a590 100644 --- a/docs/realTime/reader/mongodboplogreader.md +++ b/docs/realTime/reader/mongodboplogreader.md @@ -1,13 +1,30 @@ # MongoDB Oplog Reader + + +- [MongoDB Oplog Reader](#mongodb-oplog-reader) + - [一、插件名称](#一插件名称) + - [二、数据源版本](#二数据源版本) + - [三、数据源配置](#三数据源配置) + - [四、参数说明](#四参数说明) + - [五、使用示例](#五使用示例) + - [采集test库下的所有集合](#采集test库下的所有集合) + + + +
## 一、插件名称 名称:**mongodboplogreader**
+
+ ## 二、数据源版本 **MongoDB 4.0及以上**
+
+ ## 三、数据源配置 [MongoDB 4.0副本集搭建](https://dtstack.yuque.com/rd-center/udi643/gufhya)
@@ -20,63 +37,63 @@ - 必选:是 - 默认值:无 - +
- **username** - 描述: 用户名 - 必选:是 - 默认值:无 - +
- **password** - 描述: 密码 - 必选:是 - 默认值:无 - +
- **authenticationMechanism** - 描述: 认证机制,可选:GSSAPI、PLAIN、MONGODB-X509、MONGODB-CR、SCRAM-SHA-1、SCRAM-SHA-256 - 必选:否 - 默认值:无 - +
- **clusterMode** - 描述: 集群模式,可选:REPLICA_SET、MASTER_SLAVE - 必选:是 - 默认值:无 - +
- **monitorDatabases** - 描述: 要监听的库 - 必选:否 - 默认值:无 - +
- **monitorCollections** - 描述:要监听的集合 - 必选:否 - 默认值:无 - +
- **operateType** - 描述:要监听的操作类型,可选:insert、update、delete - 必选:否 - 默认值:无 - +
- **excludeDocId** - 描述:是否排除_id字段 - 必选:否 - 默认值:false - +
- **pavingData** - 描述:是否将解析出的json数据拍平 diff --git a/docs/realTime/reader/pgwalreader.md b/docs/realTime/reader/pgwalreader.md index 9390b88ec2..28940e095a 100644 --- a/docs/realTime/reader/pgwalreader.md +++ b/docs/realTime/reader/pgwalreader.md @@ -1,47 +1,73 @@ # PostgreSQL WAL Reader + + + + +- [PostgreSQL WAL Reader](#postgresql-wal-reader) + - [一、插件名称](#一插件名称) + - [二、数据源版本](#二数据源版本) + - [三、数据库原理及配置](#三数据库原理及配置) + - [四、使用说明](#四使用说明) + - [五、参数说明
](#五参数说明br-) + - [五、配置示例](#五配置示例) + + + +
## 一、插件名称 名称:**pgwalreader**
+
+ ## 二、数据源版本 **PostgreSQL数据库版本至少为10.0及以上**
+
+ +## 三、数据库原理及配置 +[FlinkX PostgreSQL WAL实时采集基本原理及配置](../other/PgWal原理及配置.md) + +
+ -## 三、使用说明 +## 四、使用说明 1、预写日志级别(wal_level)必须为logical
2、该插件基于PostgreSQL逻辑复制及逻辑解码功能实现的,因此PostgreSQL账户至少拥有replication权限,若允许创建slot,则至少拥有超级管理员权限
3、详细原理请参见[PostgreSQL官方文档](http://postgres.cn/docs/10/index.html)
+
+ -## 四、参数说明
+## 五、参数说明
- **jdbcUrl** - 描述:PostgreSQL数据库的jdbc连接字符串,参考文档:[PostgreSQL官方文档](https://jdbc.postgresql.org/documentation/head/connect.html) - 必选:是 - 默认值:无 - +
- **username** - 描述:数据源的用户名 - 必选:是 - 默认值:无 - +
- **password** - 描述:数据源指定用户名的密码 - 必选:是 - 默认值:无 - +
- **tableList** - 描述:需要解析的数据表,格式为schema.table - 必选:否 - 默认值:无 - +
- **cat** - 描述:需要解析的数据更新类型,包括insert、update、delete三种 @@ -49,21 +75,21 @@ - 必选:是 - 默认值:无 - +
- **statusInterval** - 描述:复制期间,数据库和使用者定期交换ping消息。如果数据库或客户端在配置的超时时间内未收到ping消息,则复制被视为已停止,并且将引发异常,并且数据库将释放资源。在PostgreSQL中,ping超时由属性wal_sender_timeout配置(默认= 60秒)。可以将pgjdc中的复制流配置为在需要时或按时间间隔发送反馈(ping)。建议比配置的wal_sender_timeout更频繁地向数据库发送反馈(ping)。在生产环境中,我使用等于wal_sender_timeout / 3的值。它避免了网络潜在的问题,并且可以在不因超时而断开连接的情况下传输更改 - 必选:否 - 默认值:2000 - +
- **lsn** - 描述:要读取PostgreSQL WAL日志序列号的开始位置 - 必选:否 - 默认值:0 - +
- **slotName** - 描述:复制槽名称,根据该值去寻找或创建复制槽 @@ -71,21 +97,21 @@ - 必选:否 - 默认值:无 - +
- **allowCreateSlot** - 描述:是否允许创建复制槽 - 必选:否 - 默认值:true - +
- **temporary** - 描述:复制槽是否为临时性的,true:是;false:否 - 必选:否 - 默认值:true - +
- **pavingData** - 描述:是否将解析出的json数据拍平 @@ -124,7 +150,7 @@ pavingData为false时: - 必选:否 - 默认值:false - +
## 五、配置示例 @@ -182,221 +208,4 @@ pavingData为false时: } } } -``` - -## PostgreSQL实时采集原理 -PostgreSQL 实时采集是基于 PostgreSQL的逻辑复制以及逻辑解码功能来完成的。逻辑复制同步数据的原理是,在wal日志产生的数据库上,由逻辑解析模块对wal日志进行初步的解析,它的解析结果为ReorderBufferChange(可以简单理解为HeapTupleData),再由pgoutput plugin对中间结果进行过滤和消息化拼接后,然后将其发送到订阅端,订阅端通过逻辑解码功能进行解析。 - -### 版本限制 -逻辑复制是pgsql10.0版本之后才支持的,因此此方案只支持10.0之后版本 - - -### 主要涉及模块说明 -| Logical Decoding | PostgreSQL 的逻辑日志来源于解析物理 WAL 日志。
解析 WAL 成为逻辑数据的过程叫 Logical Decoding。 | -| :--- | :--- | -| Replication Slots | 保存逻辑或物理流复制的基础信息。类似 Mysql 的位点信息。
一个 逻辑 slot 创建后,它的相关信息可以通过 pg_replication_slots 系统视图获取。
如果它在 active 状态,则可以通过系统视图 pg_stat_replication 看到一些 slot 的实时的状态信息。 | -| Output Plugins | PostgreSQL 的逻辑流复制协议开放一组可编程接口,用于自定义输数据到客户端的逻辑数据的格式。
这部分实现使用插件的方式被内核集成和使用,称作 Output Plugins。 | -| Exported Snapshots | 当一个逻辑流复制 slot 被创建时,系统会产生一个快照。客户端可以通过它订阅到数据库任意时间点的数据变化。 | - - - -对于修改一条数据之后 ,pgsql订阅端decode解析后的数据格式为 -```json -{"id":"schema1.test1", - "schema":"schema1", -"table":"test1", - "columnList":[ - {"name":"id","type":"int4","index":0}, - {"name":"name","type":"varchar","index":1} - ], - "oldData":["2","23"], - "newData":["2","name1"], - "type":"UPDATE", - "currentLsn":23940928, - "ts":1596358573614 -} -``` -主要包含schema table以及类型`INSERT`, `UPDATE`和`DELETE`以及WAL日志id等相关信息
-
- - -### 逻辑复制 -逻辑复制使用_发布_和_订阅_模型, 其中一个或多个_订阅者_订阅_发布者_ 节点上的一个或多个_发布_。 订阅者从他们订阅的发布中提取数据,逻辑复制是根据复制标识(通常是主键)复制数据对象及其更改的一种方法,因此在上面订阅端收到消息数据实例中可以发现 具备数据库以及表信息外 还具备修改前数据,修改后数据信息以及执行的type和对应的WAL日志ID - -发布可以选择将它们所产生的改变限制在`INSERT`, `UPDATE`和`DELETE`的任意组合上, 类似于触发器被特定事件类型触发。默认情况下,复制所有操作类型。
已发布的table必须配置一个“副本标识”以便能够复制 `UPDATE`和`DELETE`操作, 这样可以在订阅者端识别适当的行来更新或删除。默认情况下,这是主键, 如果有的话。另外唯一的索引(有一些额外的要求)也可以被设置为副本标识。 如果表没有任何合适的键,那么它可以设置为复制标识“full”, 这意味着整个行成为键。但是,这是非常低效的, 并且只能在没有其他可能的解决方案时用作后备
- - -### 创建发布 -为哪些表设置创建一个发布 -```sql -CREATE PUBLICATION name - [ FOR TABLE [ ONLY ] table_name [ * ] [, ...] - | FOR ALL TABLES ] - [ WITH ( publication_parameter [= value] [, ... ] ) ] -``` - - - -### WAL日志 -WAL 是 Write Ahead Log的缩写,中文称之为预写式日志。WAL log也被简称为xlog,每一次change操作都是先写日志再写数据,保证了事务持久性和数据完整性同时又尽量地避免了频繁IO对性能的影响。WAL的中心概念是**数据文件(存储着表和索引)的修改必须在这些动作被日志记录之后才被写入**
WAL日志保存在pg_xlog下,每个xlog文件默认是16MB,为了满足恢复需求,在xlog目录下会产生多个WAL日志,不需要的WAL日志将会被覆盖
WAL具备归档功能,通过归档的WAL文件可以恢复数据库到WAL日志覆盖时间内的任意一个时间点的状态并且有了WAL日志之后,逻辑复制就可以在WAL日志生成之后,对其进行一系列操作之后传递给订阅客户端,使得订阅客户端能实时获取到源服务器上的修改数据
- - -#### WAL何时被写入 -WAL也有个内存缓冲区WAL Buffer,WAL都是先写入缓存中,对于事务操作,缓存的WAL日志是在事务提交的时候写入磁盘的,对于非事务型的由一个异步线程追加进日志文件或者在checkPoint(数据脏页缓存写入磁盘需要先刷新WAL缓存)的时候写入。
- - -#### WAL主要配置 -``` -wal_level 可以选择为minimal, replica, or logical 使用逻辑复制需要设置为logical - -fsync boolean类型 表示是否使用fsync()系统调用把WAL文件刷新到物理磁盘,确保数据库在操作系统或硬件奔溃的情况下可恢复到最终状态 默认是on - -synchronous_commit boolean类型 声明提交一个事务是否需要等待其把WAL日志写入磁盘后再返回,默认值是’on’ - -on:默认值,为on且没有开启同步备库的时候,会当wal日志真正刷新到磁盘永久存储后才会返回客户端事务已提交成功, - 当为on且开启了同步备库的时候(设置了synchronous_standby_names),必须要等事务日志刷新到本地磁盘,并且还要等远程备库也提交到磁盘才能返回客户端已经提交. - -remote_apply:提交将等待, 直到来自当前同步备用数据库的回复表明它们已收到事务的提交记录并应用它, 以便它对备用数据库上的查询可见。 - -remote_write:提交将等待,直到来自当前同步的后备服务器的一个回复指示该服务器已经收到了该事务的提交记录并且已经把该记录写出到后备服务器的操作系统。 - -local:当事务提交时,仅写入本地磁盘即可返回客户端事务提交成功,而不管是否有同步备库。 - -off:写到缓存中就会向客户端返回提交成功,但也不是一直不刷到磁盘,延迟写入磁盘,延迟的时间为最大3倍的wal_writer_delay参数的(默认200ms)的时间,所有如果即使关闭synchronous_commit,也只会造成最多600ms的事务丢失 可能会造成一些最近已提交的事务丢失,但数据库状态是一致的,就像这些事务已经被干净地中止。但对高并发的小事务系统来说,性能来说提升较大。 - - -wal_sync_method enum类型 用来指定向磁盘强制更新WAL日志数据的方法open_datasync fdatasync fsync_writethrough fsync open_sync - - - -Wal_writer_delay 指定wal writer process 把WAL日志写入磁盘的周期 在每个周期中会先把缓存中的WAL日志刷到磁盘 - -``` - - - -### 复制槽 -每个订阅都将通过一个复制槽接收更改,记录某个订阅者的WAL接收情况。
在源数据库写入修改频繁导致WAL日志的写入速度很快,导致大量WAL日志生成,或者订阅者接受日志很慢,在消费远远小于生产的时候,会导致源数据库上的WAL日志还没有传递到备库就被回卷覆盖掉了,如果被覆盖掉的WAL日志文件又没有归档备份,那么订阅者就再也无法消费到此数据。
复制槽则保存了此订阅的接收信息,使得未被接收的WAL日日志不会被回收 - -注意
数据库会记录slot的wal复制位点,并在wal文件夹中保留所有未发送的wal文件,如果客户创建了slot但是后期不再使用就有可能导致数据库的wal日志爆仓,需要及时删除不用的slot
-
可通过以下SQL获取相关信息 -```sql -select * from pg_replication_slots; -``` -字段含义 -```text -Name Type References Description -slot_name name 复制槽的唯一的集群范围标识符 -plugin name 正在使用的包含逻辑槽输出插件的共享对象的基本名称,对于物理插槽则为null。 -slot_type text 插槽类型 - 物理或逻辑 -datoid oid 该插槽所关联的数据库的OID,或为空。 只有逻辑插槽才具有关联的数据库。 -database text 该插槽所关联的数据库的名称,或为空。 只有逻辑插槽才具有关联的数据库。 -active boolean 如果此插槽当前正在使用,则为真 -active_pid integer 如果当前正在使用插槽,则使用此插槽的会话的进程ID。 NULL如果不活动。 -xmin xid 此插槽需要数据库保留的最早事务。 VACUUM无法删除任何后来的事务删除的元组。 -catalog_xmin xid 影响该插槽需要数据库保留的系统目录的最早的事务。 VACUUM不能删除任何后来的事务删除的目录元组。 -restart_lsn pg_lsn 最老的WAL的地址(LSN)仍然可能是该插槽的使用者所需要的,因此在检查点期间不会被自动移除 -``` - - - -### 局限性 - -- 不复制数据库模式和DDL命令。初始模式可以使用`pg_dump --schema-only` 手动复制。后续的模式更改需要手动保持同步。(但是请注意, 两端的架构不需要完全相同。)当实时数据库中的模式定义更改时,逻辑复制是健壮的: 当模式在发布者上发生更改并且复制的数据开始到达订阅者但不符合表模式, 复制将错误,直到模式更新。在很多情况下, 间歇性错误可以通过首先将附加模式更改应用于订阅者来避免。
-- 不复制序列数据。序列支持的序列或标识列中的数据当然会作为表的一部分被复制, 但序列本身仍然会显示订阅者的起始值。如果订阅者被用作只读数据库, 那么这通常不成问题。但是,如果打算对订阅者数据库进行某种切换或故障切换, 则需要将序列更新为最新值,方法是从发布者复制当前数据 (可能使用`pg_dump`)或者从表中确定足够高的值。
-- 不复制`TRUNCATE`命令。当然,可以通过使用`DELETE` 来解决。为了避免意外的`TRUNCATE`调用,可以撤销表的 `TRUNCATE`权限。
-- 不复制大对象 没有什么解决办法,除非在普通表中存储数据。 -- 复制只能从基表到基表。也就是说,发布和订阅端的表必须是普通表,而不是视图, 物化视图,分区根表或外部表。对于分区,您可以一对一地复制分区层次结构, 但目前不能复制到不同的分区设置。尝试复制基表以外的表将导致错误 - - - - -### PostgreSQL实时采集配置 - -#### postgresql.conf设置 -``` -wal_level = logical -``` - - -用于复制链接的角色必须具有`REPLICATION`属性(或者是超级用户) 需要在pg_hba.conf做出如下配置 -``` -host replication all 10.0.3.0/24 md5 -``` - - -### 部分核心代码分析 - - - -#### 执行发布SQL -逻辑复制流是发布/订阅模型,因此生成流之前 先进行发布 -```java -public static final String PUBLICATION_NAME = "dtstack_flinkx"; -public static final String CREATE_PUBLICATION = "CREATE PUBLICATION %s FOR ALL TABLES;"; -public static final String QUERY_PUBLICATION = "SELECT COUNT(1) FROM pg_publication WHERE pubname = '%s';"; - -先执行查找sql 判断是否存在 dtstack_flinkx 的 PUBLICATION -如果不存在 执行创建sql语句 -conn.createStatement() - .execute(String.format(CREATE_PUBLICATION, PUBLICATION_NAME)); -``` - - - -#### 创建一个逻辑复制流 -```java - ChainedLogicalStreamBuilder builder = conn.getReplicationAPI() - .replicationStream() //定义一个逻辑复制流 - .logical() //级别是logical - .withSlotName(format.getSlotName())//复制槽名称 - //协议版本。当前仅支持版本1 - .withSlotOption("proto_version", "1")//槽版本号 - //逗号分隔的要订阅的发布名称列表(接收更改)。 单个发布名称被视为标准对象名称,并可根据需要引用 - .withSlotOption("publication_names", PgWalUtil.PUBLICATION_NAME)//关联的发布名称 - .withStatusInterval(format.getStatusInterval(), TimeUnit.MILLISECONDS); - long lsn = format.getStartLsn(); - if(lsn != 0){ - builder.withStartPosition(LogSequenceNumber.valueOf(lsn)); - } - stream = builder.start(); -``` - -#### 业务处理 -逻辑复制流接收到订阅的消息后 进行编码 获取到相应信息处理 -```java - public void run() { - LOG.info("PgWalListener start running....."); - try { - init(); - while (format.isRunning()) { - //接收到流对象 - ByteBuffer buffer = stream.readPending(); - if (buffer == null) { - continue; - } - //解码为table对象 具体信息为库 表 字段信息 WAL id等 - //然后就可以对其进行处理了 - Table table = decoder.decode(buffer); - if(StringUtils.isBlank(table.getId())){ - continue; - } - String type = table.getType().name().toLowerCase(); - if(!cat.contains(type)){ - continue; - } - if(!tableSet.contains(table.getId())){ - continue; - } - LOG.trace("table = {}",gson.toJson(table)); - ............... - } - } - } -``` - -
- - - - +``` \ No newline at end of file diff --git a/docs/realTime/reader/restapireader.md b/docs/realTime/reader/restapireader.md index 7c23263698..a327fd1d95 100644 --- a/docs/realTime/reader/restapireader.md +++ b/docs/realTime/reader/restapireader.md @@ -1,4 +1,12 @@ # Restapi Reader + + +- [Restapi Reader](#restapi-reader) + - [一、插件名称](#一插件名称) + - [二、参数说明](#二参数说明) + - [三、配置示例](#三配置示例) + + ## 一、插件名称 名称:restapireader @@ -6,66 +14,54 @@ ## 二、参数说明 -- protocol - - 描述:http请求协议 - - 必选:否 - - 字段类型:string - - 默认值:https - -
- - url - - 描述:http请求地址 - - 必选:是 - - 字段类型:string + - 描述:http请求地址 + - 必选:是 + - 字段类型:string
- requestMode - - 描述:http请求方式 - - 必选:是 - - 字段类型:string - - 可选值:post get + - 描述:http请求方式 + - 必选:是 + - 字段类型:string + - 可选值:post get
- header - - 描述: 请求的header - - 注意: 当请求方式为post时,Content-Type需为application/json,目前只支持post请求的json格式,不支持表单提交 - - 必选: 否 - - 字段类型:数组 + - 描述: 请求的header + - 注意: 当请求方式为post时,Content-Type需为application/json,目前只支持post请求的json格式,不支持表单提交 + - 必选: 否,如果requestMode配置未post,header会自动添加 'application/json' head头 + - 字段类型:数组 ```json "header": [ { "name": "token", - "value": "${uuid}" - }, - { - "name": "Content-Type", - "value": "application/json" + "value": " ${uuid}" } - ], + ] ``` - 参数解析 - - name 请求的key 必选 - - value key的值 必选 + - name 请求的key 必选 + - value key的值 必选
- body - - 描述:对应post请求的body参数 - - 注意:参数支持动态参数替换,内置变量以及动态变量的加减(只支持动态变量的一次加减运算), - - 内置变量 - - ${currentTime}当前时间,获取当前时间,格式为yyyy-MM-dd HH:mm:ss类型 - - ${intervalTime}间隔时间,代表参数 intervalTime 的值 - - ${uuid} 随机字符串 32位的随机字符串 - - param/body/response变量 - - ${param.key} 对应get请求param参数里key对应的值 - - ${body.key}对应post请求的body参数里key对应的值 - - ${response.key} 对应返回值里的key对应的值 - - 必选:否 - - 字段类型:数组 + - 描述:对应post请求的body参数 + - 注意:参数支持动态参数替换,内置变量以及动态变量的加减(只支持动态变量的一次加减运算), + - 内置变量 + - ${currentTime}当前时间,获取当前时间,格式为yyyy-MM-dd HH:mm:ss类型 + - ${intervalTime}间隔时间,代表参数 intervalTime 的值 + - ${uuid} 随机字符串 32位的随机字符串 + - param/body/response变量 + - ${param.key} 对应get请求param参数里key对应的值 + - ${body.key}对应post请求的body参数里key对应的值 + - ${response.key} 对应返回值里的key对应的值 + - 必选:否 + - 字段类型:数组 ```json "body": [ { @@ -79,30 +75,30 @@ "value": "${body.stime}+${intervalTime}", "format": "yyyy-mm-dd hh:mm:ss" } - ], + ] ``` - 参数解析 - - name 请求的key 必选 - - value key的值 必选 - - nextValue 除第一次请求之外,key对应的值 非必选 - - format 格式化模板 非必选,如果要求请求格式是日期格式,必须填写 + - name 请求的key 必选 + - value key的值 必选 + - nextValue 除第一次请求之外,key对应的值 非必选 + - format 格式化模板 非必选,如果要求请求格式是日期格式,必须填写
- param - - 描述:对应get请求参数 - - 注意:参数支持动态参数替换,内置变量以及动态变量的加减(只支持动态变量的一次加减运算) - - 内置变量 - - ${currentTime}当前时间,获取当前时间,格式为yyyy-MM-dd HH:mm:ss类型 - - ${intervalTime}间隔时间,代表参数 intervalTime 的值 - - ${uuid} 随机字符串 32位的随机字符串 - - param/body/response变量 - - ${param.key} 对应get请求param参数里key对应的值 - - ${body.key}对应post请求的body参数里key对应的值 - - ${response.key} 对应返回值里的key对应的值 - - 必选:否 - - 字段类型:数组 + - 描述:对应get请求参数 + - 注意:参数支持动态参数替换,内置变量以及动态变量的加减(只支持动态变量的一次加减运算) + - 内置变量 + - ${currentTime}当前时间,获取当前时间,格式为yyyy-MM-dd HH:mm:ss类型 + - ${intervalTime}间隔时间,代表参数 intervalTime 的值 + - ${uuid} 随机字符串 32位的随机字符串 + - param/body/response变量 + - ${param.key} 对应get请求param参数里key对应的值 + - ${body.key}对应post请求的body参数里key对应的值 + - ${response.key} 对应返回值里的key对应的值 + - 必选:否 + - 字段类型:数组 ```json "param": [ { @@ -116,29 +112,29 @@ "value": "${body.stime}+${intervalTime}", "format": "yyyy-mm-dd hh:mm:ss" } - ], + ] ``` - 参数解析 - - name 请求的key 必选 - - value key的值 必选 - - nextValue 除第一次请求之外,key对应的值 非必选 - - format 格式化模板 非必选,如果要求请求格式是日期格式,必须填写 + - name 请求的key 必选 + - value key的值 必选 + - nextValue 除第一次请求之外,key对应的值 非必选 + - format 格式化模板 非必选,如果要求请求格式是日期格式,必须填写
- decode - - 描述 解码器 返回数据是作为json格式还是text格式处理 - - 必选:否 - - 字段类型:string + - 描述 解码器 返回数据是作为json格式还是text格式处理 + - 必选:否 + - 字段类型:string ```json "deocode":"json" ``` - 默认值:text - 可选值:text json - - text 不做任何处理,返回值直接丢出去 - - json 可以进行定制化输出,指定输出的key,则对返回值解析,获取对应的key以及值 组装新的json数据丢出去 + - text 不做任何处理,返回值直接丢出去 + - json 可以进行定制化输出,指定输出的key,则对返回值解析,获取对应的key以及值 组装新的json数据丢出去
@@ -149,7 +145,7 @@ - 字段类型:string - 示例 -fileds值为 +fields值为 ``` "fields": "msg.key1,msg.key2.key3", ``` @@ -160,7 +156,7 @@ fileds值为 "key1": "value1", "key2": { "key3": "value2", - "key4": "value3", + "key4": "value3" }, "key5": 2 } @@ -180,10 +176,10 @@ fileds值为
- strategy - - 描述 定义的key的实际值与value指定值相等时进行对应的逻辑处理 - - 必选 否 - - 字段类型:数组 - - 描述:针对返回类型为json的数据,用户会指定key以及对应的value和处理方式。如果返回数据的对应的key的值正好和用户配置的value相等,则执行对应逻辑。同时用户指定的key可以来自返回值也可以来自param参数值 + - 描述 定义的key的实际值与value指定值相等时进行对应的逻辑处理 + - 必选 否 + - 字段类型:数组 + - 描述:针对返回类型为json的数据,用户会指定key以及对应的value和处理方式。如果返回数据的对应的key的值正好和用户配置的value相等,则执行对应逻辑。同时用户指定的key可以来自返回值也可以来自param参数值 ```json "strategy": [ @@ -196,35 +192,35 @@ fileds值为 ``` - 参数解析 - - key 选择对应参数的key,支持的格式为 - - 变量 - - ${param.key} - - ${body.key} - - ${response.key} - - 内置变量 - - ${currentTime}当前时间,获取当前时间,格式为yyyy-MM-dd HH:mm:ss类型 - - ${intervalTime}间隔时间,代表参数 intervalTime 的值 - - ${uuid} 随机字符串 32位的随机字符串 - - value 匹配的值,支持的格式为 - - 常量 - - 变量: - - ${param.key} - - ${body.key} - - ${response.key} - - 内置变量 - - ${currentTime}当前时间,获取当前时间,格式为yyyy-MM-dd HH:mm:ss类型 - - ${intervalTime}间隔时间,代表参数 intervalTime 的值 - - ${uuid} 随机字符串 32位的随机字符串 - - handle 对应处理逻辑 - - stop 停止任务 - - retry 重试,如果重试三次都失败 任务结束 + - key 选择对应参数的key,支持的格式为 + - 变量 + - ${param.key} + - ${body.key} + - ${response.key} + - 内置变量 + - ${currentTime}当前时间,获取当前时间,格式为yyyy-MM-dd HH:mm:ss类型 + - ${intervalTime}间隔时间,代表参数 intervalTime 的值 + - ${uuid} 随机字符串 32位的随机字符串 + - value 匹配的值,支持的格式为 + - 常量 + - 变量: + - ${param.key} + - ${body.key} + - ${response.key} + - 内置变量 + - ${currentTime}当前时间,获取当前时间,格式为yyyy-MM-dd HH:mm:ss类型 + - ${intervalTime}间隔时间,代表参数 intervalTime 的值 + - ${uuid} 随机字符串 32位的随机字符串 + - handle 对应处理逻辑 + - stop 停止任务 + - retry 重试,如果重试三次都失败 任务结束
- intervalTime - - 描述: 每次请求间隔时间,单位毫秒 - - 必选:是 - - 字段类型:long + - 描述: 每次请求间隔时间,单位毫秒 + - 必选:是 + - 字段类型:long @@ -236,7 +232,6 @@ fileds值为 { "reader": { "parameter": { - "protocol": "http", "url": "http://wwww.a.com", "requestMode": "post", "decode": "json", @@ -266,7 +261,7 @@ fileds值为 } ], "strategy": [ - { + { "key": "${response.status}", "value": "3000", "handle": "stop" @@ -282,13 +277,13 @@ fileds值为 }, "writer": { "parameter": { - "print": false + "print": true }, "name": "streamwriter" } } ], - "setting": { + "setting": { "restore": { "isRestore": true, "isStream": true diff --git a/docs/realTime/reader/sqlservercdc.md b/docs/realTime/reader/sqlservercdc.md new file mode 100644 index 0000000000..a74fafa598 --- /dev/null +++ b/docs/realTime/reader/sqlservercdc.md @@ -0,0 +1,184 @@ +# SqlServer CDC Reader + + +- [SqlServer CDC Reader](#sqlserver-cdc-reader) + - [一、插件名称](#一插件名称) + - [二、数据源版本](#二数据源版本) + - [三、数据源配置](#三数据源配置) + - [四、基本原理](#四基本原理) + - [五、参数说明](#五参数说明) + - [六、配置示例](#六配置示例) + + + +## 一、插件名称 +名称:**sqlservercdcreader** + + +## 二、数据源版本 +SqlServer 2012及以上 + +## 三、数据源配置 +[SqlServer配置CDC](../other/SqlserverCDC配置.md) + +## 四、基本原理 +[FlinkX Sqlserver CDC实时采集基本原理](../other/SqlserverCDC原理.md) + +## 五、参数说明 + + +- **url** + - 描述:SqlServer数据库的jdbc连接字符串,参考文档:[SqlServer官方文档](https://docs.microsoft.com/zh-cn/sql/connect/jdbc/overview-of-the-jdbc-driver?view=sql-server-2017) + - 必选:是 + - 字段类型:string + - 默认值:无 + +
+ +- **username** + - 描述:数据源的用户名 + - 必选:是 + - 字段类型:string + - 默认值:无 + +
+ +- **password** + - 描述:数据源指定用户名的密码 + - 必选:是 + - 字段类型:string + - 默认值:无 + +
+ +- **databaseName** + - 描述:监听的数据库 + - 必选:是 + - 字段类型:string + - 默认值:无 + +
+ +- **tableList** + - 描述:需要解析的数据表,表必须已启用CDC,格式为schema.table + - 必选:否 + - 字段类型:list + - 默认值:无 + +
+ +- **cat** + - 描述:需要解析的数据更新类型,包括insert、update、delete三种 + - 注意:以英文逗号分割的格式填写。 + - 必选:是 + - 字段类型:string + - 默认值:无 + +
+ +- **pollInterval** + - 描述:监听拉取SqlServer CDC数据库间隔时间 + - 注意:该值越小,采集延迟时间越小,给数据库的访问压力越大 + - 必选:否 + - 字段类型:long + - 默认值:1000 + +
+ +- **lsn** + - 描述:要读取SqlServer CDC日志序列号的开始位置 + - 必选:否 + - 字段类型:string + - 默认值:无 + +
+ + +- **pavingData** + - 描述:是否将解析出的json数据拍平 + - 示例:假设解析的表为tb1,schema为dbo,对tb1中的id字段做update操作,id原来的值为1,更新后为2,则pavingData为true时数据格式为: +```json +{ + "type":"update", + "schema":"dbo", + "table":"tb1", + "lsn":"00000032:00002038:0005", + "ts": 6760525407742726144, + "before_id":1, + "after_id":2 +} +``` +pavingData为false时: +```json +{ + "type":"update", + "schema":"dbo", + "table":"tb1", + "lsn":"00000032:00004a38:0007", + "ts": 6760525407742726144, + "before":{ + "id":1 + }, + "after":{ + "id":2 + } +} +``` +- type:变更类型,INSERT,UPDATE、DELETE +- lsn:Sqlserver数据库变更记录对应的lsn号 +- ts:自增ID,不重复,可用于排序,解码后为FlinkX的事件时间,解码规则如下: + +```java +long id = Long.parseLong("6760525407742726144"); + long res = id >> 22; + DateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss"); + System.out.println(sdf.format(res)); //2021-01-28 19:54:21 +``` + + +- 必选:否 +- 字段类型:boolean +- 默认值:false + + + + +## 六、配置示例 +```json +{ + "job" : { + "content" : [ { + "reader" : { + "parameter" : { + "username" : "uname", + "password" : "passwd", + "url": "jdbc:sqlserver://host:1433;database=databaseName", + "databaseName":"databaseName", + "tableList": ["dbo.cdc"], + "lsn": "00000025:00000bc0:0003", + "cat": "insert,update,delete" + }, + + "name" : "sqlservercdcreader" + }, + "writer": { + "name": "streamwriter", + "parameter": { + "print": true + } + } + } + ], + "setting" : { + "restore": { + "isStream": true + }, + "speed" : { + "channel" : 1 + } + } + } +} +``` + + diff --git a/docs/realTime/writer/emqxwriter.md b/docs/realTime/writer/emqxwriter.md index b5774574bd..8275c78fa7 100644 --- a/docs/realTime/writer/emqxwriter.md +++ b/docs/realTime/writer/emqxwriter.md @@ -1,4 +1,15 @@ # Emqx Writer + + +- [Emqx Writer](#emqx-writer) + - [一、插件名称](#一插件名称) + - [二、支持的数据源版本](#二支持的数据源版本) + - [三、参数说明
](#三参数说明br-) + - [四、配置示例](#四配置示例) + + + +
## 一、插件名称 @@ -12,29 +23,33 @@ - **broker** - 描述:连接URL信息。 - 必选:是 + - 字段类型:String - 默认值:无 - +
- **topic** - 描述:订阅主题 - 必选:是 + - 字段类型:String - 默认值:无 - +
- **username** - 描述:认证用户名 - 必选:否 + - 字段类型:String - 默认值:无 - +
- **password** - 描述:认证密码 - 必选:否 + - 字段类型:String - 默认值:无 - +
- **isCleanSession** @@ -42,8 +57,9 @@ - false:MQTT服务器保存于客户端会话的的主题与确认位置 - true:MQTT服务器不保存于客户端会话的的主题与确认位置 - 必选:否 + - 字段类型:boolean - 默认值:true - +
- **qos** @@ -52,8 +68,9 @@ - 1:AT_LEAST_ONCE,至少一次; - 2:EXACTLY_ONCE,精准一次; - 必选:否 + - 字段类型:int - 默认值:2 - +
@@ -63,37 +80,35 @@ "job": { "content": [{ "reader": { - "name": "streamreader", - "parameter": { - "column": [ - { - "name": "id", - "type": "id" - }, - { - "name": "user_id", - "type": "int" - }, - { - "name": "name", - "type": "string" - } - ], - "sliceRecordCount" : [ "100"] - } - }, - "writer": { - "writer" : { - "parameter" : { - "broker" : "tcp://0.0.0.1:1883", - "topic" : "test", - "username" : "username", - "password" : "password", - "isCleanSession": true, - "qos": 2 - }, - "name" : "emqxwriter" + "name": "streamreader", + "parameter": { + "column": [ + { + "name": "id", + "type": "id" + }, + { + "name": "user_id", + "type": "int" + }, + { + "name": "name", + "type": "string" + } + ], + "sliceRecordCount" : [ "100"] } + }, + "writer": { + "parameter" : { + "broker" : "tcp://localhost:1883", + "topic" : "test", + "username" : "admin", + "password" : "public", + "isCleanSession": true, + "qos": 2 + }, + "name" : "emqxwriter" } } ], @@ -103,7 +118,7 @@ "bytes": 0 }, "errorLimit": { - "record": 100 + "record": 1 }, "restore": { "maxRowNumForCheckpoint": 0, diff --git a/docs/realTime/writer/hudiwriter.md b/docs/realTime/writer/hudiwriter.md new file mode 100644 index 0000000000..364ce95985 --- /dev/null +++ b/docs/realTime/writer/hudiwriter.md @@ -0,0 +1,181 @@ +# Hudi Writer + + + +- [一、插件名称](#一插件名称) +- [二、参数说明](#二参数说明) +- [三、配置示例](#三配置示例) + - [1、kafka2hudi](#1kafka2hudi) + + + +
+ +## 一、插件名称 + +**名称:hudiwriter**
+ +
+ +## 二、参数说明 + +- **batchInterval** + - 描述:单次批量写入数据条数,建议配置大于1 + - 必选:否 + - 字段类型:int + - 默认值:1 + +
+ +- **tableName** + - 描述:库表名 + - 必选:是 + - 字段类型:String + - 默认值:无 + - 注意:英文点号分隔的 {Database}.{Table} + +
+ +- **path** + - 描述:表所在HDFS路径 + - 必选:是 + - 字段类型:String + - 默认值:无 + +
+ +- **defaultFS** + - 描述:Hadoop hdfs文件系统namenode节点地址。格式:hdfs://ip:端口;例如:hdfs://127.0.0.1:9 + - 必选:是 + - 字段类型:String + - 默认值:无 + +
+ +- **hadoopConfig** + - 描述:集群HA模式时需要填写的namespace配置及其它配置 + - 必选:否 + - 字段类型:Map + - 默认值:无 + +
+ +- **column** + - 描述:目的表需要写入数据的字段,字段之间用英文逗号分隔。例如: "column": ["id","name","age"]。 + - 必选:是 + - 默认值:无 + - 字段类型:List + +
+ +- **hiveJdbcUrl** + - 描述:Hive jdbc链接地址,例如jdbc:hive2://127.0.0.1:9093 + - 必选:是 + - 默认值:无 + - 字段类型:String + +
+ +- **hiveMetastore** + - 描述:Hive Metastore元数据地址,例如thrift://127.0.0.1:9083 + - 必选:是 + - 默认值:无 + - 字段类型:String + +
+ +- **hiveUser** + - 描述:Hive用户名 + - 必选:否 + - 默认值:无 + - 字段类型:String + +- **hivePass** + - 描述:Hive用户对应密码 + - 必选:否 + - 默认值:无 + - 字段类型:String + +## 三、配置示例 + +### 1、kafka2hudi + +```json +{ + "job": { + "content": [ + { + "reader": { + "name": "kafkareader", + "parameter": { + "blankIgnore": true, + "codec": "JSON", + "consumerSettings": { + "bootstrap.servers": "100.100.100.1:6667,100.100.100.2:6667" + }, + "groupId": "flink", + "metaColumns": [ + { + "name": "id", + "type": "int" + }, + { + "name": "name", + "type": "string" + }, + { + "name": "age", + "type": "int" + } + ], + "mode": "latest-offset", + "topic": "flinkx01" + } + }, + "writer": { + "name": "hudiwriter", + "parameter": { + "batchInterval": 10, + "column": [ + { + "name": "id", + "type": "int" + }, + { + "name": "name", + "type": "string" + }, + { + "name": "age", + "type": "int" + } + ], + "defaultFS": "hdfs://ns1", + "hadoopConfig": { + "dfs.nameservices": "ns1", + "dfs.ha.namenodes.ns1": "nn1,nn2", + "dfs.client.failover.proxy.provider.ns1": "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider", + "dfs.namenode.rpc-address.ns1.nn2": "100.100.100.1:8020", + "dfs.namenode.rpc-address.ns1.nn1": "100.100.100.2:8020", + "fs.file.impl": "org.apache.hadoop.fs.LocalFileSystem", + "fs.hdfs.impl": "org.apache.hadoop.hdfs.DistributedFileSystem", + "HADOOP_USER_NAME": "hive" + }, + "hiveJdbcUrl": "jdbc:hive2://localhost:9083", + "hiveMetastore": "thrift://localhost:9083", + "hiveUser": "hive", + "path": "hdfs://ns1/spark_1/lakehouse", + "tableName": "test.flinkx01" + } + } + } + ], + "setting": { + "speed": { + "bytes": 0, + "channel": 1 + } + } + } +} +``` diff --git a/docs/realTime/writer/kafkawriter.md b/docs/realTime/writer/kafkawriter.md index fa85abaf92..597dfc1607 100644 --- a/docs/realTime/writer/kafkawriter.md +++ b/docs/realTime/writer/kafkawriter.md @@ -1,16 +1,30 @@ # Kafka Writer + + +- [一、插件名称](#一插件名称) +- [二、参数说明](#二参数说明) +- [三、配置示例](#三配置示例) + - [1、kafka10](#1kafka10) + - [2、kafka11](#2kafka11) + - [3、kafka](#3kafka) + - [4、MySQL->kafka](#4mysql-kafka) + + + +
+ ## 一、插件名称 -kafka插件存在四个版本,根据kafka版本的不同,插件名称也略有不同。具体对应关系如下表所示: +kafka插件存在三个版本,根据kafka版本的不同,插件名称也略有不同。具体对应关系如下表所示: | kafka版本 | 插件名称 | | --- | --- | -| kafka 0.9 | kafka09writer | | kafka 0.10 | kafka10writer | | kafka 0.11 | kafka11writer | | kafka 1.0及以后 | kafkawriter | +注:从FlinkX1.11版本开始不再支持kafka 0.9 - +
## 二、参数说明 @@ -30,29 +44,12 @@ kafka插件存在四个版本,根据kafka版本的不同,插件名称也略
-- **encoding** - - 描述:编码 - - 注意:该参数只对kafka09reader插件有效 - - 必选:否 - - 字段类型:String - - 默认值:UTF-8 - -
- -- **brokerList** - - 描述:kafka broker地址列表 - - 注意:该参数只对kafka09writer插件有效 - - 必选:kafka09writer必选,其它kafka writer插件不用填 - - 字段类型:String - - 默认值:无 - -
- - **producerSettings** - 描述:kafka连接配置,支持所有`org.apache.kafka.clients.producer.ProducerConfig`中定义的配置 - - 必选:对于非kafka09 writer插件,该参数必填,且producerSettings中至少包含`bootstrap.servers`参数 + - 必选:是 - 字段类型:Map - 默认值:无 + - 注意:producerSettings中至少包含`bootstrap.servers`参数
@@ -64,75 +61,27 @@ kafka插件存在四个版本,根据kafka版本的不同,插件名称也略 - 必选:否 - 字段类型:String[] - 默认值:无 - -
-- **partitionAssignColumns** - - 描述:根据用户自定义的字段值来将相同key值的数据发向同一个 topic partition(目前只支持kafka 1.0以后的版本) - - 必选:否 - - 字段类型:List - - 默认值:无 -
+
-- **dataCompelOrder** - - 描述:是否强制要求kafka topic接受数据保证顺序一致性(目前只支持kafka 1.0以后的版本) - - 必选:否 - - 字段类型:Boolean - - 默认值:false +- **partitionAssignColumns** + - 描述:根据用户自定义的字段值来将相同key值的数据发向同一个 topic partition(目前只支持kafka 1.0以后的版本) + - 必选:否 + - 字段类型:List + - 默认值:无 +
+- **dataCompelOrder** + - 描述:是否强制要求kafka topic接受数据保证顺序一致性(目前只支持kafka 1.0以后的版本) + - 必选:否 + - 字段类型:Boolean + - 默认值:false ## 三、配置示例 -#### 1、kafka09 -```json -{ - "job": { - "content": [{ - "reader": { - "name": "streamreader", - "parameter": { - "column": [ - { - "name": "id", - "type": "id" - }, - { - "name": "user_id", - "type": "int" - }, - { - "name": "name", - "type": "string" - } - ], - "sliceRecordCount" : ["100"] - } - }, - "writer" : { - "parameter": { - "timezone": "UTC", - "topic": "kafka09", - "encoding": "UTF-8", - "brokerList": "0.0.0.1:9092", - "tableFields": ["id","user_id","name"] - }, - "name": "kafka09writer" - } - } ], - "setting": { - "restore" : { - "isStream" : true - }, - "speed" : { - "channel" : 1 - } - } - } -} -``` -#### 2、kafka10 +### 1、kafka10 ```json { "job": { @@ -180,7 +129,7 @@ kafka插件存在四个版本,根据kafka版本的不同,插件名称也略 } } ``` -#### 3、kafka11 +### 2、kafka11 ```json { "job": { @@ -229,7 +178,7 @@ kafka插件存在四个版本,根据kafka版本的不同,插件名称也略 } } ``` -#### 4、kafka +### 3、kafka ```json { "job": { @@ -261,9 +210,7 @@ kafka插件存在四个版本,根据kafka版本的不同,插件名称也略 "producerSettings": { "bootstrap.servers" : "0.0.0.1:9092" }, - "tableFields": ["id","user_id","name"], - "partitionAssignColumns": ["id"], - "dataCompelOrder": false + "tableFields": ["id","user_id","name"] }, "name": "kafkawriter" } @@ -279,7 +226,7 @@ kafka插件存在四个版本,根据kafka版本的不同,插件名称也略 } } ``` -#### 5、MySQL->kafka +### 4、MySQL->kafka ```json { "job" : { diff --git a/docs/realTime/writer/restapiwriter.md b/docs/realTime/writer/restapiwriter.md index 6a515b9a1e..9064126e98 100644 --- a/docs/realTime/writer/restapiwriter.md +++ b/docs/realTime/writer/restapiwriter.md @@ -10,42 +10,48 @@ - 描述:连接的url - 必选:是 - 默认值:无 - + - 字段类型:String +
- **method** - 描述:request的类型,`post`、`get` - 必选:是 - 默认值:无 - + - 字段类型:String +
- **header** - 描述:需要添加的报头信息 - 必选:否 - 默认值:无 - + - 字段类型:Map +
- **body** - 描述:发送的数据中包括params - 必选:否 - 默认值:无 - + - 字段类型:Map +
- **params** - 描述:发送的数据中包括params - 必选:否 - 默认值:无 - + - 字段类型:Map +
- **column** - 描述:如果column不为空,那么将数据和字段名一一对应。如果column为空,则返回每个数据的第一个字段。 - 必选:否 - 默认值:无 - + - 字段类型:List +
diff --git a/docs/restore.md b/docs/restore.md index fc2c607197..82df956b5f 100644 --- a/docs/restore.md +++ b/docs/restore.md @@ -111,7 +111,7 @@ checkpoint触发后,两个reader先生成Snapshot记录读取状态,通道0 > > Writer_1:id=无法确定 -任务状态会记录到配置的HDFS目录/flinkx/checkpoint/abc123下。因为每个Writer会接收两个Reader的数据,以及各个通道的数据读写速率可能不一样,所以导致writer接收到的数据顺序是不确定的,但是这不影响数据的准确性,因为读取数据时只需要Reader记录的状态就可以构造查询sql,我们只要确保这些数据真的写到HDF就行了。在Writer生成Snapshot之前,会做一系列操作保证接收到的数据全部写入HDFS: +任务状态会记录到配置的HDFS目录/flinkx/checkpoint/abc123下。因为每个Writer会接收两个Reader的数据,以及各个通道的数据读写速率可能不一样,所以导致writer接收到的数据顺序是不确定的,但是这不影响数据的准确性,因为读取数据时只需要Reader记录的状态就可以构造查询sql,我们只要确保这些数据真的写到HDFS就行了。在Writer生成Snapshot之前,会做一系列操作保证接收到的数据全部写入HDFS: - close写入HDFS文件的数据流,这时候会在/data_test/.data目录下生成两个两个文件: diff --git a/flinkx-alluxio/flinkx-alluxio-core/pom.xml b/flinkx-alluxio/flinkx-alluxio-core/pom.xml new file mode 100644 index 0000000000..e1e85b1685 --- /dev/null +++ b/flinkx-alluxio/flinkx-alluxio-core/pom.xml @@ -0,0 +1,112 @@ + + + + flinkx-alluxio + com.dtstack.flinkx + 1.6 + + 4.0.0 + + flinkx-alluxio-core + + + 3.1.2 + + + + + org.alluxio + alluxio-shaded-client + 2.6.2 + + + + org.apache.hive + hive-exec + ${hive.version} + + + calcite-core + org.apache.calcite + + + calcite-avatica + org.apache.calcite + + + derby + org.apache.derby + + + org.xerial.snappy + snappy-java + + + com.fasterxml.jackson.core + jackson-databind + + + com.fasterxml.jackson.core + jackson-annotations + + + com.fasterxml.jackson.core + jackson-core + + + + + + org.apache.hive + hive-serde + ${hive.version} + + + org.apache.hadoop + hadoop-common + + + org.apache.hadoop + hadoop-yarn-api + + + org.xerial.snappy + snappy-java + + + com.fasterxml.jackson.core + jackson-databind + + + com.fasterxml.jackson.core + jackson-annotations + + + com.fasterxml.jackson.core + jackson-core + + + + + + parquet-hadoop + org.apache.parquet + 1.8.3 + + + org.xerial.snappy + snappy-java + + + + + + org.xerial.snappy + snappy-java + 1.1.4 + + + + \ No newline at end of file diff --git a/flinkx-alluxio/flinkx-alluxio-core/src/main/java/com/dtstack/flinkx/alluxio/AlluxioConfigKeys.java b/flinkx-alluxio/flinkx-alluxio-core/src/main/java/com/dtstack/flinkx/alluxio/AlluxioConfigKeys.java new file mode 100644 index 0000000000..88a01ccb8f --- /dev/null +++ b/flinkx-alluxio/flinkx-alluxio-core/src/main/java/com/dtstack/flinkx/alluxio/AlluxioConfigKeys.java @@ -0,0 +1,43 @@ +package com.dtstack.flinkx.alluxio; + +/** + * @author wuzhongjian_yewu@cmss.chinamobile.com + * @date 2021-12-06 + */ +public class AlluxioConfigKeys { + + public static final String KEY_FIELD_DELIMITER = "fieldDelimiter"; + + public static final String KEY_PATH = "path"; + + public static final String KEY_FILTER = "filterRegex"; + + public static final String KEY_FILE_TYPE = "fileType"; + + public static final String KEY_WRITE_MODE = "writeMode"; + + public static final String KEY_FULL_COLUMN_NAME_LIST = "fullColumnName"; + + public static final String KEY_FULL_COLUMN_TYPE_LIST = "fullColumnType"; + + public static final String KEY_COLUMN_NAME = "name"; + + public static final String KEY_COLUMN_TYPE = "type"; + + public static final String KEY_COMPRESS = "compress"; + + public static final String KEY_FILE_NAME = "fileName"; + + public static final String KEY_ENCODING = "encoding"; + + public static final String KEY_ROW_GROUP_SIZE = "rowGroupSize"; + + public static final String KEY_MAX_FILE_SIZE = "maxFileSize"; + + public static final String KEY_FLUSH_INTERVAL = "flushInterval"; + + public static final String KEY_ENABLE_DICTIONARY = "enableDictionary"; + + public static final String KEY_WRITE_TYPE = "writeType"; + +} diff --git a/flinkx-alluxio/flinkx-alluxio-core/src/main/java/com/dtstack/flinkx/alluxio/AlluxioUtil.java b/flinkx-alluxio/flinkx-alluxio-core/src/main/java/com/dtstack/flinkx/alluxio/AlluxioUtil.java new file mode 100644 index 0000000000..a4545b33f9 --- /dev/null +++ b/flinkx-alluxio/flinkx-alluxio-core/src/main/java/com/dtstack/flinkx/alluxio/AlluxioUtil.java @@ -0,0 +1,180 @@ +package com.dtstack.flinkx.alluxio; + +import com.dtstack.flinkx.enums.ColumnType; +import org.apache.hadoop.hive.common.type.HiveDecimal; +import org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe; +import org.apache.hadoop.hive.serde2.io.DateWritable; +import org.apache.hadoop.hive.serde2.io.HiveDecimalWritable; +import org.apache.hadoop.hive.serde2.io.TimestampWritable; +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector; +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory; +import org.apache.hadoop.io.*; +import org.apache.parquet.io.api.Binary; + +/** + * @author wuzhongjian_yewu@cmss.chinamobile.com + * @date 2021-12-06 + */ +public class AlluxioUtil { + public static final String NULL_VALUE = "\\N"; + + private static final long NANO_SECONDS_PER_DAY = 86400_000_000_000L; + + private static final long JULIAN_EPOCH_OFFSET_DAYS = 2440588; + + private static final double SCALE_TWO = 2.0; + private static final double SCALE_TEN = 10.0; + private static final int BIT_SIZE = 8; + + public static Object getWritableValue(Object writable) { + Class clz = writable.getClass(); + Object ret = null; + + if (clz == IntWritable.class) { + ret = ((IntWritable) writable).get(); + } else if (clz == Text.class) { + ret = ((Text) writable).toString(); + } else if (clz == LongWritable.class) { + ret = ((LongWritable) writable).get(); + } else if (clz == ByteWritable.class) { + ret = ((ByteWritable) writable).get(); + } else if (clz == DateWritable.class) { + ret = ((DateWritable) writable).get(); + } else if (writable instanceof DoubleWritable) { + ret = ((DoubleWritable) writable).get(); + } else if (writable instanceof TimestampWritable) { + ret = ((TimestampWritable) writable).getTimestamp(); + } else if (writable instanceof DateWritable) { + ret = ((DateWritable) writable).get(); + } else if (writable instanceof FloatWritable) { + ret = ((FloatWritable) writable).get(); + } else if (writable instanceof BooleanWritable) { + ret = ((BooleanWritable) writable).get(); + } else { + ret = writable.toString(); + } + return ret; + } + + public static ObjectInspector columnTypeToObjectInspetor(ColumnType columnType) { + ObjectInspector objectInspector = null; + switch (columnType) { + case TINYINT: + objectInspector = ObjectInspectorFactory.getReflectionObjectInspector(Byte.class, ObjectInspectorFactory.ObjectInspectorOptions.JAVA); + break; + case SMALLINT: + objectInspector = ObjectInspectorFactory.getReflectionObjectInspector(Short.class, ObjectInspectorFactory.ObjectInspectorOptions.JAVA); + break; + case INT: + objectInspector = ObjectInspectorFactory.getReflectionObjectInspector(Integer.class, ObjectInspectorFactory.ObjectInspectorOptions.JAVA); + break; + case BIGINT: + objectInspector = ObjectInspectorFactory.getReflectionObjectInspector(Long.class, ObjectInspectorFactory.ObjectInspectorOptions.JAVA); + break; + case FLOAT: + objectInspector = ObjectInspectorFactory.getReflectionObjectInspector(Float.class, ObjectInspectorFactory.ObjectInspectorOptions.JAVA); + break; + case DOUBLE: + objectInspector = ObjectInspectorFactory.getReflectionObjectInspector(Double.class, ObjectInspectorFactory.ObjectInspectorOptions.JAVA); + break; + case DECIMAL: + objectInspector = ObjectInspectorFactory.getReflectionObjectInspector(HiveDecimalWritable.class, ObjectInspectorFactory.ObjectInspectorOptions.JAVA); + break; + case TIMESTAMP: + objectInspector = ObjectInspectorFactory.getReflectionObjectInspector(java.sql.Timestamp.class, ObjectInspectorFactory.ObjectInspectorOptions.JAVA); + break; + case DATE: + objectInspector = ObjectInspectorFactory.getReflectionObjectInspector(java.sql.Date.class, ObjectInspectorFactory.ObjectInspectorOptions.JAVA); + break; + case STRING: + case VARCHAR: + case CHAR: + objectInspector = ObjectInspectorFactory.getReflectionObjectInspector(String.class, ObjectInspectorFactory.ObjectInspectorOptions.JAVA); + break; + case BOOLEAN: + objectInspector = ObjectInspectorFactory.getReflectionObjectInspector(Boolean.class, ObjectInspectorFactory.ObjectInspectorOptions.JAVA); + break; + case BINARY: + objectInspector = ObjectInspectorFactory.getReflectionObjectInspector(BytesWritable.class, ObjectInspectorFactory.ObjectInspectorOptions.JAVA); + break; + default: + throw new IllegalArgumentException("You should not be here"); + } + return objectInspector; + } + + + public static Binary decimalToBinary(final HiveDecimal hiveDecimal, int prec, int scale) { + byte[] decimalBytes = hiveDecimal.setScale(scale).unscaledValue().toByteArray(); + + // Estimated number of bytes needed. + int precToBytes = ParquetHiveSerDe.PRECISION_TO_BYTE_COUNT[prec - 1]; + if (precToBytes == decimalBytes.length) { + // No padding needed. + return Binary.fromReusedByteArray(decimalBytes); + } + + byte[] tgt = new byte[precToBytes]; + if (hiveDecimal.signum() == -1) { + // For negative number, initializing bits to 1 + for (int i = 0; i < precToBytes; i++) { + tgt[i] |= 0xFF; + } + } + + // Padding leading zeroes/ones. + System.arraycopy(decimalBytes, 0, tgt, precToBytes - decimalBytes.length, decimalBytes.length); + return Binary.fromReusedByteArray(tgt); + } + + public static int computeMinBytesForPrecision(int precision) { + int numBytes = 1; + while (Math.pow(SCALE_TWO, BIT_SIZE * numBytes - 1.0) < Math.pow(SCALE_TEN, precision)) { + numBytes += 1; + } + return numBytes; + } + + public static byte[] longToByteArray(long data) { + long nano = data * 1000_000; + + int julianDays = (int) ((nano / NANO_SECONDS_PER_DAY) + JULIAN_EPOCH_OFFSET_DAYS); + byte[] julianDaysBytes = getBytes(julianDays); + flip(julianDaysBytes); + + long lastDayNanos = nano % NANO_SECONDS_PER_DAY; + byte[] lastDayNanosBytes = getBytes(lastDayNanos); + flip(lastDayNanosBytes); + + byte[] dst = new byte[12]; + + System.arraycopy(lastDayNanosBytes, 0, dst, 0, 8); + System.arraycopy(julianDaysBytes, 0, dst, 8, 4); + + return dst; + } + + private static byte[] getBytes(long i) { + byte[] bytes = new byte[8]; + bytes[0] = (byte) ((i >> 56) & 0xFF); + bytes[1] = (byte) ((i >> 48) & 0xFF); + bytes[2] = (byte) ((i >> 40) & 0xFF); + bytes[3] = (byte) ((i >> 32) & 0xFF); + bytes[4] = (byte) ((i >> 24) & 0xFF); + bytes[5] = (byte) ((i >> 16) & 0xFF); + bytes[6] = (byte) ((i >> 8) & 0xFF); + bytes[7] = (byte) (i & 0xFF); + return bytes; + } + + /** + * @param bytes + */ + private static void flip(byte[] bytes) { + for (int i = 0, j = bytes.length - 1; i < j; i++, j--) { + byte t = bytes[i]; + bytes[i] = bytes[j]; + bytes[j] = t; + } + } +} diff --git a/flinkx-alluxio/flinkx-alluxio-core/src/main/java/com/dtstack/flinkx/alluxio/ECompressType.java b/flinkx-alluxio/flinkx-alluxio-core/src/main/java/com/dtstack/flinkx/alluxio/ECompressType.java new file mode 100644 index 0000000000..d7ff48ad00 --- /dev/null +++ b/flinkx-alluxio/flinkx-alluxio-core/src/main/java/com/dtstack/flinkx/alluxio/ECompressType.java @@ -0,0 +1,80 @@ +package com.dtstack.flinkx.alluxio; + +import org.apache.commons.lang.StringUtils; + +/** + * @author wuzhongjian_yewu@cmss.chinamobile.com + * @date 2021-12-06 + */ +public enum ECompressType { + + /** + * text file + */ + TEXT_GZIP("GZIP", "text", ".gz", 0.331F), + TEXT_BZIP2("BZIP2", "text", ".bz2", 0.259F), + TEXT_LZO("LZO", "text", ".lzo", 1.0F), + TEXT_NONE("NONE", "text", "", 0.637F), + + /** + * orc file + */ + ORC_SNAPPY("SNAPPY", "orc", ".snappy", 0.233F), + ORC_GZIP("GZIP", "orc", ".gz", 1.0F), + ORC_BZIP("BZIP", "orc", ".bz", 1.0F), + ORC_LZ4("LZ4", "orc", ".lz4", 1.0F), + ORC_NONE("NONE", "orc", "", 0.233F), + + /** + * parquet file + */ + PARQUET_SNAPPY("SNAPPY", "parquet", ".snappy", 0.274F), + PARQUET_GZIP("GZIP", "parquet", ".gz", 1.0F), + PARQUET_LZO("LZO", "parquet", ".lzo", 1.0F), + PARQUET_NONE("NONE", "parquet", "", 1.0F); + + private String type; + + private String fileType; + + private String suffix; + + private float deviation; + + ECompressType(String type, String fileType, String suffix, float deviation) { + this.type = type; + this.fileType = fileType; + this.suffix = suffix; + this.deviation = deviation; + } + + public static ECompressType getByTypeAndFileType(String type, String fileType) { + if (StringUtils.isEmpty(type)) { + type = "NONE"; + } + + for (ECompressType value : ECompressType.values()) { + if (value.getType().equalsIgnoreCase(type) && value.getFileType().equalsIgnoreCase(fileType)) { + return value; + } + } + + throw new IllegalArgumentException("No enum constant " + type); + } + + public String getType() { + return type; + } + + public String getFileType() { + return fileType; + } + + public String getSuffix() { + return suffix; + } + + public float getDeviation() { + return deviation; + } +} diff --git a/flinkx-alluxio/flinkx-alluxio-core/src/main/java/com/dtstack/flinkx/alluxio/util/StringUtil.java b/flinkx-alluxio/flinkx-alluxio-core/src/main/java/com/dtstack/flinkx/alluxio/util/StringUtil.java new file mode 100644 index 0000000000..706ac49f6a --- /dev/null +++ b/flinkx-alluxio/flinkx-alluxio-core/src/main/java/com/dtstack/flinkx/alluxio/util/StringUtil.java @@ -0,0 +1,152 @@ +package com.dtstack.flinkx.alluxio.util; + +/** + * @author wuzhongjian_yewu@cmss.chinamobile.com + * @date 2021-12-06 + */ +public class StringUtil { + /** + *

Check if a String starts with a specified prefix.

+ * + *

nulls are handled without exceptions. Two null + * references are considered to be equal. The comparison is case sensitive.

+ * + *
+     * StringUtils.startsWith(null, null)      = true
+     * StringUtils.startsWith(null, "abc")     = false
+     * StringUtils.startsWith("abcdef", null)  = false
+     * StringUtils.startsWith("abcdef", "abc") = true
+     * StringUtils.startsWith("ABCDEF", "abc") = false
+     * 
+ * + * @param str the String to check, may be null + * @param prefix the prefix to find, may be null + * @return true if the String starts with the prefix, case sensitive, or + * both null + * @see String#startsWith(String) + * @since 2.4 + */ + public static boolean startsWith(String str, String prefix) { + return startsWith(str, prefix, false); + } + + /** + *

Case insensitive check if a String starts with a specified prefix.

+ * + *

nulls are handled without exceptions. Two null + * references are considered to be equal. The comparison is case insensitive.

+ * + *
+     * StringUtils.startsWithIgnoreCase(null, null)      = true
+     * StringUtils.startsWithIgnoreCase(null, "abc")     = false
+     * StringUtils.startsWithIgnoreCase("abcdef", null)  = false
+     * StringUtils.startsWithIgnoreCase("abcdef", "abc") = true
+     * StringUtils.startsWithIgnoreCase("ABCDEF", "abc") = true
+     * 
+ * + * @param str the String to check, may be null + * @param prefix the prefix to find, may be null + * @return true if the String starts with the prefix, case insensitive, or + * both null + * @see String#startsWith(String) + * @since 2.4 + */ + public static boolean startsWithIgnoreCase(String str, String prefix) { + return startsWith(str, prefix, true); + } + + /** + *

Check if a String starts with a specified prefix (optionally case insensitive).

+ * + * @param str the String to check, may be null + * @param prefix the prefix to find, may be null + * @param ignoreCase inidicates whether the compare should ignore case + * (case insensitive) or not. + * @return true if the String starts with the prefix or + * both null + * @see String#startsWith(String) + */ + private static boolean startsWith(String str, String prefix, boolean ignoreCase) { + if (str == null || prefix == null) { + return (str == null && prefix == null); + } + if (prefix.length() > str.length()) { + return false; + } + return str.regionMatches(ignoreCase, 0, prefix, 0, prefix.length()); + } + + /** + *

Check if a String ends with a specified suffix.

+ * + *

nulls are handled without exceptions. Two null + * references are considered to be equal. The comparison is case sensitive.

+ * + *
+     * StringUtils.endsWith(null, null)      = true
+     * StringUtils.endsWith(null, "def")     = false
+     * StringUtils.endsWith("abcdef", null)  = false
+     * StringUtils.endsWith("abcdef", "def") = true
+     * StringUtils.endsWith("ABCDEF", "def") = false
+     * StringUtils.endsWith("ABCDEF", "cde") = false
+     * 
+ * + * @param str the String to check, may be null + * @param suffix the suffix to find, may be null + * @return true if the String ends with the suffix, case sensitive, or + * both null + * @see String#endsWith(String) + * @since 2.4 + */ + public static boolean endsWith(String str, String suffix) { + return endsWith(str, suffix, false); + } + + /** + *

Case insensitive check if a String ends with a specified suffix.

+ * + *

nulls are handled without exceptions. Two null + * references are considered to be equal. The comparison is case insensitive.

+ * + *
+     * StringUtils.endsWithIgnoreCase(null, null)      = true
+     * StringUtils.endsWithIgnoreCase(null, "def")     = false
+     * StringUtils.endsWithIgnoreCase("abcdef", null)  = false
+     * StringUtils.endsWithIgnoreCase("abcdef", "def") = true
+     * StringUtils.endsWithIgnoreCase("ABCDEF", "def") = true
+     * StringUtils.endsWithIgnoreCase("ABCDEF", "cde") = false
+     * 
+ * + * @param str the String to check, may be null + * @param suffix the suffix to find, may be null + * @return true if the String ends with the suffix, case insensitive, or + * both null + * @see String#endsWith(String) + * @since 2.4 + */ + public static boolean endsWithIgnoreCase(String str, String suffix) { + return endsWith(str, suffix, true); + } + + /** + *

Check if a String ends with a specified suffix (optionally case insensitive).

+ * + * @param str the String to check, may be null + * @param suffix the suffix to find, may be null + * @param ignoreCase inidicates whether the compare should ignore case + * (case insensitive) or not. + * @return true if the String starts with the prefix or + * both null + * @see String#endsWith(String) + */ + private static boolean endsWith(String str, String suffix, boolean ignoreCase) { + if (str == null || suffix == null) { + return (str == null && suffix == null); + } + if (suffix.length() > str.length()) { + return false; + } + int strOffset = str.length() - suffix.length(); + return str.regionMatches(ignoreCase, strOffset, suffix, 0, suffix.length()); + } +} diff --git a/flinkx-metadata-oracle/flinkx-metadata-oracle-reader/pom.xml b/flinkx-alluxio/flinkx-alluxio-writer/pom.xml similarity index 71% rename from flinkx-metadata-oracle/flinkx-metadata-oracle-reader/pom.xml rename to flinkx-alluxio/flinkx-alluxio-writer/pom.xml index 1c3859e36a..8861f27cab 100644 --- a/flinkx-metadata-oracle/flinkx-metadata-oracle-reader/pom.xml +++ b/flinkx-alluxio/flinkx-alluxio-writer/pom.xml @@ -1,30 +1,49 @@ - - flinkx-metadata-oracle + flinkx-alluxio com.dtstack.flinkx 1.6 4.0.0 - flinkx-metadata-oracle-reader + flinkx-alluxio-writer + - com.dtstack.flinkx - flinkx-metadata-core - 1.6 + org.alluxio + alluxio-shaded-client + 2.6.2 + com.dtstack.flinkx - flinkx-metadata-reader + flinkx-alluxio-core 1.6 + + + httpcore + org.apache.httpcomponents + + + httpclient + org.apache.httpcomponents + + + + + + httpcore + org.apache.httpcomponents + 4.4.5 + - com.github.noraui - ojdbc8 - 12.2.0.1 + httpclient + org.apache.httpcomponents + 4.5.2 @@ -45,8 +64,11 @@ org.slf4j:slf4j-api - log4j:log4j ch.qos.logback:* + com.google.code.gson:* + com.data-artisans:* + org.scala-lang:* + io.netty:* @@ -60,10 +82,6 @@ - - io.netty - shade.metadataoraclereader.io.netty - com.google.common shade.core.com.google.common @@ -91,20 +109,20 @@ - + - + + - \ No newline at end of file diff --git a/flinkx-alluxio/flinkx-alluxio-writer/src/main/java/com/dtstack/flinkx/alluxio/writer/AlluxioOrcOutputFormat.java b/flinkx-alluxio/flinkx-alluxio-writer/src/main/java/com/dtstack/flinkx/alluxio/writer/AlluxioOrcOutputFormat.java new file mode 100644 index 0000000000..a18da2f56c --- /dev/null +++ b/flinkx-alluxio/flinkx-alluxio-writer/src/main/java/com/dtstack/flinkx/alluxio/writer/AlluxioOrcOutputFormat.java @@ -0,0 +1,275 @@ +package com.dtstack.flinkx.alluxio.writer; + +import com.dtstack.flinkx.alluxio.AlluxioUtil; +import com.dtstack.flinkx.alluxio.ECompressType; +import com.dtstack.flinkx.enums.ColumnType; +import com.dtstack.flinkx.exception.WriteRecordException; +import com.dtstack.flinkx.util.ColumnTypeUtil; +import com.dtstack.flinkx.util.DateUtil; +import com.dtstack.flinkx.util.StringUtil; +import org.apache.flink.types.Row; +import org.apache.hadoop.hive.common.type.HiveDecimal; +import org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat; +import org.apache.hadoop.hive.ql.io.orc.OrcSerde; +import org.apache.hadoop.hive.serde2.io.HiveDecimalWritable; +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector; +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory; +import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector; +import org.apache.hadoop.io.BytesWritable; +import org.apache.hadoop.io.NullWritable; +import org.apache.hadoop.io.compress.*; +import org.apache.hadoop.mapred.FileOutputFormat; +import org.apache.hadoop.mapred.JobConf; +import org.apache.hadoop.mapred.RecordWriter; +import org.apache.hadoop.mapred.Reporter; + +import java.io.IOException; +import java.math.BigDecimal; +import java.math.BigInteger; +import java.nio.charset.StandardCharsets; +import java.sql.Timestamp; +import java.text.SimpleDateFormat; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +/** + * @author wuzhongjian_yewu@cmss.chinamobile.com + * @date 2021-12-06 + */ +public class AlluxioOrcOutputFormat extends BaseAlluxioOutputFormat { + + private RecordWriter recordWriter; + private OrcSerde orcSerde; + private StructObjectInspector inspector; + private FileOutputFormat outputFormat; + private JobConf jobConf; + + private static final ColumnTypeUtil.DecimalInfo ORC_DEFAULT_DECIMAL_INFO = new ColumnTypeUtil.DecimalInfo(HiveDecimal.SYSTEM_DEFAULT_PRECISION, HiveDecimal.SYSTEM_DEFAULT_SCALE); + + @Override + protected void openSource() throws IOException { + super.openSource(); + orcSerde = new OrcSerde(); + outputFormat = new OrcOutputFormat(); + jobConf = new JobConf(conf); + FileOutputFormat.setOutputCompressorClass(jobConf, getCompressType()); + + List fullColTypeList = new ArrayList<>(); + decimalColInfo = new HashMap<>((fullColumnTypes.size() << 2) / 3); + for (int i = 0; i < fullColumnTypes.size(); i++) { + String columnType = fullColumnTypes.get(i); + if (ColumnTypeUtil.isDecimalType(columnType)) { + ColumnTypeUtil.DecimalInfo decimalInfo = ColumnTypeUtil.getDecimalInfo(columnType, ORC_DEFAULT_DECIMAL_INFO); + decimalColInfo.put(fullColumnNames.get(i), decimalInfo); + } + ColumnType type = ColumnType.getType(columnType); + fullColTypeList.add(AlluxioUtil.columnTypeToObjectInspetor(type)); + } + + this.inspector = ObjectInspectorFactory + .getStandardStructObjectInspector(fullColumnNames, fullColTypeList); + } + + private Class getCompressType() { + ECompressType compressType = ECompressType.getByTypeAndFileType(compress, "orc"); + if (ECompressType.ORC_SNAPPY.equals(compressType)) { + return SnappyCodec.class; + } else if (ECompressType.ORC_BZIP.equals(compressType)) { + return BZip2Codec.class; + } else if (ECompressType.ORC_GZIP.equals(compressType)) { + return GzipCodec.class; + } else if (ECompressType.ORC_LZ4.equals(compressType)) { + return Lz4Codec.class; + } else { + return DefaultCodec.class; + } + } + + @Override + protected String getExtension() { + ECompressType compressType = ECompressType.getByTypeAndFileType(compress, "orc"); + return compressType.getSuffix(); + } + + @Override + public float getDeviation() { + ECompressType compressType = ECompressType.getByTypeAndFileType(compress, "orc"); + return compressType.getDeviation(); + } + + @Override + protected void nextBlock() { + super.nextBlock(); + + if (recordWriter != null) { + return; + } + + try { + String currentBlockTmpPath = tmpPath + SP + currentBlockFileName; + recordWriter = outputFormat.getRecordWriter(null, jobConf, currentBlockTmpPath, Reporter.NULL); + blockIndex++; + + LOG.info("nextBlock:Current block writer record:" + rowsOfCurrentBlock); + LOG.info("Current block file name:" + currentBlockTmpPath); + } catch (Exception e) { + throw new RuntimeException(e); + } + } + + @Override + protected void writeSingleRecordToFile(Row row) throws WriteRecordException { + if (recordWriter == null) { + nextBlock(); + } + + List recordList = new ArrayList<>(); + int i = 0; + try { + for (; i < fullColumnNames.size(); ++i) { + getData(recordList, i, row); + } + } catch (Exception e) { + if (e instanceof WriteRecordException) { + throw (WriteRecordException) e; + } else { + throw new WriteRecordException(recordConvertDetailErrorMessage(i, row), e, i, row); + } + } + + try { + this.recordWriter.write(NullWritable.get(), this.orcSerde.serialize(recordList, this.inspector)); + rowsOfCurrentBlock++; + + if (restoreConfig.isRestore()) { + lastRow = row; + } + } catch (IOException e) { + throw new WriteRecordException(String.format("数据写入alluxio异常,row:{%s}", row), e); + } + } + + @Override + protected void flushDataInternal() throws IOException { + LOG.info("Close current orc record writer, write data size:[{}]", bytesWriteCounter.getLocalValue()); + + if (recordWriter != null) { + recordWriter.close(Reporter.NULL); + recordWriter = null; + } + } + + private void getData(List recordList, int index, Row row) throws WriteRecordException { + int j = colIndices[index]; + if (j == -1) { + recordList.add(null); + return; + } + + Object column = row.getField(j); + if (column == null) { + recordList.add(null); + return; + } + + ColumnType columnType = ColumnType.fromString(columnTypes.get(j)); + String rowData = column.toString(); + if (rowData == null || (rowData.length() == 0 && !ColumnType.isStringType(columnType))) { + recordList.add(null); + return; + } + + switch (columnType) { + case TINYINT: + recordList.add(Byte.valueOf(rowData)); + break; + case SMALLINT: + recordList.add(Short.valueOf(rowData)); + break; + case INT: + recordList.add(Integer.valueOf(rowData)); + break; + case BIGINT: + recordList.add(getBigint(column, rowData)); + break; + case FLOAT: + recordList.add(Float.valueOf(rowData)); + break; + case DOUBLE: + recordList.add(Double.valueOf(rowData)); + break; + case DECIMAL: + recordList.add(getDecimalWritable(index, rowData)); + break; + case STRING: + case VARCHAR: + case CHAR: + if (column instanceof Timestamp) { + SimpleDateFormat fm = DateUtil.getDateTimeFormatterForMillisencond(); + recordList.add(fm.format(column)); + } else if (column instanceof Map || column instanceof List) { + recordList.add(gson.toJson(column)); + } else { + recordList.add(rowData); + } + break; + case BOOLEAN: + recordList.add(StringUtil.parseBoolean(rowData)); + break; + case DATE: + recordList.add(DateUtil.columnToDate(column, null)); + break; + case TIMESTAMP: + recordList.add(DateUtil.columnToTimestamp(column, null)); + break; + case BINARY: + recordList.add(new BytesWritable(rowData.getBytes(StandardCharsets.UTF_8))); + break; + default: + throw new IllegalArgumentException(); + } + } + + private Object getBigint(Object column, String rowData) { + if (column instanceof Timestamp) { + column = ((Timestamp) column).getTime(); + return column; + } + + BigInteger data = new BigInteger(rowData); + if (data.compareTo(new BigInteger(String.valueOf(Long.MAX_VALUE))) > 0) { + return data; + } else { + return Long.valueOf(rowData); + } + } + + private HiveDecimalWritable getDecimalWritable(int index, String rowData) throws WriteRecordException { + ColumnTypeUtil.DecimalInfo decimalInfo = decimalColInfo.get(fullColumnNames.get(index)); + HiveDecimal hiveDecimal = HiveDecimal.create(new BigDecimal(rowData)); + hiveDecimal = HiveDecimal.enforcePrecisionScale(hiveDecimal, decimalInfo.getPrecision(), decimalInfo.getScale()); + if (hiveDecimal == null) { + String msg = String.format("第[%s]个数据数据[%s]precision和scale和元数据不匹配:decimal(%s, %s)", + index, decimalInfo.getPrecision(), decimalInfo.getScale(), rowData); + throw new WriteRecordException(msg, new IllegalArgumentException()); + } + return new HiveDecimalWritable(hiveDecimal); + } + + @Override + protected String recordConvertDetailErrorMessage(int pos, Row row) { + return "\nAlluxioOrcOutputFormat [" + jobName + "] writeRecord error: when converting field[" + fullColumnNames.get(pos) + "] in Row(" + row + ")"; + } + + @Override + protected void closeSource() throws IOException { + RecordWriter rw = this.recordWriter; + if (rw != null) { + LOG.info("close:Current block writer record:" + rowsOfCurrentBlock); + rw.close(Reporter.NULL); + this.recordWriter = null; + } + } +} diff --git a/flinkx-alluxio/flinkx-alluxio-writer/src/main/java/com/dtstack/flinkx/alluxio/writer/AlluxioOutputFormatBuilder.java b/flinkx-alluxio/flinkx-alluxio-writer/src/main/java/com/dtstack/flinkx/alluxio/writer/AlluxioOutputFormatBuilder.java new file mode 100644 index 0000000000..fb7ba478ba --- /dev/null +++ b/flinkx-alluxio/flinkx-alluxio-writer/src/main/java/com/dtstack/flinkx/alluxio/writer/AlluxioOutputFormatBuilder.java @@ -0,0 +1,79 @@ +package com.dtstack.flinkx.alluxio.writer; + +import com.dtstack.flinkx.constants.ConstantValue; +import com.dtstack.flinkx.outputformat.FileOutputFormatBuilder; + +import java.util.List; + +/** + * @author wuzhongjian_yewu@cmss.chinamobile.com + * @date 2021-12-06 + */ +public class AlluxioOutputFormatBuilder extends FileOutputFormatBuilder { + + private final BaseAlluxioOutputFormat format; + + public AlluxioOutputFormatBuilder(String type) { + switch (type.toUpperCase()) { + case "TEXT": + format = new AlluxioTextOutputFormat(); + break; + case "ORC": + format = new AlluxioOrcOutputFormat(); + break; + case "PARQUET": + format = new AlluxioParquetOutputFormat(); + break; + default: + throw new IllegalArgumentException("Unsupported Alluxio file type: " + type); + } + + super.setFormat(format); + } + + public void setColumnNames(List columnNames) { + format.columnNames = columnNames; + } + + public void setColumnTypes(List columnTypes) { + format.columnTypes = columnTypes; + } + + public void setFullColumnNames(List fullColumnNames) { + format.fullColumnNames = fullColumnNames; + } + + public void setDelimiter(String delimiter) { + format.delimiter = delimiter; + } + + public void setRowGroupSize(int rowGroupSize) { + format.rowGroupSize = rowGroupSize; + } + + public void setFullColumnTypes(List fullColumnTypes) { + format.fullColumnTypes = fullColumnTypes; + } + + public void setEnableDictionary(boolean enableDictionary) { + format.enableDictionary = enableDictionary; + } + + public void setWriteType(String writeType) { + format.writeType = writeType; + } + + @Override + protected void checkFormat() { + super.checkFormat(); + + if (super.format.getPath() == null || super.format.getPath().length() == 0) { + throw new IllegalArgumentException("No valid path supplied."); + } + + if (!super.format.getPath().startsWith(ConstantValue.PROTOCOL_ALLUXIO)) { + throw new IllegalArgumentException("Path should start with alluxio://"); + } + } + +} diff --git a/flinkx-alluxio/flinkx-alluxio-writer/src/main/java/com/dtstack/flinkx/alluxio/writer/AlluxioParquetOutputFormat.java b/flinkx-alluxio/flinkx-alluxio-writer/src/main/java/com/dtstack/flinkx/alluxio/writer/AlluxioParquetOutputFormat.java new file mode 100644 index 0000000000..276e16f419 --- /dev/null +++ b/flinkx-alluxio/flinkx-alluxio-writer/src/main/java/com/dtstack/flinkx/alluxio/writer/AlluxioParquetOutputFormat.java @@ -0,0 +1,318 @@ +package com.dtstack.flinkx.alluxio.writer; + +import com.dtstack.flinkx.alluxio.AlluxioUtil; +import com.dtstack.flinkx.alluxio.ECompressType; +import com.dtstack.flinkx.enums.ColumnType; +import com.dtstack.flinkx.exception.WriteRecordException; +import com.dtstack.flinkx.util.ColumnTypeUtil; +import com.dtstack.flinkx.util.DateUtil; +import com.dtstack.flinkx.util.StringUtil; +import org.apache.commons.lang.StringUtils; +import org.apache.flink.types.Row; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hive.common.type.HiveDecimal; +import org.apache.hadoop.hive.serde2.io.DateWritable; +import org.apache.parquet.column.ParquetProperties; +import org.apache.parquet.example.data.Group; +import org.apache.parquet.example.data.simple.SimpleGroupFactory; +import org.apache.parquet.hadoop.ParquetFileWriter; +import org.apache.parquet.hadoop.ParquetWriter; +import org.apache.parquet.hadoop.example.ExampleParquetWriter; +import org.apache.parquet.hadoop.example.GroupWriteSupport; +import org.apache.parquet.hadoop.metadata.CompressionCodecName; +import org.apache.parquet.io.api.Binary; +import org.apache.parquet.schema.MessageType; +import org.apache.parquet.schema.OriginalType; +import org.apache.parquet.schema.PrimitiveType; +import org.apache.parquet.schema.Types; + +import java.io.IOException; +import java.math.BigDecimal; +import java.sql.Timestamp; +import java.util.Date; +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +/** + * @author wuzhongjian_yewu@cmss.chinamobile.com + * @date 2021-12-06 + */ +public class AlluxioParquetOutputFormat extends BaseAlluxioOutputFormat { + + private SimpleGroupFactory groupFactory; + + private ParquetWriter writer; + + private MessageType schema; + + private static final ColumnTypeUtil.DecimalInfo PARQUET_DEFAULT_DECIMAL_INFO = new ColumnTypeUtil.DecimalInfo(10, 0); + + + @Override + protected void openSource() throws IOException { + super.openSource(); + + schema = buildSchema(); + GroupWriteSupport.setSchema(schema, conf); + groupFactory = new SimpleGroupFactory(schema); + } + + @Override + protected void nextBlock() { + super.nextBlock(); + + if (writer != null) { + return; + } + + try { + String currentBlockTmpPath = tmpPath + SP + currentBlockFileName; + Path writePath = new Path(currentBlockTmpPath); + ExampleParquetWriter.Builder builder = ExampleParquetWriter.builder(writePath) + .withWriteMode(ParquetFileWriter.Mode.CREATE) + .withWriterVersion(ParquetProperties.WriterVersion.PARQUET_1_0) + .withCompressionCodec(getCompressType()) + .withConf(conf) + .withType(schema) + .withDictionaryEncoding(enableDictionary) + .withRowGroupSize(rowGroupSize); + + writer = builder.build(); + blockIndex++; + } catch (Exception e) { + LOG.error(e.getMessage(), e); + throw new RuntimeException(e); + } + } + + private CompressionCodecName getCompressType() { + // Compatible with old code + if (StringUtils.isEmpty(compress)) { + compress = ECompressType.PARQUET_SNAPPY.getType(); + } + ECompressType compressType = ECompressType.getByTypeAndFileType(compress, "parquet"); + if (ECompressType.PARQUET_SNAPPY.equals(compressType)) { + return CompressionCodecName.SNAPPY; + } else if (ECompressType.PARQUET_GZIP.equals(compressType)) { + return CompressionCodecName.GZIP; + } else if (ECompressType.PARQUET_LZO.equals(compressType)) { + return CompressionCodecName.LZO; + } else { + return CompressionCodecName.UNCOMPRESSED; + } + } + + @Override + protected String getExtension() { + ECompressType compressType = ECompressType.getByTypeAndFileType(compress, "parquet"); + return compressType.getSuffix(); + } + + @Override + protected void flushDataInternal() throws IOException { + LOG.info("Close current parquet record writer, write data size:[{}]", bytesWriteCounter.getLocalValue()); + + if (writer != null) { + writer.close(); + writer = null; + } + } + + @Override + public float getDeviation() { + ECompressType compressType = ECompressType.getByTypeAndFileType(compress, "parquet"); + return compressType.getDeviation(); + } + + @Override + protected void writeSingleRecordToFile(Row row) throws WriteRecordException { + + if (writer == null) { + nextBlock(); + } + + Group group = groupFactory.newGroup(); + int i = 0; + try { + for (; i < fullColumnNames.size(); i++) { + int colIndex = colIndices[i]; + if (colIndex > -1) { + Object valObj = row.getField(colIndex); + if (valObj == null || (valObj.toString().length() == 0 && !ColumnType.isStringType(fullColumnTypes.get(i)))) { + continue; + } + + addDataToGroup(group, valObj, i); + } + } + } catch (Exception e) { + if (e instanceof WriteRecordException) { + throw (WriteRecordException) e; + } else { + throw new WriteRecordException(recordConvertDetailErrorMessage(i, row), e, i, row); + } + } + + try { + writer.write(group); + rowsOfCurrentBlock++; + + if (restoreConfig.isRestore()) { + lastRow = row; + } + } catch (IOException e) { + throw new WriteRecordException(String.format("数据写入alluxio异常,row:{%s}", row), e); + } + } + + private void addDataToGroup(Group group, Object valObj, int i) throws Exception { + String colName = fullColumnNames.get(i); + String colType = fullColumnTypes.get(i); + colType = ColumnType.fromString(colType).name().toLowerCase(); + + String val = valObj.toString(); + + switch (colType) { + case "tinyint": + case "smallint": + case "int": + if (valObj instanceof Timestamp) { + ((Timestamp) valObj).getTime(); + group.add(colName, (int) ((Timestamp) valObj).getTime()); + } else if (valObj instanceof Date) { + group.add(colName, (int) ((Date) valObj).getTime()); + } else { + group.add(colName, Integer.parseInt(val)); + } + break; + case "bigint": + if (valObj instanceof Timestamp) { + group.add(colName, ((Timestamp) valObj).getTime()); + } else if (valObj instanceof Date) { + group.add(colName, ((Date) valObj).getTime()); + } else { + group.add(colName, Long.parseLong(val)); + } + break; + case "float": + group.add(colName, Float.parseFloat(val)); + break; + case "double": + group.add(colName, Double.parseDouble(val)); + break; + case "binary": + group.add(colName, Binary.fromString(val)); + break; + case "char": + case "varchar": + case "string": + if (valObj instanceof Timestamp) { + val = DateUtil.getDateTimeFormatterForMillisencond().format(valObj); + group.add(colName, val); + } else if (valObj instanceof Map || valObj instanceof List) { + group.add(colName, gson.toJson(valObj)); + } else { + group.add(colName, val); + } + break; + case "boolean": + group.add(colName, StringUtil.parseBoolean(val)); + break; + case "timestamp": + Timestamp ts = DateUtil.columnToTimestamp(valObj, null); + byte[] dst = AlluxioUtil.longToByteArray(ts.getTime()); + group.add(colName, Binary.fromConstantByteArray(dst)); + break; + case "decimal": + ColumnTypeUtil.DecimalInfo decimalInfo = decimalColInfo.get(colName); + + HiveDecimal hiveDecimal = HiveDecimal.create(new BigDecimal(val)); + hiveDecimal = HiveDecimal.enforcePrecisionScale(hiveDecimal, decimalInfo.getPrecision(), decimalInfo.getScale()); + if (hiveDecimal == null) { + String msg = String.format("第[%s]个数据数据[%s]precision和scale和元数据不匹配:decimal(%s, %s)", i, decimalInfo.getPrecision(), decimalInfo.getScale(), valObj); + throw new WriteRecordException(msg, new IllegalArgumentException()); + } + + group.add(colName, AlluxioUtil.decimalToBinary(hiveDecimal, decimalInfo.getPrecision(), decimalInfo.getScale())); + break; + case "date": + Date date = DateUtil.columnToDate(valObj, null); + group.add(colName, DateWritable.dateToDays(new java.sql.Date(date.getTime()))); + break; + default: + group.add(colName, val); + break; + } + } + + @Override + protected String recordConvertDetailErrorMessage(int pos, Row row) { + return "\nAlluxioParquetOutputFormat [" + jobName + "] writeRecord error: when converting field[" + fullColumnNames.get(pos) + "] in Row(" + row + ")"; + } + + @Override + protected void closeSource() throws IOException { + if (writer != null) { + writer.close(); + } + } + + private MessageType buildSchema() { + decimalColInfo = new HashMap<>(16); + Types.MessageTypeBuilder typeBuilder = Types.buildMessage(); + for (int i = 0; i < fullColumnNames.size(); i++) { + String name = fullColumnNames.get(i); + String colType = fullColumnTypes.get(i).toLowerCase(); + switch (colType) { + case "tinyint": + case "smallint": + case "int": + typeBuilder.optional(PrimitiveType.PrimitiveTypeName.INT32).named(name); + break; + case "bigint": + typeBuilder.optional(PrimitiveType.PrimitiveTypeName.INT64).named(name); + break; + case "float": + typeBuilder.optional(PrimitiveType.PrimitiveTypeName.FLOAT).named(name); + break; + case "double": + typeBuilder.optional(PrimitiveType.PrimitiveTypeName.DOUBLE).named(name); + break; + case "binary": + typeBuilder.optional(PrimitiveType.PrimitiveTypeName.BINARY).named(name); + break; + case "char": + case "varchar": + case "string": + typeBuilder.optional(PrimitiveType.PrimitiveTypeName.BINARY).as(OriginalType.UTF8).named(name); + break; + case "boolean": + typeBuilder.optional(PrimitiveType.PrimitiveTypeName.BOOLEAN).named(name); + break; + case "timestamp": + typeBuilder.optional(PrimitiveType.PrimitiveTypeName.INT96).named(name); + break; + case "date": + typeBuilder.optional(PrimitiveType.PrimitiveTypeName.INT32).as(OriginalType.DATE).named(name); + break; + default: + if (ColumnTypeUtil.isDecimalType(colType)) { + ColumnTypeUtil.DecimalInfo decimalInfo = ColumnTypeUtil.getDecimalInfo(colType, PARQUET_DEFAULT_DECIMAL_INFO); + typeBuilder.optional(PrimitiveType.PrimitiveTypeName.FIXED_LEN_BYTE_ARRAY) + .as(OriginalType.DECIMAL) + .precision(decimalInfo.getPrecision()) + .scale(decimalInfo.getScale()) + .length(AlluxioUtil.computeMinBytesForPrecision(decimalInfo.getPrecision())) + .named(name); + + decimalColInfo.put(name, decimalInfo); + } else { + typeBuilder.optional(PrimitiveType.PrimitiveTypeName.BINARY).named(name); + } + break; + } + } + return typeBuilder.named("Pair"); + } +} diff --git a/flinkx-alluxio/flinkx-alluxio-writer/src/main/java/com/dtstack/flinkx/alluxio/writer/AlluxioTextOutputFormat.java b/flinkx-alluxio/flinkx-alluxio-writer/src/main/java/com/dtstack/flinkx/alluxio/writer/AlluxioTextOutputFormat.java new file mode 100644 index 0000000000..2ab5e7d97a --- /dev/null +++ b/flinkx-alluxio/flinkx-alluxio-writer/src/main/java/com/dtstack/flinkx/alluxio/writer/AlluxioTextOutputFormat.java @@ -0,0 +1,231 @@ +package com.dtstack.flinkx.alluxio.writer; + +import com.dtstack.flinkx.alluxio.AlluxioUtil; +import com.dtstack.flinkx.alluxio.ECompressType; +import com.dtstack.flinkx.enums.ColumnType; +import com.dtstack.flinkx.exception.WriteRecordException; +import com.dtstack.flinkx.util.DateUtil; +import com.dtstack.flinkx.util.StringUtil; +import org.apache.commons.compress.compressors.bzip2.BZip2CompressorOutputStream; +import org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream; +import org.apache.flink.types.Row; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hive.common.type.HiveDecimal; +import org.apache.hadoop.io.compress.CompressionCodecFactory; + +import java.io.IOException; +import java.io.OutputStream; +import java.math.BigDecimal; +import java.math.BigInteger; +import java.sql.Timestamp; +import java.text.SimpleDateFormat; +import java.util.Date; +import java.util.List; +import java.util.Map; + +/** + * @author wuzhongjian_yewu@cmss.chinamobile.com + * @date 2021-12-06 + */ +public class AlluxioTextOutputFormat extends BaseAlluxioOutputFormat { + + private static final int NEWLINE = 10; + private transient OutputStream stream; + + private static final int BUFFER_SIZE = 1000; + + @Override + protected void flushDataInternal() throws IOException { + LOG.info("Close current text stream, write data size:[{}]", bytesWriteCounter.getLocalValue()); + + if (stream != null) { + stream.flush(); + stream.close(); + stream = null; + } + } + + @Override + public float getDeviation() { + ECompressType compressType = ECompressType.getByTypeAndFileType(compress, "text"); + return compressType.getDeviation(); + } + + @Override + protected String getExtension() { + ECompressType compressType = ECompressType.getByTypeAndFileType(compress, "text"); + return compressType.getSuffix(); + } + + @Override + protected void nextBlock() { + super.nextBlock(); + + if (stream != null) { + return; + } + + try { + String currentBlockTmpPath = tmpPath + SP + currentBlockFileName; + Path p = new Path(currentBlockTmpPath); + + ECompressType compressType = ECompressType.getByTypeAndFileType(compress, "text"); + if (ECompressType.TEXT_NONE.equals(compressType)) { + stream = fs.create(p); + } else { + p = new Path(currentBlockTmpPath); + if (compressType == ECompressType.TEXT_GZIP) { + stream = new GzipCompressorOutputStream(fs.create(p)); + } else if (compressType == ECompressType.TEXT_BZIP2) { + stream = new BZip2CompressorOutputStream(fs.create(p)); + } else if (compressType == ECompressType.TEXT_LZO) { + CompressionCodecFactory factory = new CompressionCodecFactory(new Configuration()); + stream = factory.getCodecByClassName("com.hadoop.compression.lzo.LzopCodec").createOutputStream(fs.create(p)); + } + } + + LOG.info("subtask:[{}] create block file:{}", taskNumber, currentBlockTmpPath); + + blockIndex++; + } catch (Exception e) { + throw new RuntimeException(e); + } + } + + @Override + protected void writeSingleRecordToFile(Row row) throws WriteRecordException { + if (stream == null) { + nextBlock(); + } + + StringBuilder sb = new StringBuilder(); + int i = 0; + try { + int cnt = fullColumnNames.size(); + for (; i < cnt; ++i) { + int j = colIndices[i]; + if (j == -1) { + continue; + } + + if (i != 0) { + sb.append(delimiter); + } + + appendDataToString(sb, row.getField(j), ColumnType.fromString(columnTypes.get(j))); + } + } catch (Exception e) { + if (i < row.getArity()) { + throw new WriteRecordException(recordConvertDetailErrorMessage(i, row), e, i, row); + } + throw new WriteRecordException(e.getMessage(), e); + } + + try { + byte[] bytes = sb.toString().getBytes(this.charsetName); + this.stream.write(bytes); + this.stream.write(NEWLINE); + rowsOfCurrentBlock++; + + if (restoreConfig.isRestore()) { + lastRow = row; + } + + if (rowsOfCurrentBlock % BUFFER_SIZE == 0) { + this.stream.flush(); + } + } catch (IOException e) { + LOG.error(e.getMessage(), e); + throw new WriteRecordException(String.format("数据写入Alluxio异常,row:{%s}", row), e); + } + } + + private void appendDataToString(StringBuilder sb, Object column, ColumnType columnType) { + if (column == null) { + sb.append(AlluxioUtil.NULL_VALUE); + return; + } + + String rowData = column.toString(); + if (rowData.length() == 0) { + sb.append(""); + } else { + switch (columnType) { + case TINYINT: + sb.append(Byte.valueOf(rowData)); + break; + case SMALLINT: + sb.append(Short.valueOf(rowData)); + break; + case INT: + sb.append(Integer.valueOf(rowData)); + break; + case BIGINT: + if (column instanceof Timestamp) { + column = ((Timestamp) column).getTime(); + sb.append(column); + break; + } + + BigInteger data = new BigInteger(rowData); + if (data.compareTo(new BigInteger(String.valueOf(Long.MAX_VALUE))) > 0) { + sb.append(data); + } else { + sb.append(Long.valueOf(rowData)); + } + break; + case FLOAT: + sb.append(Float.valueOf(rowData)); + break; + case DOUBLE: + sb.append(Double.valueOf(rowData)); + break; + case DECIMAL: + sb.append(HiveDecimal.create(new BigDecimal(rowData))); + break; + case STRING: + case VARCHAR: + case CHAR: + if (column instanceof Timestamp) { + SimpleDateFormat fm = DateUtil.getDateTimeFormatterForMillisencond(); + sb.append(fm.format(column)); + } else if (column instanceof Map || column instanceof List) { + sb.append(gson.toJson(column)); + } else { + sb.append(rowData); + } + break; + case BOOLEAN: + sb.append(StringUtil.parseBoolean(rowData)); + break; + case DATE: + column = DateUtil.columnToDate(column, null); + sb.append(DateUtil.dateToString((Date) column)); + break; + case TIMESTAMP: + column = DateUtil.columnToTimestamp(column, null); + sb.append(DateUtil.timestampToString((Date) column)); + break; + default: + throw new IllegalArgumentException("Unsupported column type: " + columnType); + } + } + } + + @Override + protected String recordConvertDetailErrorMessage(int pos, Row row) { + return "\nAlluxioTextOutputFormat [" + jobName + "] writeRecord error: when converting field[" + columnNames.get(pos) + "] in Row(" + row + ")"; + } + + @Override + public void closeSource() throws IOException { + OutputStream s = this.stream; + if (s != null) { + s.flush(); + this.stream = null; + s.close(); + } + } + +} diff --git a/flinkx-alluxio/flinkx-alluxio-writer/src/main/java/com/dtstack/flinkx/alluxio/writer/AlluxioWriter.java b/flinkx-alluxio/flinkx-alluxio-writer/src/main/java/com/dtstack/flinkx/alluxio/writer/AlluxioWriter.java new file mode 100644 index 0000000000..1dc04d3ec4 --- /dev/null +++ b/flinkx-alluxio/flinkx-alluxio-writer/src/main/java/com/dtstack/flinkx/alluxio/writer/AlluxioWriter.java @@ -0,0 +1,161 @@ +package com.dtstack.flinkx.alluxio.writer; + +import com.dtstack.flinkx.alluxio.util.StringUtil; +import com.dtstack.flinkx.config.DataTransferConfig; +import com.dtstack.flinkx.config.WriterConfig; +import com.dtstack.flinkx.constants.ConstantValue; +import com.dtstack.flinkx.enums.WriteType; +import com.dtstack.flinkx.writer.BaseDataWriter; +import org.apache.flink.streaming.api.datastream.DataStream; +import org.apache.flink.streaming.api.datastream.DataStreamSink; +import org.apache.flink.types.Row; +import org.apache.parquet.hadoop.ParquetWriter; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.ArrayList; +import java.util.List; +import java.util.Map; + +import static com.dtstack.flinkx.alluxio.AlluxioConfigKeys.*; +import static com.dtstack.flinkx.alluxio.AlluxioConfigKeys.KEY_COLUMN_NAME; +import static com.dtstack.flinkx.alluxio.AlluxioConfigKeys.KEY_COLUMN_TYPE; +import static com.dtstack.flinkx.alluxio.AlluxioConfigKeys.KEY_COMPRESS; +import static com.dtstack.flinkx.alluxio.AlluxioConfigKeys.KEY_ENABLE_DICTIONARY; +import static com.dtstack.flinkx.alluxio.AlluxioConfigKeys.KEY_ENCODING; +import static com.dtstack.flinkx.alluxio.AlluxioConfigKeys.KEY_FIELD_DELIMITER; +import static com.dtstack.flinkx.alluxio.AlluxioConfigKeys.KEY_FILE_NAME; +import static com.dtstack.flinkx.alluxio.AlluxioConfigKeys.KEY_FILE_TYPE; +import static com.dtstack.flinkx.alluxio.AlluxioConfigKeys.KEY_FLUSH_INTERVAL; +import static com.dtstack.flinkx.alluxio.AlluxioConfigKeys.KEY_FULL_COLUMN_NAME_LIST; +import static com.dtstack.flinkx.alluxio.AlluxioConfigKeys.KEY_FULL_COLUMN_TYPE_LIST; +import static com.dtstack.flinkx.alluxio.AlluxioConfigKeys.KEY_MAX_FILE_SIZE; +import static com.dtstack.flinkx.alluxio.AlluxioConfigKeys.KEY_PATH; +import static com.dtstack.flinkx.alluxio.AlluxioConfigKeys.KEY_ROW_GROUP_SIZE; +import static com.dtstack.flinkx.alluxio.AlluxioConfigKeys.KEY_WRITE_MODE; +import static com.dtstack.flinkx.alluxio.AlluxioConfigKeys.KEY_WRITE_TYPE; + +/** + * @author wuzhongjian_yewu@cmss.chinamobile.com + * @date 2021-12-06 + */ +public class AlluxioWriter extends BaseDataWriter { + + protected final Logger LOG = LoggerFactory.getLogger(getClass()); + + protected String fileType; + + protected String path; + + protected String fieldDelimiter; + + protected String compress; + + protected String fileName; + + protected List columnName; + + protected List columnType; + + protected String charSet; + + protected List fullColumnName; + + protected List fullColumnType; + + protected int rowGroupSize; + + protected long maxFileSize; + + protected long flushInterval; + + protected boolean enableDictionary; + + protected String writerType; + + public AlluxioWriter(DataTransferConfig config) { + super(config); + WriterConfig writerConfig = config.getJob().getContent().get(0).getWriter(); + List columns = writerConfig.getParameter().getColumn(); + fileType = writerConfig.getParameter().getStringVal(KEY_FILE_TYPE); + path = writerConfig.getParameter().getStringVal(KEY_PATH); + fieldDelimiter = writerConfig.getParameter().getStringVal(KEY_FIELD_DELIMITER); + charSet = writerConfig.getParameter().getStringVal(KEY_ENCODING); + rowGroupSize = writerConfig.getParameter().getIntVal(KEY_ROW_GROUP_SIZE, ParquetWriter.DEFAULT_BLOCK_SIZE); + maxFileSize = writerConfig.getParameter().getLongVal(KEY_MAX_FILE_SIZE, ConstantValue.STORE_SIZE_G); + flushInterval = writerConfig.getParameter().getLongVal(KEY_FLUSH_INTERVAL, 0); + enableDictionary = writerConfig.getParameter().getBooleanVal(KEY_ENABLE_DICTIONARY, true); + writerType = writerConfig.getParameter().getStringVal(KEY_WRITE_TYPE, WriteType.THROUGH.name()); + + if (fieldDelimiter == null || fieldDelimiter.length() == 0) { + fieldDelimiter = "\001"; + } else { + fieldDelimiter = com.dtstack.flinkx.util.StringUtil.convertRegularExpr(fieldDelimiter); + } + + compress = writerConfig.getParameter().getStringVal(KEY_COMPRESS); + fileName = writerConfig.getParameter().getStringVal(KEY_FILE_NAME, ""); + if (columns != null && columns.size() > 0) { + columnName = new ArrayList<>(); + columnType = new ArrayList<>(); + for (Object column : columns) { + Map sm = (Map) column; + columnName.add((String) sm.get(KEY_COLUMN_NAME)); + columnType.add((String) sm.get(KEY_COLUMN_TYPE)); + } + } + + fullColumnName = (List) writerConfig.getParameter().getVal(KEY_FULL_COLUMN_NAME_LIST); + fullColumnType = (List) writerConfig.getParameter().getVal(KEY_FULL_COLUMN_TYPE_LIST); + + mode = writerConfig.getParameter().getStringVal(KEY_WRITE_MODE); + } + + @Override + public DataStreamSink writeData(DataStream dataSet) { + AlluxioOutputFormatBuilder builder = new AlluxioOutputFormatBuilder(fileType); + builder.setPath(formatPath(path)); + builder.setFileName(fileName); + builder.setWriteMode(mode); + builder.setColumnNames(columnName); + builder.setColumnTypes(columnType); + builder.setCompress(compress); + builder.setMonitorUrls(monitorUrls); + builder.setErrors(errors); + builder.setErrorRatio(errorRatio); + builder.setFullColumnNames(fullColumnName); + builder.setFullColumnTypes(fullColumnType); + builder.setDirtyPath(dirtyPath); + builder.setDirtyHadoopConfig(dirtyHadoopConfig); + builder.setSrcCols(srcCols); + builder.setCharSetName(charSet); + builder.setDelimiter(fieldDelimiter); + builder.setRowGroupSize(rowGroupSize); + builder.setRestoreConfig(restoreConfig); + builder.setMaxFileSize(maxFileSize); + builder.setFlushBlockInterval(flushInterval); + builder.setEnableDictionary(enableDictionary); + builder.setWriteType(writerType); + return createOutput(dataSet, builder.finish()); + } + + private String formatPath(String path) { + String pathAfterFormat = path; + if (!StringUtil.startsWith(path, "alluxio://")) { + if (StringUtil.startsWith(path, "//")) { + pathAfterFormat = "alluxio:" + path; + } else if (StringUtil.startsWith(path, "/")) { + pathAfterFormat = "alluxio:/" + path; + } else { + pathAfterFormat = "alluxio://" + path; + } + } + + if (!StringUtil.endsWith(pathAfterFormat, "/")) { + pathAfterFormat = pathAfterFormat + "/"; + } + + LOG.debug("Path = " + pathAfterFormat); + return pathAfterFormat; + } +} \ No newline at end of file diff --git a/flinkx-alluxio/flinkx-alluxio-writer/src/main/java/com/dtstack/flinkx/alluxio/writer/BaseAlluxioOutputFormat.java b/flinkx-alluxio/flinkx-alluxio-writer/src/main/java/com/dtstack/flinkx/alluxio/writer/BaseAlluxioOutputFormat.java new file mode 100644 index 0000000000..4b71429bf6 --- /dev/null +++ b/flinkx-alluxio/flinkx-alluxio-writer/src/main/java/com/dtstack/flinkx/alluxio/writer/BaseAlluxioOutputFormat.java @@ -0,0 +1,331 @@ +package com.dtstack.flinkx.alluxio.writer; + +import com.dtstack.flinkx.constants.ConstantValue; +import com.dtstack.flinkx.outputformat.BaseFileOutputFormat; +import com.dtstack.flinkx.util.ColumnTypeUtil; +import com.dtstack.flinkx.util.SysUtil; +import com.google.gson.Gson; +import org.apache.commons.lang3.StringUtils; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FileStatus; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.fs.PathFilter; + +import java.io.IOException; +import java.util.List; +import java.util.Map; + +/** + * @author wuzhongjian_yewu@cmss.chinamobile.com + * @date 2021-12-02 + */ +public abstract class BaseAlluxioOutputFormat extends BaseFileOutputFormat { + + private static final int FILE_NAME_PART_SIZE = 3; + + protected int rowGroupSize; + + protected FileSystem fs; + + protected List columnTypes; + + protected List columnNames; + + protected List fullColumnNames; + + protected List fullColumnTypes; + + protected String delimiter; + + protected int[] colIndices; + + protected Configuration conf; + + protected boolean enableDictionary; + + protected String writeType; + + protected transient Map decimalColInfo; + + /** + * 如果key为string类型的值是map 或者 list 会使用gson转为json格式存入 + */ + protected transient Gson gson; + + @Override + protected void openInternal(int taskNumber, int numTasks) throws IOException { + gson = new Gson(); + + initColIndices(); + super.openInternal(taskNumber, numTasks); + } + + @Override + protected void checkOutputDir() { + try { + Path dir = new Path(outputFilePath); + + if (fs.exists(dir)) { + if (fs.getFileStatus(dir).isFile()) { + throw new RuntimeException("Can't write new files under common file: " + dir + "\n" + + "One can only write new files under directories"); + } + } else { + if (!makeDir) { + throw new RuntimeException("Output path not exists:" + outputFilePath); + } + } + } catch (IOException e) { + throw new RuntimeException("Check output path error", e); + } + } + + @Override + protected void createActionFinishedTag() { + try { + if (fs.createNewFile(new Path(actionFinishedTag))) { + LOG.info("Succeed to create action finished tag:{}", actionFinishedTag); + } else { + LOG.warn("Failed to create action finished tag:{}", actionFinishedTag); + } + } catch (Exception e) { + throw new RuntimeException("Create action finished tag error:", e); + } + } + + @Override + protected void waitForActionFinishedBeforeWrite() { + try { + Path path = new Path(actionFinishedTag); + boolean readyWrite = fs.exists(path); + int n = 0; + while (!readyWrite) { + if (n > SECOND_WAIT) { + throw new RuntimeException("Wait action finished before write timeout"); + } + + SysUtil.sleep(1000); + readyWrite = fs.exists(path); + n++; + } + } catch (Exception e) { + LOG.warn("Call method waitForActionFinishedBeforeWrite error", e); + } + } + + @Override + protected void cleanDirtyData() { + int fileIndex = formatState.getFileIndex(); + String lastJobId = formatState.getJobId(); + LOG.info("start to cleanDirtyData, fileIndex = {}, lastJobId = {}", fileIndex, lastJobId); + if (StringUtils.isBlank(lastJobId)) { + return; + } + + PathFilter filter = path -> { + String fileName = path.getName(); + if (!fileName.contains(lastJobId)) { + return false; + } + + String[] splits = fileName.split("\\."); + if (splits.length == FILE_NAME_PART_SIZE) { + return Integer.parseInt(splits[2]) > fileIndex; + } + + return false; + }; + + try { + FileStatus[] dirtyData = fs.listStatus(new Path(outputFilePath), filter); + if (dirtyData != null && dirtyData.length > 0) { + for (FileStatus dirtyDatum : dirtyData) { + fs.delete(dirtyDatum.getPath(), false); + LOG.info("Delete dirty data file:{}", dirtyDatum.getPath()); + } + } + } catch (Exception e) { + LOG.error("Clean dirty data error:", e); + throw new RuntimeException(e); + } + } + + @Override + protected void openSource() throws IOException { + try { + conf = new Configuration(); + conf.set("fs.alluxio.impl", "alluxio.hadoop.FileSystem"); + conf.set("fs.AbstractFileSystem.alluxio.impl", "alluxio.hadoop.AlluxioFileSystem"); + //默认情况下,数据被同步地写入到底层存储系统(hdfs or eos),但不会被写入到Alluxio的Worker。 + conf.set("alluxio.user.file.writetype.default", writeType); + fs = new Path(path).getFileSystem(conf); + } catch (Exception e) { + LOG.error("Failed to get AlluxioFileSystem with exception : " + e.getMessage()); + throw new RuntimeException("Failed to get AlluxioFileSystem with exception", e); + } + } + + private void initColIndices() { + if (fullColumnNames == null || fullColumnNames.size() == 0) { + fullColumnNames = columnNames; + } + + if (fullColumnTypes == null || fullColumnTypes.size() == 0) { + fullColumnTypes = columnTypes; + } + + colIndices = new int[fullColumnNames.size()]; + for (int i = 0; i < fullColumnNames.size(); ++i) { + int j = 0; + for (; j < columnNames.size(); ++j) { + if (fullColumnNames.get(i).equalsIgnoreCase(columnNames.get(j))) { + colIndices[i] = j; + break; + } + } + if (j == columnNames.size()) { + colIndices[i] = -1; + } + } + } + + @Override + protected void moveTemporaryDataBlockFileToDirectory() { + try { + if (currentBlockFileName != null && currentBlockFileName.startsWith(ConstantValue.POINT_SYMBOL)) { + Path src = new Path(tmpPath + SP + currentBlockFileName); + if (!fs.exists(src)) { + LOG.warn("block file {} not exists", currentBlockFileName); + return; + } + + String dataFileName = currentBlockFileName.replaceFirst("\\.", ""); + Path dist = new Path(tmpPath + SP + dataFileName); + + if (fs.rename(src, dist)) { + LOG.info("Rename temporary data block file:{} to:{}", src, dist); + } else { + LOG.info("Failed to rename temporary data block file:{} to:{}", src, dist); + } + } + } catch (Exception e) { + LOG.error("Failed to rename file with exception : " + e.getMessage()); + throw new RuntimeException(e); + } + } + + @Override + protected void clearTemporaryDataFiles() throws IOException { + Path finishedDir, tmpDir; + if (outputFilePath.endsWith(SP)) { + finishedDir = new Path(outputFilePath, FINISHED_SUBDIR); + tmpDir = new Path(outputFilePath, DATA_SUBDIR); + } else { + finishedDir = new Path(outputFilePath + SP + FINISHED_SUBDIR); + tmpDir = new Path(outputFilePath + SP + DATA_SUBDIR); + } + + if (fs.delete(finishedDir, true)) { + LOG.info("Succeed to delete .finished dir:{}", finishedDir); + } else { + LOG.warn("Failed to delete .finished dir:{}", finishedDir); + } + + if (fs.delete(tmpDir, true)) { + LOG.info("Succeed to delete .data dir:{}", tmpDir); + } else { + LOG.warn("Failed to delete .data dir:{}", tmpDir); + } + } + + @Override + protected void closeSource() throws IOException { + if (fs != null) { + fs.close(); + } + } + + @Override + protected void createFinishedTag() throws IOException { + if (fs != null) { + fs.createNewFile(new Path(finishedPath)); + LOG.info("Create finished tag dir:{}", finishedPath); + } + } + + @Override + protected void waitForAllTasksToFinish() throws IOException { + Path finishedDir = new Path(outputFilePath + SP + FINISHED_SUBDIR); + final int maxRetryTime = 100; + int i = 0; + for (; i < maxRetryTime; ++i) { + if (fs.listStatus(finishedDir).length == numTasks) { + break; + } + SysUtil.sleep(3000); + } + + if (i == maxRetryTime) { + String subTaskDataPath = outputFilePath + SP + DATA_SUBDIR; + fs.delete(new Path(subTaskDataPath), true); + LOG.info("waitForAllTasksToFinish: delete path:[{}]", subTaskDataPath); + + fs.delete(finishedDir, true); + LOG.info("waitForAllTasksToFinish: delete finished dir:[{}]", finishedDir); + + throw new RuntimeException("timeout when gathering finish tags for each subtasks"); + } + } + + @Override + protected void coverageData() throws IOException { + LOG.info("Overwrite the original data"); + + Path dir = new Path(outputFilePath); + if (!fs.exists(dir)) { + return; + } + + fs.delete(dir, true); + fs.mkdirs(dir); + } + + @Override + protected void moveTemporaryDataFileToDirectory() throws IOException { + PathFilter pathFilter = path -> path.getName().startsWith(String.valueOf(taskNumber)); + Path dir = new Path(outputFilePath); + Path tmpDir = new Path(tmpPath); + + FileStatus[] dataFiles = fs.listStatus(tmpDir, pathFilter); + for (FileStatus dataFile : dataFiles) { + if (fs.rename(dataFile.getPath(), new Path(dir, dataFile.getPath().getName()))) { + LOG.info("Rename temp file:{} to dir:{}", dataFile.getPath(), dir); + } else { + LOG.warn("Failed to rename temp file:{} to dir:{}", dataFile.getPath(), dir); + } + } + } + + @Override + protected void moveAllTemporaryDataFileToDirectory() throws IOException { + PathFilter pathFilter = path -> !path.getName().startsWith("."); + Path dir = new Path(outputFilePath); + Path tmpDir = new Path(tmpPath); + + FileStatus[] dataFiles = fs.listStatus(tmpDir, pathFilter); + for (FileStatus dataFile : dataFiles) { + if (fs.rename(dataFile.getPath(), new Path(dir, dataFile.getPath().getName()))) { + LOG.info("Rename temp file:{} to dir:{}", dataFile.getPath(), dir); + } else { + LOG.warn("Failed to rename temp file:{} to dir:{}", dataFile.getPath(), dir); + } + } + } + + @Override + protected void writeMultipleRecordsInternal() throws Exception { + notSupportBatchWrite("AlluxioWriter"); + } + +} \ No newline at end of file diff --git a/flinkx-metadata-hbase/pom.xml b/flinkx-alluxio/pom.xml similarity index 85% rename from flinkx-metadata-hbase/pom.xml rename to flinkx-alluxio/pom.xml index faa5c8dadb..21099d1356 100644 --- a/flinkx-metadata-hbase/pom.xml +++ b/flinkx-alluxio/pom.xml @@ -9,10 +9,11 @@ 4.0.0 - flinkx-metadata-hbase + flinkx-alluxio pom - flinkx-metadata-hbase-reader + flinkx-alluxio-core + flinkx-alluxio-writer diff --git a/flinkx-binlog/flinkx-binlog-reader/pom.xml b/flinkx-binlog/flinkx-binlog-reader/pom.xml index c45520a141..ac71d5c557 100644 --- a/flinkx-binlog/flinkx-binlog-reader/pom.xml +++ b/flinkx-binlog/flinkx-binlog-reader/pom.xml @@ -130,9 +130,8 @@ - + tofile="${basedir}/../../syncplugins/binlogreader/${project.name}-${package.name}.jar" /> diff --git a/flinkx-carbondata/flinkx-carbondata-reader/pom.xml b/flinkx-carbondata/flinkx-carbondata-reader/pom.xml index b40fb3e0a8..55c16b2f6f 100644 --- a/flinkx-carbondata/flinkx-carbondata-reader/pom.xml +++ b/flinkx-carbondata/flinkx-carbondata-reader/pom.xml @@ -93,7 +93,7 @@ + tofile="${basedir}/../../syncplugins/carbondatareader/${project.name}-${package.name}.jar" /> diff --git a/flinkx-carbondata/flinkx-carbondata-writer/pom.xml b/flinkx-carbondata/flinkx-carbondata-writer/pom.xml index 40229f2a25..374673eaeb 100644 --- a/flinkx-carbondata/flinkx-carbondata-writer/pom.xml +++ b/flinkx-carbondata/flinkx-carbondata-writer/pom.xml @@ -95,7 +95,7 @@ + tofile="${basedir}/../../syncplugins/carbondatawriter/${project.name}-${package.name}.jar" /> diff --git a/flinkx-cassandra/flinkx-cassandra-reader/pom.xml b/flinkx-cassandra/flinkx-cassandra-reader/pom.xml index f7f601084c..b9da54be21 100644 --- a/flinkx-cassandra/flinkx-cassandra-reader/pom.xml +++ b/flinkx-cassandra/flinkx-cassandra-reader/pom.xml @@ -83,9 +83,8 @@ - + tofile="${basedir}/../../syncplugins/cassandrareader/${project.name}-${package.name}.jar" /> diff --git a/flinkx-cassandra/flinkx-cassandra-writer/pom.xml b/flinkx-cassandra/flinkx-cassandra-writer/pom.xml index 0450175f24..88a44bd22f 100644 --- a/flinkx-cassandra/flinkx-cassandra-writer/pom.xml +++ b/flinkx-cassandra/flinkx-cassandra-writer/pom.xml @@ -83,9 +83,8 @@ - + tofile="${basedir}/../../syncplugins/cassandrawriter/${project.name}-${package.name}.jar" /> diff --git a/flinkx-clickhouse/flinkx-clickhouse-reader/pom.xml b/flinkx-clickhouse/flinkx-clickhouse-reader/pom.xml index 28685f3ab9..8f1e494eb8 100644 --- a/flinkx-clickhouse/flinkx-clickhouse-reader/pom.xml +++ b/flinkx-clickhouse/flinkx-clickhouse-reader/pom.xml @@ -90,7 +90,7 @@ + tofile="${basedir}/../../syncplugins/clickhousereader/${project.name}-${package.name}.jar" /> diff --git a/flinkx-clickhouse/flinkx-clickhouse-writer/pom.xml b/flinkx-clickhouse/flinkx-clickhouse-writer/pom.xml index 31c7a81506..8427d72b1b 100644 --- a/flinkx-clickhouse/flinkx-clickhouse-writer/pom.xml +++ b/flinkx-clickhouse/flinkx-clickhouse-writer/pom.xml @@ -10,6 +10,7 @@ 4.0.0 flinkx-clickhouse-writer + com.dtstack.flinkx @@ -89,7 +90,7 @@ + tofile="${basedir}/../../syncplugins/clickhousewriter/${project.name}-${package.name}.jar" /> diff --git a/flinkx-core/pom.xml b/flinkx-core/pom.xml index 05e885bd61..446cc59e25 100644 --- a/flinkx-core/pom.xml +++ b/flinkx-core/pom.xml @@ -24,12 +24,31 @@ 2.7 + + + org.apache.logging.log4j + log4j-core + 2.12.1 + + + + org.apache.logging.log4j + log4j-slf4j-impl + 2.12.1 + + org.slf4j - slf4j-log4j12 - 1.7.10 + slf4j-api + 1.7.30 + + + + + + ch.qos.logback logback-classic @@ -49,28 +68,23 @@ ${flink.version} - org.apache.flink - flink-streaming-java_2.11 + flink-streaming-java_${scala.binary.version} ${flink.version} - - - org.xerial.snappy - snappy-java - - org.apache.flink - flink-clients_2.11 + flink-clients_${scala.binary.version} ${flink.version} + + org.apache.flink - flink-hadoop-compatibility_2.11 + flink-hadoop-compatibility_${scala.binary.version} ${flink.version} @@ -80,27 +94,25 @@ - - commons-cli - commons-cli - 1.2 - - org.apache.flink - flink-yarn_2.11 + flink-yarn_${scala.binary.version} ${flink.version} flink-shaded-hadoop2 org.apache.flink + + org.apache.hadoop + hadoop-common + org.apache.flink - flink-queryable-state-runtime_2.11 + flink-queryable-state-runtime_${scala.binary.version} ${flink.version} @@ -117,7 +129,7 @@ com.fasterxml.jackson.core jackson-databind - 2.9.10.1 + 2.9.10.7 io.prometheus @@ -221,9 +233,8 @@ - + tofile="${basedir}/../syncplugins/flinkx-${package.name}.jar" /> diff --git a/flinkx-core/src/main/java/com/dtstack/flinkx/Main.java b/flinkx-core/src/main/java/com/dtstack/flinkx/Main.java index 1c866ff281..d629c94ae9 100644 --- a/flinkx-core/src/main/java/com/dtstack/flinkx/Main.java +++ b/flinkx-core/src/main/java/com/dtstack/flinkx/Main.java @@ -33,7 +33,7 @@ import com.dtstack.flinkx.writer.DataWriterFactory; import com.fasterxml.jackson.databind.ObjectMapper; import org.apache.commons.io.Charsets; -import org.apache.commons.lang.StringUtils; +import org.apache.commons.lang3.StringUtils; import org.apache.flink.api.common.JobExecutionResult; import org.apache.flink.api.common.restartstrategy.RestartStrategies; import org.apache.flink.api.common.time.Time; @@ -115,7 +115,6 @@ public static void main(String[] args) throws Exception { PluginUtil.registerPluginUrlToCachedFile(config, env); env.setParallelism(speedConfig.getChannel()); - env.setRestartStrategy(RestartStrategies.noRestart()); BaseDataReader dataReader = DataReaderFactory.getDataReader(config, env); DataStream dataStream = dataReader.readData(); if(speedConfig.getReaderChannel() > 0){ diff --git a/flinkx-core/src/main/java/com/dtstack/flinkx/constants/ConstantValue.java b/flinkx-core/src/main/java/com/dtstack/flinkx/constants/ConstantValue.java index b996539e7f..1ba9aa661d 100644 --- a/flinkx-core/src/main/java/com/dtstack/flinkx/constants/ConstantValue.java +++ b/flinkx-core/src/main/java/com/dtstack/flinkx/constants/ConstantValue.java @@ -47,10 +47,12 @@ public class ConstantValue { public static final String KEY_HTTP = "http"; + public static final String PROTOCOL_S3A = "s3a://"; public static final String PROTOCOL_HTTP = "http://"; public static final String PROTOCOL_HTTPS = "https://"; public static final String PROTOCOL_HDFS = "hdfs://"; public static final String PROTOCOL_JDBC_MYSQL = "jdbc:mysql://"; + public static final String PROTOCOL_ALLUXIO = "alluxio://"; public static final String SYSTEM_PROPERTIES_KEY_OS = "os.name"; public static final String SYSTEM_PROPERTIES_KEY_USER_DIR = "user.dir"; diff --git a/flinkx-core/src/main/java/com/dtstack/flinkx/enums/EDatabaseType.java b/flinkx-core/src/main/java/com/dtstack/flinkx/enums/EDatabaseType.java index 7bc687fc92..a7d40a7ad6 100644 --- a/flinkx-core/src/main/java/com/dtstack/flinkx/enums/EDatabaseType.java +++ b/flinkx-core/src/main/java/com/dtstack/flinkx/enums/EDatabaseType.java @@ -18,8 +18,6 @@ package com.dtstack.flinkx.enums; -import org.apache.commons.net.ftp.FTP; - /** * Database type * @@ -40,6 +38,7 @@ public enum EDatabaseType { MongoDB, Redis, ES, + TeraData, /** * contains ftp and sftp diff --git a/flinkx-core/src/main/java/com/dtstack/flinkx/enums/WriteType.java b/flinkx-core/src/main/java/com/dtstack/flinkx/enums/WriteType.java new file mode 100644 index 0000000000..9ed388a46d --- /dev/null +++ b/flinkx-core/src/main/java/com/dtstack/flinkx/enums/WriteType.java @@ -0,0 +1,30 @@ +package com.dtstack.flinkx.enums; + +/** + * @author wuzhongjian_yewu@cmss.chinamobile.com + * @date 2021-12-16 + */ +public enum WriteType { + + /** + * Data is written synchronously to a Alluxio worker and the under storage system. + */ + CACHE_THROUGH, + + /** + * Data is written synchronously to a Alluxio worker. + * No data will be written to the under storage. This is the default write type. + */ + MUST_CACHE, + + /** + * Default,Data is written synchronously to the under storage. No data will be written to Alluxio. + */ + THROUGH, + + /** + * Data is written synchronously to a Alluxio worker and asynchronously to the under storage system. Experimental. + */ + ASYNC_THROUGH + +} diff --git a/flinkx-core/src/main/java/com/dtstack/flinkx/inputformat/BaseRichInputFormat.java b/flinkx-core/src/main/java/com/dtstack/flinkx/inputformat/BaseRichInputFormat.java index 885dbbac4b..5c707926ed 100644 --- a/flinkx-core/src/main/java/com/dtstack/flinkx/inputformat/BaseRichInputFormat.java +++ b/flinkx-core/src/main/java/com/dtstack/flinkx/inputformat/BaseRichInputFormat.java @@ -114,12 +114,12 @@ public final void configure(Configuration parameters) { @Override public void openInputFormat() throws IOException { - showConfig(); initJobInfo(); initPrometheusReporter(); startTime = System.currentTimeMillis(); DtLogger.config(logConfig, jobId); + showConfig(); } @Override diff --git a/flinkx-core/src/main/java/com/dtstack/flinkx/log/DtLogger.java b/flinkx-core/src/main/java/com/dtstack/flinkx/log/DtLogger.java index bc4f34b898..8bd3363018 100644 --- a/flinkx-core/src/main/java/com/dtstack/flinkx/log/DtLogger.java +++ b/flinkx-core/src/main/java/com/dtstack/flinkx/log/DtLogger.java @@ -18,7 +18,6 @@ package com.dtstack.flinkx.log; import ch.qos.logback.classic.Level; -import ch.qos.logback.classic.LoggerContext; import ch.qos.logback.classic.encoder.PatternLayoutEncoder; import ch.qos.logback.classic.filter.ThresholdFilter; import ch.qos.logback.core.rolling.RollingFileAppender; @@ -26,9 +25,17 @@ import ch.qos.logback.core.util.FileSize; import ch.qos.logback.core.util.OptionHelper; import com.dtstack.flinkx.config.LogConfig; -import org.apache.commons.lang.StringUtils; -import org.apache.log4j.PatternLayout; -import org.apache.log4j.varia.LevelRangeFilter; +import org.apache.commons.lang3.StringUtils; +import org.apache.logging.log4j.LogManager; +import org.apache.logging.log4j.core.Appender; +import org.apache.logging.log4j.core.Filter; +import org.apache.logging.log4j.core.LoggerContext; +import org.apache.logging.log4j.core.appender.rolling.SizeBasedTriggeringPolicy; +import org.apache.logging.log4j.core.config.Configuration; +import org.apache.logging.log4j.core.config.LoggerConfig; +import org.apache.logging.log4j.core.filter.LevelRangeFilter; +import org.apache.logging.log4j.core.layout.PatternLayout; +import org.apache.logging.log4j.spi.StandardLevel; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import org.slf4j.impl.StaticLoggerBinder; @@ -45,14 +52,14 @@ public class DtLogger { private static Logger LOG = LoggerFactory.getLogger(DtLogger.class); private static boolean init = false; + public static final String LOG4J = "org.apache.logging.slf4j.Log4jLoggerFactory"; public static final String APPEND_NAME = "flinkx"; public static final String LOGGER_NAME = "com.dtstack"; - public static final String LOG4J = "org.slf4j.impl.Log4jLoggerFactory"; + private static boolean isLog4j2; public static final String LOGBACK = "ch.qos.logback.classic.util.ContextSelectorStaticBinder"; public static int LEVEL_INT = Integer.MAX_VALUE; - public static void config(LogConfig logConfig, String jobId) { if (logConfig == null || !logConfig.isLogger() || init) { return; @@ -67,58 +74,73 @@ public static void config(LogConfig logConfig, String jobId) { } String type = StaticLoggerBinder.getSingleton().getLoggerFactoryClassStr(); + LOG.info("current log type is {}", type); if (LOG4J.equalsIgnoreCase(type)) { configLog4j(logConfig, jobId); + isLog4j2 = true; } else if (LOGBACK.equalsIgnoreCase(type)) { configLogback(logConfig, jobId); + isLog4j2 = false; + }else{ + LOG.warn("log type {} is not [org.apache.logging.slf4j.Log4jLoggerFactory], either nor [ch.qos.logback.classic.util.ContextSelectorStaticBinder]", type); } init = true; } } - } + } private static void configLog4j(LogConfig logConfig, String jobId) { - org.apache.log4j.Logger logger = org.apache.log4j.Logger.getLogger(LOGGER_NAME); - org.apache.log4j.Level level = org.apache.log4j.Level.toLevel(logConfig.getLevel()); - LEVEL_INT = level.toInt(); + LOG.info("start to config log4j..."); + LoggerContext loggerContext = (LoggerContext) LogManager.getContext(); + Configuration config = loggerContext.getConfiguration(); + + org.apache.logging.log4j.Level level = org.apache.logging.log4j.Level.toLevel(logConfig.getLevel()); + LEVEL_INT = level.intLevel(); String pattern = logConfig.getPattern(); String path = logConfig.getPath(); - logger.removeAllAppenders(); - logger.setAdditivity(true); - logger.setLevel(level); - - org.apache.log4j.RollingFileAppender appender = new org.apache.log4j.RollingFileAppender(); - PatternLayout layout = new PatternLayout(); - if (StringUtils.isNotBlank(pattern)) { - layout.setConversionPattern(pattern); - } else { - layout.setConversionPattern(LogConfig.DEFAULT_LOG4J_PATTERN); + if (StringUtils.isBlank(pattern)) { + pattern = LogConfig.DEFAULT_LOG4J_PATTERN; } - LevelRangeFilter filter = new LevelRangeFilter(); - filter.setLevelMin(level); - appender.addFilter(filter); - appender.setLayout(layout); - appender.setFile(path + jobId + ".log"); - appender.setEncoding(StandardCharsets.UTF_8.name()); - appender.setMaxFileSize("1GB"); - appender.setMaxBackupIndex(1); - appender.setAppend(true); - appender.activateOptions(); - appender.setName(APPEND_NAME); + PatternLayout layout = PatternLayout.newBuilder() + .withCharset(StandardCharsets.UTF_8) + .withConfiguration(config) + .withPattern(pattern) + .build(); + + Filter filter = LevelRangeFilter.createFilter(org.apache.logging.log4j.Level.ERROR, + level, + Filter.Result.ACCEPT, + Filter.Result.DENY); + + Appender appender = org.apache.logging.log4j.core.appender.RollingFileAppender.newBuilder() + .withAppend(true) +// .setFilter(filter) + .withFileName(path + File.separator + jobId + ".log") + .withFilePattern(path + File.separator + jobId + ".%i.log") + .setName(APPEND_NAME) + .withPolicy(SizeBasedTriggeringPolicy.createPolicy("1GB")) + .setLayout(layout) + .setConfiguration(config) + .build(); + appender.start(); - logger.removeAllAppenders(); - logger.addAppender(appender); + for (final LoggerConfig loggerConfig : config.getLoggers().values()) { + loggerConfig.addAppender(appender, level, filter); + loggerConfig.setAdditive(false); + loggerConfig.setLevel(level); + } - logger.info("DtLogger config successfully, current log is [log4j]"); + LOG.info("DtLogger config successfully, current log is [log4j]"); } @SuppressWarnings("unchecked") private static void configLogback(LogConfig logConfig, String jobId) { - LoggerContext context = (LoggerContext) LoggerFactory.getILoggerFactory(); - ch.qos.logback.classic.Logger logger = context.getLogger(LOGGER_NAME); + LOG.info("start to config logback..."); + final ch.qos.logback.classic.LoggerContext context = (ch.qos.logback.classic.LoggerContext) LoggerFactory.getILoggerFactory(); + final ch.qos.logback.classic.Logger logger = context.getLogger(LOGGER_NAME); Level level = Level.toLevel(logConfig.getLevel()); LEVEL_INT = level.toInt(); @@ -166,14 +188,22 @@ private static void configLogback(LogConfig logConfig, String jobId) { logger.setAdditive(true); logger.addAppender(appender); - logger.info("DtLogger config successfully, current log is [logback]"); + LOG.info("DtLogger config successfully, current log is [logback]"); } public static boolean isEnableTrace(){ - return Level.TRACE_INT >= LEVEL_INT; + if(isLog4j2){ + return StandardLevel.TRACE.intLevel() >= LEVEL_INT; + }else{ + return Level.TRACE.levelInt >= LEVEL_INT; + } } public static boolean isEnableDebug(){ - return Level.DEBUG_INT >= LEVEL_INT; + if(isLog4j2){ + return StandardLevel.DEBUG.intLevel() >= LEVEL_INT; + }else{ + return Level.DEBUG.levelInt >= LEVEL_INT; + } } } diff --git a/flinkx-core/src/main/java/com/dtstack/flinkx/metrics/AccumulatorCollector.java b/flinkx-core/src/main/java/com/dtstack/flinkx/metrics/AccumulatorCollector.java index feb64ae62f..a1f67a983f 100644 --- a/flinkx-core/src/main/java/com/dtstack/flinkx/metrics/AccumulatorCollector.java +++ b/flinkx-core/src/main/java/com/dtstack/flinkx/metrics/AccumulatorCollector.java @@ -25,7 +25,7 @@ import com.google.common.collect.Lists; import com.google.gson.Gson; import com.google.gson.internal.LinkedTreeMap; -import org.apache.commons.lang.StringUtils; +import org.apache.commons.lang3.StringUtils; import org.apache.flink.api.common.accumulators.LongCounter; import org.apache.flink.api.common.functions.RuntimeContext; import org.apache.http.impl.client.CloseableHttpClient; diff --git a/flinkx-core/src/main/java/com/dtstack/flinkx/metrics/BaseMetric.java b/flinkx-core/src/main/java/com/dtstack/flinkx/metrics/BaseMetric.java index e0843e7323..942e254f8c 100644 --- a/flinkx-core/src/main/java/com/dtstack/flinkx/metrics/BaseMetric.java +++ b/flinkx-core/src/main/java/com/dtstack/flinkx/metrics/BaseMetric.java @@ -54,7 +54,7 @@ public void addMetric(String metricName, LongCounter counter){ public void addMetric(String metricName, LongCounter counter, boolean meterView){ metricCounters.put(metricName, counter); - flinkxOutput.gauge(metricName, new SimpleAccumulatorGauge(counter)); + flinkxOutput.gauge(metricName, new SimpleAccumulatorGauge<>(counter)); if (meterView){ flinkxOutput.meter(metricName + Metrics.SUFFIX_RATE, new SimpleLongCounterMeterView(counter, 20)); } diff --git a/flinkx-core/src/main/java/com/dtstack/flinkx/options/Options.java b/flinkx-core/src/main/java/com/dtstack/flinkx/options/Options.java index 0296baa0dc..68b59e1624 100644 --- a/flinkx-core/src/main/java/com/dtstack/flinkx/options/Options.java +++ b/flinkx-core/src/main/java/com/dtstack/flinkx/options/Options.java @@ -73,6 +73,9 @@ public class Options { @OptionRequired(description = "env properties") private String confProp = "{}"; + @OptionRequired(description = "json modify") + private String p = ""; + @OptionRequired(description = "savepoint path") private String s; @@ -246,6 +249,14 @@ public void setRemotePluginPath(String remotePluginPath) { this.remotePluginPath = remotePluginPath; } + public String getP() { + return p; + } + + public void setP(String p) { + this.p = p; + } + public String getKrb5conf() { return krb5conf; } @@ -285,6 +296,7 @@ public String toString() { ", queue='" + queue + '\'' + ", flinkLibJar='" + flinkLibJar + '\'' + ", confProp='" + confProp + '\'' + + ", p='" + p + '\'' + ", s='" + s + '\'' + ", pluginLoadMode='" + pluginLoadMode + '\'' + ", appId='" + appId + '\'' + diff --git a/flinkx-core/src/main/java/com/dtstack/flinkx/reader/BaseDataReader.java b/flinkx-core/src/main/java/com/dtstack/flinkx/reader/BaseDataReader.java index 3de055c843..dfd9e07929 100644 --- a/flinkx-core/src/main/java/com/dtstack/flinkx/reader/BaseDataReader.java +++ b/flinkx-core/src/main/java/com/dtstack/flinkx/reader/BaseDataReader.java @@ -92,7 +92,8 @@ public void setSrcCols(List srcCols) { protected BaseDataReader(DataTransferConfig config, StreamExecutionEnvironment env) { this.env = env; this.dataTransferConfig = config; - this.numPartitions = config.getJob().getSetting().getSpeed().getChannel(); + this.numPartitions = Math.max(config.getJob().getSetting().getSpeed().getChannel(), + config.getJob().getSetting().getSpeed().getReaderChannel()); this.bytes = config.getJob().getSetting().getSpeed().getBytes(); this.monitorUrls = config.getMonitorUrls(); this.restoreConfig = config.getJob().getSetting().getRestoreConfig(); diff --git a/flinkx-core/src/main/java/com/dtstack/flinkx/util/ClassUtil.java b/flinkx-core/src/main/java/com/dtstack/flinkx/util/ClassUtil.java index b073524412..7fb6aae459 100644 --- a/flinkx-core/src/main/java/com/dtstack/flinkx/util/ClassUtil.java +++ b/flinkx-core/src/main/java/com/dtstack/flinkx/util/ClassUtil.java @@ -38,6 +38,7 @@ public class ClassUtil { public static void forName(String clazz, ClassLoader classLoader) { synchronized (LOCK_STR){ try { + LOG.info("className = " + clazz); Class.forName(clazz, true, classLoader); DriverManager.setLoginTimeout(10); } catch (Exception e) { diff --git a/flinkx-core/src/main/java/com/dtstack/flinkx/util/DateUtil.java b/flinkx-core/src/main/java/com/dtstack/flinkx/util/DateUtil.java index 660afb3f2f..935a654128 100644 --- a/flinkx-core/src/main/java/com/dtstack/flinkx/util/DateUtil.java +++ b/flinkx-core/src/main/java/com/dtstack/flinkx/util/DateUtil.java @@ -23,6 +23,10 @@ import java.sql.Timestamp; import java.text.ParseException; import java.text.SimpleDateFormat; +import java.time.LocalDate; +import java.time.LocalDateTime; +import java.time.ZoneId; +import java.time.ZoneOffset; import java.util.Calendar; import java.util.Date; import java.util.HashMap; @@ -66,32 +70,32 @@ public class DateUtil { public final static int LENGTH_NANOSECOND = 19; public static ThreadLocal> datetimeFormatter = ThreadLocal.withInitial(() -> { - TimeZone timeZone = TimeZone.getTimeZone(TIME_ZONE); + TimeZone timeZone = TimeZone.getTimeZone(TIME_ZONE); - Map formatterMap = new HashMap<>(); - SimpleDateFormat standardDatetimeFormatter = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss"); - standardDatetimeFormatter.setTimeZone(timeZone); - formatterMap.put(STANDARD_DATETIME_FORMAT,standardDatetimeFormatter); + Map formatterMap = new HashMap<>(); + SimpleDateFormat standardDatetimeFormatter = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss"); + standardDatetimeFormatter.setTimeZone(timeZone); + formatterMap.put(STANDARD_DATETIME_FORMAT,standardDatetimeFormatter); - SimpleDateFormat unStandardDatetimeFormatter = new SimpleDateFormat("yyyyMMddHHmmss"); - unStandardDatetimeFormatter.setTimeZone(timeZone); - formatterMap.put(UN_STANDARD_DATETIME_FORMAT,unStandardDatetimeFormatter); + SimpleDateFormat unStandardDatetimeFormatter = new SimpleDateFormat("yyyyMMddHHmmss"); + unStandardDatetimeFormatter.setTimeZone(timeZone); + formatterMap.put(UN_STANDARD_DATETIME_FORMAT,unStandardDatetimeFormatter); - SimpleDateFormat dateFormatter = new SimpleDateFormat("yyyy-MM-dd"); - dateFormatter.setTimeZone(timeZone); - formatterMap.put(DATE_FORMAT,dateFormatter); + SimpleDateFormat dateFormatter = new SimpleDateFormat("yyyy-MM-dd"); + dateFormatter.setTimeZone(timeZone); + formatterMap.put(DATE_FORMAT,dateFormatter); - SimpleDateFormat timeFormatter = new SimpleDateFormat("HH:mm:ss"); - timeFormatter.setTimeZone(timeZone); - formatterMap.put(TIME_FORMAT,timeFormatter); + SimpleDateFormat timeFormatter = new SimpleDateFormat("HH:mm:ss"); + timeFormatter.setTimeZone(timeZone); + formatterMap.put(TIME_FORMAT,timeFormatter); - SimpleDateFormat yearFormatter = new SimpleDateFormat("yyyy"); - yearFormatter.setTimeZone(timeZone); - formatterMap.put(YEAR_FORMAT,yearFormatter); + SimpleDateFormat yearFormatter = new SimpleDateFormat("yyyy"); + yearFormatter.setTimeZone(timeZone); + formatterMap.put(YEAR_FORMAT,yearFormatter); - SimpleDateFormat standardDatetimeFormatterOfMillisecond = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss.SSS"); - standardDatetimeFormatterOfMillisecond.setTimeZone(timeZone); - formatterMap.put(STANDARD_DATETIME_FORMAT_FOR_MILLISECOND,standardDatetimeFormatterOfMillisecond); + SimpleDateFormat standardDatetimeFormatterOfMillisecond = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss.SSS"); + standardDatetimeFormatterOfMillisecond.setTimeZone(timeZone); + formatterMap.put(STANDARD_DATETIME_FORMAT_FOR_MILLISECOND,standardDatetimeFormatterOfMillisecond); return formatterMap; }); @@ -99,9 +103,9 @@ public class DateUtil { private DateUtil() {} public static java.sql.Date columnToDate(Object column,SimpleDateFormat customTimeFormat) { - if(column == null) { + if (column == null) { return null; - } else if(column instanceof String) { + } else if (column instanceof String) { if (((String) column).length() == 0){ return null; } @@ -119,18 +123,26 @@ public static java.sql.Date columnToDate(Object column,SimpleDateFormat customTi return new java.sql.Date(getMillSecond(rawData.toString())); } else if (column instanceof java.sql.Date) { return (java.sql.Date) column; - } else if(column instanceof Timestamp) { + } else if (column instanceof Timestamp) { Timestamp ts = (Timestamp) column; return new java.sql.Date(ts.getTime()); - } else if(column instanceof Date) { + } else if (column instanceof Date) { Date d = (Date)column; return new java.sql.Date(d.getTime()); + } else if (column instanceof LocalDate) { + LocalDate localDate = (LocalDate) column; + return new java.sql.Date( + localDate.atStartOfDay(ZoneId.systemDefault()).toInstant().toEpochMilli()); + } else if (column instanceof LocalDateTime) { + LocalDateTime localDateTime = (LocalDateTime) column; + return new java.sql.Date( + localDateTime.toInstant(ZoneOffset.of("+8")).toEpochMilli()); } throw new IllegalArgumentException("Can't convert " + column.getClass().getName() + " to Date"); } - public static java.sql.Timestamp columnToTimestamp(Object column,SimpleDateFormat customTimeFormat) { + public static java.sql.Timestamp columnToTimestamp(Object column, SimpleDateFormat customTimeFormat) { if (column == null) { return null; } else if(column instanceof String) { @@ -153,25 +165,33 @@ public static java.sql.Timestamp columnToTimestamp(Object column,SimpleDateForma return new java.sql.Timestamp(((java.sql.Date) column).getTime()); } else if(column instanceof Timestamp) { return (Timestamp) column; - } else if(column instanceof Date) { + } else if (column instanceof Date) { Date d = (Date)column; return new java.sql.Timestamp(d.getTime()); + } else if (column instanceof LocalDateTime) { + LocalDateTime localDateTime = (LocalDateTime) column; + return new java.sql.Timestamp( + localDateTime.toInstant(ZoneOffset.of("+8")).toEpochMilli()); + } else if (column instanceof LocalDate) { + LocalDate localDate = (LocalDate) column; + return new java.sql.Timestamp( + localDate.atStartOfDay(ZoneId.systemDefault()).toInstant().toEpochMilli()); } throw new IllegalArgumentException("Can't convert " + column.getClass().getName() + " to Date"); } - public static long getMillSecond(String data){ + public static long getMillSecond(String data) { long time = Long.parseLong(data); - if(data.length() == LENGTH_SECOND){ + if (data.length() == LENGTH_SECOND) { time = Long.parseLong(data) * 1000; - } else if(data.length() == LENGTH_MILLISECOND){ + } else if (data.length() == LENGTH_MILLISECOND) { time = Long.parseLong(data); - } else if(data.length() == LENGTH_MICROSECOND){ + } else if (data.length() == LENGTH_MICROSECOND) { time = Long.parseLong(data) / 1000; - } else if(data.length() == LENGTH_NANOSECOND){ + } else if (data.length() == LENGTH_NANOSECOND) { time = Long.parseLong(data) / 1000000 ; - } else if(data.length() < LENGTH_SECOND){ + } else if (data.length() < LENGTH_SECOND) { try { long day = Long.parseLong(data); Date date = datetimeFormatter.get().get(DATE_FORMAT).parse(START_TIME); @@ -179,18 +199,18 @@ public static long getMillSecond(String data){ long addMill = date.getTime() + day * 24 * 3600 * 1000; cal.setTimeInMillis(addMill); time = cal.getTimeInMillis(); - } catch (Exception ignore){ + } catch (Exception ignore) { } } return time; } - public static Date stringToDate(String strDate,SimpleDateFormat customTimeFormat) { - if(strDate == null || strDate.trim().length() == 0) { + public static Date stringToDate(String strDate, SimpleDateFormat customTimeFormat) { + if (strDate == null || strDate.trim().length() == 0) { return null; } - if(customTimeFormat != null){ + if (customTimeFormat != null) { try { return customTimeFormat.parse(strDate); } catch (ParseException ignored) { @@ -270,51 +290,51 @@ public static SimpleDateFormat buildDateFormatter(String timeFormat){ * @return String DateFormat字符串如:yyyy-MM-dd HH:mm:ss */ public static String getDateFormat(String str) { - if(StringUtils.isBlank(str)){ + if (StringUtils.isBlank(str)) { return null; } boolean year = false; Pattern pattern = Pattern.compile("^[-\\+]?[\\d]*$"); - if(pattern.matcher(str.substring(0, 4)).matches()) { + if (pattern.matcher(str.substring(0, 4)).matches()) { year = true; } StringBuilder sb = new StringBuilder(); int index = 0; - if(!year) { - if(str.contains("月") || str.contains("-") || str.contains("/")) { - if(Character.isDigit(str.charAt(0))) { + if (!year) { + if (str.contains("月") || str.contains("-") || str.contains("/")) { + if (Character.isDigit(str.charAt(0))) { index = 1; } - }else { + } else { index = 3; } } for (int i = 0; i < str.length(); i++) { char chr = str.charAt(i); - if(Character.isDigit(chr)) { - if(index==0) { + if (Character.isDigit(chr)) { + if (index==0) { sb.append("y"); } - if(index==1) { + if (index==1) { sb.append("M"); } - if(index==2) { + if (index==2) { sb.append("d"); } - if(index==3) { + if (index==3) { sb.append("H"); } - if(index==4) { + if (index==4) { sb.append("m"); } - if(index==5) { + if (index==5) { sb.append("s"); } - if(index==6) { + if (index==6) { sb.append("S"); } - }else { - if(i>0) { + } else { + if (i > 0) { char lastChar = str.charAt(i-1); if(Character.isDigit(lastChar)) { index++; diff --git a/flinkx-core/src/main/java/com/dtstack/flinkx/util/StringUtil.java b/flinkx-core/src/main/java/com/dtstack/flinkx/util/StringUtil.java index 473e2816a0..76ff4e99a8 100644 --- a/flinkx-core/src/main/java/com/dtstack/flinkx/util/StringUtil.java +++ b/flinkx-core/src/main/java/com/dtstack/flinkx/util/StringUtil.java @@ -339,4 +339,19 @@ public static String splitIgnoreQuotaAndJoinByPoint(String table) { } return stringBuffer.toString(); } + + public static Boolean parseBoolean(String str) { + if (null == str || "null".equalsIgnoreCase(str)) { + return Boolean.FALSE; + } + + if ("1".equals(str)) { + return Boolean.TRUE; + } else if ("0".equals(str)) { + return Boolean.FALSE; + } else { + return Boolean.parseBoolean(str); + } + } + } diff --git a/flinkx-db2/flinkx-db2-reader/pom.xml b/flinkx-db2/flinkx-db2-reader/pom.xml index 787c353a68..fe6415dab9 100644 --- a/flinkx-db2/flinkx-db2-reader/pom.xml +++ b/flinkx-db2/flinkx-db2-reader/pom.xml @@ -96,7 +96,7 @@ + tofile="${basedir}/../../syncplugins/db2reader/${project.name}-${package.name}.jar" /> diff --git a/flinkx-db2/flinkx-db2-writer/pom.xml b/flinkx-db2/flinkx-db2-writer/pom.xml index 6a80f1fa8b..a01e3ee391 100644 --- a/flinkx-db2/flinkx-db2-writer/pom.xml +++ b/flinkx-db2/flinkx-db2-writer/pom.xml @@ -96,7 +96,7 @@ + tofile="${basedir}/../../syncplugins/db2writer/${project.name}-${package.name}.jar" /> diff --git a/flinkx-dm/flinkx-dm-reader/pom.xml b/flinkx-dm/flinkx-dm-reader/pom.xml index 99fad125f2..2aec50182f 100644 --- a/flinkx-dm/flinkx-dm-reader/pom.xml +++ b/flinkx-dm/flinkx-dm-reader/pom.xml @@ -96,7 +96,7 @@ + tofile="${basedir}/../../syncplugins/dmreader/${project.name}-${package.name}.jar" /> diff --git a/flinkx-dm/flinkx-dm-writer/pom.xml b/flinkx-dm/flinkx-dm-writer/pom.xml index a224b7ef3c..d1061ba007 100644 --- a/flinkx-dm/flinkx-dm-writer/pom.xml +++ b/flinkx-dm/flinkx-dm-writer/pom.xml @@ -96,7 +96,7 @@ + tofile="${basedir}/../../syncplugins/dmwriter/${project.name}-${package.name}.jar" /> diff --git a/flinkx-emqx/flinkx-emqx-reader/pom.xml b/flinkx-emqx/flinkx-emqx-reader/pom.xml index deed70db09..1db1b443de 100644 --- a/flinkx-emqx/flinkx-emqx-reader/pom.xml +++ b/flinkx-emqx/flinkx-emqx-reader/pom.xml @@ -88,7 +88,7 @@ + tofile="${basedir}/../../syncplugins/emqxreader/${project.name}-${package.name}.jar" /> diff --git a/flinkx-emqx/flinkx-emqx-writer/pom.xml b/flinkx-emqx/flinkx-emqx-writer/pom.xml index f8548ba5f0..68ddb5ac70 100644 --- a/flinkx-emqx/flinkx-emqx-writer/pom.xml +++ b/flinkx-emqx/flinkx-emqx-writer/pom.xml @@ -88,7 +88,7 @@ + tofile="${basedir}/../../syncplugins/emqxwriter/${project.name}-${package.name}.jar" /> diff --git a/flinkx-es/flinkx-es-reader/pom.xml b/flinkx-es/flinkx-es-reader/pom.xml index c531fe00f5..2fc873788c 100644 --- a/flinkx-es/flinkx-es-reader/pom.xml +++ b/flinkx-es/flinkx-es-reader/pom.xml @@ -92,7 +92,7 @@ + tofile="${basedir}/../../syncplugins/esreader/${project.name}-${package.name}.jar" /> diff --git a/flinkx-es/flinkx-es-writer/pom.xml b/flinkx-es/flinkx-es-writer/pom.xml index d50b680fe0..3004e63002 100644 --- a/flinkx-es/flinkx-es-writer/pom.xml +++ b/flinkx-es/flinkx-es-writer/pom.xml @@ -93,7 +93,7 @@ + tofile="${basedir}/../../syncplugins/eswriter/${project.name}-${package.name}.jar" /> diff --git a/flinkx-examples/examples/sqlserver_kafka.json b/flinkx-examples/examples/sqlserver_kafka.json deleted file mode 100644 index 2b49eae799..0000000000 --- a/flinkx-examples/examples/sqlserver_kafka.json +++ /dev/null @@ -1,43 +0,0 @@ -{ - "job" : { - "content" : [ { - "reader" : { - "parameter" : { - "username" : "sa", - "password" : "Password!", - "url" : "jdbc:sqlserver://kudu4:1433;databaseName=testDB", - "databaseName" : "testDB", - "cat" : "insert,update", - "tableList" : [ - "dbo.customers", - "dbo.orders" - ], - "pavingData" : true - }, - "name" : "sqlservercdcreader" - }, - "writer" : { - "parameter" : { - "producerSettings" : { - "zookeeper.connect" : "kudu1:2181/kafka100", - "bootstrap.servers" : "kudu5:9092" - }, - "topic" : "tudou" - }, - "name" : "kafkawriter" - } - } ], - "setting" : { - "restore" : { - "isRestore" : true, - "isStream" : true - }, - "errorLimit" : { - }, - "speed" : { - "bytes" : -1048576, - "channel" : 1 - } - } - } -} \ No newline at end of file diff --git a/flinkx-ftp/flinkx-ftp-reader/pom.xml b/flinkx-ftp/flinkx-ftp-reader/pom.xml index fde7375b4c..29e56eb80f 100644 --- a/flinkx-ftp/flinkx-ftp-reader/pom.xml +++ b/flinkx-ftp/flinkx-ftp-reader/pom.xml @@ -106,7 +106,7 @@ under the License. + tofile="${basedir}/../../syncplugins/ftpreader/${project.name}-${package.name}.jar"/> diff --git a/flinkx-ftp/flinkx-ftp-reader/src/main/java/com/dtstack/flinkx/ftp/reader/FtpInputFormatBuilder.java b/flinkx-ftp/flinkx-ftp-reader/src/main/java/com/dtstack/flinkx/ftp/reader/FtpInputFormatBuilder.java index db022437a1..c4f8508343 100644 --- a/flinkx-ftp/flinkx-ftp-reader/src/main/java/com/dtstack/flinkx/ftp/reader/FtpInputFormatBuilder.java +++ b/flinkx-ftp/flinkx-ftp-reader/src/main/java/com/dtstack/flinkx/ftp/reader/FtpInputFormatBuilder.java @@ -3,6 +3,7 @@ import com.dtstack.flinkx.ftp.FtpConfig; import com.dtstack.flinkx.inputformat.BaseRichInputFormatBuilder; import com.dtstack.flinkx.reader.MetaColumn; +import org.apache.commons.lang3.StringUtils; import java.util.List; @@ -30,5 +31,9 @@ protected void checkFormat() { if (format.getRestoreConfig() != null && format.getRestoreConfig().isRestore()){ throw new UnsupportedOperationException("This plugin not support restore from failed state"); } + + if (StringUtils.isEmpty(format.ftpConfig.getPath())) { + throw new IllegalArgumentException("The property [path] cannot be empty or null"); + } } } diff --git a/flinkx-ftp/flinkx-ftp-writer/pom.xml b/flinkx-ftp/flinkx-ftp-writer/pom.xml index 7f54dee34f..beb6d76b2a 100644 --- a/flinkx-ftp/flinkx-ftp-writer/pom.xml +++ b/flinkx-ftp/flinkx-ftp-writer/pom.xml @@ -107,7 +107,7 @@ under the License. + tofile="${basedir}/../../syncplugins/ftpwriter/${project.name}-${package.name}.jar" /> diff --git a/flinkx-gbase/flinkx-gbase-reader/pom.xml b/flinkx-gbase/flinkx-gbase-reader/pom.xml index cc436c8260..bb90458df0 100644 --- a/flinkx-gbase/flinkx-gbase-reader/pom.xml +++ b/flinkx-gbase/flinkx-gbase-reader/pom.xml @@ -95,7 +95,7 @@ + tofile="${basedir}/../../syncplugins/gbasereader/${project.name}-${package.name}.jar" /> diff --git a/flinkx-gbase/flinkx-gbase-writer/pom.xml b/flinkx-gbase/flinkx-gbase-writer/pom.xml index 6a11467b41..077c8947e4 100644 --- a/flinkx-gbase/flinkx-gbase-writer/pom.xml +++ b/flinkx-gbase/flinkx-gbase-writer/pom.xml @@ -95,7 +95,7 @@ + tofile="${basedir}/../../syncplugins/gbasewriter/${project.name}-${package.name}.jar" /> diff --git a/flinkx-greenplum/flinkx-greenplum-core/pom.xml b/flinkx-greenplum/flinkx-greenplum-core/pom.xml index cd4830df20..3a79da783d 100644 --- a/flinkx-greenplum/flinkx-greenplum-core/pom.xml +++ b/flinkx-greenplum/flinkx-greenplum-core/pom.xml @@ -1,6 +1,6 @@ - flinkx-greenplum diff --git a/flinkx-greenplum/flinkx-greenplum-reader/pom.xml b/flinkx-greenplum/flinkx-greenplum-reader/pom.xml index 008ba92ec2..72a28b084e 100644 --- a/flinkx-greenplum/flinkx-greenplum-reader/pom.xml +++ b/flinkx-greenplum/flinkx-greenplum-reader/pom.xml @@ -1,6 +1,6 @@ - flinkx-greenplum @@ -100,7 +100,7 @@ + tofile="${basedir}/../../syncplugins/greenplumreader/${project.name}-${package.name}.jar" /> diff --git a/flinkx-greenplum/flinkx-greenplum-writer/pom.xml b/flinkx-greenplum/flinkx-greenplum-writer/pom.xml index 29a0de69d0..dd5f2bf06f 100644 --- a/flinkx-greenplum/flinkx-greenplum-writer/pom.xml +++ b/flinkx-greenplum/flinkx-greenplum-writer/pom.xml @@ -1,6 +1,6 @@ - flinkx-greenplum @@ -97,9 +97,8 @@ - + tofile="${basedir}/../../syncplugins/greenplumwriter/${project.name}-${package.name}.jar" /> diff --git a/flinkx-greenplum/pom.xml b/flinkx-greenplum/pom.xml index 3281a852dc..5c3fb85de9 100644 --- a/flinkx-greenplum/pom.xml +++ b/flinkx-greenplum/pom.xml @@ -1,6 +1,6 @@ - flinkx-all diff --git a/flinkx-hbase/flinkx-hbase-reader/pom.xml b/flinkx-hbase/flinkx-hbase-reader/pom.xml index 3203ee401a..49429b3834 100644 --- a/flinkx-hbase/flinkx-hbase-reader/pom.xml +++ b/flinkx-hbase/flinkx-hbase-reader/pom.xml @@ -105,7 +105,7 @@ + tofile="${basedir}/../../syncplugins/hbasereader/${project.name}-${package.name}.jar" /> diff --git a/flinkx-hbase/flinkx-hbase-writer/pom.xml b/flinkx-hbase/flinkx-hbase-writer/pom.xml index faa8d4c22c..05ce45986e 100644 --- a/flinkx-hbase/flinkx-hbase-writer/pom.xml +++ b/flinkx-hbase/flinkx-hbase-writer/pom.xml @@ -94,7 +94,7 @@ + tofile="${basedir}/../../syncplugins/hbasewriter/${project.name}-${package.name}.jar" /> diff --git a/flinkx-hbase2/flinkx-hbase-core2/pom.xml b/flinkx-hbase2/flinkx-hbase-core2/pom.xml new file mode 100644 index 0000000000..2da3a4e592 --- /dev/null +++ b/flinkx-hbase2/flinkx-hbase-core2/pom.xml @@ -0,0 +1,45 @@ + + + + flinkx-hbase2 + com.dtstack.flinkx + 1.6 + + 4.0.0 + + flinkx-hbase-core2 + + + + org.apache.hbase + hbase-client + 2.2.4 + + + log4j + log4j + + + + + + org.apache.hbase + hbase-common + 2.2.4 + + + log4j + log4j + + + + + com.dtstack.flinkx + flinkx-core + 1.6 + compile + + + \ No newline at end of file diff --git a/flinkx-hbase2/flinkx-hbase-core2/src/main/java/com/dtstack/flinkx/hbase2/HbaseConfigConstants.java b/flinkx-hbase2/flinkx-hbase-core2/src/main/java/com/dtstack/flinkx/hbase2/HbaseConfigConstants.java new file mode 100644 index 0000000000..fef2c0e013 --- /dev/null +++ b/flinkx-hbase2/flinkx-hbase-core2/src/main/java/com/dtstack/flinkx/hbase2/HbaseConfigConstants.java @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.dtstack.flinkx.hbase2; + +/** + * The class containing Hbase configuration constants + * + * Company: cmss + * @author wangyulei_yewu@cmss.chinamobile.com + */ +public class HbaseConfigConstants { + + public static final int DEFAULT_SCAN_CACHE_SIZE = 256; + + public static final int MAX_SCAN_CACHE_SIZE = 1000; + + public static final int MIN_SCAN_CACHE_SIZE = 1; + + public static final String DEFAULT_ENCODING = "UTF-8"; + + public static final String DEFAULT_DATA_FORMAT = "yyyy-MM-dd HH:mm:ss"; + + public static final String DEFAULT_NULL_MODE = "skip"; + + public static final long DEFAULT_WRITE_BUFFER_SIZE = 8 * 1024 * 1024L; + + public static final boolean DEFAULT_WAL_FLAG = false; + +} diff --git a/flinkx-hbase2/flinkx-hbase-core2/src/main/java/com/dtstack/flinkx/hbase2/HbaseConfigKeys.java b/flinkx-hbase2/flinkx-hbase-core2/src/main/java/com/dtstack/flinkx/hbase2/HbaseConfigKeys.java new file mode 100644 index 0000000000..344ff7749a --- /dev/null +++ b/flinkx-hbase2/flinkx-hbase-core2/src/main/java/com/dtstack/flinkx/hbase2/HbaseConfigKeys.java @@ -0,0 +1,72 @@ + +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package com.dtstack.flinkx.hbase2; + +/** + * This class defines configuration keys for HbaseReader and HbaseWriter + * + * Company: cmss + * @author wangyulei_yewu@cmss.chinamobile.com + */ +public class HbaseConfigKeys { + + public static final String KEY_SCAN_CACHE_SIZE = "scanCacheSize"; + + public static final String KEY_SCAN_BATCH_SIZE = "scanBatchSize"; + + public static final String KEY_TABLE = "table"; + + public static final String KEY_HBASE_CONFIG = "hbaseConfig"; + + public static final String KEY_START_ROW_KEY = "startRowkey"; + + public static final String KEY_END_ROW_KEY = "endRowkey"; + + public static final String KEY_IS_BINARY_ROW_KEY = "isBinaryRowkey"; + + public static final String KEY_ENCODING = "encoding"; + + public static final String KEY_RANGE = "range"; + + public static final String KEY_COLUMN_NAME = "name"; + + public static final String KEY_COLUMN_TYPE = "type"; + + public static final String KEY_ROW_KEY_COLUMN = "rowkeyColumn"; + + public static final String KEY_ROW_KEY_COLUMN_INDEX = "index"; + + public static final String KEY_ROW_KEY_COLUMN_TYPE = "type"; + + public static final String KEY_ROW_KEY_COLUMN_VALUE = "value"; + + public static final String KEY_NULL_MODE = "nullMode"; + + public static final String KEY_WAL_FLAG = "walFlag"; + + public static final String KEY_VERSION_COLUMN = "versionColumn"; + + public static final String KEY_WRITE_BUFFER_SIZE = "writeBufferSize"; + + public static final String KEY_VERSION_COLUMN_INDEX = "index"; + + public static final String KEY_VERSION_COLUMN_VALUE = "value"; + + +} diff --git a/flinkx-metadata-hbase/flinkx-metadata-hbase-reader/src/main/java/com/dtstack/flinkx/metadatahbase/util/HbaseHelper.java b/flinkx-hbase2/flinkx-hbase-core2/src/main/java/com/dtstack/flinkx/hbase2/HbaseHelper.java similarity index 59% rename from flinkx-metadata-hbase/flinkx-metadata-hbase-reader/src/main/java/com/dtstack/flinkx/metadatahbase/util/HbaseHelper.java rename to flinkx-hbase2/flinkx-hbase-core2/src/main/java/com/dtstack/flinkx/hbase2/HbaseHelper.java index 2e718ad308..180085b480 100644 --- a/flinkx-metadata-hbase/flinkx-metadata-hbase-reader/src/main/java/com/dtstack/flinkx/metadatahbase/util/HbaseHelper.java +++ b/flinkx-hbase2/flinkx-hbase-core2/src/main/java/com/dtstack/flinkx/hbase2/HbaseHelper.java @@ -7,7 +7,7 @@ * "License"); you may not use this file except in compliance * with the License. You may obtain a copy of the License at * - * http://www.apache.org/licenses/LICENSE-2.0 + * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, @@ -16,17 +16,19 @@ * limitations under the License. */ -package com.dtstack.flinkx.metadatahbase.util; +package com.dtstack.flinkx.hbase2; import com.dtstack.flinkx.authenticate.KerberosUtil; import com.dtstack.flinkx.util.FileSystemUtil; import org.apache.commons.collections.MapUtils; +import org.apache.commons.lang3.StringUtils; import org.apache.commons.lang3.Validate; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.hbase.HBaseConfiguration; -import org.apache.hadoop.hbase.client.Admin; -import org.apache.hadoop.hbase.client.Connection; -import org.apache.hadoop.hbase.client.ConnectionFactory; +import org.apache.hadoop.hbase.HConstants; +import org.apache.hadoop.hbase.TableName; +import org.apache.hadoop.hbase.client.*; +import org.apache.hadoop.hbase.util.Bytes; import org.apache.hadoop.security.UserGroupInformation; import org.slf4j.Logger; import org.slf4j.LoggerFactory; @@ -50,11 +52,8 @@ public class HbaseHelper { private final static String KEY_HBASE_SECURITY_AUTHORIZATION = "hbase.security.authorization"; private final static String KEY_HBASE_SECURITY_AUTH_ENABLE = "hbase.security.auth.enable"; - private HbaseHelper(){ - } - - public static org.apache.hadoop.hbase.client.Connection getHbaseConnection(Map hbaseConfigMap) { - Validate.isTrue(MapUtils.isNotEmpty(hbaseConfigMap), "[hadoopConfig] couldn't be empty!"); + public static Connection getHbaseConnection(Map hbaseConfigMap) { + Validate.isTrue(MapUtils.isNotEmpty(hbaseConfigMap), "hbaseConfig不能为空Map结构!"); if(openKerberos(hbaseConfigMap)){ return getConnectionWithKerberos(hbaseConfigMap); @@ -69,17 +68,20 @@ public static org.apache.hadoop.hbase.client.Connection getHbaseConnection(Map hbaseConfigMap){ + private static Connection getConnectionWithKerberos(Map hbaseConfigMap){ try { setKerberosConf(hbaseConfigMap); UserGroupInformation ugi = getUgi(hbaseConfigMap); - return ugi.doAs((PrivilegedAction) () -> { - try { - Configuration hConfiguration = getConfig(hbaseConfigMap); - return ConnectionFactory.createConnection(hConfiguration); - } catch (IOException e) { - LOG.error("Get connection fail with config:{}", hbaseConfigMap); - throw new RuntimeException(e); + return ugi.doAs(new PrivilegedAction() { + @Override + public Connection run() { + try { + Configuration hConfiguration = getConfig(hbaseConfigMap); + return ConnectionFactory.createConnection(hConfiguration); + } catch (IOException e) { + LOG.error("Get connection fail with config:{}", hbaseConfigMap); + throw new RuntimeException(e); + } } }); } catch (Exception e){ @@ -129,7 +131,7 @@ public static boolean openKerberos(Map hbaseConfigMap){ /** * 设置hbase 开启kerberos 连接必要的固定参数 - * @param hbaseConfigMap 参数 + * @param hbaseConfigMap */ public static void setKerberosConf(Map hbaseConfigMap){ hbaseConfigMap.put(KEY_HBASE_SECURITY_AUTHORIZATION, AUTHENTICATION_TYPE); @@ -137,6 +139,41 @@ public static void setKerberosConf(Map hbaseConfigMap){ hbaseConfigMap.put(KEY_HBASE_SECURITY_AUTH_ENABLE, true); } + public static RegionLocator getRegionLocator(Connection hConnection, String userTable){ + TableName hTableName = TableName.valueOf(userTable); + Admin admin = null; + RegionLocator regionLocator = null; + try { + admin = hConnection.getAdmin(); + HbaseHelper.checkHbaseTable(admin,hTableName); + regionLocator = hConnection.getRegionLocator(hTableName); + } catch (Exception e) { + HbaseHelper.closeRegionLocator(regionLocator); + HbaseHelper.closeAdmin(admin); + HbaseHelper.closeConnection(hConnection); + throw new RuntimeException(e); + } + return regionLocator; + + } + + public static byte[] convertRowkey(String rowkey, boolean isBinaryRowkey) { + if(StringUtils.isBlank(rowkey)) { + return HConstants.EMPTY_BYTE_ARRAY; + } else { + return HbaseHelper.stringToBytes(rowkey, isBinaryRowkey); + } + } + + private static byte[] stringToBytes(String rowkey, boolean isBinaryRowkey) { + if (isBinaryRowkey) { + return Bytes.toBytesBinary(rowkey); + } else { + return Bytes.toBytes(rowkey); + } + } + + public static void closeConnection(Connection hConnection){ try { if(null != hConnection) { @@ -147,9 +184,10 @@ public static void closeConnection(Connection hConnection){ } } + public static void closeAdmin(Admin admin){ try { - if( null != admin) { + if(null != admin) { admin.close(); } } catch (IOException e) { @@ -157,4 +195,36 @@ public static void closeAdmin(Admin admin){ } } + + public static void closeRegionLocator(RegionLocator regionLocator){ + try { + if(null != regionLocator) { + regionLocator.close(); + } + } catch (IOException e) { + throw new RuntimeException(e); + } + } + + public static void checkHbaseTable(Admin admin, TableName table) throws IOException { + if(!admin.tableExists(table)){ + throw new IllegalArgumentException("hbase table " + table + " does not exist."); + } + if(!admin.isTableAvailable(table)){ + throw new RuntimeException("hbase table " + table + " is not available."); + } + if(admin.isTableDisabled(table)){ + throw new RuntimeException("hbase table " + table + " is disabled"); + } + } + + public static void closeBufferedMutator(BufferedMutator bufferedMutator){ + try { + if(null != bufferedMutator){ + bufferedMutator.close(); + } + } catch (IOException e) { + throw new RuntimeException(e); + } + } } diff --git a/flinkx-socket/flinkx-socket-reader/pom.xml b/flinkx-hbase2/flinkx-hbase-reader2/pom.xml similarity index 82% rename from flinkx-socket/flinkx-socket-reader/pom.xml rename to flinkx-hbase2/flinkx-hbase-reader2/pom.xml index 428c512a62..f6b44e9746 100644 --- a/flinkx-socket/flinkx-socket-reader/pom.xml +++ b/flinkx-hbase2/flinkx-hbase-reader2/pom.xml @@ -3,18 +3,22 @@ xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> - flinkx-socket + flinkx-hbase2 com.dtstack.flinkx 1.6 4.0.0 - flinkx-socket-reader - + flinkx-hbase-reader2 - flinkx-socket-core + com.google.guava + guava + 12.0.1 + + com.dtstack.flinkx + flinkx-hbase-core2 1.6 @@ -35,8 +39,9 @@ false + com.data-artisans:* + org.scala-lang:* org.slf4j:slf4j-api - log4j:log4j ch.qos.logback:* @@ -51,17 +56,13 @@ - - io.netty - shade.socket.io.netty - com.google.common - shade.socket.com.google.common + shade.hbase.com.google.common com.google.thirdparty - shade.socket.com.google.thirdparty + shade.hbase.com.google.thirdparty @@ -82,14 +83,14 @@ - + - + diff --git a/flinkx-hbase2/flinkx-hbase-reader2/src/main/java/com/dtstack/flinkx/hbase2/reader/Hbase2Reader.java b/flinkx-hbase2/flinkx-hbase-reader2/src/main/java/com/dtstack/flinkx/hbase2/reader/Hbase2Reader.java new file mode 100644 index 0000000000..afa87203a8 --- /dev/null +++ b/flinkx-hbase2/flinkx-hbase-reader2/src/main/java/com/dtstack/flinkx/hbase2/reader/Hbase2Reader.java @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.dtstack.flinkx.hbase2.reader; + +import com.dtstack.flinkx.config.DataTransferConfig; +import com.dtstack.flinkx.config.ReaderConfig; +import com.dtstack.flinkx.hbase2.HbaseConfigConstants; +import com.dtstack.flinkx.hbase2.HbaseConfigKeys; +import com.dtstack.flinkx.reader.BaseDataReader; +import org.apache.flink.streaming.api.datastream.DataStream; +import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; +import org.apache.flink.types.Row; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.ArrayList; +import java.util.List; +import java.util.Map; + +/** + * The reader plugin of Hbase + * + * Company: cmss + * @author wangyulei_yewu@cmss.chinamobile.com + */ +public class Hbase2Reader extends BaseDataReader { + + private static Logger LOG = LoggerFactory.getLogger(Hbase2Reader.class); + + protected List columnName; + protected List columnType; + protected List columnValue; + protected List columnFormat; + protected String encoding; + protected Map hbaseConfig; + protected String startRowkey; + protected String endRowkey; + protected boolean isBinaryRowkey; + protected String tableName; + protected int scanCacheSize; + + public Hbase2Reader(DataTransferConfig config, StreamExecutionEnvironment env) { + super(config, env); + ReaderConfig readerConfig = config.getJob().getContent().get(0).getReader(); + tableName = readerConfig.getParameter().getStringVal(HbaseConfigKeys.KEY_TABLE); + hbaseConfig = (Map) readerConfig.getParameter().getVal(HbaseConfigKeys.KEY_HBASE_CONFIG); + + Map range = (Map) readerConfig.getParameter().getVal(HbaseConfigKeys.KEY_RANGE); + if(range != null) { + startRowkey = (String) range.get(HbaseConfigKeys.KEY_START_ROW_KEY); + endRowkey = (String) range.get(HbaseConfigKeys.KEY_END_ROW_KEY); + isBinaryRowkey = (Boolean) range.get(HbaseConfigKeys.KEY_IS_BINARY_ROW_KEY); + } + + encoding = readerConfig.getParameter().getStringVal(HbaseConfigKeys.KEY_ENCODING); + scanCacheSize = readerConfig.getParameter().getIntVal(HbaseConfigKeys.KEY_SCAN_CACHE_SIZE, HbaseConfigConstants.DEFAULT_SCAN_CACHE_SIZE); + + List columns = readerConfig.getParameter().getColumn(); + if(columns != null && columns.size() > 0) { + columnName = new ArrayList<>(); + columnType = new ArrayList<>(); + columnValue = new ArrayList<>(); + columnFormat = new ArrayList<>(); + for(int i = 0; i < columns.size(); ++i) { + Map sm = (Map) columns.get(i); + columnName.add((String) sm.get("name")); + columnType.add((String) sm.get("type")); + columnValue.add((String) sm.get("value")); + columnFormat.add((String) sm.get("format")); + } + + LOG.info("init column finished"); + } else{ + throw new IllegalArgumentException("column argument error"); + } + } + + @Override + public DataStream readData() { + HbaseInputFormatBuilder builder = new HbaseInputFormatBuilder(); + builder.setDataTransferConfig(dataTransferConfig); + builder.setColumnFormats(columnFormat); + builder.setColumnNames(columnName); + builder.setColumnTypes(columnType); + builder.setColumnValues(columnValue); + builder.setEncoding(encoding); + builder.setEndRowkey(endRowkey); + builder.setHbaseConfig(hbaseConfig); + builder.setStartRowkey(startRowkey); + builder.setIsBinaryRowkey(isBinaryRowkey); + builder.setTableName(tableName); + builder.setBytes(bytes); + builder.setMonitorUrls(monitorUrls); + builder.setScanCacheSize(scanCacheSize); + builder.setMonitorUrls(monitorUrls); + builder.setTestConfig(testConfig); + builder.setLogConfig(logConfig); + + return createInput(builder.finish()); + } + +} diff --git a/flinkx-hbase2/flinkx-hbase-reader2/src/main/java/com/dtstack/flinkx/hbase2/reader/HbaseInputFormat.java b/flinkx-hbase2/flinkx-hbase-reader2/src/main/java/com/dtstack/flinkx/hbase2/reader/HbaseInputFormat.java new file mode 100644 index 0000000000..41efc7623f --- /dev/null +++ b/flinkx-hbase2/flinkx-hbase-reader2/src/main/java/com/dtstack/flinkx/hbase2/reader/HbaseInputFormat.java @@ -0,0 +1,365 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.dtstack.flinkx.hbase2.reader; + +import com.dtstack.flinkx.hbase2.HbaseHelper; +import com.dtstack.flinkx.inputformat.BaseRichInputFormat; +import com.google.common.collect.Maps; +import org.apache.commons.lang.ArrayUtils; +import org.apache.commons.lang.StringUtils; +import org.apache.commons.lang3.math.NumberUtils; +import org.apache.commons.lang3.time.DateUtils; +import org.apache.flink.core.io.InputSplit; +import org.apache.flink.types.Row; +import org.apache.hadoop.hbase.HConstants; +import org.apache.hadoop.hbase.TableName; +import org.apache.hadoop.hbase.client.*; +import org.apache.hadoop.hbase.util.Bytes; +import org.apache.hadoop.hbase.util.Pair; +import org.apache.hadoop.security.UserGroupInformation; + +import java.io.IOException; +import java.math.BigDecimal; +import java.nio.charset.StandardCharsets; +import java.security.PrivilegedAction; +import java.util.ArrayList; +import java.util.List; +import java.util.Locale; +import java.util.Map; + + +/** + * The InputFormat Implementation used for HbaseReader + * + * Company: cmss + * @author wangyulei_yewu@cmss.chinamobile.com + */ +public class HbaseInputFormat extends BaseRichInputFormat { + + public static final String KEY_ROW_KEY = "rowkey"; + + protected Map hbaseConfig; + protected String tableName; + protected String startRowkey; + protected String endRowkey; + protected List columnNames; + protected List columnValues; + protected List columnFormats; + protected List columnTypes; + protected boolean isBinaryRowkey; + protected String encoding; + /** + * 客户端每次 rpc fetch 的行数 + */ + protected int scanCacheSize; + private transient Connection connection; + private transient Scan scan; + private transient Table table; + private transient ResultScanner resultScanner; + private transient Result next; + private transient Map nameMaps; + + private boolean openKerberos = false; + + @Override + public void openInputFormat() throws IOException { + super.openInputFormat(); + + LOG.info("HbaseOutputFormat openInputFormat start"); + nameMaps = Maps.newConcurrentMap(); + + connection = HbaseHelper.getHbaseConnection(hbaseConfig); + + LOG.info("HbaseOutputFormat openInputFormat end"); + } + + @Override + public InputSplit[] createInputSplitsInternal(int minNumSplits) throws IOException { + try (Connection connection = HbaseHelper.getHbaseConnection(hbaseConfig)) { + if(HbaseHelper.openKerberos(hbaseConfig)) { + UserGroupInformation ugi = HbaseHelper.getUgi(hbaseConfig); + return ugi.doAs(new PrivilegedAction() { + @Override + public com.dtstack.flinkx.hbase2.reader.HbaseInputSplit[] run() { + return split(connection, tableName, startRowkey, endRowkey, isBinaryRowkey); + } + }); + } else { + return split(connection, tableName, startRowkey, endRowkey, isBinaryRowkey); + } + } + } + + public HbaseInputSplit[] split(Connection hConn, String tableName, String startKey, String endKey, boolean isBinaryRowkey) { + byte[] startRowkeyByte = HbaseHelper.convertRowkey(startKey, isBinaryRowkey); + byte[] endRowkeyByte = HbaseHelper.convertRowkey(endKey, isBinaryRowkey); + + /* 如果用户配置了 startRowkey 和 endRowkey,需要确保:startRowkey <= endRowkey */ + if (startRowkeyByte.length != 0 && endRowkeyByte.length != 0 + && Bytes.compareTo(startRowkeyByte, endRowkeyByte) > 0) { + throw new IllegalArgumentException("startRowKey can't be bigger than endRowkey"); + } + + RegionLocator regionLocator = HbaseHelper.getRegionLocator(hConn, tableName); + List resultSplits; + try { + Pair regionRanges = regionLocator.getStartEndKeys(); + if (null == regionRanges) { + throw new RuntimeException("Failed to retrieve rowkey ragne"); + } + resultSplits = doSplit(startRowkeyByte, endRowkeyByte, regionRanges); + + LOG.info("HBaseReader split job into {} tasks.", resultSplits.size()); + return resultSplits.toArray(new HbaseInputSplit[resultSplits.size()]); + } catch (Exception e) { + throw new RuntimeException("Failed to split hbase table"); + }finally { + HbaseHelper.closeRegionLocator(regionLocator); + } + } + + private List doSplit(byte[] startRowkeyByte, + byte[] endRowkeyByte, Pair regionRanges) { + + List configurations = new ArrayList<>(); + + for (int i = 0; i < regionRanges.getFirst().length; i++) { + + byte[] regionStartKey = regionRanges.getFirst()[i]; + byte[] regionEndKey = regionRanges.getSecond()[i]; + + // 当前的region为最后一个region + // 如果最后一个region的start Key大于用户指定的userEndKey,则最后一个region,应该不包含在内 + // 注意如果用户指定userEndKey为"",则此判断应该不成立。userEndKey为""表示取得最大的region + boolean isSkip = Bytes.compareTo(regionEndKey, HConstants.EMPTY_BYTE_ARRAY) == 0 + && (endRowkeyByte.length != 0 && (Bytes.compareTo( + regionStartKey, endRowkeyByte) > 0)); + if (isSkip) { + continue; + } + + // 如果当前的region不是最后一个region, + // 用户配置的userStartKey大于等于region的endkey,则这个region不应该含在内 + if ((Bytes.compareTo(regionEndKey, HConstants.EMPTY_BYTE_ARRAY) != 0) + && (Bytes.compareTo(startRowkeyByte, regionEndKey) >= 0)) { + continue; + } + + // 如果用户配置的userEndKey小于等于 region的startkey,则这个region不应该含在内 + // 注意如果用户指定的userEndKey为"",则次判断应该不成立。userEndKey为""表示取得最大的region + if (endRowkeyByte.length != 0 + && (Bytes.compareTo(endRowkeyByte, regionStartKey) <= 0)) { + continue; + } + + String thisStartKey = getStartKey(startRowkeyByte, regionStartKey); + String thisEndKey = getEndKey(endRowkeyByte, regionEndKey); + HbaseInputSplit hbaseInputSplit = new HbaseInputSplit(thisStartKey, thisEndKey); + configurations.add(hbaseInputSplit); + } + + return configurations; + } + + private String getEndKey(byte[] endRowkeyByte, byte[] regionEndKey) { + // 由于之前处理过,所以传入的userStartKey不可能为null + if (endRowkeyByte == null) { + throw new IllegalArgumentException("userEndKey should not be null!"); + } + + byte[] tempEndRowkeyByte; + + if (endRowkeyByte.length == 0) { + tempEndRowkeyByte = regionEndKey; + } else if (Bytes.compareTo(regionEndKey, HConstants.EMPTY_BYTE_ARRAY) == 0) { + // 为最后一个region + tempEndRowkeyByte = endRowkeyByte; + } else { + if (Bytes.compareTo(endRowkeyByte, regionEndKey) > 0) { + tempEndRowkeyByte = regionEndKey; + } else { + tempEndRowkeyByte = endRowkeyByte; + } + } + + return Bytes.toStringBinary(tempEndRowkeyByte); + } + + private String getStartKey(byte[] startRowkeyByte, byte[] regionStarKey) { + // 由于之前处理过,所以传入的userStartKey不可能为null + if (startRowkeyByte == null) { + throw new IllegalArgumentException( + "userStartKey should not be null!"); + } + + byte[] tempStartRowkeyByte; + + if (Bytes.compareTo(startRowkeyByte, regionStarKey) < 0) { + tempStartRowkeyByte = regionStarKey; + } else { + tempStartRowkeyByte = startRowkeyByte; + } + return Bytes.toStringBinary(tempStartRowkeyByte); + } + + @Override + public void openInternal(InputSplit inputSplit) throws IOException { + HbaseInputSplit hbaseInputSplit = (HbaseInputSplit) inputSplit; + byte[] startRow = Bytes.toBytesBinary(hbaseInputSplit.getStartkey()); + byte[] stopRow = Bytes.toBytesBinary(hbaseInputSplit.getEndKey()); + + if(null == connection || connection.isClosed()){ + connection = HbaseHelper.getHbaseConnection(hbaseConfig); + } + + openKerberos = HbaseHelper.openKerberos(hbaseConfig); + + table = connection.getTable(TableName.valueOf(tableName)); + scan = new Scan(); + scan.setStartRow(startRow); + scan.setStopRow(stopRow); + scan.setCaching(scanCacheSize); + resultScanner = table.getScanner(scan); + } + + @Override + public boolean reachedEnd() throws IOException { + next = resultScanner.next(); + return next == null; + } + + @Override + public Row nextRecordInternal(Row row) throws IOException { + row = new Row(columnTypes.size()); + + for (int i = 0; i < columnTypes.size(); ++i) { + String columnType = columnTypes.get(i); + String columnName = columnNames.get(i); + String columnFormat = columnFormats.get(i); + String columnValue = columnValues.get(i); + Object col = null; + byte[] bytes; + + try { + if (StringUtils.isNotEmpty(columnValue)) { + // 常量 + col = convertValueToAssignType(columnType, columnValue, columnFormat); + } else { + if (KEY_ROW_KEY.equals(columnName)) { + bytes = next.getRow(); + } else { + byte [][] arr = nameMaps.get(columnName); + if(arr == null){ + arr = new byte[2][]; + String[] arr1 = columnName.split(":"); + arr[0] = arr1[0].trim().getBytes(StandardCharsets.UTF_8); + arr[1] = arr1[1].trim().getBytes(StandardCharsets.UTF_8); + nameMaps.put(columnName,arr); + } + bytes = next.getValue(arr[0], arr[1]); + } + col = convertBytesToAssignType(columnType, bytes, columnFormat); + } + row.setField(i, col); + } catch(Exception e) { + throw new IOException("Couldn't read data:",e); + } + } + return row; + } + + @Override + public void closeInternal() throws IOException { + HbaseHelper.closeConnection(connection); + } + + public Object convertValueToAssignType(String columnType, String constantValue,String dateformat) throws Exception { + Object column = null; + if(org.apache.commons.lang3.StringUtils.isEmpty(constantValue)) { + return column; + } + + switch (columnType.toUpperCase()) { + case "BOOLEAN": + column = Boolean.valueOf(constantValue); + break; + case "SHORT": + case "INT": + case "LONG": + column = NumberUtils.createBigDecimal(constantValue).toBigInteger(); + break; + case "FLOAT": + case "DOUBLE": + column = new BigDecimal(constantValue); + break; + case "STRING": + column = constantValue; + break; + case "DATE": + column = DateUtils.parseDate(constantValue, new String[]{dateformat}); + break; + default: + throw new IllegalArgumentException("Unsupported columnType: " + columnType); + } + + return column; + } + + public Object convertBytesToAssignType(String columnType, byte[] byteArray,String dateformat) throws Exception { + Object column = null; + if(ArrayUtils.isEmpty(byteArray)) { + return null; + } + String bytesToString = new String(byteArray, encoding); + switch (columnType.toUpperCase(Locale.ENGLISH)) { + case "BOOLEAN": + column = Boolean.valueOf(bytesToString); + break; + case "SHORT": + column = Short.valueOf(bytesToString); + break; + case "INT": + column = Integer.valueOf(bytesToString); + break; + case "LONG": + column = Long.valueOf(bytesToString); + break; + case "FLOAT": + column = Float.valueOf(bytesToString); + break; + case "DOUBLE": + column = Double.valueOf(bytesToString); + break; + case "STRING": + column = bytesToString; + break; + case "BINARY_STRING": + column = Bytes.toStringBinary(byteArray); + break; + case "DATE": + String dateValue = Bytes.toStringBinary(byteArray); + column = DateUtils.parseDate(dateValue, new String[]{dateformat}); + break; + default: + throw new IllegalArgumentException("Unsupported column type: " + columnType); + } + return column; + } + +} diff --git a/flinkx-hbase2/flinkx-hbase-reader2/src/main/java/com/dtstack/flinkx/hbase2/reader/HbaseInputFormatBuilder.java b/flinkx-hbase2/flinkx-hbase-reader2/src/main/java/com/dtstack/flinkx/hbase2/reader/HbaseInputFormatBuilder.java new file mode 100644 index 0000000000..14a99e40ec --- /dev/null +++ b/flinkx-hbase2/flinkx-hbase-reader2/src/main/java/com/dtstack/flinkx/hbase2/reader/HbaseInputFormatBuilder.java @@ -0,0 +1,106 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package com.dtstack.flinkx.hbase2.reader; + +import com.dtstack.flinkx.hbase2.HbaseConfigConstants; +import com.dtstack.flinkx.inputformat.BaseRichInputFormatBuilder; +import org.apache.commons.lang.StringUtils; +import org.apache.flink.util.Preconditions; + +import java.util.List; +import java.util.Map; + +/** + * The builder of HbaseInputFormat + * + * Company: cmss + * @author wangyulei_yewu@cmss.chinamobile.com + */ +public class HbaseInputFormatBuilder extends BaseRichInputFormatBuilder { + + private HbaseInputFormat format; + + public HbaseInputFormatBuilder() { + super.format = format = new HbaseInputFormat(); + } + + public void setHbaseConfig(Map hbaseConfig) { + format.hbaseConfig = hbaseConfig; + } + + public void setTableName(String tableName) { + format.tableName = tableName; + } + + public void setStartRowkey(String startRowkey) { + format.startRowkey = startRowkey; + } + + public void setEndRowkey(String endRowkey) { + format.endRowkey = endRowkey; + } + + public void setColumnNames(List columnNames) { + format.columnNames = columnNames; + } + + public void setColumnValues(List columnValues) { + format.columnValues = columnValues; + } + + public void setColumnTypes(List columnTypes) { + format.columnTypes = columnTypes; + } + + public void setIsBinaryRowkey(boolean isBinaryRowkey) { + format.isBinaryRowkey = isBinaryRowkey; + } + + public void setEncoding(String encoding) { + format.encoding = StringUtils.isEmpty(encoding) ? "utf-8" : encoding; + } + + public void setColumnFormats(List columnFormats) { + format.columnFormats = columnFormats; + } + + public void setScanCacheSize(int scanCacheSize) { + format.scanCacheSize = scanCacheSize; + } + + @Override + protected void checkFormat() { + Preconditions.checkNotNull(format.columnTypes); + Preconditions.checkNotNull(format.columnFormats); + Preconditions.checkNotNull(format.columnValues); + Preconditions.checkNotNull(format.columnNames); + + Preconditions.checkArgument(format.scanCacheSize <= HbaseConfigConstants.MAX_SCAN_CACHE_SIZE && format.scanCacheSize >= HbaseConfigConstants.MIN_SCAN_CACHE_SIZE, + "scanCacheSize should be between " + HbaseConfigConstants.MIN_SCAN_CACHE_SIZE + " and " + HbaseConfigConstants.MAX_SCAN_CACHE_SIZE); + + for(int i = 0; i < format.columnTypes.size(); ++i) { + Preconditions.checkArgument(StringUtils.isNotEmpty(format.columnTypes.get(i))); + Preconditions.checkArgument(StringUtils.isNotEmpty(format.columnNames.get(i)) + || StringUtils.isNotEmpty(format.columnTypes.get(i)) ); + } + + if (format.getRestoreConfig() != null && format.getRestoreConfig().isRestore()){ + throw new UnsupportedOperationException("This plugin not support restore from failed state"); + } + } +} diff --git a/flinkx-metadata-hbase/flinkx-metadata-hbase-reader/src/main/java/com/dtstack/flinkx/metadatahbase/inputformat/MetadatahbaseInputFormatBuilder.java b/flinkx-hbase2/flinkx-hbase-reader2/src/main/java/com/dtstack/flinkx/hbase2/reader/HbaseInputSplit.java similarity index 55% rename from flinkx-metadata-hbase/flinkx-metadata-hbase-reader/src/main/java/com/dtstack/flinkx/metadatahbase/inputformat/MetadatahbaseInputFormatBuilder.java rename to flinkx-hbase2/flinkx-hbase-reader2/src/main/java/com/dtstack/flinkx/hbase2/reader/HbaseInputSplit.java index 690857d4fa..56343797cc 100644 --- a/flinkx-metadata-hbase/flinkx-metadata-hbase-reader/src/main/java/com/dtstack/flinkx/metadatahbase/inputformat/MetadatahbaseInputFormatBuilder.java +++ b/flinkx-hbase2/flinkx-hbase-reader2/src/main/java/com/dtstack/flinkx/hbase2/reader/HbaseInputSplit.java @@ -16,33 +16,36 @@ * limitations under the License. */ -package com.dtstack.flinkx.metadatahbase.inputformat; +package com.dtstack.flinkx.hbase2.reader; -import com.dtstack.flinkx.metadata.inputformat.MetadataInputFormatBuilder; +import org.apache.flink.core.io.InputSplit; -import java.util.Map; - -/** 帮助配置hbase读取参数 - * @author kunni@dtstack.com +/** + * The Class describing each InputSplit of HBase + * + * Company: cmss + * @author wangyulei_yewu@cmss.chinamobile.com */ -public class MetadatahbaseInputFormatBuilder extends MetadataInputFormatBuilder { +public class HbaseInputSplit implements InputSplit { - protected MetadatahbaseInputFormat format; + private String startkey; + private String endKey; - public MetadatahbaseInputFormatBuilder(MetadatahbaseInputFormat format) { - super(format); - this.format = format; + public HbaseInputSplit(String startKey, String endKey) { + this.startkey = startKey; + this.endKey = endKey; } - public void setHadoopConfig(Map hadoopConfig){ - format.setHadoopConfig(hadoopConfig); + public String getStartkey() { + return startkey; } - public void setPath(String path){ - format.setPath(path); + public String getEndKey() { + return endKey; } @Override - protected void checkFormat() { + public int getSplitNumber() { + return 0; } } diff --git a/flinkx-metadata-tidb/flinkx-metadata-tidb-reader/pom.xml b/flinkx-hbase2/flinkx-hbase-writer2/pom.xml similarity index 80% rename from flinkx-metadata-tidb/flinkx-metadata-tidb-reader/pom.xml rename to flinkx-hbase2/flinkx-hbase-writer2/pom.xml index 906bdd0b0d..99ab947403 100644 --- a/flinkx-metadata-tidb/flinkx-metadata-tidb-reader/pom.xml +++ b/flinkx-hbase2/flinkx-hbase-writer2/pom.xml @@ -3,24 +3,23 @@ xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> - flinkx-metadata-tidb + flinkx-hbase2 com.dtstack.flinkx 1.6 4.0.0 - flinkx-metadata-tidb-reader - + flinkx-hbase-writer2 - com.dtstack.flinkx - flinkx-metadata-reader - 1.6 + com.google.guava + guava + 12.0.1 - mysql - mysql-connector-java - 5.1.46 + com.dtstack.flinkx + flinkx-hbase-core2 + 1.6 @@ -40,6 +39,10 @@ false + + com.google.code.gson:* + com.data-artisans:* + org.scala-lang:* org.slf4j:slf4j-api log4j:log4j ch.qos.logback:* @@ -56,17 +59,13 @@ - - io.netty - shade.metadatatidbreader.io.netty - com.google.common - shade.core.com.google.common + shade.hbase.com.google.common com.google.thirdparty - shade.core.com.google.thirdparty + shade.hbase.com.google.thirdparty @@ -87,14 +86,14 @@ - + - + diff --git a/flinkx-hbase2/flinkx-hbase-writer2/src/main/java/com/dtstack/flinkx/hbase2/writer/Hbase2Writer.java b/flinkx-hbase2/flinkx-hbase-writer2/src/main/java/com/dtstack/flinkx/hbase2/writer/Hbase2Writer.java new file mode 100644 index 0000000000..4e2974a6c8 --- /dev/null +++ b/flinkx-hbase2/flinkx-hbase-writer2/src/main/java/com/dtstack/flinkx/hbase2/writer/Hbase2Writer.java @@ -0,0 +1,151 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + + +package com.dtstack.flinkx.hbase2.writer; + +import com.dtstack.flinkx.config.DataTransferConfig; +import com.dtstack.flinkx.config.WriterConfig; +import com.dtstack.flinkx.util.ValueUtil; +import com.dtstack.flinkx.writer.BaseDataWriter; +import org.apache.commons.collections.CollectionUtils; +import org.apache.commons.lang.StringUtils; +import org.apache.flink.streaming.api.datastream.DataStream; +import org.apache.flink.streaming.api.datastream.DataStreamSink; +import org.apache.flink.types.Row; + +import java.util.ArrayList; +import java.util.List; +import java.util.Map; + +import static com.dtstack.flinkx.hbase2.HbaseConfigConstants.DEFAULT_WAL_FLAG; +import static com.dtstack.flinkx.hbase2.HbaseConfigConstants.DEFAULT_WRITE_BUFFER_SIZE; +import static com.dtstack.flinkx.hbase2.HbaseConfigKeys.*; + +/** + * The Writer plugin of HBase + * + * Company: www.dtstack.com + * @author huyifan.zju@163.com + */ +public class Hbase2Writer extends BaseDataWriter { + + private String tableName; + private Map hbaseConfig; + private String encoding; + private String nullMode; + private Boolean walFlag; + private long writeBufferSize; + + private List columnTypes; + private List columnNames; + private String rowkeyExpress; + + private Integer versionColumnIndex; + private String versionColumnValue; + + public Hbase2Writer(DataTransferConfig config) { + super(config); + WriterConfig writerConfig = config.getJob().getContent().get(0).getWriter(); + + tableName = writerConfig.getParameter().getStringVal(KEY_TABLE); + hbaseConfig = (Map) writerConfig.getParameter().getVal(KEY_HBASE_CONFIG); + encoding = writerConfig.getParameter().getStringVal(KEY_ENCODING); + nullMode = writerConfig.getParameter().getStringVal(KEY_NULL_MODE); + walFlag = writerConfig.getParameter().getBooleanVal(KEY_WAL_FLAG, DEFAULT_WAL_FLAG); + writeBufferSize = writerConfig.getParameter().getLongVal(KEY_WRITE_BUFFER_SIZE, DEFAULT_WRITE_BUFFER_SIZE); + + List columns = writerConfig.getParameter().getColumn(); + if(CollectionUtils.isNotEmpty(columns)) { + columnTypes = new ArrayList<>(); + columnNames = new ArrayList<>(); + for(int i = 0; i < columns.size(); ++i) { + Map sm = (Map) columns.get(i); + columnNames.add((String) sm.get(KEY_COLUMN_NAME)); + columnTypes.add((String) sm.get(KEY_COLUMN_TYPE)); + } + } + + Object rowKeyInfo = writerConfig.getParameter().getStringVal(KEY_ROW_KEY_COLUMN); + rowkeyExpress = buildRowKeyExpress(rowKeyInfo); + + Map versionColumn = (Map) writerConfig.getParameter().getVal(KEY_VERSION_COLUMN); + if(versionColumn != null) { + versionColumnIndex = (Integer) versionColumn.get(KEY_VERSION_COLUMN_INDEX); + versionColumnValue = (String) versionColumn.get(KEY_VERSION_COLUMN_VALUE); + } + } + + /** + * Compatible with old formats + */ + private String buildRowKeyExpress(Object rowKeyInfo){ + if (rowKeyInfo == null){ + return null; + } + + if(rowKeyInfo instanceof String){ + return rowKeyInfo.toString(); + } + + if(!(rowKeyInfo instanceof List)){ + return null; + } + + StringBuilder expressBuilder = new StringBuilder(); + + for (Map item : ((List) rowKeyInfo)) { + Integer index = ValueUtil.getInt(item.get(KEY_ROW_KEY_COLUMN_INDEX)); + if (index != null && index != -1) { + expressBuilder.append(String.format("$(%s)", columnNames.get(index))); + continue; + } + + String value = (String) item.get(KEY_ROW_KEY_COLUMN_VALUE); + if (StringUtils.isNotEmpty(value)) { + expressBuilder.append(value); + } + } + + return expressBuilder.toString(); + } + + @Override + public DataStreamSink writeData(DataStream dataSet) { + HbaseOutputFormatBuilder builder = new HbaseOutputFormatBuilder(); + builder.setHbaseConfig(hbaseConfig); + builder.setTableName(tableName); + builder.setEncoding(encoding); + builder.setNullMode(nullMode); + builder.setWalFlag(walFlag); + builder.setWriteBufferSize(writeBufferSize); + builder.setColumnNames(columnNames); + builder.setColumnTypes(columnTypes); + builder.setRowkeyExpress(rowkeyExpress); + builder.setVersionColumnIndex(versionColumnIndex); + builder.setVersionColumnValues(versionColumnValue); + builder.setMonitorUrls(monitorUrls); + builder.setErrorRatio(errorRatio); + builder.setErrors(errors); + builder.setDirtyPath(dirtyPath); + builder.setDirtyHadoopConfig(dirtyHadoopConfig); + builder.setSrcCols(srcCols); + + return createOutput(dataSet, builder.finish()); + } +} diff --git a/flinkx-hbase2/flinkx-hbase-writer2/src/main/java/com/dtstack/flinkx/hbase2/writer/HbaseOutputFormat.java b/flinkx-hbase2/flinkx-hbase-writer2/src/main/java/com/dtstack/flinkx/hbase2/writer/HbaseOutputFormat.java new file mode 100644 index 0000000000..2852f79a21 --- /dev/null +++ b/flinkx-hbase2/flinkx-hbase-writer2/src/main/java/com/dtstack/flinkx/hbase2/writer/HbaseOutputFormat.java @@ -0,0 +1,556 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.dtstack.flinkx.hbase2.writer; + +import com.dtstack.flinkx.constants.ConstantValue; +import com.dtstack.flinkx.enums.ColumnType; +import com.dtstack.flinkx.exception.WriteRecordException; +import com.dtstack.flinkx.hbase2.HbaseHelper; +import com.dtstack.flinkx.hbase2.writer.function.FunctionParser; +import com.dtstack.flinkx.hbase2.writer.function.FunctionTree; +import com.dtstack.flinkx.outputformat.BaseRichOutputFormat; +import com.dtstack.flinkx.util.DateUtil; +import com.google.common.collect.Lists; +import com.google.common.collect.Maps; +import org.apache.commons.lang.StringUtils; +import org.apache.commons.lang3.Validate; +import org.apache.flink.configuration.Configuration; +import org.apache.flink.types.Row; +import org.apache.hadoop.hbase.HConstants; +import org.apache.hadoop.hbase.TableName; +import org.apache.hadoop.hbase.client.*; +import org.apache.hadoop.hbase.util.Bytes; +import org.apache.hadoop.security.UserGroupInformation; + +import java.io.IOException; +import java.nio.charset.Charset; +import java.nio.charset.StandardCharsets; +import java.security.PrivilegedAction; +import java.sql.Timestamp; +import java.text.ParseException; +import java.text.SimpleDateFormat; +import java.util.Date; +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +/** + * The Hbase Implementation of OutputFormat + * + * Company: www.dtstack.com + * @author huyifan.zju@163.com + */ +public class HbaseOutputFormat extends BaseRichOutputFormat { + + protected Map hbaseConfig; + + protected String tableName; + + protected String encoding; + + protected String nullMode; + + protected boolean walFlag; + + protected long writeBufferSize; + + protected List columnTypes; + + protected List columnNames; + + protected String rowkeyExpress; + + protected Integer versionColumnIndex; + + protected String versionColumnValue; + + private transient Connection connection; + + private transient BufferedMutator bufferedMutator; + + private transient FunctionTree functionTree; + + protected List rowKeyColumns = Lists.newArrayList(); + protected List rowKeyColumnIndex = Lists.newArrayList(); + + private transient Map nameMaps; + + private transient Map nameByteMaps ; + + private transient ThreadLocal timeSecondFormatThreadLocal; + + private transient ThreadLocal timeMillisecondFormatThreadLocal; + + private boolean openKerberos = false; + + @Override + public void configure(Configuration parameters) { + } + + @Override + public void openInternal(int taskNumber, int numTasks) throws IOException { + openKerberos = HbaseHelper.openKerberos(hbaseConfig); + if (openKerberos) { + sleepRandomTime(); + + UserGroupInformation ugi = HbaseHelper.getUgi(hbaseConfig); + ugi.doAs(new PrivilegedAction() { + @Override + public Object run() { + openConnection(); + return null; + } + }); + } else { + openConnection(); + } + } + + private void sleepRandomTime() { + try { + Thread.sleep(5000L + (long)(10000 * Math.random())); + } catch (Exception exception) { + LOG.warn("", exception); + } + } + + public void openConnection() { + LOG.info("HbaseOutputFormat configure start"); + nameMaps = Maps.newConcurrentMap(); + nameByteMaps = Maps.newConcurrentMap(); + timeSecondFormatThreadLocal = new ThreadLocal(); + timeMillisecondFormatThreadLocal = new ThreadLocal(); + Validate.isTrue(hbaseConfig != null && hbaseConfig.size() !=0, "hbaseConfig不能为空Map结构!"); + + try { + org.apache.hadoop.conf.Configuration hConfiguration = HbaseHelper.getConfig(hbaseConfig); + connection = ConnectionFactory.createConnection(hConfiguration); + + /** + * 写缓存 + */ + bufferedMutator = connection.getBufferedMutator( + new BufferedMutatorParams(TableName.valueOf(tableName)) + .pool(HTable.getDefaultExecutor(hConfiguration)) + .writeBufferSize(writeBufferSize)); + } catch (Exception e) { + HbaseHelper.closeBufferedMutator(bufferedMutator); + HbaseHelper.closeConnection(connection); + throw new IllegalArgumentException(e); + } + + functionTree = FunctionParser.parse(rowkeyExpress); + rowKeyColumns = FunctionParser.parseRowKeyCol(rowkeyExpress); + for (String rowKeyColumn : rowKeyColumns) { + int index = columnNames.indexOf(rowKeyColumn); + if(index == -1){ + throw new RuntimeException("Can not get row key column from columns:" + rowKeyColumn); + } + rowKeyColumnIndex.add(index); + } + + LOG.info("HbaseOutputFormat configure end"); + } + + @Override + public void writeSingleRecordInternal(Row record) throws WriteRecordException { + int i = 0; + try { + byte[] rowkey = getRowkey(record); + Put put; + if(versionColumnIndex == null) { + put = new Put(rowkey); + if(!walFlag) { + put.setDurability(Durability.SKIP_WAL); + } + } else { + long timestamp = getVersion(record); + put = new Put(rowkey,timestamp); + } + + for (; i < record.getArity(); ++i) { + if(rowKeyColumnIndex.contains(i)){ + continue; + } + + String type = columnTypes.get(i); + String name = columnNames.get(i); + String[] cfAndQualifier = nameMaps.get(name); + byte[][] cfAndQualifierBytes = nameByteMaps.get(name); + if(cfAndQualifier == null || cfAndQualifierBytes == null){ + cfAndQualifier = name.split(":"); + if(cfAndQualifier.length == 2 + && StringUtils.isNotBlank(cfAndQualifier[0]) + && StringUtils.isNotBlank(cfAndQualifier[1])){ + nameMaps.put(name,cfAndQualifier); + cfAndQualifierBytes = new byte[2][]; + cfAndQualifierBytes[0] = Bytes.toBytes(cfAndQualifier[0]); + cfAndQualifierBytes[1] = Bytes.toBytes(cfAndQualifier[1]); + nameByteMaps.put(name,cfAndQualifierBytes); + } else { + throw new IllegalArgumentException("Hbasewriter 中,column 的列配置格式应该是:列族:列名. 您配置的列错误:" + name); + } + } + + ColumnType columnType = ColumnType.getType(type); + byte[] columnBytes = getColumnByte(columnType, record.getField(i)); + //columnBytes 为null忽略这列 + if(null != columnBytes){ + put.addColumn( + cfAndQualifierBytes[0], + cfAndQualifierBytes[1], + columnBytes); + } + } + + bufferedMutator.mutate(put); + } catch(Exception ex) { + if(i < record.getArity()) { + throw new WriteRecordException(recordConvertDetailErrorMessage(i, record), ex, i, record); + } + throw new WriteRecordException(ex.getMessage(), ex); + } + } + + private SimpleDateFormat getSimpleDateFormat(String sign){ + SimpleDateFormat format; + if(ConstantValue.TIME_SECOND_SUFFIX.equals(sign)){ + format = timeSecondFormatThreadLocal.get(); + if(format == null){ + format = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss"); + timeSecondFormatThreadLocal.set(format); + } + } else { + format = timeMillisecondFormatThreadLocal.get(); + if(format == null){ + format = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss SSS"); + timeMillisecondFormatThreadLocal.set(format); + } + } + + return format; + } + + @Override + protected String recordConvertDetailErrorMessage(int pos, Row row) { + return "\nHbaseOutputFormat [" + jobName + "] writeRecord error: when converting field[" + columnNames.get(pos) + "] in Row(" + row + ")"; + } + + @Override + protected void writeMultipleRecordsInternal() throws Exception { + notSupportBatchWrite("HbaseWriter"); + } + + private byte[] getRowkey(Row record) throws Exception{ + Map nameValueMap = new HashMap<>((rowKeyColumnIndex.size()<<2)/3); + for (Integer keyColumnIndex : rowKeyColumnIndex) { + nameValueMap.put(columnNames.get(keyColumnIndex), record.getField(keyColumnIndex)); + } + + String rowKeyStr = functionTree.evaluate(nameValueMap); + return rowKeyStr.getBytes(StandardCharsets.UTF_8); + } + + public long getVersion(Row record){ + Integer index = versionColumnIndex.intValue(); + long timestamp; + if(index == null){ + //指定时间作为版本 + timestamp = Long.valueOf(versionColumnValue); + if(timestamp < 0){ + throw new IllegalArgumentException("Illegal timestamp to construct versionClumn: " + timestamp); + } + }else{ + //指定列作为版本,long/doubleColumn直接record.aslong, 其它类型尝试用yyyy-MM-dd HH:mm:ss,yyyy-MM-dd HH:mm:ss SSS去format + if(index >= record.getArity() || index < 0){ + throw new IllegalArgumentException("version column index out of range: " + index); + } + if(record.getField(index) == null){ + throw new IllegalArgumentException("null verison column!"); + } + SimpleDateFormat dfSeconds = getSimpleDateFormat(ConstantValue.TIME_SECOND_SUFFIX); + SimpleDateFormat dfMs = getSimpleDateFormat(ConstantValue.TIME_MILLISECOND_SUFFIX); + Object column = record.getField(index); + if(column instanceof Long){ + Long longValue = (Long) column; + timestamp = longValue; + } else if (column instanceof Double){ + Double doubleValue = (Double) column; + timestamp = doubleValue.longValue(); + } else if (column instanceof String){ + Date date; + try{ + + date = dfMs.parse((String) column); + }catch (ParseException e){ + try { + date = dfSeconds.parse((String) column); + } catch (ParseException e1) { + LOG.info(String.format("您指定第[%s]列作为hbase写入版本,但在尝试用yyyy-MM-dd HH:mm:ss 和 yyyy-MM-dd HH:mm:ss SSS 去解析为Date时均出错,请检查并修改",index)); + throw new RuntimeException(e1); + } + } + timestamp = date.getTime(); + } else if (column instanceof Date) { + timestamp = ((Date) column).getTime(); + } else { + throw new RuntimeException("rowkey类型不兼容: " + column.getClass()); + } + } + return timestamp; + } + + public byte[] getValueByte(ColumnType columnType, String value){ + byte[] bytes; + if(value != null){ + switch (columnType) { + case INT: + bytes = Bytes.toBytes(Integer.parseInt(value)); + break; + case LONG: + bytes = Bytes.toBytes(Long.parseLong(value)); + break; + case DOUBLE: + bytes = Bytes.toBytes(Double.parseDouble(value)); + break; + case FLOAT: + bytes = Bytes.toBytes(Float.parseFloat(value)); + break; + case SHORT: + bytes = Bytes.toBytes(Short.parseShort(value)); + break; + case BOOLEAN: + bytes = Bytes.toBytes(Boolean.parseBoolean(value)); + break; + case STRING: + bytes = value.getBytes(Charset.forName(encoding)); + break; + default: + throw new IllegalArgumentException("Unsupported column type: " + columnType); + } + }else{ + bytes = HConstants.EMPTY_BYTE_ARRAY; + } + return bytes; + } + + public byte[] getColumnByte(ColumnType columnType, Object column){ + byte[] bytes; + if(column != null){ + switch (columnType) { + case INT: + bytes = intToBytes(column); + break; + case LONG: + bytes = longToBytes(column); + break; + case DOUBLE: + bytes = doubleToBytes(column); + break; + case FLOAT: + bytes = floatToBytes(column); + break; + case SHORT: + bytes = shortToBytes(column); + break; + case BOOLEAN: + bytes = boolToBytes(column); + break; + case STRING: + String stringValue; + if (column instanceof Timestamp){ + SimpleDateFormat fm = DateUtil.getDateTimeFormatter(); + stringValue = fm.format(column); + }else { + stringValue = String.valueOf(column); + } + bytes = this.getValueByte(columnType, stringValue); + break; + default: + throw new IllegalArgumentException("Unsupported column type: " + columnType); + } + } else { + switch (nullMode.toUpperCase()){ + case "SKIP": + bytes = null; + break; + case "EMPTY": + bytes = HConstants.EMPTY_BYTE_ARRAY; + break; + default: + throw new IllegalArgumentException("Unsupported null mode: " + nullMode); + } + } + return bytes; + } + + private byte[] intToBytes(Object column) { + Integer intValue = null; + if(column instanceof Integer) { + intValue = (Integer) column; + } else if(column instanceof Long) { + intValue = ((Long) column).intValue(); + } else if(column instanceof Double) { + intValue = ((Double) column).intValue(); + } else if(column instanceof Float) { + intValue = ((Float) column).intValue(); + } else if(column instanceof Short) { + intValue = ((Short) column).intValue(); + } else if(column instanceof Boolean) { + intValue = (Boolean) column ? 1 : 0; + } else if(column instanceof String) { + intValue = Integer.valueOf((String) column); + } else { + throw new RuntimeException("Can't convert from " + column.getClass() + " to INT"); + } + + return Bytes.toBytes(intValue); + } + + private byte[] longToBytes(Object column) { + Long longValue = null; + if(column instanceof Integer) { + longValue = ((Integer)column).longValue(); + } else if(column instanceof Long) { + longValue = (Long) column; + } else if(column instanceof Double) { + longValue = ((Double) column).longValue(); + } else if(column instanceof Float) { + longValue = ((Float) column).longValue(); + } else if(column instanceof Short) { + longValue = ((Short) column).longValue(); + } else if(column instanceof Boolean) { + longValue = (Boolean) column ? 1L : 0L; + } else if(column instanceof String) { + longValue = Long.valueOf((String) column); + }else if (column instanceof Timestamp){ + longValue = ((Timestamp) column).getTime(); + }else { + throw new RuntimeException("Can't convert from " + column.getClass() + " to LONG"); + } + + return Bytes.toBytes(longValue); + } + + private byte[] doubleToBytes(Object column) { + Double doubleValue = null; + if(column instanceof Integer) { + doubleValue = ((Integer)column).doubleValue(); + } else if(column instanceof Long) { + doubleValue = ((Long) column).doubleValue(); + } else if(column instanceof Double) { + doubleValue = (Double) column; + } else if(column instanceof Float) { + doubleValue = ((Float) column).doubleValue(); + } else if(column instanceof Short) { + doubleValue = ((Short) column).doubleValue(); + } else if(column instanceof Boolean) { + doubleValue = (Boolean) column ? 1.0 : 0.0; + } else if(column instanceof String) { + doubleValue = Double.valueOf((String) column); + } else { + throw new RuntimeException("Can't convert from " + column.getClass() + " to DOUBLE"); + } + + return Bytes.toBytes(doubleValue); + } + + private byte[] floatToBytes(Object column) { + Float floatValue = null; + if(column instanceof Integer) { + floatValue = ((Integer)column).floatValue(); + } else if(column instanceof Long) { + floatValue = ((Long) column).floatValue(); + } else if(column instanceof Double) { + floatValue = ((Double) column).floatValue(); + } else if(column instanceof Float) { + floatValue = (Float) column; + } else if(column instanceof Short) { + floatValue = ((Short) column).floatValue(); + } else if(column instanceof Boolean) { + floatValue = (Boolean) column ? 1.0f : 0.0f; + } else if(column instanceof String) { + floatValue = Float.valueOf((String) column); + } else { + throw new RuntimeException("Can't convert from " + column.getClass() + " to DOUBLE"); + } + + return Bytes.toBytes(floatValue); + } + + private byte[] shortToBytes(Object column) { + Short shortValue = null; + if(column instanceof Integer) { + shortValue = ((Integer)column).shortValue(); + } else if(column instanceof Long) { + shortValue = ((Long) column).shortValue(); + } else if(column instanceof Double) { + shortValue = ((Double) column).shortValue(); + } else if(column instanceof Float) { + shortValue = ((Float) column).shortValue(); + } else if(column instanceof Short) { + shortValue = (Short) column; + } else if(column instanceof Boolean) { + shortValue = (Boolean) column ? (short) 1 : (short) 0 ; + } else if(column instanceof String) { + shortValue = Short.valueOf((String) column); + } else { + throw new RuntimeException("Can't convert from " + column.getClass() + " to SHORT"); + } + return Bytes.toBytes(shortValue); + } + + private byte[] boolToBytes(Object column) { + Boolean booleanValue = null; + if(column instanceof Integer) { + booleanValue = (Integer) column != 0; + } else if(column instanceof Long) { + booleanValue = (Long) column != 0L; + } else if(column instanceof Double) { + booleanValue = new Double(0.0).compareTo((Double) column) != 0; + } else if(column instanceof Float) { + booleanValue = new Float(0.0f).compareTo((Float) column) != 0; + } else if(column instanceof Short) { + booleanValue = (Short) column != 0; + } else if(column instanceof Boolean) { + booleanValue = (Boolean) column; + } else if(column instanceof String) { + booleanValue = Boolean.valueOf((String)column); + } else { + throw new RuntimeException("Can't convert from " + column.getClass() + " to SHORT"); + } + + return Bytes.toBytes(booleanValue); + } + + @Override + public void closeInternal() throws IOException { + if (null != timeSecondFormatThreadLocal) { + timeSecondFormatThreadLocal.remove(); + } + + if (null != timeMillisecondFormatThreadLocal) { + timeMillisecondFormatThreadLocal.remove(); + } + + HbaseHelper.closeBufferedMutator(bufferedMutator); + HbaseHelper.closeConnection(connection); + } + +} diff --git a/flinkx-hbase2/flinkx-hbase-writer2/src/main/java/com/dtstack/flinkx/hbase2/writer/HbaseOutputFormatBuilder.java b/flinkx-hbase2/flinkx-hbase-writer2/src/main/java/com/dtstack/flinkx/hbase2/writer/HbaseOutputFormatBuilder.java new file mode 100644 index 0000000000..0b6b52cc6e --- /dev/null +++ b/flinkx-hbase2/flinkx-hbase-writer2/src/main/java/com/dtstack/flinkx/hbase2/writer/HbaseOutputFormatBuilder.java @@ -0,0 +1,117 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.dtstack.flinkx.hbase2.writer; + +import com.dtstack.flinkx.hbase2.HbaseConfigConstants; +import com.dtstack.flinkx.outputformat.BaseRichOutputFormatBuilder; +import com.google.common.base.Preconditions; +import org.apache.commons.lang.StringUtils; + +import java.util.List; +import java.util.Map; + +/** + * The Builder class of HbaseOutputFormatBuilder + * + * Company: www.dtstack.com + * @author huyifan.zju@163.com + */ +public class HbaseOutputFormatBuilder extends BaseRichOutputFormatBuilder { + + private HbaseOutputFormat format; + + public HbaseOutputFormatBuilder() { + super.format = format = new HbaseOutputFormat(); + } + + public void setTableName(String tableName) { + format.tableName = tableName; + } + + public void setHbaseConfig(Map hbaseConfig) { + format.hbaseConfig = hbaseConfig; + } + + public void setColumnTypes(List columnTypes) { + format.columnTypes = columnTypes; + } + + public void setColumnNames(List columnNames) { + format.columnNames = columnNames; + } + + public void setRowkeyExpress(String rowkeyExpress) { + format.rowkeyExpress = rowkeyExpress; + } + + public void setVersionColumnIndex(Integer versionColumnIndex) { + format.versionColumnIndex = versionColumnIndex; + } + + public void setVersionColumnValues(String versionColumnValue) { + format.versionColumnValue = versionColumnValue; + } + + public void setEncoding(String encoding) { + if(StringUtils.isEmpty(encoding)) { + format.encoding = HbaseConfigConstants.DEFAULT_ENCODING; + } else { + format.encoding = encoding; + } + } + + public void setWriteBufferSize(Long writeBufferSize) { + if(writeBufferSize == null || writeBufferSize.longValue() == 0L) { + format.writeBufferSize = HbaseConfigConstants.DEFAULT_WRITE_BUFFER_SIZE; + } else { + format.writeBufferSize = writeBufferSize; + } + } + + public void setNullMode(String nullMode) { + if(StringUtils.isEmpty(nullMode)) { + format.nullMode = HbaseConfigConstants.DEFAULT_NULL_MODE; + } else { + format.nullMode = nullMode; + } + } + + public void setWalFlag(Boolean walFlag) { + if(walFlag == null) { + format.walFlag = false; + } else { + format.walFlag = walFlag; + } + } + + @Override + protected void checkFormat() { + Preconditions.checkArgument(StringUtils.isNotEmpty(format.tableName)); + Preconditions.checkNotNull(format.hbaseConfig); + Preconditions.checkNotNull(format.columnNames); + Preconditions.checkNotNull(format.columnTypes); + Preconditions.checkNotNull(format.rowkeyExpress); + + if (format.getRestoreConfig() != null && format.getRestoreConfig().isRestore()){ + throw new UnsupportedOperationException("This plugin not support restore from failed state"); + } + + notSupportBatchWrite("HbaseWriter"); + } +} diff --git a/flinkx-metadata-phoenix5/flinkx-metadata-phoenix5-reader/src/test/java/com/dtstack/flinkx/metadataphoenix5/inputformat/Metadataphoenix5InputFormatTest.java b/flinkx-hbase2/flinkx-hbase-writer2/src/main/java/com/dtstack/flinkx/hbase2/writer/function/ConstantFunction.java similarity index 62% rename from flinkx-metadata-phoenix5/flinkx-metadata-phoenix5-reader/src/test/java/com/dtstack/flinkx/metadataphoenix5/inputformat/Metadataphoenix5InputFormatTest.java rename to flinkx-hbase2/flinkx-hbase-writer2/src/main/java/com/dtstack/flinkx/hbase2/writer/function/ConstantFunction.java index 677e037e4c..983a38870f 100644 --- a/flinkx-metadata-phoenix5/flinkx-metadata-phoenix5-reader/src/test/java/com/dtstack/flinkx/metadataphoenix5/inputformat/Metadataphoenix5InputFormatTest.java +++ b/flinkx-hbase2/flinkx-hbase-writer2/src/main/java/com/dtstack/flinkx/hbase2/writer/function/ConstantFunction.java @@ -16,24 +16,30 @@ * limitations under the License. */ -package com.dtstack.flinkx.metadataphoenix5.inputformat; -import org.junit.Assert; -import org.junit.Test; +package com.dtstack.flinkx.hbase2.writer.function; -public class Metadataphoenix5InputFormatTest { +/** + * @author jiangbo + * @date 2019/7/25 + */ +public class ConstantFunction implements IFunction { + + private Object value; - protected Metadataphoenix5InputFormat inputFormat = new Metadataphoenix5InputFormat(); + public ConstantFunction() { + } - @Test - public void testSetPath(){ - inputFormat.setPath("/hbase"); - Assert.assertEquals(inputFormat.path, "/hbase/table"); + public ConstantFunction(Object value) { + this.value = value; } - @Test - public void testQuote(){ - Assert.assertEquals(inputFormat.quote("test"), "Test"); + @Override + public String evaluate(Object val) { + return String.valueOf(value); } + public void setValue(Object value) { + this.value = value; + } } diff --git a/flinkx-hbase2/flinkx-hbase-writer2/src/main/java/com/dtstack/flinkx/hbase2/writer/function/FunctionFactory.java b/flinkx-hbase2/flinkx-hbase-writer2/src/main/java/com/dtstack/flinkx/hbase2/writer/function/FunctionFactory.java new file mode 100644 index 0000000000..21390d7c97 --- /dev/null +++ b/flinkx-hbase2/flinkx-hbase-writer2/src/main/java/com/dtstack/flinkx/hbase2/writer/function/FunctionFactory.java @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.dtstack.flinkx.hbase2.writer.function; + +import org.apache.commons.lang.StringUtils; + +/** + * @company: www.dtstack.com + * @author: toutian + * @create: 2019/7/23 + */ +public class FunctionFactory { + + public static IFunction createFuntion(String functionName) { + if (StringUtils.isBlank(functionName)) { + throw new UnsupportedOperationException("function name can't be null!"); + } + + IFunction function = null; + switch (functionName.toUpperCase()) { + case "MD5": + function = new Md5Function(); + break; + case "STRING": + function = new StringFunction(); + break; + case "CONSTANT": + function = new ConstantFunction(); + break; + default: + throw new UnsupportedOperationException(String.format("function name[%s] don't exist!", functionName)); + } + return function; + } + +} diff --git a/flinkx-hbase2/flinkx-hbase-writer2/src/main/java/com/dtstack/flinkx/hbase2/writer/function/FunctionParser.java b/flinkx-hbase2/flinkx-hbase-writer2/src/main/java/com/dtstack/flinkx/hbase2/writer/function/FunctionParser.java new file mode 100644 index 0000000000..e28b6e54b4 --- /dev/null +++ b/flinkx-hbase2/flinkx-hbase-writer2/src/main/java/com/dtstack/flinkx/hbase2/writer/function/FunctionParser.java @@ -0,0 +1,180 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + + +package com.dtstack.flinkx.hbase2.writer.function; + + +import org.apache.commons.lang.StringUtils; + +import java.util.ArrayList; +import java.util.List; +import java.util.regex.Matcher; +import java.util.regex.Pattern; + +/** + * @author jiangbo + * @date 2019/7/24 + */ +public class FunctionParser { + + private static final String COL_REGEX = "\\$\\([^\\(\\)]+?\\)"; + private static Pattern COL_PATTERN = Pattern.compile(COL_REGEX); + + private static String LEFT_KUO = "("; + private static String RIGHT_KUO = ")"; + private static String DELIM = "_"; + + public static List parseRowKeyCol(String express){ + List columnNames = new ArrayList<>(); + Matcher matcher = COL_PATTERN.matcher(express); + while (matcher.find()) { + String colExpre = matcher.group(); + String col = colExpre.substring(colExpre.indexOf(LEFT_KUO)+1, colExpre.indexOf(RIGHT_KUO)); + columnNames.add(col); + } + + return columnNames; + } + + public static FunctionTree parse(String express){ + if(StringUtils.isEmpty(express)){ + throw new RuntimeException("Row key column express can not be null"); + } + + if(StringUtils.isEmpty(express.trim())){ + throw new RuntimeException("Row key column express can not be empty"); + } + + express = replaceColToStringFunc(express); + + FunctionTree root = new FunctionTree(); + root.setFunction(new StringFunction()); + + if(express.startsWith(DELIM)){ + FunctionTree child = new FunctionTree(); + child.setFunction(new ConstantFunction("")); + root.addInputFunction(child); + express = express.substring(1); + } + + parseFunction(root, express); + + if(express.endsWith(DELIM)){ + FunctionTree child = new FunctionTree(); + child.setFunction(new ConstantFunction("")); + root.addInputFunction(child); + } + + return root; + } + + private static void parseFunction(FunctionTree root, String express){ + int leftBracketsIndex = express.indexOf("("); + if (leftBracketsIndex == -1){ + root.setColumnName(express); + } else { + int rightBracketsIndex = findRightBrackets(leftBracketsIndex, express); + if(rightBracketsIndex == -1){ + throw new IllegalArgumentException("Illegal express:" + express); + } + + String value = express.substring(0, leftBracketsIndex); + if(StringUtils.isEmpty(value)){ + throw new IllegalArgumentException("Parse function from express fail,function name can not be empty"); + } + + if(value.startsWith(DELIM)){ + value = value.substring(1); + } + + String[] splits = value.split(DELIM); + for (int i = 0; i < splits.length-1; i++) { + FunctionTree child = new FunctionTree(); + child.setFunction(new ConstantFunction(splits[i])); + root.addInputFunction(child); + } + + FunctionTree child = new FunctionTree(); + child.setFunction(FunctionFactory.createFuntion(splits[splits.length -1 ])); + root.addInputFunction(child); + + String subExpress = express.substring(leftBracketsIndex+1, rightBracketsIndex); + parseFunction(child, subExpress); + + String leftExpress = express.substring(rightBracketsIndex+1); + processLeftExpress(leftExpress, root); + } + } + + private static void processLeftExpress(String leftExpress, FunctionTree root){ + if(StringUtils.isEmpty(leftExpress)){ + return; + } + + if (leftExpress.contains(LEFT_KUO)) { + parseFunction(root, leftExpress); + } else { + if(leftExpress.startsWith(DELIM)){ + leftExpress = leftExpress.substring(1); + } + + if(StringUtils.isEmpty(leftExpress)){ + return; + } + + String[] splits = leftExpress.split(DELIM); + for (int i = 0; i < splits.length; i++) { + FunctionTree child = new FunctionTree(); + child.setFunction(new ConstantFunction(splits[i])); + root.addInputFunction(child); + } + } + } + + private static int findRightBrackets(int startIndex, String express){ + boolean hasMeddleBrackets = false; + for (int i = startIndex+1; i < express.length(); i++) { + char c = express.charAt(i); + if('(' == c){ + hasMeddleBrackets = true; + } + + if(')' == c){ + if(hasMeddleBrackets){ + hasMeddleBrackets = false; + } else { + return i; + } + } + } + + return -1; + } + + public static String replaceColToStringFunc(String express){ + Matcher matcher = COL_PATTERN.matcher(express); + while (matcher.find()) { + String columnExpress = matcher.group(); + String column = columnExpress.substring(2, columnExpress.length() - 1); + express = express.replace(columnExpress, String.format("string(%s)", column)); + } + + return express; + } +} diff --git a/flinkx-hbase2/flinkx-hbase-writer2/src/main/java/com/dtstack/flinkx/hbase2/writer/function/FunctionTree.java b/flinkx-hbase2/flinkx-hbase-writer2/src/main/java/com/dtstack/flinkx/hbase2/writer/function/FunctionTree.java new file mode 100644 index 0000000000..de4c1bc45e --- /dev/null +++ b/flinkx-hbase2/flinkx-hbase-writer2/src/main/java/com/dtstack/flinkx/hbase2/writer/function/FunctionTree.java @@ -0,0 +1,80 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + + +package com.dtstack.flinkx.hbase2.writer.function; + +import com.google.common.collect.Lists; +import org.apache.commons.collections.CollectionUtils; +import org.apache.commons.collections.MapUtils; +import org.apache.commons.lang.StringUtils; + +import java.util.ArrayList; +import java.util.List; +import java.util.Map; + +/** + * @author jiangbo + * @date 2019/7/24 + */ +public class FunctionTree { + + private String columnName; + + private IFunction function; + + private List inputFunctions = Lists.newArrayList(); + + public String evaluate(Map nameValueMap) throws Exception{ + if(StringUtils.isNotEmpty(columnName) && MapUtils.isNotEmpty(nameValueMap)){ + return function.evaluate(nameValueMap.get(columnName)); + } + + if(CollectionUtils.isNotEmpty(inputFunctions)){ + List subTaskVal = new ArrayList<>(); + for (FunctionTree inputFunction : inputFunctions) { + subTaskVal.add(inputFunction.evaluate(nameValueMap)); + } + + return function.evaluate(StringUtils.join(subTaskVal, "_")); + } else { + return function.evaluate(null); + } + } + + public void addInputFunction(FunctionTree inputFunction){ + inputFunctions.add(inputFunction); + } + + public String getColumnName() { + return columnName; + } + + public void setColumnName(String columnName) { + this.columnName = columnName; + } + + public IFunction getFunction() { + return function; + } + + public void setFunction(IFunction function) { + this.function = function; + } + +} diff --git a/flinkx-socket/flinkx-socket-core/src/main/java/com/dtstack/flinkx/socket/constants/SocketCons.java b/flinkx-hbase2/flinkx-hbase-writer2/src/main/java/com/dtstack/flinkx/hbase2/writer/function/IFunction.java similarity index 63% rename from flinkx-socket/flinkx-socket-core/src/main/java/com/dtstack/flinkx/socket/constants/SocketCons.java rename to flinkx-hbase2/flinkx-hbase-writer2/src/main/java/com/dtstack/flinkx/hbase2/writer/function/IFunction.java index b762f935b4..5044086a76 100644 --- a/flinkx-socket/flinkx-socket-core/src/main/java/com/dtstack/flinkx/socket/constants/SocketCons.java +++ b/flinkx-hbase2/flinkx-hbase-writer2/src/main/java/com/dtstack/flinkx/hbase2/writer/function/IFunction.java @@ -16,26 +16,21 @@ * limitations under the License. */ -package com.dtstack.flinkx.socket.constants; +package com.dtstack.flinkx.hbase2.writer.function; /** - * @author kunni@dtstack.com + * @company: www.dtstack.com + * @author: toutian + * @create: 2019/7/23 */ - -public class SocketCons { - - /** - * 设置一个socket client失败时的标志 - */ - public static final String KEY_EXIT0 = "exit0 "; - - public static final String DEFAULT_ENCODING = "UTF-8"; +public interface IFunction { /** - * reader读取的常量 + * 具体的计算方法 + * + * @param val 输入参数 + * @return 计算结果 + * @throws Exception 捕获的异常,异常类型不确定 */ - public static final String KEY_ADDRESS = "address"; - public static final String KEY_PARSE = "parse"; - public static final String KEY_ENCODING = "encoding"; - + String evaluate(Object val) throws Exception; } diff --git a/flinkx-metadata-hive1/flinkx-metadata-hive1-reader/src/main/java/com/dtstack/flinkx/metadatahive1/reader/Metadatahive1Reader.java b/flinkx-hbase2/flinkx-hbase-writer2/src/main/java/com/dtstack/flinkx/hbase2/writer/function/Md5Function.java similarity index 59% rename from flinkx-metadata-hive1/flinkx-metadata-hive1-reader/src/main/java/com/dtstack/flinkx/metadatahive1/reader/Metadatahive1Reader.java rename to flinkx-hbase2/flinkx-hbase-writer2/src/main/java/com/dtstack/flinkx/hbase2/writer/function/Md5Function.java index 8c27f34c68..17c18a8ede 100644 --- a/flinkx-metadata-hive1/flinkx-metadata-hive1-reader/src/main/java/com/dtstack/flinkx/metadatahive1/reader/Metadatahive1Reader.java +++ b/flinkx-hbase2/flinkx-hbase-writer2/src/main/java/com/dtstack/flinkx/hbase2/writer/function/Md5Function.java @@ -16,19 +16,19 @@ * limitations under the License. */ -package com.dtstack.flinkx.metadatahive1.reader; +package com.dtstack.flinkx.hbase2.writer.function; -import com.dtstack.flinkx.config.DataTransferConfig; -import com.dtstack.flinkx.metadatahive2.reader.Metadatahive2Reader; -import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; +import com.dtstack.flinkx.util.Md5Util; -public class Metadatahive1Reader extends Metadatahive2Reader { - - public static final String DRIVER_NAME = "org.apache.hive.jdbc.HiveDriver"; +/** + * @company: www.dtstack.com + * @author: toutian + * @create: 2019/7/23 + */ +public class Md5Function implements IFunction { - public Metadatahive1Reader(DataTransferConfig config, StreamExecutionEnvironment env) { - super(config, env); - driverName = DRIVER_NAME; + @Override + public String evaluate(Object str) throws Exception{ + return Md5Util.getMd5(str.toString()); } - } diff --git a/flinkx-hbase2/flinkx-hbase-writer2/src/main/java/com/dtstack/flinkx/hbase2/writer/function/StringFunction.java b/flinkx-hbase2/flinkx-hbase-writer2/src/main/java/com/dtstack/flinkx/hbase2/writer/function/StringFunction.java new file mode 100644 index 0000000000..24afe1b8e6 --- /dev/null +++ b/flinkx-hbase2/flinkx-hbase-writer2/src/main/java/com/dtstack/flinkx/hbase2/writer/function/StringFunction.java @@ -0,0 +1,32 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.dtstack.flinkx.hbase2.writer.function; + +/** + * @company: www.dtstack.com + * @author: toutian + * @create: 2019/7/23 + */ +public class StringFunction implements IFunction { + + @Override + public String evaluate(Object str) { + return str.toString(); + } +} diff --git a/flinkx-metadata-oracle/pom.xml b/flinkx-hbase2/pom.xml similarity index 81% rename from flinkx-metadata-oracle/pom.xml rename to flinkx-hbase2/pom.xml index 7521edbe8f..d55bcc0443 100644 --- a/flinkx-metadata-oracle/pom.xml +++ b/flinkx-hbase2/pom.xml @@ -9,10 +9,12 @@ 4.0.0 - flinkx-metadata-oracle + flinkx-hbase2 pom - flinkx-metadata-oracle-reader + flinkx-hbase-core2 + flinkx-hbase-reader2 + flinkx-hbase-writer2 diff --git a/flinkx-hdfs/flinkx-hdfs-core/src/main/java/com/dtstack/flinkx/hdfs/ECompressType.java b/flinkx-hdfs/flinkx-hdfs-core/src/main/java/com/dtstack/flinkx/hdfs/ECompressType.java index 2f0c7ec101..3ff1b66de2 100644 --- a/flinkx-hdfs/flinkx-hdfs-core/src/main/java/com/dtstack/flinkx/hdfs/ECompressType.java +++ b/flinkx-hdfs/flinkx-hdfs-core/src/main/java/com/dtstack/flinkx/hdfs/ECompressType.java @@ -32,6 +32,7 @@ public enum ECompressType { */ TEXT_GZIP("GZIP", "text", ".gz", 0.331F), TEXT_BZIP2("BZIP2", "text", ".bz2", 0.259F), + TEXT_LZO("LZO", "text", ".lzo", 1.0F), TEXT_NONE("NONE", "text", "", 0.637F), /** diff --git a/flinkx-hdfs/flinkx-hdfs-reader/pom.xml b/flinkx-hdfs/flinkx-hdfs-reader/pom.xml index 7d8d5b07cf..57a6ad9d67 100644 --- a/flinkx-hdfs/flinkx-hdfs-reader/pom.xml +++ b/flinkx-hdfs/flinkx-hdfs-reader/pom.xml @@ -130,7 +130,7 @@ under the License. + tofile="${basedir}/../../syncplugins/hdfsreader/${project.name}-${package.name}.jar" /> diff --git a/flinkx-hdfs/flinkx-hdfs-reader/src/main/java/com/dtstack/flinkx/hdfs/reader/BaseHdfsInputFormat.java b/flinkx-hdfs/flinkx-hdfs-reader/src/main/java/com/dtstack/flinkx/hdfs/reader/BaseHdfsInputFormat.java index ecd714029c..c1289beab8 100644 --- a/flinkx-hdfs/flinkx-hdfs-reader/src/main/java/com/dtstack/flinkx/hdfs/reader/BaseHdfsInputFormat.java +++ b/flinkx-hdfs/flinkx-hdfs-reader/src/main/java/com/dtstack/flinkx/hdfs/reader/BaseHdfsInputFormat.java @@ -95,6 +95,7 @@ public void openInputFormat() throws IOException { protected JobConf buildConfig() { JobConf conf = FileSystemUtil.getJobConf(hadoopConfig, defaultFs); conf.set(HdfsPathFilter.KEY_REGEX, filterRegex); + conf.set(HdfsPathFilter.KEY_PATH, inputPath); FileSystemUtil.setHadoopUserName(conf); return conf; } diff --git a/flinkx-hdfs/flinkx-hdfs-reader/src/main/java/com/dtstack/flinkx/hdfs/reader/HdfsPathFilter.java b/flinkx-hdfs/flinkx-hdfs-reader/src/main/java/com/dtstack/flinkx/hdfs/reader/HdfsPathFilter.java index ae208b9a6e..e6434a3b26 100644 --- a/flinkx-hdfs/flinkx-hdfs-reader/src/main/java/com/dtstack/flinkx/hdfs/reader/HdfsPathFilter.java +++ b/flinkx-hdfs/flinkx-hdfs-reader/src/main/java/com/dtstack/flinkx/hdfs/reader/HdfsPathFilter.java @@ -37,9 +37,12 @@ public class HdfsPathFilter implements HdfsConfigurablePathFilter { private String regex; + private String parentPath; + private static final String DEFAULT_REGEX = ".*"; public static final String KEY_REGEX = "file.path.regexFilter"; + public static final String KEY_PATH = "file.path"; private static final PathFilter HIDDEN_FILE_FILTER = p -> { String name = p.getName(); @@ -65,12 +68,22 @@ public boolean accept(Path path) { return false; } + if (path.toUri().getPath().equals(parentPath)) { + return true; + } + return PATTERN.matcher(path.getName()).matches(); } @Override public void configure(JobConf jobConf) { this.regex = jobConf.get(KEY_REGEX); + + String path = jobConf.get(KEY_PATH); + if (StringUtils.isNotEmpty(path)) { + this.parentPath = new Path(path).toUri().getPath(); + } + compileRegex(); } } diff --git a/flinkx-hdfs/flinkx-hdfs-writer/pom.xml b/flinkx-hdfs/flinkx-hdfs-writer/pom.xml index fbc18cc948..5f705c5b62 100644 --- a/flinkx-hdfs/flinkx-hdfs-writer/pom.xml +++ b/flinkx-hdfs/flinkx-hdfs-writer/pom.xml @@ -34,6 +34,24 @@ under the License. + + + org.anarres.lzo + lzo-core + 1.0.2 + + + org.anarres.lzo + lzo-hadoop + 1.0.5 + + + hadoop-core + org.apache.hadoop + + + + com.dtstack.flinkx flinkx-hdfs-core @@ -133,7 +151,7 @@ under the License. + tofile="${basedir}/../../syncplugins/hdfswriter/${project.name}-${package.name}.jar" /> diff --git a/flinkx-hdfs/flinkx-hdfs-writer/src/main/java/com/dtstack/flinkx/hdfs/writer/HdfsParquetOutputFormat.java b/flinkx-hdfs/flinkx-hdfs-writer/src/main/java/com/dtstack/flinkx/hdfs/writer/HdfsParquetOutputFormat.java index f635c61348..02841a3ea1 100644 --- a/flinkx-hdfs/flinkx-hdfs-writer/src/main/java/com/dtstack/flinkx/hdfs/writer/HdfsParquetOutputFormat.java +++ b/flinkx-hdfs/flinkx-hdfs-writer/src/main/java/com/dtstack/flinkx/hdfs/writer/HdfsParquetOutputFormat.java @@ -26,6 +26,7 @@ import com.dtstack.flinkx.util.DateUtil; import com.dtstack.flinkx.util.FileSystemUtil; import com.dtstack.flinkx.util.GsonUtil; +import com.dtstack.flinkx.util.StringUtil; import org.apache.commons.lang.StringUtils; import org.apache.flink.types.Row; import org.apache.hadoop.fs.FileSystem; @@ -246,7 +247,7 @@ private void addDataToGroup(Group group, Object valObj, int i) throws Exception{ group.add(colName,val); } break; - case "boolean" : group.add(colName,Boolean.parseBoolean(val));break; + case "boolean" : group.add(colName,StringUtil.parseBoolean(val));break; case "timestamp" : Timestamp ts = DateUtil.columnToTimestamp(valObj,null); byte[] dst = HdfsUtil.longToByteArray(ts.getTime()); diff --git a/flinkx-hdfs/flinkx-hdfs-writer/src/main/java/com/dtstack/flinkx/hdfs/writer/HdfsTextOutputFormat.java b/flinkx-hdfs/flinkx-hdfs-writer/src/main/java/com/dtstack/flinkx/hdfs/writer/HdfsTextOutputFormat.java index b5a782d51c..daa9b32478 100644 --- a/flinkx-hdfs/flinkx-hdfs-writer/src/main/java/com/dtstack/flinkx/hdfs/writer/HdfsTextOutputFormat.java +++ b/flinkx-hdfs/flinkx-hdfs-writer/src/main/java/com/dtstack/flinkx/hdfs/writer/HdfsTextOutputFormat.java @@ -23,14 +23,15 @@ import com.dtstack.flinkx.hdfs.ECompressType; import com.dtstack.flinkx.hdfs.HdfsUtil; import com.dtstack.flinkx.util.DateUtil; -import com.dtstack.flinkx.util.GsonUtil; -import com.google.gson.JsonElement; -import com.google.gson.JsonParser; +import com.dtstack.flinkx.util.StringUtil; import org.apache.commons.compress.compressors.bzip2.BZip2CompressorOutputStream; import org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream; import org.apache.flink.types.Row; +import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.hive.common.type.HiveDecimal; +import org.apache.hadoop.io.compress.CompressionCodecFactory; + import java.io.IOException; import java.io.OutputStream; import java.math.BigDecimal; @@ -98,6 +99,9 @@ protected void nextBlock(){ stream = new GzipCompressorOutputStream(fs.create(p)); } else if(compressType == ECompressType.TEXT_BZIP2){ stream = new BZip2CompressorOutputStream(fs.create(p)); + } else if (compressType == ECompressType.TEXT_LZO) { + CompressionCodecFactory factory = new CompressionCodecFactory(new Configuration()); + stream = factory.getCodecByClassName("com.hadoop.compression.lzo.LzopCodec").createOutputStream(fs.create(p)); } } @@ -212,7 +216,7 @@ private void appendDataToString(StringBuilder sb, Object column, ColumnType colu } break; case BOOLEAN: - sb.append(Boolean.valueOf(rowData)); + sb.append(StringUtil.parseBoolean(rowData)); break; case DATE: column = DateUtil.columnToDate(column,null); diff --git a/flinkx-hdfs/flinkx-hdfs-writer/src/main/java/com/dtstack/flinkx/hdfs/writer/StringUtil.java b/flinkx-hdfs/flinkx-hdfs-writer/src/main/java/com/dtstack/flinkx/hdfs/writer/StringUtil.java new file mode 100644 index 0000000000..e69de29bb2 diff --git a/flinkx-hive/flinkx-hive-core/src/main/java/com/dtstack/flinkx/hive/HiveConfigKeys.java b/flinkx-hive/flinkx-hive-core/src/main/java/com/dtstack/flinkx/hive/HiveConfigKeys.java index a3bf8a5b7f..a07a0e468f 100644 --- a/flinkx-hive/flinkx-hive-core/src/main/java/com/dtstack/flinkx/hive/HiveConfigKeys.java +++ b/flinkx-hive/flinkx-hive-core/src/main/java/com/dtstack/flinkx/hive/HiveConfigKeys.java @@ -28,14 +28,13 @@ public class HiveConfigKeys { public static final String KEY_DEFAULT_FS = "defaultFS"; public static final String KEY_FS_DEFAULT_FS = "fs.defaultFS"; -// -// public static final String KEY_PATH = "path"; public static final String KEY_HADOOP_CONFIG = "hadoopConfig"; public static final String KEY_FILE_TYPE = "fileType"; public static final String KEY_PARTITION_TYPE = "partitionType"; + public static final String KEY_PARTITION = "partition"; public static final String KEY_WRITE_MODE = "writeMode"; @@ -52,26 +51,14 @@ public class HiveConfigKeys { public static final String KEY_PASSWORD = "password"; -// public static final String KEY_FULL_COLUMN_NAME_LIST = "fullColumnName"; -// -// public static final String KEY_FULL_COLUMN_TYPE_LIST = "fullColumnType"; -// -// public static final String KEY_COLUMN_NAME = "name"; -// -// public static final String KEY_COLUMN_TYPE = "type"; - public static final String KEY_COMPRESS = "compress"; public static final String KEY_INTERVAL = "interval"; public static final String KEY_BUFFER_SIZE = "bufferSize"; -// public static final String KEY_FILE_NAME = "fileName"; - public static final String KEY_CHARSET_NAME = "charsetName"; -// public static final String KEY_ROW_GROUP_SIZE = "rowGroupSize"; - public static final String KEY_MAX_FILE_SIZE = "maxFileSize"; public static final String KEY_SCHEMA = "schema"; diff --git a/flinkx-hive/flinkx-hive-core/src/main/java/com/dtstack/flinkx/hive/util/RetryUtil.java b/flinkx-hive/flinkx-hive-core/src/main/java/com/dtstack/flinkx/hive/util/RetryUtil.java new file mode 100755 index 0000000000..e69de29bb2 diff --git a/flinkx-hive/flinkx-hive-writer/pom.xml b/flinkx-hive/flinkx-hive-writer/pom.xml index 4e30810ea9..de65c7b5e3 100644 --- a/flinkx-hive/flinkx-hive-writer/pom.xml +++ b/flinkx-hive/flinkx-hive-writer/pom.xml @@ -129,7 +129,7 @@ under the License. + tofile="${basedir}/../../syncplugins/hivewriter/${project.name}-${package.name}.jar" /> diff --git a/flinkx-metadata/flinkx-metadata-core/pom.xml b/flinkx-hudi/flinkx-hudi-core/pom.xml similarity index 76% rename from flinkx-metadata/flinkx-metadata-core/pom.xml rename to flinkx-hudi/flinkx-hudi-core/pom.xml index 2597d66217..ce04a42e5c 100644 --- a/flinkx-metadata/flinkx-metadata-core/pom.xml +++ b/flinkx-hudi/flinkx-hudi-core/pom.xml @@ -3,12 +3,12 @@ xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> - flinkx-metadata + flinkx-hudi com.dtstack.flinkx 1.6 - jar 4.0.0 - flinkx-metadata-core + flinkx-hudi-core + \ No newline at end of file diff --git a/flinkx-hudi/flinkx-hudi-core/src/main/java/com/dtstack/flinkx/hudi/HudiConfigKeys.java b/flinkx-hudi/flinkx-hudi-core/src/main/java/com/dtstack/flinkx/hudi/HudiConfigKeys.java new file mode 100644 index 0000000000..0b9277369c --- /dev/null +++ b/flinkx-hudi/flinkx-hudi-core/src/main/java/com/dtstack/flinkx/hudi/HudiConfigKeys.java @@ -0,0 +1,73 @@ +package com.dtstack.flinkx.hudi;/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** + * @author fengjiangtao_yewu@cmss.chinamobile.com + * @date 2021-08-05 + */ +public class HudiConfigKeys { + public static final String KEY_PATH = "path"; + + public static final String KEY_HADOOP_CONFIG = "hadoopConfig"; + + public static final String KEY_DEFAULT_FS = "defaultFS"; + + public static final String KEY_WRITE_MODE = "writeMode"; + + public static final String KEY_COLUMN_NAME = "name"; + + public static final String KEY_COLUMN_TYPE = "type"; + + public static final String KEY_COMPRESS = "compress"; + + public static final String KEY_TABLE_NAME = "tableName"; + + public static final String KEY_TABLE_TYPE = "tableType"; + + public static final String KEY_TABLE_RECORD_KEY = "recordKey"; + + public static final String KEY_TABLE_TYPE_RECORD = "record"; + + public static final String KEY_SCHEMA_FIELDS = "fields"; + + public static final String KEY_PARTITION_FIELDS = "partitionFields"; + + public static final String KEY_HIVE_JDBC = "hiveJdbcUrl"; + + public static final String KEY_HIVE_METASTORE = "hiveMetastore"; + + public static final String KEY_HIVE_USER = "hiveUser"; + + public static final String KEY_HIVE_PASS = "hivePass"; + + public static final String KEY_BATCH_INTERVAL = "batchInterval"; + + public static final String KEY_HA_DEFAULT_FS = "dfs.nameservices"; + + public static final String KEY_HIVE_METASTORE_URIS = "hive.metastore.uris"; + + public static final String KEY_LAKEHOUSE_METADATAURL = "lakehouse.metadataUrl"; + + public static final String KEY_LAKEHOUSE_JOBUUID = "lakehouse.jobUUID"; + + public static final String KEY_LAKEHOUSE_USERID = "lakehouse.userId"; + + public static final String KEY_HADOOP_USER_NAME = "HADOOP_USER_NAME"; + + public static final String KEY_DBTABLE_DELIMITER = "."; +} diff --git a/flinkx-metadata-hive2/flinkx-metadata-hive2-reader/pom.xml b/flinkx-hudi/flinkx-hudi-writer/pom.xml similarity index 50% rename from flinkx-metadata-hive2/flinkx-metadata-hive2-reader/pom.xml rename to flinkx-hudi/flinkx-hudi-writer/pom.xml index 9822595871..49ef214452 100644 --- a/flinkx-metadata-hive2/flinkx-metadata-hive2-reader/pom.xml +++ b/flinkx-hudi/flinkx-hudi-writer/pom.xml @@ -3,151 +3,211 @@ xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> - flinkx-metadata-hive2 + flinkx-hudi com.dtstack.flinkx 1.6 4.0.0 - flinkx-metadata-hive2-reader + flinkx-hudi-writer + + - com.dtstack.flinkx - flinkx-metadata-reader - 1.6 + org.anarres.lzo + lzo-core + 1.0.2 - org.apache.hive - hive-jdbc - 2.1.0 + org.anarres.lzo + lzo-hadoop + 1.0.5 - slf4j-log4j12 - org.slf4j - - - log4j-slf4j-impl - org.apache.logging.log4j - - - log4j-web - org.apache.logging.log4j - - - log4j-core - org.apache.logging.log4j - - - log4j-api - org.apache.logging.log4j - - - log4j-1.2-api - org.apache.logging.log4j - - - netty-all - io.netty + hadoop-core + org.apache.hadoop + + + + + com.dtstack.flinkx + flinkx-hudi-core + 1.6 + - hive-common - org.apache.hive + httpcore + org.apache.httpcomponents - parquet-hadoop-bundle - org.apache.parquet + httpclient + org.apache.httpcomponents + + + + + com.dtstack.flinkx + flinkx-hudi-core + 1.6 + compile + + + org.apache.flink + flink-table-common + ${flink.version} + compile + + + org.apache.hudi + hudi-flink_2.12 + ${hudi.version} + - xerces - xercesImpl + org.eclipse.jetty + jetty-server - hbase-client - org.apache.hbase + org.eclipse.jetty + jetty-client - curator-framework - org.apache.curator + org.eclipse.jetty + jetty-http - zookeeper - org.apache.zookeeper + org.eclipse.jetty + jetty-io - slf4j-api - org.slf4j + org.eclipse.jetty + jetty-rewrite - commons-cli - commons-cli + org.eclipse.jetty + jetty-security - commons-compress - org.apache.commons + org.eclipse.jetty + jetty-servlet - commons-lang - commons-lang + org.eclipse.jetty + jetty-util - guava - com.google.guava + org.eclipse.jetty + jetty-webapp - gson - com.google.code.gson + org.eclipse.jetty + jetty-xml - avro - org.apache.avro + org.apache.hive + hive-exec - hbase-common - org.apache.hbase + org.apache.hudi + hudi-common - hbase-hadoop2-compat - org.apache.hbase + org.apache.hudi + hudi-client-common - hbase-server - org.apache.hbase + org.apache.hudi + hudi-java-client - tephra-hbase-compat-1.0 - co.cask.tephra + org.apache.hudi + hudi-hive-sync + + + + org.apache.hudi + hudi-common + ${hudi.version} + shade + + + org.apache.hudi + hudi-java-client + ${hudi.version} + shade + - hbase-hadoop-compat - org.apache.hbase + org.slf4j + slf4j-api + + org.apache.hudi + hudi-hive-sync + ${hudi.version} + shade + + + org.eclipse.jetty + jetty-server + ${jetty-server.version} + + + org.apache.avro + avro + ${avro.version} + + + org.apache.parquet + parquet-avro + ${parquet-avro.version} + org.apache.hive - hive-exec - 1.1.1 + hive-service + ${hive.version} - calcite-core - org.apache.calcite + org.slf4j + slf4j-api - calcite-avatica - org.apache.calcite + org.slf4j + slf4j-log4j12 + + + + org.apache.hive + hive-jdbc + ${hive.version} + - derby - org.apache.derby + org.eclipse.jetty.orbit + javax.servlet - org.xerial.snappy - snappy-java + org.apache.parquet + parquet-hadoop-bundle + + org.apache.calcite + calcite-core + 1.10.0 + + + com.alibaba + fastjson + 1.2.76 + + + @@ -165,11 +225,13 @@ org.slf4j:slf4j-api - log4j:log4j ch.qos.logback:* + com.google.code.gson:* + com.data-artisans:* + org.scala-lang:* + io.netty:* - *:* @@ -181,14 +243,6 @@ - - org.apache.hive.jdbc - shade.hive2.jdbc - - - org.apache.hive.service - shade.hive2.service - com.google.common shade.core.com.google.common @@ -197,23 +251,7 @@ com.google.thirdparty shade.core.com.google.thirdparty - - org.apache.http - shade.metadatahive1.org.apache.http - - - - - META-INF/services/java.sql.Driver - - - META-INF/services - java.sql.hive2.Driver - - @@ -232,14 +270,14 @@ - + - + @@ -247,5 +285,4 @@ - \ No newline at end of file diff --git a/flinkx-hudi/flinkx-hudi-writer/src/main/java/com.dtstack.flinkx.hudi.writer/HudiOutputFormat.java b/flinkx-hudi/flinkx-hudi-writer/src/main/java/com.dtstack.flinkx.hudi.writer/HudiOutputFormat.java new file mode 100644 index 0000000000..26c8aa0143 --- /dev/null +++ b/flinkx-hudi/flinkx-hudi-writer/src/main/java/com.dtstack.flinkx.hudi.writer/HudiOutputFormat.java @@ -0,0 +1,303 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package com.dtstack.flinkx.hudi.writer; + +import com.dtstack.flinkx.hudi.HudiConfigKeys; +import com.dtstack.flinkx.outputformat.BaseRichOutputFormat; +import com.dtstack.flinkx.reader.MetaColumn; +import com.dtstack.flinkx.util.FileSystemUtil; +import com.dtstack.flinkx.util.GsonUtil; +import com.dtstack.flinkx.util.StringUtil; +import com.google.common.collect.Lists; +import org.apache.flink.types.Row; +import org.apache.flink.util.StringUtils; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hudi.client.HoodieJavaWriteClient; +import org.apache.hudi.client.common.HoodieJavaEngineContext; +import org.apache.hudi.common.engine.EngineType; +import org.apache.hudi.common.fs.FSUtils; +import org.apache.hudi.common.model.HoodieAvroPayload; +import org.apache.hudi.common.model.HoodieKey; +import org.apache.hudi.common.model.HoodieRecord; +import org.apache.hudi.common.table.HoodieTableMetaClient; +import org.apache.hudi.common.util.Option; +import org.apache.hudi.config.HoodieCompactionConfig; +import org.apache.hudi.config.HoodieIndexConfig; +import org.apache.hudi.config.HoodieWriteConfig; +import org.apache.hudi.hive.HiveSyncConfig; +import org.apache.hudi.hive.HiveSyncTool; +import org.apache.hudi.hive.ddl.HiveSyncMode; +import org.apache.hudi.index.HoodieIndex; +import org.apache.hudi.org.apache.avro.Schema; +import org.apache.hudi.org.apache.avro.generic.GenericData; +import org.apache.hudi.org.apache.avro.generic.GenericRecord; + +import java.util.Iterator; +import java.util.List; +import java.util.Map; +import java.util.stream.Collectors; + +import static com.dtstack.flinkx.hudi.HudiConfigKeys.KEY_HADOOP_USER_NAME; + +/** + * Reference: org.apache.hudi.examples.java.HoodieJavaWriteClientExample + * + * @author fengjiangtao_yewu@cmss.chinamobile.com + * @date 2021-08-10 + */ + +public class HudiOutputFormat extends BaseRichOutputFormat { + protected String tableName; + protected String tableType; + protected String path; + protected String schema; + protected String defaultFS; + protected String recordKey; + protected String hiveJdbcUrl; + protected String hiveMetastore; + protected String hiveUser; + protected String hivePass; + protected Schema avroSchema; + protected String[] dbTableName; + protected Map hadoopConfig; + protected List metaColumns; + protected List partitionFields; + + protected transient HoodieJavaWriteClient client; + protected FileSystem fs; + protected Configuration hadoopConfiguration; + protected HiveConf hiveConf; + protected HiveSyncConfig hiveSyncConfig; + + @Override + protected void openInternal(int taskNumber, int numTasks) { + dbTableName = org.apache.commons.lang3.StringUtils.split(tableName, "."); + // Create the write client to write some records in + HoodieWriteConfig hudiWriteConfig = HoodieWriteConfig.newBuilder() + .withEngineType(EngineType.FLINK).withPath(path) + .withSchema(schema).withParallelism(numTasks, numTasks) + .withDeleteParallelism(numTasks).forTable(dbTableName[1]) + .withIndexConfig(HoodieIndexConfig.newBuilder().withIndexType(HoodieIndex.IndexType.INMEMORY).build()) + .withCompactionConfig(HoodieCompactionConfig.newBuilder().archiveCommitsWith(20, 30).build()).build(); + + hadoopConfiguration = FileSystemUtil.getConfiguration(hadoopConfig, defaultFS); + client = new HoodieJavaWriteClient<>(new HoodieJavaEngineContext(hadoopConfiguration), hudiWriteConfig); + avroSchema = new Schema.Parser().parse(schema); + + if (hadoopConfig.containsKey(KEY_HADOOP_USER_NAME)) { + // Config the HADOOP_USER_NAME for permission. + LOG.info("Default System HADOOP_USER_NAME:" + System.getProperty(KEY_HADOOP_USER_NAME)); + System.setProperty(KEY_HADOOP_USER_NAME, hadoopConfig.get(KEY_HADOOP_USER_NAME).toString()); + LOG.info("Change System HADOOP_USER_NAME:" + System.getProperty(KEY_HADOOP_USER_NAME)); + } + + LOG.info("Init hudi table schema:[{}]", schema); + initTable(hadoopConfiguration, dbTableName[1]); + initHiveConf(); + } + + @Override + protected void writeSingleRecordInternal(Row row) { + String newCommitTime = client.startCommit(); + HoodieRecord hoodieRecord = buildHudiRecords(row); + List> records = Lists.newArrayList(hoodieRecord).stream() + .filter(record -> record != null).collect(Collectors.toList()); + + if (records.size() > 0) { + client.upsert(records, newCommitTime); + syncHiveMeta(); + } + } + + /** + * Multiple write cost less when every writing needs to sync hive Metastore. + */ + @Override + protected void writeMultipleRecordsInternal() { + String newCommitTime = client.startCommit(); + + List> records = rows.stream().filter(record -> record != null) + .map(this::buildHudiRecords).collect(Collectors.toList()); + if (records.size() > 0) { + client.upsert(records, newCommitTime); + syncHiveMeta(); + } + } + + /** + * Build hudi record by row data. + * + * @param row + * @return + */ + private HoodieRecord buildHudiRecords(Row row) { + HoodieKey key = new HoodieKey(); + // Set first key as recordKey + Option genericRecord; + HoodieAvroPayload hoodieAvroPayload; + HoodieRecord hoodieRecord; + try { + key.setRecordKey(recordKey); + // TODO Next version will support partition table. + key.setPartitionPath(""); + + genericRecord = Option.of(buildGenericRecord(row)); + if (!genericRecord.isPresent()) { + return null; + } + hoodieAvroPayload = new HoodieAvroPayload(genericRecord); + hoodieRecord = new HoodieRecord<>(key, hoodieAvroPayload); + } catch (Exception e) { + LOG.error("Build hudi records err. Row:" + row.toString(), e); + throw new RuntimeException(e); + } + + return hoodieRecord; + } + + /** + * Init the table + * + * @param hadoopConf + * @param splitTableName + */ + private void initTable(Configuration hadoopConf, String splitTableName) { + // initialize the table, if not done already + Path path = new Path(this.path); + try { + fs = FSUtils.getFs(this.path, hadoopConf); + if (!fs.exists(path)) { + HoodieTableMetaClient metaClient = HoodieTableMetaClient.withPropertyBuilder() + .setTableType(tableType) + .setTableName(splitTableName) + .setPayloadClassName(HoodieAvroPayload.class.getName()) + .initTable(hadoopConf, this.path); + + LOG.info("Hudi table init done.", metaClient.getMetaPath()); + } + } catch (Exception e) { + LOG.warn("Hudi table init err. " + tableName, e); + throw new RuntimeException("Create hudi table failed:" + splitTableName + ", " + e.getMessage()); + } + } + + /** + * Init hive configuration. + */ + private void initHiveConf() { + hiveConf = new HiveConf(); + hiveConf.set(HudiConfigKeys.KEY_HIVE_METASTORE_URIS, hiveMetastore); + hiveConf.set(HudiConfigKeys.KEY_HA_DEFAULT_FS, defaultFS); + + Iterator> iterator = hadoopConfiguration.iterator(); + while (iterator.hasNext()) { + Map.Entry entry = iterator.next(); + hiveConf.set(entry.getKey(), entry.getValue()); + } + + hiveSyncConfig = new HiveSyncConfig(); + hiveSyncConfig.jdbcUrl = this.hiveJdbcUrl; + hiveSyncConfig.autoCreateDatabase = true; + hiveSyncConfig.databaseName = this.dbTableName[0]; + hiveSyncConfig.tableName = this.dbTableName[1]; + hiveSyncConfig.basePath = this.path; + hiveSyncConfig.assumeDatePartitioning = true; + hiveSyncConfig.usePreApacheInputFormat = false; + hiveSyncConfig.createManagedTable = true; + if (org.apache.commons.lang3.StringUtils.isNotBlank(this.hiveUser)) { + hiveSyncConfig.hiveUser = this.hiveUser; + } + if (org.apache.commons.lang3.StringUtils.isNotBlank(this.hivePass)) { + hiveSyncConfig.hivePass = this.hivePass; + } + if (partitionFields.size() > 0) { + hiveSyncConfig.partitionFields = partitionFields; + } + + // Default use HIVEQL sync mode + hiveSyncConfig.syncMode = HiveSyncMode.HIVEQL.name(); + } + + /** + * Sync the hive meta after every upsert batch. + * + * @return + */ + private boolean syncHiveMeta() { + try { + HiveSyncTool hiveSyncTool = new HiveSyncTool(hiveSyncConfig, hiveConf, fs); + hiveSyncTool.syncHoodieTable(); + } catch (Exception e) { + LOG.error("Create hive meta failed " + tableName, e); + throw new RuntimeException("Sync hudi to hive metastore failed.", e); + } + + return true; + } + + /** + * Schema for example: + * { + * "type": "record", + * "name": "triprec", + * "fields": [ + * {"name": "ts","type": "long"}, + * {"name": "uuid","type": "string"}, + * {"name": "begin_lat","type": "double"} + * ] + * } + * + * @param row + * @return + */ + private GenericRecord buildGenericRecord(Row row) { + GenericRecord rec = new GenericData.Record(avroSchema); + + try { + int arity = row.getArity(); + if (metaColumns != null && metaColumns.size() > 0) { + for (int i = 0; i < arity; i++) { + String value = StringUtils.arrayAwareToString(row.getField(i)); + rec.put(metaColumns.get(i).getName(), StringUtil.string2col(value, metaColumns.get(i).getType(), null)); + } + } else { + Map map = GsonUtil.GSON.fromJson(row.getField(0).toString(), GsonUtil.gsonMapTypeToken); + map.keySet().stream().forEach(key -> rec.put(key, map.get(key))); + } + } catch (Exception e) { + LOG.warn("Build genericRecord err." + row.toString(), e); + return null; + } + + return rec; + } + + /** + * Close client. + */ + @Override + public void closeInternal() { + LOG.warn("Hudi output closeInternal."); + if (null != client) { + client.close(); + } + } +} diff --git a/flinkx-hudi/flinkx-hudi-writer/src/main/java/com.dtstack.flinkx.hudi.writer/HudiOutputformatBuilder.java b/flinkx-hudi/flinkx-hudi-writer/src/main/java/com.dtstack.flinkx.hudi.writer/HudiOutputformatBuilder.java new file mode 100644 index 0000000000..36d9825a39 --- /dev/null +++ b/flinkx-hudi/flinkx-hudi-writer/src/main/java/com.dtstack.flinkx.hudi.writer/HudiOutputformatBuilder.java @@ -0,0 +1,105 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package com.dtstack.flinkx.hudi.writer; + +import com.dtstack.flinkx.outputformat.BaseRichOutputFormatBuilder; +import com.dtstack.flinkx.reader.MetaColumn; +import org.apache.avro.Schema; + +import java.util.List; +import java.util.Map; + +/** + * @author fengjiangtao_yewu@cmss.chinamobile.com + * @date 2021-08-10 + */ + +public class HudiOutputformatBuilder extends BaseRichOutputFormatBuilder { + private HudiOutputFormat format; + + public HudiOutputformatBuilder() { + super.format = format = new HudiOutputFormat(); + } + + public void setTableName(String tableName) { + format.tableName = tableName; + } + + public void setTableType(String tableType) { + format.tableType = tableType; + } + public void setRecordKey(String recordKey) { + format.recordKey = recordKey; + } + + public void setPath(String path) { + format.path = path; + } + + public void setHadoopConf(Map hadoopConf) { + format.hadoopConfig = hadoopConf; + } + + public void setDefaultFS(String defaultFS) { + format.defaultFS = defaultFS; + } + + public void setHiveJdbcUrl(String hiveJdbcUrl) { + format.hiveJdbcUrl = hiveJdbcUrl; + } + + public void setHiveMetastore(String hiveMetastore) { + format.hiveMetastore = hiveMetastore; + } + + public void setHiveUser(String hiveUser) { + format.hiveUser = hiveUser; + } + + public void setHivePass(String hivePass) { + format.hivePass = hivePass; + } + + public void setSchema(String schema) { + format.schema = schema; + } + + public void setColumns(List metaColumns) { + format.metaColumns = metaColumns; + } + + public void setPartitionFields(List partitionFields) { + format.partitionFields = partitionFields; + } + + /** + * Column type comes form org.apache.avro.Schema.Type + */ + @Override + protected void checkFormat() { + MetaColumn mColumn = null; + try { + for (MetaColumn metaColumn : format.metaColumns) { + mColumn = metaColumn; + Schema.Type.valueOf(metaColumn.getType().toUpperCase()); + } + } catch (Exception e) { + throw new UnsupportedOperationException("This plugin column's type not support: " + mColumn.getType()); + } + } +} diff --git a/flinkx-hudi/flinkx-hudi-writer/src/main/java/com.dtstack.flinkx.hudi.writer/HudiWriter.java b/flinkx-hudi/flinkx-hudi-writer/src/main/java/com.dtstack.flinkx.hudi.writer/HudiWriter.java new file mode 100644 index 0000000000..25d7307dc4 --- /dev/null +++ b/flinkx-hudi/flinkx-hudi-writer/src/main/java/com.dtstack.flinkx.hudi.writer/HudiWriter.java @@ -0,0 +1,151 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package com.dtstack.flinkx.hudi.writer; + +import com.alibaba.fastjson.JSONArray; +import com.alibaba.fastjson.JSONObject; +import com.dtstack.flinkx.config.DataTransferConfig; +import com.dtstack.flinkx.config.WriterConfig; +import com.dtstack.flinkx.hudi.HudiConfigKeys; +import com.dtstack.flinkx.reader.MetaColumn; +import com.dtstack.flinkx.writer.BaseDataWriter; +import org.apache.commons.lang3.StringUtils; +import org.apache.flink.streaming.api.datastream.DataStream; +import org.apache.flink.streaming.api.datastream.DataStreamSink; +import org.apache.flink.types.Row; +import org.apache.hudi.common.model.HoodieTableType; + +import java.util.ArrayList; +import java.util.Arrays; +import java.util.List; +import java.util.Map; + +import static com.dtstack.flinkx.hudi.HudiConfigKeys.*; + +/** + * @author fengjiangtao_yewu@cmss.chinamobile.com + * @date 2021-08-10 + */ + +public class HudiWriter extends BaseDataWriter { + /** + * org.apache.hudi.common.model.WriteOperationType + * UPSERT | UPSERT_PREPPED + */ + protected String writeOperation; + + /** + * Table name to register to Hive metastore + */ + protected String tableName; + + /** + * Type of table to write. COPY_ON_WRITE (or) MERGE_ON_READ + */ + protected String tableType; + + /** + * Base path for the target hoodie table. + */ + protected String path; + + /** + * Async Compaction, enabled by default for MOR + */ + protected boolean compress; + + protected List metaColumns; + protected String defaultFS; + protected String recordKey; + protected String hiveJdbcUrl; + protected String hiveMetastore; + protected String hiveUser; + protected String hivePass; + protected int batchInterval; + protected List partitionFields; + protected Map hadoopConfig; + + public HudiWriter(DataTransferConfig config) { + super(config); + + WriterConfig writerConfig = config.getJob().getContent().get(0).getWriter(); + writeOperation = writerConfig.getParameter().getStringVal(HudiConfigKeys.KEY_WRITE_MODE); + tableName = writerConfig.getParameter().getStringVal(HudiConfigKeys.KEY_TABLE_NAME); + tableType = writerConfig.getParameter().getStringVal(HudiConfigKeys.KEY_TABLE_TYPE, HoodieTableType.COPY_ON_WRITE.name()); + recordKey = writerConfig.getParameter().getStringVal(HudiConfigKeys.KEY_TABLE_RECORD_KEY, "id"); + path = writerConfig.getParameter().getStringVal(HudiConfigKeys.KEY_PATH); + compress = writerConfig.getParameter().getBooleanVal(HudiConfigKeys.KEY_COMPRESS, Boolean.FALSE); + metaColumns = MetaColumn.getMetaColumns(writerConfig.getParameter().getColumn()); + defaultFS = writerConfig.getParameter().getStringVal(HudiConfigKeys.KEY_DEFAULT_FS); + hiveJdbcUrl = writerConfig.getParameter().getStringVal(HudiConfigKeys.KEY_HIVE_JDBC); + hiveMetastore = writerConfig.getParameter().getStringVal(HudiConfigKeys.KEY_HIVE_METASTORE); + hiveUser = writerConfig.getParameter().getStringVal(HudiConfigKeys.KEY_HIVE_USER, ""); + hivePass = writerConfig.getParameter().getStringVal(HudiConfigKeys.KEY_HIVE_PASS, ""); + batchInterval = writerConfig.getParameter().getIntVal(HudiConfigKeys.KEY_BATCH_INTERVAL, 1); + String partitionField = writerConfig.getParameter().getStringVal(HudiConfigKeys.KEY_PARTITION_FIELDS); + partitionFields = StringUtils.isNotBlank(partitionField) ? Arrays.asList(StringUtils.split(partitionField, ",")) : new ArrayList<>(); + hadoopConfig = (Map) writerConfig.getParameter().getVal(HudiConfigKeys.KEY_HADOOP_CONFIG); + } + + @Override + public DataStreamSink writeData(DataStream dataSet) { + HudiOutputformatBuilder builder = new HudiOutputformatBuilder(); + builder.setTableName(tableName); + builder.setTableType(tableType); + builder.setRecordKey(recordKey); + builder.setPath(path); + builder.setHadoopConf(hadoopConfig); + builder.setDefaultFS(defaultFS); + builder.setHiveJdbcUrl(hiveJdbcUrl); + builder.setHiveMetastore(hiveMetastore); + builder.setHiveUser(hiveUser); + builder.setHivePass(hivePass); + builder.setSchema(buildSchema(metaColumns, tableName)); + builder.setColumns(metaColumns); + builder.setPartitionFields(partitionFields); + builder.setBatchInterval(batchInterval); + + return createOutput(dataSet, builder.finish()); + } + + /** + * Transform MetaColumn to org.apache.avro.Schema. + * + * @param metaColumns + * @param tableName + * @return + */ + private String buildSchema(List metaColumns, String tableName) { + String[] dbTableName = StringUtils.split(tableName, "."); + JSONArray jsonArray = new JSONArray(); + metaColumns.forEach(metaColumn -> { + JSONObject jsonField = new JSONObject(); + jsonField.put(KEY_COLUMN_NAME, metaColumn.getName()); + jsonField.put(KEY_COLUMN_TYPE, metaColumn.getType()); + jsonArray.add(jsonField); + }); + + JSONObject schemaJson = new JSONObject(); + schemaJson.put(KEY_COLUMN_NAME, dbTableName[1]); + schemaJson.put(KEY_COLUMN_TYPE, KEY_TABLE_TYPE_RECORD); + schemaJson.put(KEY_SCHEMA_FIELDS, jsonArray); + + return schemaJson.toString(); + } + +} diff --git a/flinkx-metadata/pom.xml b/flinkx-hudi/pom.xml similarity index 62% rename from flinkx-metadata/pom.xml rename to flinkx-hudi/pom.xml index 56745da6fe..c5b675bf5d 100644 --- a/flinkx-metadata/pom.xml +++ b/flinkx-hudi/pom.xml @@ -7,16 +7,24 @@ com.dtstack.flinkx 1.6 - pom 4.0.0 - - flinkx-metadata - - flinkx-metadata-core - flinkx-metadata-reader + flinkx-hudi-core + flinkx-hudi-writer + flinkx-hudi + pom + + + 0.9.0 + 2.3.1 + + 1.8.2 + 1.11.1 + 9.4.15.v20190215 + + com.dtstack.flinkx diff --git a/flinkx-kafka/flinkx-kafka-reader/pom.xml b/flinkx-kafka/flinkx-kafka-reader/pom.xml index 908f857952..b2b936be42 100644 --- a/flinkx-kafka/flinkx-kafka-reader/pom.xml +++ b/flinkx-kafka/flinkx-kafka-reader/pom.xml @@ -68,7 +68,7 @@ + tofile="${basedir}/../../syncplugins/kafkareader/${project.name}-${package.name}.jar"/> diff --git a/flinkx-kafka/flinkx-kafka-writer/pom.xml b/flinkx-kafka/flinkx-kafka-writer/pom.xml index 652593dd25..84efdd2b15 100644 --- a/flinkx-kafka/flinkx-kafka-writer/pom.xml +++ b/flinkx-kafka/flinkx-kafka-writer/pom.xml @@ -67,7 +67,7 @@ + tofile="${basedir}/../../syncplugins/kafkawriter/${project.name}-${package.name}.jar" /> diff --git a/flinkx-kafka09/flinkx-kafka09-reader/.gitignore b/flinkx-kafka09/flinkx-kafka09-reader/.gitignore deleted file mode 100644 index ca7ca55c4c..0000000000 --- a/flinkx-kafka09/flinkx-kafka09-reader/.gitignore +++ /dev/null @@ -1,13 +0,0 @@ -target -.idea/ -/.idea/* -*.pyc -*.swp -.DS_Store -/target -target -.class -.project -.classpath -*.eclipse.* -*.iml diff --git a/flinkx-kafka09/flinkx-kafka09-reader/src/main/java/com/dtstack/flinkx/kafka09/client/Kafka09Client.java b/flinkx-kafka09/flinkx-kafka09-reader/src/main/java/com/dtstack/flinkx/kafka09/client/Kafka09Client.java deleted file mode 100644 index faacd2710d..0000000000 --- a/flinkx-kafka09/flinkx-kafka09-reader/src/main/java/com/dtstack/flinkx/kafka09/client/Kafka09Client.java +++ /dev/null @@ -1,93 +0,0 @@ -/* - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - *

- * http://www.apache.org/licenses/LICENSE-2.0 - *

- * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -package com.dtstack.flinkx.kafka09.client; - -import com.dtstack.flinkx.decoder.IDecode; -import com.dtstack.flinkx.kafkabase.client.IClient; -import com.dtstack.flinkx.kafkabase.entity.kafkaState; -import com.dtstack.flinkx.kafkabase.format.KafkaBaseInputFormat; -import com.dtstack.flinkx.util.ExceptionUtil; -import kafka.consumer.ConsumerIterator; -import kafka.consumer.KafkaStream; -import kafka.message.MessageAndMetadata; -import org.apache.commons.lang3.tuple.Pair; -import org.slf4j.Logger; -import org.slf4j.LoggerFactory; - -import java.util.Map; - -/** - * Date: 2019/12/25 - * Company: www.dtstack.com - * - * @author tudou - */ -public class Kafka09Client implements IClient { - - private static final Logger LOG = LoggerFactory.getLogger(Kafka09Client.class); - - private volatile boolean running = true; - private KafkaStream mStream; - private IDecode decode; - private KafkaBaseInputFormat format; - - public Kafka09Client(KafkaStream aStream, KafkaBaseInputFormat format) { - this.mStream = aStream; - this.decode = format.getDecode(); - this.format = format; - } - - @Override - public void run() { - Thread.currentThread().setUncaughtExceptionHandler((t, e) -> { - LOG.warn("KafkaClient run failed, Throwable = {}", ExceptionUtil.getErrorMessage(e)); - }); - try { - while (running) { - ConsumerIterator it = mStream.iterator(); - while (it.hasNext()) { - String m = null; - try { - MessageAndMetadata next = it.next(); - processMessage(new String(next.message(), format.getEncoding()), - next.topic(), - next.partition(), - next.offset(), - null); - } catch (Exception e) { - LOG.error("process event = {}, e = {}", m, ExceptionUtil.getErrorMessage(e)); - } - } - } - } catch (Exception t) { - LOG.error("kafka Consumer fetch error, e = {}", ExceptionUtil.getErrorMessage(t)); - } - } - - @Override - public void processMessage(String message, String topic, Integer partition, Long offset, Long timestamp) { - Map event = decode.decode(message); - if (event != null && event.size() > 0) { - format.processEvent(Pair.of(event, new kafkaState(topic, partition, offset, timestamp))); - } - } - - @Override - public void close() { - running = false; - } -} diff --git a/flinkx-kafka09/flinkx-kafka09-reader/src/main/java/com/dtstack/flinkx/kafka09/client/Kafka09Consumer.java b/flinkx-kafka09/flinkx-kafka09-reader/src/main/java/com/dtstack/flinkx/kafka09/client/Kafka09Consumer.java deleted file mode 100644 index 110ef449d5..0000000000 --- a/flinkx-kafka09/flinkx-kafka09-reader/src/main/java/com/dtstack/flinkx/kafka09/client/Kafka09Consumer.java +++ /dev/null @@ -1,45 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - *

- * http://www.apache.org/licenses/LICENSE-2.0 - *

- * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -package com.dtstack.flinkx.kafka09.client; - -import com.dtstack.flinkx.kafkabase.KafkaInputSplit; -import com.dtstack.flinkx.kafkabase.client.KafkaBaseConsumer; -import com.dtstack.flinkx.kafkabase.format.KafkaBaseInputFormat; -import kafka.consumer.KafkaStream; - -import java.util.Properties; - -/** - * @company: www.dtstack.com - * @author: toutian - * @create: 2019/7/5 - */ -public class Kafka09Consumer extends KafkaBaseConsumer { - private KafkaStream mStream; - - public Kafka09Consumer(KafkaStream aStream) { - super(new Properties()); - this.mStream = aStream; - } - - @Override - public KafkaBaseConsumer createClient(String topic, String group, KafkaBaseInputFormat format, KafkaInputSplit kafkaInputSplit) { - client = new Kafka09Client(mStream, format); - return this; - } -} diff --git a/flinkx-kafka09/flinkx-kafka09-reader/src/main/java/com/dtstack/flinkx/kafka09/format/Kafka09InputFormat.java b/flinkx-kafka09/flinkx-kafka09-reader/src/main/java/com/dtstack/flinkx/kafka09/format/Kafka09InputFormat.java deleted file mode 100644 index bc4bcd9992..0000000000 --- a/flinkx-kafka09/flinkx-kafka09-reader/src/main/java/com/dtstack/flinkx/kafka09/format/Kafka09InputFormat.java +++ /dev/null @@ -1,80 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - *

- * http://www.apache.org/licenses/LICENSE-2.0 - *

- * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -package com.dtstack.flinkx.kafka09.format; - -import com.dtstack.flinkx.kafka09.client.Kafka09Consumer; -import com.dtstack.flinkx.kafkabase.KafkaInputSplit; -import com.dtstack.flinkx.kafkabase.enums.KafkaVersion; -import com.dtstack.flinkx.kafkabase.format.KafkaBaseInputFormat; -import com.dtstack.flinkx.kafkabase.util.KafkaUtil; -import kafka.consumer.ConsumerConfig; -import kafka.consumer.KafkaStream; -import kafka.javaapi.consumer.ConsumerConnector; -import org.apache.flink.core.io.InputSplit; - -import java.io.IOException; -import java.util.Collections; -import java.util.List; -import java.util.Map; -import java.util.Properties; - -/** - * @company: www.dtstack.com - * @author: toutian - * @create: 2019/7/5 - */ -public class Kafka09InputFormat extends KafkaBaseInputFormat { - - private transient ConsumerConnector consumerConnector; - - @Override - public void openInputFormat() throws IOException { - super.openInputFormat(); - Properties props = KafkaUtil.geneConsumerProp(consumerSettings, mode); - consumerConnector = kafka.consumer.Consumer.createJavaConsumerConnector(new ConsumerConfig(props)); - } - - @Override - protected void openInternal(InputSplit inputSplit) { - Map topicCountMap = Collections.singletonMap(topic, 1); - Map>> consumerMap = consumerConnector.createMessageStreams(topicCountMap); - - List> streams = consumerMap.get(topic); - for (final KafkaStream stream : streams) { - consumer = new Kafka09Consumer(stream); - } - consumer.createClient(topic, groupId, this, (KafkaInputSplit)inputSplit).execute(); - running = true; - } - - @Override - protected void closeInternal() { - if (running) { - consumerConnector.commitOffsets(true); - consumerConnector.shutdown(); - consumer.close(); - running = false; - LOG.warn("input kafka release."); - } - } - - @Override - public KafkaVersion getKafkaVersion() { - return KafkaVersion.kafka09; - } -} diff --git a/flinkx-kafka09/flinkx-kafka09-reader/src/main/java/com/dtstack/flinkx/kafka09/reader/Kafka09Reader.java b/flinkx-kafka09/flinkx-kafka09-reader/src/main/java/com/dtstack/flinkx/kafka09/reader/Kafka09Reader.java deleted file mode 100644 index 31c6c6c5b4..0000000000 --- a/flinkx-kafka09/flinkx-kafka09-reader/src/main/java/com/dtstack/flinkx/kafka09/reader/Kafka09Reader.java +++ /dev/null @@ -1,48 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - *

- * http://www.apache.org/licenses/LICENSE-2.0 - *

- * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -package com.dtstack.flinkx.kafka09.reader; - -import com.dtstack.flinkx.config.DataTransferConfig; -import com.dtstack.flinkx.kafka09.format.Kafka09InputFormat; -import com.dtstack.flinkx.kafkabase.KafkaConfigKeys; -import com.dtstack.flinkx.kafkabase.format.KafkaBaseInputFormatBuilder; -import com.dtstack.flinkx.kafkabase.reader.KafkaBaseReader; -import org.apache.commons.lang3.StringUtils; -import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; - -/** - * @company: www.dtstack.com - * @author: toutian - * @create: 2019/7/4 - */ -public class Kafka09Reader extends KafkaBaseReader { - - public Kafka09Reader(DataTransferConfig config, StreamExecutionEnvironment env) { - super(config, env); - //兼容历史脚本 - String id = consumerSettings.get(KafkaConfigKeys.GROUP_ID); - if(StringUtils.isNotBlank(id)){ - super.groupId = id; - } - } - - @Override - public KafkaBaseInputFormatBuilder getBuilder(){ - return new KafkaBaseInputFormatBuilder(new Kafka09InputFormat()); - } -} diff --git a/flinkx-kafka09/flinkx-kafka09-writer/.gitignore b/flinkx-kafka09/flinkx-kafka09-writer/.gitignore deleted file mode 100644 index ca7ca55c4c..0000000000 --- a/flinkx-kafka09/flinkx-kafka09-writer/.gitignore +++ /dev/null @@ -1,13 +0,0 @@ -target -.idea/ -/.idea/* -*.pyc -*.swp -.DS_Store -/target -target -.class -.project -.classpath -*.eclipse.* -*.iml diff --git a/flinkx-kafka09/flinkx-kafka09-writer/pom.xml b/flinkx-kafka09/flinkx-kafka09-writer/pom.xml deleted file mode 100644 index cf7820adba..0000000000 --- a/flinkx-kafka09/flinkx-kafka09-writer/pom.xml +++ /dev/null @@ -1,79 +0,0 @@ - - - - flinkx-kafka09 - com.dtstack.flinkx - 1.6 - - 4.0.0 - - flinkx-kafka09-writer - - - - com.dtstack.flinkx - flinkx-kb-writer - 1.6 - - - - - - - org.apache.maven.plugins - maven-shade-plugin - 3.1.0 - - - package - - shade - - - false - - - com.google.common - shade.core.com.google.common - - - com.google.thirdparty - shade.core.com.google.thirdparty - - - - - - - - maven-antrun-plugin - 1.2 - - - copy-resources - - package - - run - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/flinkx-kafka09/flinkx-kafka09-writer/src/main/java/com/dtstack/flinkx/kafka09/format/Kafka09OutputFormat.java b/flinkx-kafka09/flinkx-kafka09-writer/src/main/java/com/dtstack/flinkx/kafka09/format/Kafka09OutputFormat.java deleted file mode 100644 index add067e1c7..0000000000 --- a/flinkx-kafka09/flinkx-kafka09-writer/src/main/java/com/dtstack/flinkx/kafka09/format/Kafka09OutputFormat.java +++ /dev/null @@ -1,108 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - *

- * http://www.apache.org/licenses/LICENSE-2.0 - *

- * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -package com.dtstack.flinkx.kafka09.format; - -import com.dtstack.flinkx.kafkabase.format.KafkaBaseOutputFormat; -import com.dtstack.flinkx.kafkabase.util.Formatter; -import com.dtstack.flinkx.kafkabase.writer.HeartBeatController; -import com.dtstack.flinkx.util.MapUtil; -import com.dtstack.flinkx.util.TelnetUtil; -import org.apache.flink.configuration.Configuration; -import org.apache.kafka.clients.producer.KafkaProducer; -import org.apache.kafka.clients.producer.ProducerRecord; -import org.apache.kafka.clients.producer.internals.DefaultPartitioner; - -import java.io.IOException; -import java.util.Map; -import java.util.Objects; -import java.util.concurrent.TimeUnit; - -/** - * @company: www.dtstack.com - * @author: toutian - * @create: 2019/7/5 - */ -public class Kafka09OutputFormat extends KafkaBaseOutputFormat { - - private String encoding; - private String brokerList; - private transient KafkaProducer producer; - private HeartBeatController heartBeatController; - - @Override - public void configure(Configuration parameters) { - props.put("key.serializer", org.apache.kafka.common.serialization.StringSerializer.class.getName()); - props.put("value.serializer", org.apache.kafka.common.serialization.StringSerializer.class.getName()); - props.put("producer.type", "sync"); - props.put("compression.codec", "none"); - props.put("request.required.acks", "1"); - props.put("batch.num.messages", "1024"); - props.put("partitioner.class", DefaultPartitioner.class.getName()); - - props.put("client.id", ""); - - if (producerSettings != null) { - props.putAll(producerSettings); - } - props.put("metadata.broker.list", brokerList); - producer = new KafkaProducer<>(props); - - LOG.info("brokerList {}", brokerList); - String broker = brokerList.split(",")[0]; - String[] split = broker.split(":"); - - try { - TelnetUtil.telnet(split[0], Integer.parseInt(split[1])); - }catch (Exception e){ - throw new RuntimeException("telnet error, brokerList = " + brokerList); - } - } - - @Override - protected void emit(Map event) throws IOException { - heartBeatController.acquire(); - String tp = Formatter.format(event, topic, timezone); - producer.send(new ProducerRecord<>(tp, event.toString(), MapUtil.writeValueAsString(event)), (metadata, exception) -> { - if (Objects.nonNull(exception)) { - LOG.warn("kafka writeSingleRecordInternal error:{}", exception.getMessage(), exception); - heartBeatController.onFailed(exception); - } else { - heartBeatController.onSuccess(); - } - }); - } - - @Override - public void closeInternal() { - LOG.info("kafka output closeInternal."); - //未设置具体超时时间 关闭时间默认是long.value 导致整个方法长时间等待关闭不了,因此明确指定20s时间 - producer.close(KafkaBaseOutputFormat.CLOSE_TIME, TimeUnit.MILLISECONDS); - } - - public void setEncoding(String encoding) { - this.encoding = encoding; - } - - public void setBrokerList(String brokerList) { - this.brokerList = brokerList; - } - - public void setHeartBeatController(HeartBeatController heartBeatController) { - this.heartBeatController = heartBeatController; - } -} diff --git a/flinkx-kafka09/flinkx-kafka09-writer/src/main/java/com/dtstack/flinkx/kafka09/writer/Kafka09Writer.java b/flinkx-kafka09/flinkx-kafka09-writer/src/main/java/com/dtstack/flinkx/kafka09/writer/Kafka09Writer.java deleted file mode 100644 index 4caa61ba91..0000000000 --- a/flinkx-kafka09/flinkx-kafka09-writer/src/main/java/com/dtstack/flinkx/kafka09/writer/Kafka09Writer.java +++ /dev/null @@ -1,70 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -package com.dtstack.flinkx.kafka09.writer; - -import com.dtstack.flinkx.config.DataTransferConfig; -import com.dtstack.flinkx.config.WriterConfig; -import com.dtstack.flinkx.kafka09.format.Kafka09OutputFormat; -import com.dtstack.flinkx.kafkabase.KafkaConfigKeys; -import com.dtstack.flinkx.kafkabase.writer.HeartBeatController; -import com.dtstack.flinkx.kafkabase.writer.KafkaBaseWriter; -import org.apache.commons.lang.StringUtils; -import org.apache.flink.streaming.api.datastream.DataStream; -import org.apache.flink.streaming.api.datastream.DataStreamSink; -import org.apache.flink.types.Row; - -import java.nio.charset.StandardCharsets; - -/** - * @company: www.dtstack.com - * @author: toutian - * @create: 2019/7/4 - */ -public class Kafka09Writer extends KafkaBaseWriter { - - private String encoding; - private String brokerList; - - public Kafka09Writer(DataTransferConfig config) { - super(config); - WriterConfig writerConfig = config.getJob().getContent().get(0).getWriter(); - encoding = writerConfig.getParameter().getStringVal(KafkaConfigKeys.KEY_ENCODING, StandardCharsets.UTF_8.name()); - brokerList = writerConfig.getParameter().getStringVal(KafkaConfigKeys.KEY_BROKER_LIST); - if (StringUtils.isBlank(brokerList)) { - throw new RuntimeException("brokerList can not be empty!"); - } - } - - @Override - public DataStreamSink writeData(DataStream dataSet) { - Kafka09OutputFormat format = new Kafka09OutputFormat(); - format.setTimezone(timezone); - format.setEncoding(encoding); - format.setTopic(topic); - format.setTableFields(tableFields); - format.setBrokerList(brokerList); - format.setProducerSettings(producerSettings); - format.setRestoreConfig(restoreConfig); - format.setHeartBeatController(new HeartBeatController()); - - format.setDirtyPath(dirtyPath); - format.setDirtyHadoopConfig(dirtyHadoopConfig); - format.setSrcFieldNames(srcCols); - return createOutput(dataSet, format); - } -} diff --git a/flinkx-kafka09/pom.xml b/flinkx-kafka09/pom.xml deleted file mode 100644 index 3a8a141fcc..0000000000 --- a/flinkx-kafka09/pom.xml +++ /dev/null @@ -1,74 +0,0 @@ - - - - flinkx-all - com.dtstack.flinkx - 1.6 - - 4.0.0 - pom - - flinkx-kafka09 - - - flinkx-kafka09-reader - flinkx-kafka09-writer - - - - - com.dtstack.flinkx - flinkx-core - 1.6 - provided - - - ch.qos.logback - logback-classic - - - ch.qos.logback - logback-core - - - - - org.apache.kafka - kafka_2.11 - 0.9.0.1 - - - slf4j-api - org.slf4j - - - slf4j-log4j12 - org.slf4j - - - log4j - log4j - - - scala-library - org.scala-lang - - - netty - io.netty - - - snappy-java - org.xerial.snappy - - - junit - junit - - - - - - \ No newline at end of file diff --git a/flinkx-kafka10/flinkx-kafka10-reader/pom.xml b/flinkx-kafka10/flinkx-kafka10-reader/pom.xml index 6d1d444762..9170712ee0 100644 --- a/flinkx-kafka10/flinkx-kafka10-reader/pom.xml +++ b/flinkx-kafka10/flinkx-kafka10-reader/pom.xml @@ -67,7 +67,7 @@ + tofile="${basedir}/../../syncplugins/kafka10reader/${project.name}-${package.name}.jar" /> diff --git a/flinkx-kafka10/flinkx-kafka10-writer/pom.xml b/flinkx-kafka10/flinkx-kafka10-writer/pom.xml index 44a97b4ce4..40234ddd1c 100644 --- a/flinkx-kafka10/flinkx-kafka10-writer/pom.xml +++ b/flinkx-kafka10/flinkx-kafka10-writer/pom.xml @@ -67,7 +67,7 @@ + tofile="${basedir}/../../syncplugins/kafka10writer/${project.name}-${package.name}.jar" /> diff --git a/flinkx-kafka11/flinkx-kafka11-reader/pom.xml b/flinkx-kafka11/flinkx-kafka11-reader/pom.xml index 280ae7bf26..4ce9b9fd74 100644 --- a/flinkx-kafka11/flinkx-kafka11-reader/pom.xml +++ b/flinkx-kafka11/flinkx-kafka11-reader/pom.xml @@ -67,7 +67,7 @@ + tofile="${basedir}/../../syncplugins/kafka11reader/${project.name}-${package.name}.jar"/> diff --git a/flinkx-kafka11/flinkx-kafka11-reader/src/main/java/com/dtstack/flinkx/kafka11/client/Kafka11Client.java b/flinkx-kafka11/flinkx-kafka11-reader/src/main/java/com/dtstack/flinkx/kafka11/client/Kafka11Client.java index 2b6ff0bf0d..007afbc7c5 100644 --- a/flinkx-kafka11/flinkx-kafka11-reader/src/main/java/com/dtstack/flinkx/kafka11/client/Kafka11Client.java +++ b/flinkx-kafka11/flinkx-kafka11-reader/src/main/java/com/dtstack/flinkx/kafka11/client/Kafka11Client.java @@ -25,7 +25,7 @@ * @author tudou */ public class Kafka11Client implements IClient { - private static Logger LOG = LoggerFactory.getLogger(Kafka11Consumer.class); + private static Logger LOG = LoggerFactory.getLogger(Kafka11Client.class); private volatile boolean running = true; private long pollTimeout; private boolean blankIgnore; diff --git a/flinkx-kafka11/flinkx-kafka11-writer/pom.xml b/flinkx-kafka11/flinkx-kafka11-writer/pom.xml index 7ca967cbd6..907e45992d 100644 --- a/flinkx-kafka11/flinkx-kafka11-writer/pom.xml +++ b/flinkx-kafka11/flinkx-kafka11-writer/pom.xml @@ -68,7 +68,7 @@ + tofile="${basedir}/../../syncplugins/kafka11writer/${project.name}-${package.name}.jar" /> diff --git a/flinkx-kb/flinkx-kb-core/src/main/java/com/dtstack/flinkx/kafkabase/KafkaConfigKeys.java b/flinkx-kb/flinkx-kb-core/src/main/java/com/dtstack/flinkx/kafkabase/KafkaConfigKeys.java index d24df8c172..7d44c2bd0f 100755 --- a/flinkx-kb/flinkx-kb-core/src/main/java/com/dtstack/flinkx/kafkabase/KafkaConfigKeys.java +++ b/flinkx-kb/flinkx-kb-core/src/main/java/com/dtstack/flinkx/kafkabase/KafkaConfigKeys.java @@ -44,11 +44,6 @@ public class KafkaConfigKeys { public static final String KEY_OFFSET = "offset"; public static final String KEY_TIMESTAMP = "timestamp"; public static List KEY_ASSIGNER_DEFAULT_RULE = Arrays.asList("database", "schema", "table"); - /** - * kafka 09 - */ - public static final String KEY_ENCODING = "encoding"; - public static final String KEY_BROKER_LIST = "brokerList"; public static final String GROUP_ID = "group.id"; diff --git a/flinkx-kb/flinkx-kb-core/src/main/java/com/dtstack/flinkx/kafkabase/entity/kafkaState.java b/flinkx-kb/flinkx-kb-core/src/main/java/com/dtstack/flinkx/kafkabase/entity/kafkaState.java index 7ee7669f54..ca8e7babb9 100644 --- a/flinkx-kb/flinkx-kb-core/src/main/java/com/dtstack/flinkx/kafkabase/entity/kafkaState.java +++ b/flinkx-kb/flinkx-kb-core/src/main/java/com/dtstack/flinkx/kafkabase/entity/kafkaState.java @@ -78,9 +78,9 @@ public boolean equals(Object o) { if (this == o) return true; if (o == null || getClass() != o.getClass()) return false; kafkaState that = (kafkaState) o; - return partition == that.partition && - offset == that.offset && - timestamp == that.timestamp && + return offset.equals(that.offset) && + timestamp.equals(that.timestamp) && + partition.equals(that.partition) && topic.equals(that.topic); } diff --git a/flinkx-kb/flinkx-kb-reader/src/main/java/com/dtstack/flinkx/kafkabase/format/KafkaBaseInputFormat.java b/flinkx-kb/flinkx-kb-reader/src/main/java/com/dtstack/flinkx/kafkabase/format/KafkaBaseInputFormat.java index 5f09e75d31..1daeee612c 100644 --- a/flinkx-kb/flinkx-kb-reader/src/main/java/com/dtstack/flinkx/kafkabase/format/KafkaBaseInputFormat.java +++ b/flinkx-kb/flinkx-kb-reader/src/main/java/com/dtstack/flinkx/kafkabase/format/KafkaBaseInputFormat.java @@ -59,7 +59,6 @@ public class KafkaBaseInputFormat extends BaseRichInputFormat { protected String groupId; protected String codec; protected boolean blankIgnore; - protected String encoding; protected StartupMode mode; protected String offset; protected Long timestamp; @@ -174,10 +173,6 @@ public Object getState(){ return formatState == null ? null : formatState.getState(); } - public String getEncoding() { - return encoding; - } - public IDecode getDecode() { return decode; } diff --git a/flinkx-kb/flinkx-kb-reader/src/main/java/com/dtstack/flinkx/kafkabase/format/KafkaBaseInputFormatBuilder.java b/flinkx-kb/flinkx-kb-reader/src/main/java/com/dtstack/flinkx/kafkabase/format/KafkaBaseInputFormatBuilder.java index b140a6747d..f46655dbf6 100644 --- a/flinkx-kb/flinkx-kb-reader/src/main/java/com/dtstack/flinkx/kafkabase/format/KafkaBaseInputFormatBuilder.java +++ b/flinkx-kb/flinkx-kb-reader/src/main/java/com/dtstack/flinkx/kafkabase/format/KafkaBaseInputFormatBuilder.java @@ -64,10 +64,6 @@ public void setConsumerSettings(Map consumerSettings) { format.consumerSettings = consumerSettings; } - public void setEncoding(String encoding) { - format.encoding = encoding; - } - public void setMode(StartupMode mode) { format.mode = mode; } diff --git a/flinkx-kb/flinkx-kb-reader/src/main/java/com/dtstack/flinkx/kafkabase/reader/KafkaBaseReader.java b/flinkx-kb/flinkx-kb-reader/src/main/java/com/dtstack/flinkx/kafkabase/reader/KafkaBaseReader.java index dacfc4ba0e..614dbbd7ae 100644 --- a/flinkx-kb/flinkx-kb-reader/src/main/java/com/dtstack/flinkx/kafkabase/reader/KafkaBaseReader.java +++ b/flinkx-kb/flinkx-kb-reader/src/main/java/com/dtstack/flinkx/kafkabase/reader/KafkaBaseReader.java @@ -29,7 +29,6 @@ import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; import org.apache.flink.types.Row; -import java.nio.charset.StandardCharsets; import java.util.List; import java.util.Map; @@ -59,7 +58,6 @@ public KafkaBaseReader(DataTransferConfig config, StreamExecutionEnvironment env groupId = readerConfig.getParameter().getStringVal(KafkaConfigKeys.KEY_GROUP_ID, "default"); codec = readerConfig.getParameter().getStringVal(KafkaConfigKeys.KEY_CODEC, "text"); blankIgnore = readerConfig.getParameter().getBooleanVal(KafkaConfigKeys.KEY_BLANK_IGNORE, false); - encoding = readerConfig.getParameter().getStringVal(KafkaConfigKeys.KEY_ENCODING, StandardCharsets.UTF_8.name()); mode = readerConfig.getParameter().getStringVal(KafkaConfigKeys.KEY_MODE, StartupMode.GROUP_OFFSETS.name); offset = readerConfig.getParameter().getStringVal(KafkaConfigKeys.KEY_OFFSET, ""); timestamp = readerConfig.getParameter().getLongVal(KafkaConfigKeys.KEY_TIMESTAMP, -1L); @@ -76,7 +74,6 @@ public DataStream readData() { builder.setGroupId(groupId); builder.setCodec(codec); builder.setBlankIgnore(blankIgnore); - builder.setEncoding(encoding); builder.setConsumerSettings(consumerSettings); builder.setMode(StartupMode.getFromName(mode)); builder.setOffset(offset); diff --git a/flinkx-kingbase/flinkx-kingbase-core/pom.xml b/flinkx-kingbase/flinkx-kingbase-core/pom.xml index 7ce87ab996..899f3ecdfd 100644 --- a/flinkx-kingbase/flinkx-kingbase-core/pom.xml +++ b/flinkx-kingbase/flinkx-kingbase-core/pom.xml @@ -1,6 +1,6 @@ - flinkx-kingbase diff --git a/flinkx-kingbase/flinkx-kingbase-reader/pom.xml b/flinkx-kingbase/flinkx-kingbase-reader/pom.xml index 613433025e..38c18a7e05 100644 --- a/flinkx-kingbase/flinkx-kingbase-reader/pom.xml +++ b/flinkx-kingbase/flinkx-kingbase-reader/pom.xml @@ -1,6 +1,6 @@ - flinkx-kingbase @@ -94,7 +94,7 @@ + tofile="${basedir}/../../syncplugins/kingbasereader/${project.name}-${package.name}.jar" /> diff --git a/flinkx-kingbase/flinkx-kingbase-writer/pom.xml b/flinkx-kingbase/flinkx-kingbase-writer/pom.xml index a4a8831300..5a1bd1a2bc 100644 --- a/flinkx-kingbase/flinkx-kingbase-writer/pom.xml +++ b/flinkx-kingbase/flinkx-kingbase-writer/pom.xml @@ -1,6 +1,6 @@ - flinkx-kingbase @@ -94,7 +94,7 @@ + tofile="${basedir}/../../syncplugins/kingbasewriter/${project.name}-${package.name}.jar" /> diff --git a/flinkx-kingbase/pom.xml b/flinkx-kingbase/pom.xml index 9915d66707..14ceabbcf7 100644 --- a/flinkx-kingbase/pom.xml +++ b/flinkx-kingbase/pom.xml @@ -1,6 +1,6 @@ - flinkx-all diff --git a/flinkx-kudu/flinkx-kudu-reader/pom.xml b/flinkx-kudu/flinkx-kudu-reader/pom.xml index f3ba6a805a..53568ff52a 100644 --- a/flinkx-kudu/flinkx-kudu-reader/pom.xml +++ b/flinkx-kudu/flinkx-kudu-reader/pom.xml @@ -82,7 +82,7 @@ + tofile="${basedir}/../../syncplugins/kudureader/${project.name}-${package.name}.jar" /> diff --git a/flinkx-kudu/flinkx-kudu-writer/pom.xml b/flinkx-kudu/flinkx-kudu-writer/pom.xml index c1a0afdb78..c0030ab1bd 100644 --- a/flinkx-kudu/flinkx-kudu-writer/pom.xml +++ b/flinkx-kudu/flinkx-kudu-writer/pom.xml @@ -82,7 +82,7 @@ + tofile="${basedir}/../../syncplugins/kuduwriter/${project.name}-${package.name}.jar" /> diff --git a/flinkx-launcher/pom.xml b/flinkx-launcher/pom.xml index 9c52b9d958..8d26b84b5d 100644 --- a/flinkx-launcher/pom.xml +++ b/flinkx-launcher/pom.xml @@ -13,9 +13,9 @@ - ch.qos.logback - logback-classic - 1.1.7 + ch.qos.logback + logback-classic + 1.1.7 @@ -30,8 +30,6 @@ - - com.google.code.gson gson diff --git a/flinkx-launcher/src/main/java/com/dtstack/flinkx/launcher/Launcher.java b/flinkx-launcher/src/main/java/com/dtstack/flinkx/launcher/Launcher.java index 2a464515f5..4d6c46c039 100644 --- a/flinkx-launcher/src/main/java/com/dtstack/flinkx/launcher/Launcher.java +++ b/flinkx-launcher/src/main/java/com/dtstack/flinkx/launcher/Launcher.java @@ -124,13 +124,14 @@ public static JobGraph buildJobGraph(Options launcherOptions, String[] remoteArg } PackagedProgram program = PackagedProgram.newBuilder() .setJarFile(jarFile) - .setUserClassPaths(urlList) .setEntryPointClassName(MAIN_CLASS) .setConfiguration(launcherOptions.loadFlinkConfiguration()) .setSavepointRestoreSettings(savepointRestoreSettings) .setArguments(remoteArgs) .build(); - return PackagedProgramUtils.createJobGraph(program, launcherOptions.loadFlinkConfiguration(), Integer.parseInt(launcherOptions.getParallelism()), false); + JobGraph jobGraph = PackagedProgramUtils.createJobGraph(program, launcherOptions.loadFlinkConfiguration(), Integer.parseInt(launcherOptions.getParallelism()), false); + jobGraph.addJars(urlList); + return jobGraph; } public static List analyzeUserClasspath(String content, String pluginRoot) { diff --git a/flinkx-launcher/src/main/java/com/dtstack/flinkx/launcher/perJob/FlinkPerJobUtil.java b/flinkx-launcher/src/main/java/com/dtstack/flinkx/launcher/perJob/FlinkPerJobUtil.java index 2ee9028014..5718ba229a 100644 --- a/flinkx-launcher/src/main/java/com/dtstack/flinkx/launcher/perJob/FlinkPerJobUtil.java +++ b/flinkx-launcher/src/main/java/com/dtstack/flinkx/launcher/perJob/FlinkPerJobUtil.java @@ -43,9 +43,9 @@ public class FlinkPerJobUtil { * the minimum memory should be higher than the min heap cutoff */ public final static int MIN_JM_MEMORY = 768; - public final static int MIN_TM_MEMORY = 768; - public final static String JOBMANAGER_MEMORY_MB = "jobmanager.memory.mb"; - public final static String TASKMANAGER_MEMORY_MB = "taskmanager.memory.mb"; + public final static int MIN_TM_MEMORY = 1024; + public final static String JOBMANAGER_MEMORY_MB = "jobmanager.memory.process.size"; + public final static String TASKMANAGER_MEMORY_MB = "taskmanager.memory.process.size"; public final static String SLOTS_PER_TASKMANAGER = "taskmanager.slots"; private static final Logger LOG = LoggerFactory.getLogger(FlinkPerJobUtil.class); @@ -56,16 +56,16 @@ public class FlinkPerJobUtil { * @return */ public static ClusterSpecification createClusterSpecification(Properties conProp) { - int jobmanagerMemoryMb = 768; - int taskmanagerMemoryMb = 768; + int jobManagerMemoryMb = 768; + int taskManagerMemoryMb = 1024; int slotsPerTaskManager = 1; if (conProp != null) { - if (conProp.containsKey(JOBMANAGER_MEMORY_MB)) { - jobmanagerMemoryMb = Math.max(MIN_JM_MEMORY, ValueUtil.getInt(conProp.getProperty(JOBMANAGER_MEMORY_MB))); + if (conProp.contains(JOBMANAGER_MEMORY_MB)) { + jobManagerMemoryMb = Math.max(MIN_JM_MEMORY, ValueUtil.getInt(conProp.getProperty(JOBMANAGER_MEMORY_MB))); } - if (conProp.containsKey(TASKMANAGER_MEMORY_MB)) { - taskmanagerMemoryMb = Math.max(MIN_JM_MEMORY, ValueUtil.getInt(conProp.getProperty(TASKMANAGER_MEMORY_MB))); + if (conProp.contains(TASKMANAGER_MEMORY_MB)) { + taskManagerMemoryMb = Math.max(MIN_JM_MEMORY, ValueUtil.getInt(conProp.getProperty(TASKMANAGER_MEMORY_MB))); } if (conProp.containsKey(SLOTS_PER_TASKMANAGER)) { slotsPerTaskManager = ValueUtil.getInt(conProp.get(SLOTS_PER_TASKMANAGER)); @@ -73,8 +73,8 @@ public static ClusterSpecification createClusterSpecification(Properties conProp } return new ClusterSpecification.ClusterSpecificationBuilder() - .setMasterMemoryMB(jobmanagerMemoryMb) - .setTaskManagerMemoryMB(taskmanagerMemoryMb) + .setMasterMemoryMB(jobManagerMemoryMb) + .setTaskManagerMemoryMB(taskManagerMemoryMb) .setSlotsPerTaskManager(slotsPerTaskManager) .createClusterSpecification(); } diff --git a/flinkx-launcher/src/main/java/com/dtstack/flinkx/launcher/perJob/PerJobClusterClientBuilder.java b/flinkx-launcher/src/main/java/com/dtstack/flinkx/launcher/perJob/PerJobClusterClientBuilder.java index cf8c8caef5..f13251a5ed 100644 --- a/flinkx-launcher/src/main/java/com/dtstack/flinkx/launcher/perJob/PerJobClusterClientBuilder.java +++ b/flinkx-launcher/src/main/java/com/dtstack/flinkx/launcher/perJob/PerJobClusterClientBuilder.java @@ -26,8 +26,8 @@ import org.apache.flink.runtime.security.SecurityUtils; import org.apache.flink.yarn.YarnClientYarnClusterInformationRetriever; import org.apache.flink.yarn.YarnClusterDescriptor; -import org.apache.flink.yarn.cli.FlinkYarnSessionCli; import org.apache.flink.yarn.configuration.YarnConfigOptionsInternal; +import org.apache.flink.yarn.configuration.YarnLogConfigUtil; import org.apache.hadoop.fs.Path; import org.apache.hadoop.yarn.client.api.YarnClient; import org.apache.hadoop.yarn.conf.YarnConfiguration; @@ -99,13 +99,13 @@ public YarnClusterDescriptor createPerJobClusterDescriptor(Options launcherOptio throw new IllegalArgumentException("The Flink jar path is null"); } - File log4j = new File(launcherOptions.getFlinkconf()+ File.separator + FlinkYarnSessionCli.CONFIG_FILE_LOG4J_NAME); + File log4j = new File(launcherOptions.getFlinkconf()+ File.separator + YarnLogConfigUtil.CONFIG_FILE_LOG4J_NAME); if(log4j.exists()){ - flinkConfig.setString(YarnConfigOptionsInternal.APPLICATION_LOG_CONFIG_FILE, launcherOptions.getFlinkconf()+ File.separator + FlinkYarnSessionCli.CONFIG_FILE_LOG4J_NAME); + flinkConfig.setString(YarnConfigOptionsInternal.APPLICATION_LOG_CONFIG_FILE, launcherOptions.getFlinkconf()+ File.separator + YarnLogConfigUtil.CONFIG_FILE_LOG4J_NAME); } else{ - File logback = new File(launcherOptions.getFlinkconf()+ File.separator + FlinkYarnSessionCli.CONFIG_FILE_LOGBACK_NAME); + File logback = new File(launcherOptions.getFlinkconf()+ File.separator + YarnLogConfigUtil.CONFIG_FILE_LOGBACK_NAME); if(logback.exists()){ - flinkConfig.setString(YarnConfigOptionsInternal.APPLICATION_LOG_CONFIG_FILE, launcherOptions.getFlinkconf()+ File.separator + FlinkYarnSessionCli.CONFIG_FILE_LOGBACK_NAME); + flinkConfig.setString(YarnConfigOptionsInternal.APPLICATION_LOG_CONFIG_FILE, launcherOptions.getFlinkconf()+ File.separator + YarnLogConfigUtil.CONFIG_FILE_LOGBACK_NAME); } } diff --git a/flinkx-launcher/src/main/java/org/apache/flink/client/deployment/ClusterSpecification.java b/flinkx-launcher/src/main/java/org/apache/flink/client/deployment/ClusterSpecification.java index 3f37518c41..f95543a97f 100644 --- a/flinkx-launcher/src/main/java/org/apache/flink/client/deployment/ClusterSpecification.java +++ b/flinkx-launcher/src/main/java/org/apache/flink/client/deployment/ClusterSpecification.java @@ -19,10 +19,7 @@ package org.apache.flink.client.deployment; import org.apache.flink.client.program.PackagedProgram; -import org.apache.flink.configuration.ConfigConstants; import org.apache.flink.configuration.Configuration; -import org.apache.flink.configuration.JobManagerOptions; -import org.apache.flink.configuration.TaskManagerOptions; import org.apache.flink.runtime.jobgraph.JobGraph; import org.apache.flink.runtime.jobgraph.SavepointRestoreSettings; import org.apache.hadoop.yarn.conf.YarnConfiguration; @@ -62,20 +59,6 @@ private ClusterSpecification(int masterMemoryMB, int taskManagerMemoryMB, int nu this.priority = priority; } - public static ClusterSpecification fromConfiguration(Configuration configuration) { - int slots = configuration.getInteger(ConfigConstants.TASK_MANAGER_NUM_TASK_SLOTS, 1); - - int jobManagerMemoryMb = configuration.getInteger(JobManagerOptions.JOB_MANAGER_HEAP_MEMORY_MB); - int taskManagerMemoryMb = configuration.getInteger(TaskManagerOptions.TASK_MANAGER_HEAP_MEMORY_MB); - - return new ClusterSpecificationBuilder() - .setMasterMemoryMB(jobManagerMemoryMb) - .setTaskManagerMemoryMB(taskManagerMemoryMb) - .setNumberTaskManagers(1) - .setSlotsPerTaskManager(slots) - .createClusterSpecification(); - } - public PackagedProgram getProgram() { return program; } @@ -200,7 +183,7 @@ public String toString() { */ public static class ClusterSpecificationBuilder { private int masterMemoryMB = 768; - private int taskManagerMemoryMB = 768; + private int taskManagerMemoryMB = 1024; private int numberTaskManagers = 1; private int slotsPerTaskManager = 1; private int parallelism = 1; diff --git a/flinkx-launcher/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java b/flinkx-launcher/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java index 3827005790..3d13479ef5 100644 --- a/flinkx-launcher/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java +++ b/flinkx-launcher/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java @@ -18,6 +18,9 @@ package org.apache.flink.yarn; +import com.dtstack.flinkx.constants.ConfigConstant; +import com.dtstack.flinkx.constants.ConstantValue; +import com.dtstack.flinkx.launcher.perJob.FlinkPerJobUtil; import org.apache.commons.lang3.StringUtils; import org.apache.flink.annotation.VisibleForTesting; import org.apache.flink.api.common.cache.DistributedCache; @@ -26,32 +29,60 @@ import org.apache.flink.client.deployment.ClusterDescriptor; import org.apache.flink.client.deployment.ClusterRetrieveException; import org.apache.flink.client.deployment.ClusterSpecification; +import org.apache.flink.client.deployment.application.ApplicationConfiguration; import org.apache.flink.client.program.ClusterClientProvider; import org.apache.flink.client.program.PackagedProgram; import org.apache.flink.client.program.PackagedProgramUtils; import org.apache.flink.client.program.rest.RestClusterClient; -import org.apache.flink.configuration.*; +import org.apache.flink.configuration.ConfigConstants; +import org.apache.flink.configuration.ConfigUtils; +import org.apache.flink.configuration.Configuration; +import org.apache.flink.configuration.ConfigurationUtils; +import org.apache.flink.configuration.CoreOptions; +import org.apache.flink.configuration.HighAvailabilityOptions; +import org.apache.flink.configuration.IllegalConfigurationException; +import org.apache.flink.configuration.JobManagerOptions; +import org.apache.flink.configuration.PipelineOptions; +import org.apache.flink.configuration.ResourceManagerOptions; +import org.apache.flink.configuration.RestOptions; +import org.apache.flink.configuration.SecurityOptions; import org.apache.flink.core.plugin.PluginConfig; import org.apache.flink.core.plugin.PluginUtils; import org.apache.flink.runtime.clusterframework.BootstrapTools; import org.apache.flink.runtime.entrypoint.ClusterEntrypoint; import org.apache.flink.runtime.jobgraph.JobGraph; import org.apache.flink.runtime.jobmanager.HighAvailabilityMode; +import org.apache.flink.runtime.jobmanager.JobManagerProcessSpec; +import org.apache.flink.runtime.jobmanager.JobManagerProcessUtils; +import org.apache.flink.runtime.util.HadoopUtils; import org.apache.flink.util.FlinkException; import org.apache.flink.util.Preconditions; import org.apache.flink.util.ShutdownHookUtil; import org.apache.flink.yarn.configuration.YarnConfigOptions; import org.apache.flink.yarn.configuration.YarnConfigOptionsInternal; +import org.apache.flink.yarn.configuration.YarnDeploymentTarget; +import org.apache.flink.yarn.configuration.YarnLogConfigUtil; +import org.apache.flink.yarn.entrypoint.YarnApplicationClusterEntryPoint; import org.apache.flink.yarn.entrypoint.YarnJobClusterEntrypoint; import org.apache.flink.yarn.entrypoint.YarnSessionClusterEntrypoint; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; -import org.apache.hadoop.fs.permission.FsAction; -import org.apache.hadoop.fs.permission.FsPermission; +import org.apache.hadoop.hdfs.DFSConfigKeys; import org.apache.hadoop.security.UserGroupInformation; import org.apache.hadoop.yarn.api.ApplicationConstants; import org.apache.hadoop.yarn.api.protocolrecords.GetNewApplicationResponse; -import org.apache.hadoop.yarn.api.records.*; +import org.apache.hadoop.yarn.api.records.ApplicationId; +import org.apache.hadoop.yarn.api.records.ApplicationReport; +import org.apache.hadoop.yarn.api.records.ApplicationSubmissionContext; +import org.apache.hadoop.yarn.api.records.ContainerLaunchContext; +import org.apache.hadoop.yarn.api.records.FinalApplicationStatus; +import org.apache.hadoop.yarn.api.records.NodeReport; +import org.apache.hadoop.yarn.api.records.NodeState; +import org.apache.hadoop.yarn.api.records.Priority; +import org.apache.hadoop.yarn.api.records.QueueInfo; +import org.apache.hadoop.yarn.api.records.Resource; +import org.apache.hadoop.yarn.api.records.YarnApplicationState; +import org.apache.hadoop.yarn.api.records.YarnClusterMetrics; import org.apache.hadoop.yarn.client.api.YarnClient; import org.apache.hadoop.yarn.client.api.YarnClientApplication; import org.apache.hadoop.yarn.conf.YarnConfiguration; @@ -62,30 +93,36 @@ import org.slf4j.LoggerFactory; import javax.annotation.Nullable; -import java.io.*; +import java.io.ByteArrayOutputStream; +import java.io.File; +import java.io.FileOutputStream; +import java.io.IOException; +import java.io.ObjectOutputStream; +import java.io.PrintStream; +import java.io.UnsupportedEncodingException; import java.lang.reflect.InvocationTargetException; import java.lang.reflect.Method; +import java.net.URI; import java.net.URISyntaxException; import java.net.URLDecoder; import java.nio.charset.Charset; -import java.nio.file.FileVisitResult; -import java.nio.file.Files; -import java.nio.file.SimpleFileVisitor; -import java.nio.file.attribute.BasicFileAttributes; -import java.util.*; +import java.util.Collection; +import java.util.Collections; +import java.util.HashMap; +import java.util.HashSet; +import java.util.LinkedList; +import java.util.List; +import java.util.Map; +import java.util.Optional; +import java.util.Set; import java.util.stream.Collectors; -import static com.dtstack.flinkx.constants.ConfigConstant.FLINK_PLUGIN_LOAD_MODE_KEY; -import static com.dtstack.flinkx.constants.ConstantValue.SHIP_FILE_PLUGIN_LOAD_MODE; -import static com.dtstack.flinkx.launcher.perJob.FlinkPerJobUtil.buildProgram; -import static com.dtstack.flinkx.launcher.perJob.FlinkPerJobUtil.getUrlFormat; import static org.apache.flink.configuration.ConfigConstants.DEFAULT_FLINK_USR_LIB_DIR; import static org.apache.flink.configuration.ConfigConstants.ENV_FLINK_LIB_DIR; import static org.apache.flink.runtime.entrypoint.component.FileJobGraphRetriever.JOB_GRAPH_FILE_PATH; import static org.apache.flink.util.Preconditions.checkArgument; import static org.apache.flink.util.Preconditions.checkNotNull; -import static org.apache.flink.yarn.cli.FlinkYarnSessionCli.CONFIG_FILE_LOG4J_NAME; -import static org.apache.flink.yarn.cli.FlinkYarnSessionCli.CONFIG_FILE_LOGBACK_NAME; +import static org.apache.flink.yarn.YarnConfigKeys.LOCAL_RESOURCE_DESCRIPTOR_SEPARATOR; /** * The descriptor with deployment information for deploying a Flink cluster on Yarn. @@ -106,11 +143,17 @@ public class YarnClusterDescriptor implements ClusterDescriptor { private final List shipFiles = new LinkedList<>(); private final String yarnQueue; + + private Path flinkJarPath; + private final Configuration flinkConfiguration; + private final String customName; + private final String nodeLabel; + private final String applicationType; - private Path flinkJarPath; + private String zookeeperNamespace; private YarnConfigOptions.UserJarInclusion userJarInclusion; @@ -142,161 +185,6 @@ public YarnClusterDescriptor( this.zookeeperNamespace = flinkConfiguration.getString(HighAvailabilityOptions.HA_CLUSTER_ID, null); } - /** - * Uploads and registers a single resource and adds it to localResources. - * - * @param key - * the key to add the resource under - * @param fs - * the remote file system to upload to - * @param appId - * application ID - * @param localSrcPath - * local path to the file - * @param localResources - * map of resources - * - * @return the remote path to the uploaded resource - */ - private static Path setupSingleLocalResource( - String key, - FileSystem fs, - ApplicationId appId, - Path localSrcPath, - Map localResources, - Path targetHomeDir, - String relativeTargetPath) throws IOException { - Tuple2 resource = Utils.setupLocalResource( - fs, - appId.toString(), - localSrcPath, - targetHomeDir, - relativeTargetPath); - - localResources.put(key, resource.f1); - - return resource.f0; - } - - /** - * Match file name for "flink-dist*.jar" pattern. - * - * @param fileName file name to check - * @return true if file is a dist jar - */ - private static boolean isDistJar(String fileName) { - return fileName.startsWith("flink-dist") && fileName.endsWith("jar"); - } - - /** - * Recursively uploads (and registers) any (user and system) files in shipFiles except - * for files matching "flink-dist*.jar" which should be uploaded separately. - * - * @param shipFiles - * files to upload - * @param fs - * file system to upload to - * @param targetHomeDir - * remote home directory to upload to - * @param appId - * application ID - * @param remotePaths - * paths of the remote resources (uploaded resources will be added) - * @param localResources - * map of resources (uploaded resources will be added) - * @param localResourcesDirectory - * the directory the localResources are uploaded to - * @param envShipFileList - * list of shipped files in a format understood by {@link Utils#createTaskExecutorContext} - * - * @return list of class paths with the the proper resource keys from the registration - */ - static List uploadAndRegisterFiles( - Collection shipFiles, - FileSystem fs, - Path targetHomeDir, - ApplicationId appId, - List remotePaths, - Map localResources, - String localResourcesDirectory, - StringBuilder envShipFileList) throws IOException { - final List localPaths = new ArrayList<>(); - final List relativePaths = new ArrayList<>(); - for (File shipFile : shipFiles) { - if (shipFile.isDirectory()) { - // add directories to the classpath - final java.nio.file.Path shipPath = shipFile.toPath(); - final java.nio.file.Path parentPath = shipPath.getParent(); - Files.walkFileTree(shipPath, new SimpleFileVisitor() { - @Override - public FileVisitResult visitFile(java.nio.file.Path file, BasicFileAttributes attrs) { - localPaths.add(new Path(file.toUri())); - relativePaths.add(new Path(localResourcesDirectory, parentPath.relativize(file).toString())); - return FileVisitResult.CONTINUE; - } - }); - } else { - localPaths.add(new Path(shipFile.toURI())); - relativePaths.add(new Path(localResourcesDirectory, shipFile.getName())); - } - } - - final Set archives = new HashSet<>(); - final Set resources = new HashSet<>(); - for (int i = 0; i < localPaths.size(); i++) { - final Path localPath = localPaths.get(i); - final Path relativePath = relativePaths.get(i); - if (!isDistJar(relativePath.getName())) { - final String key = relativePath.toString(); - final Path remotePath = setupSingleLocalResource( - key, - fs, - appId, - localPath, - localResources, - targetHomeDir, - relativePath.getParent().toString()); - remotePaths.add(remotePath); - envShipFileList.append(key).append("=").append(remotePath).append(","); - // add files to the classpath - if (key.endsWith("jar")) { - archives.add(relativePath.toString()); - } else { - resources.add(relativePath.getParent().toString()); - } - } - } - - // construct classpath, we always want resource directories to go first, we also sort - // both resources and archives in order to make classpath deterministic - final ArrayList classPaths = new ArrayList<>(); - resources.stream().sorted().forEach(classPaths::add); - archives.stream().sorted().forEach(classPaths::add); - return classPaths; - } - - private static YarnConfigOptions.UserJarInclusion getUserJarInclusionMode(org.apache.flink.configuration.Configuration config) { - return config.getEnum(YarnConfigOptions.UserJarInclusion.class, YarnConfigOptions.CLASSPATH_INCLUDE_USER_JAR); - } - - private static boolean isUsrLibDirIncludedInShipFiles(List shipFiles) { - return shipFiles.stream() - .filter(File::isDirectory) - .map(File::getName) - .noneMatch(name -> name.equals(DEFAULT_FLINK_USR_LIB_DIR)); - } - - public static void logDetachedClusterInformation(ApplicationId yarnApplicationId, Logger logger) { - logger.info( - "The Flink YARN session cluster has been started in detached mode. In order to " + - "stop Flink gracefully, use the following command:\n" + - "$ echo \"stop\" | ./bin/yarn-session.sh -id {}\n" + - "If this should not be possible, then you can also kill Flink via YARN's web interface or via:\n" + - "$ yarn application -kill {}\n" + - "Note that killing Flink might not clean up all job artifacts and temporary files.", - yarnApplicationId, yarnApplicationId); - } - private Optional> decodeDirsToShipToCluster(final Configuration configuration) { checkNotNull(configuration); @@ -358,10 +246,6 @@ public Configuration getFlinkConfiguration() { return flinkConfiguration; } - // ------------------------------------------------------------- - // Lifecycle management - // ------------------------------------------------------------- - public void setLocalJarPath(Path localJarPath) { if (!localJarPath.toString().endsWith("jar")) { throw new IllegalArgumentException("The passed jar path ('" + localJarPath + "') does not end with the 'jar' extension"); @@ -369,27 +253,12 @@ public void setLocalJarPath(Path localJarPath) { this.flinkJarPath = localJarPath; } - // ------------------------------------------------------------- - // ClusterClient overrides - // ------------------------------------------------------------- - - /** - * Adds the given files to the list of files to ship. - * - *

Note that any file matching "flink-dist*.jar" will be excluded from the upload by - * {@link #uploadAndRegisterFiles(Collection, FileSystem, Path, ApplicationId, List, Map, String, StringBuilder)} - * since we upload the Flink uber jar ourselves and do not need to deploy it multiple times. - * - * @param shipFiles files to ship - */ - public void addShipFiles(List shipFiles) { - checkArgument(userJarInclusion != YarnConfigOptions.UserJarInclusion.DISABLED || isUsrLibDirIncludedInShipFiles(shipFiles), - "This is an illegal ship directory : %s. When setting the %s to %s the name of ship directory can not be %s.", - ConfigConstants.DEFAULT_FLINK_USR_LIB_DIR, - YarnConfigOptions.CLASSPATH_INCLUDE_USER_JAR.key(), - YarnConfigOptions.UserJarInclusion.DISABLED, - ConfigConstants.DEFAULT_FLINK_USR_LIB_DIR); - this.shipFiles.addAll(shipFiles); + private static String encodeYarnLocalResourceDescriptorListToString(List resources) { + return String.join( + LOCAL_RESOURCE_DESCRIPTOR_SEPARATOR, + resources.stream() + .map(YarnLocalResourceDescriptor::toString) + .collect(Collectors.toList())); } private void isReadyForDeployment(ClusterSpecification clusterSpecification) throws Exception { @@ -444,6 +313,10 @@ public String getNodeLabel() { return nodeLabel; } + // ------------------------------------------------------------- + // Lifecycle management + // ------------------------------------------------------------- + @Override public void close() { if (!sharedYarnClient) { @@ -451,6 +324,10 @@ public void close() { } } + // ------------------------------------------------------------- + // ClusterClient overrides + // ------------------------------------------------------------- + @Override public ClusterClientProvider retrieve(ApplicationId applicationId) throws ClusterRetrieveException { @@ -500,6 +377,10 @@ public ClusterClientProvider deploySessionCluster(ClusterSpecific } } + private static YarnConfigOptions.UserJarInclusion getUserJarInclusionMode(org.apache.flink.configuration.Configuration config) { + return config.getEnum(YarnConfigOptions.UserJarInclusion.class, YarnConfigOptions.CLASSPATH_INCLUDE_USER_JAR); + } + @Override public ClusterClientProvider deployJobCluster( ClusterSpecification clusterSpecification, @@ -517,13 +398,119 @@ public ClusterClientProvider deployJobCluster( } } + private static boolean isUsrLibDirIncludedInShipFiles(List shipFiles) { + return shipFiles.stream() + .filter(File::isDirectory) + .map(File::getName) + .noneMatch(name -> name.equals(DEFAULT_FLINK_USR_LIB_DIR)); + } + + public static void logDetachedClusterInformation(ApplicationId yarnApplicationId, Logger logger) { + logger.info( + "The Flink YARN session cluster has been started in detached mode. In order to " + + "stop Flink gracefully, use the following command:\n" + + "$ echo \"stop\" | ./bin/yarn-session.sh -id {}\n" + + "If this should not be possible, then you can also kill Flink via YARN's web interface or via:\n" + + "$ yarn application -kill {}\n" + + "Note that killing Flink might not clean up all job artifacts and temporary files.", + yarnApplicationId, yarnApplicationId); + } + + /** + * Adds the given files to the list of files to ship. + * + *

Note that any file matching "flink-dist*.jar" will be excluded from the upload by + * {@link YarnApplicationFileUploader#registerMultipleLocalResources(Collection, String)} + * since we upload the Flink uber jar ourselves and do not need to deploy it multiple times. + * + * @param shipFiles files to ship + */ + public void addShipFiles(List shipFiles) { + checkArgument(userJarInclusion != YarnConfigOptions.UserJarInclusion.DISABLED || isUsrLibDirIncludedInShipFiles(shipFiles), + "This is an illegal ship directory : %s. When setting the %s to %s the name of ship directory can not be %s.", + ConfigConstants.DEFAULT_FLINK_USR_LIB_DIR, + YarnConfigOptions.CLASSPATH_INCLUDE_USER_JAR.key(), + YarnConfigOptions.UserJarInclusion.DISABLED, + ConfigConstants.DEFAULT_FLINK_USR_LIB_DIR); + this.shipFiles.addAll(shipFiles); + } + + @Override + public ClusterClientProvider deployApplicationCluster( + final ClusterSpecification clusterSpecification, + final ApplicationConfiguration applicationConfiguration) throws ClusterDeploymentException { + checkNotNull(clusterSpecification); + checkNotNull(applicationConfiguration); + + final YarnDeploymentTarget deploymentTarget = YarnDeploymentTarget.fromConfig(flinkConfiguration); + if (YarnDeploymentTarget.APPLICATION != deploymentTarget) { + throw new ClusterDeploymentException( + "Couldn't deploy Yarn Application Cluster." + + " Expected deployment.target=" + YarnDeploymentTarget.APPLICATION.getName() + + " but actual one was \"" + deploymentTarget.getName() + "\""); + } + + applicationConfiguration.applyToConfiguration(flinkConfiguration); + + final List pipelineJars = flinkConfiguration.getOptional(PipelineOptions.JARS).orElse(Collections.emptyList()); + Preconditions.checkArgument(pipelineJars.size() == 1, "Should only have one jar"); + + try { + return deployInternal( + clusterSpecification, + "Flink Application Cluster", + YarnApplicationClusterEntryPoint.class.getName(), + null, + false); + } catch (Exception e) { + throw new ClusterDeploymentException("Couldn't deploy Yarn Application Cluster", e); + } + } + + private void checkYarnQueues(YarnClient yarnClient) { + try { + List queues = yarnClient.getAllQueues(); + if (queues.size() > 0 && this.yarnQueue != null) { // check only if there are queues configured in yarn and for this session. + boolean queueFound = false; + for (QueueInfo queue : queues) { + if (queue.getQueueName().equals(this.yarnQueue)) { + queueFound = true; + break; + } + } + if (!queueFound) { + String queueNames = ""; + for (QueueInfo queue : queues) { + queueNames += queue.getQueueName() + ", "; + } + LOG.warn("The specified queue '" + this.yarnQueue + "' does not exist. " + + "Available queues: " + queueNames); + } + } else { + LOG.debug("The YARN cluster does not have any queues configured"); + } + } catch (Throwable e) { + LOG.warn("Error while getting queue information from YARN: " + e.getMessage()); + if (LOG.isDebugEnabled()) { + LOG.debug("Error details", e); + } + } + } + @Override public void killCluster(ApplicationId applicationId) throws FlinkException { try { yarnClient.killApplication(applicationId); - Utils.deleteApplicationFiles(Collections.singletonMap( - YarnConfigKeys.FLINK_YARN_FILES, - getYarnFilesDir(applicationId).toUri().toString())); + + try (final FileSystem fs = FileSystem.get(yarnConfiguration)) { + final Path applicationDir = YarnApplicationFileUploader + .getApplicationDirPath(fs.getHomeDirectory(), applicationId); + + Utils.deleteApplicationFiles(Collections.singletonMap( + YarnConfigKeys.FLINK_YARN_FILES, + applicationDir.toUri().toString())); + } + } catch (YarnException | IOException e) { throw new FlinkException("Could not kill the Yarn Flink cluster with id " + applicationId + '.', e); } @@ -545,18 +532,13 @@ private ClusterClientProvider deployInternal( @Nullable JobGraph jobGraph, boolean detached) throws Exception { - if (UserGroupInformation.isSecurityEnabled()) { - // note: UGI::hasKerberosCredentials inaccurately reports false - // for logins based on a keytab (fixed in Hadoop 2.6.1, see HADOOP-10786), - // so we check only in ticket cache scenario. + final UserGroupInformation currentUser = UserGroupInformation.getCurrentUser(); + if (HadoopUtils.isKerberosSecurityEnabled(currentUser)) { boolean useTicketCache = flinkConfiguration.getBoolean(SecurityOptions.KERBEROS_LOGIN_USETICKETCACHE); - UserGroupInformation loginUser = UserGroupInformation.getCurrentUser(); - if (loginUser.getAuthenticationMethod() == UserGroupInformation.AuthenticationMethod.KERBEROS - && useTicketCache && !loginUser.hasKerberosCredentials()) { - LOG.error("Hadoop security with Kerberos is enabled but the login user does not have Kerberos credentials"); + if (!HadoopUtils.areKerberosCredentialsValid(currentUser, useTicketCache)) { throw new RuntimeException("Hadoop security with Kerberos is enabled but the login user " + - "does not have Kerberos credentials"); + "does not have Kerberos credentials or delegation tokens!"); } } @@ -573,12 +555,12 @@ private ClusterClientProvider deployInternal( final GetNewApplicationResponse appResponse = yarnApplication.getNewApplicationResponse(); if(clusterSpecification.isCreateProgramDelay()){ - String url = getUrlFormat(clusterSpecification.getYarnConfiguration(), yarnClient) + "/" + appResponse.getApplicationId().toString(); - PackagedProgram program = buildProgram(url,clusterSpecification); + String url = FlinkPerJobUtil.getUrlFormat(clusterSpecification.getYarnConfiguration(), yarnClient) + "/" + appResponse.getApplicationId().toString(); + PackagedProgram program = FlinkPerJobUtil.buildProgram(url,clusterSpecification); clusterSpecification.setProgram(program); jobGraph = PackagedProgramUtils.createJobGraph(program, clusterSpecification.getConfiguration(), clusterSpecification.getParallelism(), false); - String pluginLoadMode = clusterSpecification.getConfiguration().getString(FLINK_PLUGIN_LOAD_MODE_KEY); - if(StringUtils.equalsIgnoreCase(pluginLoadMode, SHIP_FILE_PLUGIN_LOAD_MODE)){ + String pluginLoadMode = clusterSpecification.getConfiguration().getString(ConfigConstant.FLINK_PLUGIN_LOAD_MODE_KEY); + if(StringUtils.equalsIgnoreCase(pluginLoadMode, ConstantValue.SHIP_FILE_PLUGIN_LOAD_MODE)){ jobGraph.getClasspaths().forEach(jarFile -> { try { shipFiles.add(new File(jarFile.toURI())); @@ -660,12 +642,8 @@ private ClusterSpecification validateClusterResources( int jobManagerMemoryMb = clusterSpecification.getMasterMemoryMB(); final int taskManagerMemoryMb = clusterSpecification.getTaskManagerMemoryMB(); - if (jobManagerMemoryMb < yarnMinAllocationMB || taskManagerMemoryMb < yarnMinAllocationMB) { - LOG.warn("The JobManager or TaskManager memory is below the smallest possible YARN Container size. " - + "The value of 'yarn.scheduler.minimum-allocation-mb' is '" + yarnMinAllocationMB + "'. Please increase the memory size." + - "YARN will allocate the smaller containers but the scheduler will account for the minimum-allocation-mb, maybe not all instances " + - "you requested will start."); - } + logIfComponentMemNotIntegerMultipleOfYarnMinAllocation("JobManager", jobManagerMemoryMb, yarnMinAllocationMB); + logIfComponentMemNotIntegerMultipleOfYarnMinAllocation("TaskManager", taskManagerMemoryMb, yarnMinAllocationMB); // set the memory to minAllocationMB to do the next checks correctly if (jobManagerMemoryMb < yarnMinAllocationMB) { @@ -705,33 +683,36 @@ private ClusterSpecification validateClusterResources( } - private void checkYarnQueues(YarnClient yarnClient) { + private void logIfComponentMemNotIntegerMultipleOfYarnMinAllocation( + String componentName, + int componentMemoryMB, + int yarnMinAllocationMB) { + int normalizedMemMB = (componentMemoryMB + (yarnMinAllocationMB - 1)) / yarnMinAllocationMB * yarnMinAllocationMB; + if (normalizedMemMB <= 0) { + normalizedMemMB = yarnMinAllocationMB; + } + if (componentMemoryMB != normalizedMemMB) { + LOG.info("The configured {} memory is {} MB. YARN will allocate {} MB to make up an integer multiple of its " + + "minimum allocation memory ({} MB, configured via 'yarn.scheduler.minimum-allocation-mb'). The extra {} MB " + + "may not be used by Flink.", componentName, componentMemoryMB, normalizedMemMB, yarnMinAllocationMB, + normalizedMemMB - componentMemoryMB); + } + } + + /** + * Kills YARN application and stops YARN client. + * + *

Use this method to kill the App before it has been properly deployed + */ + private void failSessionDuringDeployment(YarnClient yarnClient, YarnClientApplication yarnApplication) { + LOG.info("Killing YARN application"); + try { - List queues = yarnClient.getAllQueues(); - if (queues.size() > 0 && this.yarnQueue != null) { // check only if there are queues configured in yarn and for this session. - boolean queueFound = false; - for (QueueInfo queue : queues) { - if (queue.getQueueName().equals(this.yarnQueue)) { - queueFound = true; - break; - } - } - if (!queueFound) { - String queueNames = ""; - for (QueueInfo queue : queues) { - queueNames += queue.getQueueName() + ", "; - } - LOG.warn("The specified queue '" + this.yarnQueue + "' does not exist. " + - "Available queues: " + queueNames); - } - } else { - LOG.debug("The YARN cluster does not have any queues configured"); - } - } catch (Throwable e) { - LOG.warn("Error while getting queue information from YARN: " + e.getMessage()); - if (LOG.isDebugEnabled()) { - LOG.debug("Error details", e); - } + yarnClient.killApplication(yarnApplication.getNewApplicationResponse().getApplicationId()); + } catch (Exception e) { + // we only log a debug message here because the "killApplication" call is a best-effort + // call (we don't know if the application has been deployed when the error occured). + LOG.debug("Error while killing YARN application", e); } } @@ -750,11 +731,7 @@ private ApplicationReport startAppMaster( configuration, PluginUtils.createPluginManagerFromRootFolder(configuration)); - // initialize file system - // Copy the application master jar to the filesystem - // Create a local resource to point to the destination jar path final FileSystem fs = FileSystem.get(yarnConfiguration); - final Path homeDir = fs.getHomeDirectory(); // hard coded check for the GoogleHDFS client because its not overriding the getScheme() method. if (!fs.getClass().getSimpleName().equals("GoogleHadoopFileSystem") && @@ -765,10 +742,18 @@ private ApplicationReport startAppMaster( } ApplicationSubmissionContext appContext = yarnApplication.getApplicationSubmissionContext(); + + final List providedLibDirs = getRemoteSharedPaths(configuration); + + final YarnApplicationFileUploader fileUploader = YarnApplicationFileUploader.from( + fs, + fs.getHomeDirectory(), + providedLibDirs, + appContext.getApplicationId(), + getFileReplication()); + // The files need to be shipped and added to classpath. Set systemShipFiles = new HashSet<>(shipFiles.size()); - // The files only need to be shipped. - Set shipOnlyFiles = new HashSet<>(); for (File file : shipFiles) { systemShipFiles.add(file.getAbsoluteFile()); } @@ -778,11 +763,6 @@ private ApplicationReport startAppMaster( systemShipFiles.add(new File(logConfigFilePath)); } - addLibFoldersToShipFiles(systemShipFiles); - - // Plugin files only need to be shipped and should not be added to classpath. - addPluginsFoldersToShipFiles(shipOnlyFiles); - // Set-up ApplicationSubmissionContext for the application final ApplicationId appId = appContext.getApplicationId(); @@ -814,21 +794,24 @@ private ApplicationReport startAppMaster( 1)); } - final Set userJarFiles = (jobGraph == null) - // not per-job submission - ? Collections.emptySet() - // add user code jars from the provided JobGraph - : jobGraph.getUserJars().stream().map(f -> f.toUri()).map(File::new).collect(Collectors.toSet()); + final Set userJarFiles = new HashSet<>(); + if (jobGraph != null) { + userJarFiles.addAll(jobGraph.getUserJars().stream().map(f -> f.toUri()).map(Path::new).collect(Collectors.toSet())); + } + + final List jarUrls = ConfigUtils.decodeListFromConfig(configuration, PipelineOptions.JARS, URI::create); + if (jarUrls != null && YarnApplicationClusterEntryPoint.class.getName().equals(yarnClusterEntrypoint)) { + userJarFiles.addAll(jarUrls.stream().map(Path::new).collect(Collectors.toSet())); + } // only for per job mode if (jobGraph != null) { for (Map.Entry entry : jobGraph.getUserArtifacts().entrySet()) { - org.apache.flink.core.fs.Path path = new org.apache.flink.core.fs.Path(entry.getValue().filePath); // only upload local files - if (!path.getFileSystem().isDistributedFS()) { - Path localPath = new Path(path.getPath()); + if (!Utils.isRemotePath(entry.getValue().filePath)) { + Path localPath = new Path(entry.getValue().filePath); Tuple2 remoteFileInfo = - Utils.uploadLocalFileToRemote(fs, appId.toString(), localPath, homeDir, entry.getKey()); + fileUploader.uploadLocalFileToRemote(localPath, entry.getKey()); jobGraph.setUserArtifactRemotePath(entry.getKey(), remoteFileInfo.f0.toString()); } } @@ -836,45 +819,33 @@ private ApplicationReport startAppMaster( jobGraph.writeUserArtifactEntriesToConfiguration(); } - // local resource map for Yarn - final Map localResources = new HashMap<>(2 + systemShipFiles.size() + userJarFiles.size()); - // list of remote paths (after upload) - final List paths = new ArrayList<>(2 + systemShipFiles.size() + userJarFiles.size()); - // ship list that enables reuse of resources for task manager containers - StringBuilder envShipFileList = new StringBuilder(); + if (providedLibDirs == null || providedLibDirs.isEmpty()) { + addLibFoldersToShipFiles(systemShipFiles); + } - // upload and register ship files, these files will be added to classpath. - List systemClassPaths = uploadAndRegisterFiles( - systemShipFiles, - fs, - homeDir, - appId, - paths, - localResources, - Path.CUR_DIR, - envShipFileList); + // Register all files in provided lib dirs as local resources with public visibility + // and upload the remaining dependencies as local resources with APPLICATION visibility. + final List systemClassPaths = fileUploader.registerProvidedLocalResources(); + final List uploadedDependencies = fileUploader.registerMultipleLocalResources( + systemShipFiles.stream().map(e -> new Path(e.toURI())).collect(Collectors.toSet()), + Path.CUR_DIR); + systemClassPaths.addAll(uploadedDependencies); // upload and register ship-only files - uploadAndRegisterFiles( - shipOnlyFiles, - fs, - homeDir, - appId, - paths, - localResources, - Path.CUR_DIR, - envShipFileList); - - final List userClassPaths = uploadAndRegisterFiles( + // Plugin files only need to be shipped and should not be added to classpath. + if (providedLibDirs == null || providedLibDirs.isEmpty()) { + Set shipOnlyFiles = new HashSet<>(); + addPluginsFoldersToShipFiles(shipOnlyFiles); + fileUploader.registerMultipleLocalResources( + shipOnlyFiles.stream().map(e -> new Path(e.toURI())).collect(Collectors.toSet()), + Path.CUR_DIR); + } + + // Upload and register user jars + final List userClassPaths = fileUploader.registerMultipleLocalResources( userJarFiles, - fs, - homeDir, - appId, - paths, - localResources, userJarInclusion == YarnConfigOptions.UserJarInclusion.DISABLED ? - ConfigConstants.DEFAULT_FLINK_USR_LIB_DIR : Path.CUR_DIR, - envShipFileList); + ConfigConstants.DEFAULT_FLINK_USR_LIB_DIR : Path.CUR_DIR); if (userJarInclusion == YarnConfigOptions.UserJarInclusion.ORDER) { systemClassPaths.addAll(userClassPaths); @@ -896,17 +867,39 @@ private ApplicationReport startAppMaster( } // Setup jar for ApplicationMaster - Path remotePathJar = setupSingleLocalResource( - flinkJarPath.getName(), - fs, - appId, - flinkJarPath, - localResources, - homeDir, - ""); + final YarnLocalResourceDescriptor localResourceDescFlinkJar = fileUploader.uploadFlinkDist(flinkJarPath); + classPathBuilder.append(localResourceDescFlinkJar.getResourceKey()).append(File.pathSeparator); + + // write job graph to tmp file and add it to local resource + // TODO: server use user main method to generate job graph + if (jobGraph != null) { + File tmpJobGraphFile = null; + try { + tmpJobGraphFile = File.createTempFile(appId.toString(), null); + try (FileOutputStream output = new FileOutputStream(tmpJobGraphFile); + ObjectOutputStream obOutput = new ObjectOutputStream(output)) { + obOutput.writeObject(jobGraph); + } - paths.add(remotePathJar); - classPathBuilder.append(flinkJarPath.getName()).append(File.pathSeparator); + final String jobGraphFilename = "job.graph"; + configuration.setString(JOB_GRAPH_FILE_PATH, jobGraphFilename); + + fileUploader.registerSingleLocalResource( + jobGraphFilename, + new Path(tmpJobGraphFile.toURI()), + "", + true, + false); + classPathBuilder.append(jobGraphFilename).append(File.pathSeparator); + } catch (Exception e) { + LOG.warn("Add job graph to local resource fail."); + throw e; + } finally { + if (tmpJobGraphFile != null && !tmpJobGraphFile.delete()) { + LOG.warn("Fail to delete temporary file {}.", tmpJobGraphFile.toPath()); + } + } + } // Upload the flink configuration // write out configuration file @@ -916,16 +909,12 @@ private ApplicationReport startAppMaster( BootstrapTools.writeConfiguration(configuration, tmpConfigurationFile); String flinkConfigKey = "flink-conf.yaml"; - Path remotePathConf = setupSingleLocalResource( + fileUploader.registerSingleLocalResource( flinkConfigKey, - fs, - appId, new Path(tmpConfigurationFile.getAbsolutePath()), - localResources, - homeDir, - ""); - envShipFileList.append(flinkConfigKey).append("=").append(remotePathConf).append(","); - paths.add(remotePathConf); + "", + true, + true); classPathBuilder.append("flink-conf.yaml").append(File.pathSeparator); } finally { if (tmpConfigurationFile != null && !tmpConfigurationFile.delete()) { @@ -939,43 +928,6 @@ private ApplicationReport startAppMaster( } } - // write job graph to tmp file and add it to local resource - if (jobGraph != null) { - File tmpJobGraphFile = null; - try { - tmpJobGraphFile = File.createTempFile(appId.toString(), null); - try (FileOutputStream output = new FileOutputStream(tmpJobGraphFile); - ObjectOutputStream obOutput = new ObjectOutputStream(output);){ - obOutput.writeObject(jobGraph); - } - - final String jobGraphFilename = "job.graph"; - flinkConfiguration.setString(JOB_GRAPH_FILE_PATH, jobGraphFilename); - - Path pathFromYarnURL = setupSingleLocalResource( - jobGraphFilename, - fs, - appId, - new Path(tmpJobGraphFile.toURI()), - localResources, - homeDir, - ""); - paths.add(pathFromYarnURL); - classPathBuilder.append(jobGraphFilename).append(File.pathSeparator); - } catch (Exception e) { - LOG.warn("Add job graph to local resource fail"); - throw e; - } finally { - if (tmpJobGraphFile != null && !tmpJobGraphFile.delete()) { - LOG.warn("Fail to delete temporary file {}.", tmpConfigurationFile.toPath()); - } - } - } - - final Path yarnFilesDir = getYarnFilesDir(appId); - FsPermission permission = new FsPermission(FsAction.ALL, FsAction.NONE, FsAction.NONE); - fs.setPermission(yarnFilesDir, permission); // set permission for path. - //To support Yarn Secure Integration Test Scenario //In Integration test setup, the Yarn containers created by YarnMiniCluster does not have the Yarn site XML //and KRB5 configuration files. We are adding these files as container local resources for the container @@ -987,89 +939,93 @@ private ApplicationReport startAppMaster( File f = new File(System.getenv("YARN_CONF_DIR"), Utils.YARN_SITE_FILE_NAME); LOG.info("Adding Yarn configuration {} to the AM container local resource bucket", f.getAbsolutePath()); Path yarnSitePath = new Path(f.getAbsolutePath()); - remoteYarnSiteXmlPath = setupSingleLocalResource( + remoteYarnSiteXmlPath = fileUploader.registerSingleLocalResource( Utils.YARN_SITE_FILE_NAME, - fs, - appId, yarnSitePath, - localResources, - homeDir, - ""); + "", + false, + false).getPath(); String krb5Config = System.getProperty("java.security.krb5.conf"); if (krb5Config != null && krb5Config.length() != 0) { File krb5 = new File(krb5Config); LOG.info("Adding KRB5 configuration {} to the AM container local resource bucket", krb5.getAbsolutePath()); Path krb5ConfPath = new Path(krb5.getAbsolutePath()); - remoteKrb5Path = setupSingleLocalResource( + remoteKrb5Path = fileUploader.registerSingleLocalResource( Utils.KRB5_FILE_NAME, - fs, - appId, krb5ConfPath, - localResources, - homeDir, - ""); + "", + false, + false).getPath(); hasKrb5 = true; } } - // setup security tokens Path remotePathKeytab = null; + String localizedKeytabPath = null; String keytab = configuration.getString(SecurityOptions.KERBEROS_LOGIN_KEYTAB); if (keytab != null) { - LOG.info("Adding keytab {} to the AM container local resource bucket", keytab); - remotePathKeytab = setupSingleLocalResource( - Utils.KEYTAB_FILE_NAME, - fs, - appId, - new Path(keytab), - localResources, - homeDir, - ""); + boolean localizeKeytab = flinkConfiguration.getBoolean(YarnConfigOptions.SHIP_LOCAL_KEYTAB); + localizedKeytabPath = flinkConfiguration.getString(YarnConfigOptions.LOCALIZED_KEYTAB_PATH); + if (localizeKeytab) { + // Localize the keytab to YARN containers via local resource. + LOG.info("Adding keytab {} to the AM container local resource bucket", keytab); + remotePathKeytab = fileUploader.registerSingleLocalResource( + localizedKeytabPath, + new Path(keytab), + "", + false, + false).getPath(); + } else { + // // Assume Keytab is pre-installed in the container. + localizedKeytabPath = flinkConfiguration.getString(YarnConfigOptions.LOCALIZED_KEYTAB_PATH); + } } - final boolean hasLogback = logConfigFilePath != null && logConfigFilePath.endsWith(CONFIG_FILE_LOGBACK_NAME); - final boolean hasLog4j = logConfigFilePath != null && logConfigFilePath.endsWith(CONFIG_FILE_LOG4J_NAME); - + final JobManagerProcessSpec processSpec = JobManagerProcessUtils.processSpecFromConfigWithNewOptionToInterpretLegacyHeap( + flinkConfiguration, + JobManagerOptions.TOTAL_PROCESS_MEMORY); final ContainerLaunchContext amContainer = setupApplicationMasterContainer( yarnClusterEntrypoint, - hasLogback, - hasLog4j, hasKrb5, - clusterSpecification.getMasterMemoryMB()); + processSpec); + // setup security tokens if (UserGroupInformation.isSecurityEnabled()) { // set HDFS delegation tokens when security is enabled LOG.info("Adding delegation token to the AM container."); - Utils.setTokensFor(amContainer, paths, yarnConfiguration); + Utils.setTokensFor(amContainer, fileUploader.getRemotePaths(), yarnConfiguration); } - amContainer.setLocalResources(localResources); - fs.close(); + amContainer.setLocalResources(fileUploader.getRegisteredLocalResources()); + fileUploader.close(); // Setup CLASSPATH and environment variables for ApplicationMaster final Map appMasterEnv = new HashMap<>(); // set user specified app master environment variables appMasterEnv.putAll( - BootstrapTools.getEnvironmentVariables(ResourceManagerOptions.CONTAINERIZED_MASTER_ENV_PREFIX, configuration)); + ConfigurationUtils.getPrefixedKeyValuePairs(ResourceManagerOptions.CONTAINERIZED_MASTER_ENV_PREFIX, configuration)); // set Flink app class path appMasterEnv.put(YarnConfigKeys.ENV_FLINK_CLASSPATH, classPathBuilder.toString()); // set Flink on YARN internal configuration values - appMasterEnv.put(YarnConfigKeys.FLINK_JAR_PATH, remotePathJar.toString()); + appMasterEnv.put(YarnConfigKeys.FLINK_DIST_JAR, localResourceDescFlinkJar.toString()); appMasterEnv.put(YarnConfigKeys.ENV_APP_ID, appId.toString()); - appMasterEnv.put(YarnConfigKeys.ENV_CLIENT_HOME_DIR, homeDir.toString()); - appMasterEnv.put(YarnConfigKeys.ENV_CLIENT_SHIP_FILES, envShipFileList.toString()); + appMasterEnv.put(YarnConfigKeys.ENV_CLIENT_HOME_DIR, fileUploader.getHomeDir().toString()); + appMasterEnv.put(YarnConfigKeys.ENV_CLIENT_SHIP_FILES, encodeYarnLocalResourceDescriptorListToString(fileUploader.getEnvShipResourceList())); appMasterEnv.put(YarnConfigKeys.ENV_ZOOKEEPER_NAMESPACE, getZookeeperNamespace()); - appMasterEnv.put(YarnConfigKeys.FLINK_YARN_FILES, yarnFilesDir.toUri().toString()); + appMasterEnv.put(YarnConfigKeys.FLINK_YARN_FILES, fileUploader.getApplicationDir().toUri().toString()); // https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/YarnApplicationSecurity.md#identity-on-an-insecure-cluster-hadoop_user_name appMasterEnv.put(YarnConfigKeys.ENV_HADOOP_USER_NAME, UserGroupInformation.getCurrentUser().getUserName()); - if (remotePathKeytab != null) { - appMasterEnv.put(YarnConfigKeys.KEYTAB_PATH, remotePathKeytab.toString()); + if (localizedKeytabPath != null) { + appMasterEnv.put(YarnConfigKeys.LOCAL_KEYTAB_PATH, localizedKeytabPath); String principal = configuration.getString(SecurityOptions.KERBEROS_LOGIN_PRINCIPAL); appMasterEnv.put(YarnConfigKeys.KEYTAB_PRINCIPAL, principal); + if (remotePathKeytab != null) { + appMasterEnv.put(YarnConfigKeys.REMOTE_KEYTAB_PATH, remotePathKeytab.toString()); + } } //To support Yarn Secure Integration Test Scenario @@ -1113,7 +1069,7 @@ private ApplicationReport startAppMaster( setApplicationTags(appContext); // add a hook to clean up in case deployment fails - Thread deploymentFailureHook = new DeploymentFailureHook(yarnApplication, yarnFilesDir); + Thread deploymentFailureHook = new DeploymentFailureHook(yarnApplication, fileUploader.getApplicationDir()); Runtime.getRuntime().addShutdownHook(deploymentFailureHook); LOG.info("Submitting application master " + appId); yarnClient.submitApplication(appContext); @@ -1163,34 +1119,6 @@ private ApplicationReport startAppMaster( return report; } - /** - * Returns the Path where the YARN application files should be uploaded to. - * - * @param appId YARN application id - */ - private Path getYarnFilesDir(final ApplicationId appId) throws IOException { - final FileSystem fileSystem = FileSystem.get(yarnConfiguration); - final Path homeDir = fileSystem.getHomeDirectory(); - return new Path(homeDir, ".flink/" + appId + '/'); - } - - /** - * Kills YARN application and stops YARN client. - * - *

Use this method to kill the App before it has been properly deployed - */ - private void failSessionDuringDeployment(YarnClient yarnClient, YarnClientApplication yarnApplication) { - LOG.info("Killing YARN application"); - - try { - yarnClient.killApplication(yarnApplication.getNewApplicationResponse().getApplicationId()); - } catch (Exception e) { - // we only log a debug message here because the "killApplication" call is a best-effort - // call (we don't know if the application has been deployed when the error occured). - LOG.debug("Error while killing YARN application", e); - } - } - private ClusterResourceDescription getCurrentFreeClusterResources(YarnClient yarnClient) throws YarnException, IOException { List nodes = yarnClient.getNodeReports(NodeState.RUNNING); @@ -1289,38 +1217,30 @@ private void setApplicationNodeLabel(final ApplicationSubmissionContext appConte } } - @VisibleForTesting - void addLibFoldersToShipFiles(Collection effectiveShipFiles) { - // Add lib folder to the ship files if the environment variable is set. - // This is for convenience when running from the command-line. - // (for other files users explicitly set the ship files) - String libDir = System.getenv().get(ENV_FLINK_LIB_DIR); - if (libDir != null) { - File directoryFile = new File(libDir); - if (directoryFile.isDirectory()) { - effectiveShipFiles.add(directoryFile); - } else { - throw new YarnDeploymentException("The environment variable '" + ENV_FLINK_LIB_DIR + - "' is set to '" + libDir + "' but the directory doesn't exist."); - } - } else if (shipFiles.isEmpty()) { - LOG.warn("Environment variable '{}' not set and ship files have not been provided manually. " + - "Not shipping any library files.", ENV_FLINK_LIB_DIR); - } + private int getFileReplication() { + final int yarnFileReplication = yarnConfiguration.getInt(DFSConfigKeys.DFS_REPLICATION_KEY, DFSConfigKeys.DFS_REPLICATION_DEFAULT); + final int fileReplication = flinkConfiguration.getInteger(YarnConfigOptions.FILE_REPLICATION); + return fileReplication > 0 ? fileReplication : yarnFileReplication; } - @VisibleForTesting - void addPluginsFoldersToShipFiles(Collection effectiveShipFiles) { - final Optional pluginsDir = PluginConfig.getPluginsDir(); - pluginsDir.ifPresent(effectiveShipFiles::add); + private List getRemoteSharedPaths(Configuration configuration) throws IOException, FlinkException { + final List providedLibDirs = ConfigUtils.decodeListFromConfig( + configuration, YarnConfigOptions.PROVIDED_LIB_DIRS, Path::new); + + for (Path path : providedLibDirs) { + if (!Utils.isRemotePath(path.toString())) { + throw new FlinkException( + "The \"" + YarnConfigOptions.PROVIDED_LIB_DIRS.key() + "\" should only contain" + + " dirs accessible from all worker nodes, while the \"" + path + "\" is local."); + } + } + return providedLibDirs; } ContainerLaunchContext setupApplicationMasterContainer( String yarnClusterEntrypoint, - boolean hasLogback, - boolean hasLog4j, boolean hasKrb5, - int jobManagerMemoryMb) { + JobManagerProcessSpec processSpec) { // ------------------ Prepare Application Master Container ------------------------------ // respect custom JVM options in the YAML file @@ -1340,26 +1260,12 @@ ContainerLaunchContext setupApplicationMasterContainer( final Map startCommandValues = new HashMap<>(); startCommandValues.put("java", "$JAVA_HOME/bin/java"); - int heapSize = BootstrapTools.calculateHeapSize(jobManagerMemoryMb, flinkConfiguration); - String jvmHeapMem = String.format("-Xms%sm -Xmx%sm", heapSize, heapSize); + String jvmHeapMem = JobManagerProcessUtils.generateJvmParametersStr(processSpec, flinkConfiguration); startCommandValues.put("jvmmem", jvmHeapMem); startCommandValues.put("jvmopts", javaOpts); - String logging = ""; - - if (hasLogback || hasLog4j) { - logging = "-Dlog.file=\"" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/jobmanager.log\""; + startCommandValues.put("logging", YarnLogConfigUtil.getLoggingYarnCommand(flinkConfiguration)); - if (hasLogback) { - logging += " -Dlogback.configurationFile=file:" + CONFIG_FILE_LOGBACK_NAME; - } - - if (hasLog4j) { - logging += " -Dlog4j.configuration=file:" + CONFIG_FILE_LOG4J_NAME; - } - } - - startCommandValues.put("logging", logging); startCommandValues.put("class", yarnClusterEntrypoint); startCommandValues.put("redirects", "1> " + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/jobmanager.out " + @@ -1379,6 +1285,32 @@ ContainerLaunchContext setupApplicationMasterContainer( return amContainer; } + @VisibleForTesting + void addLibFoldersToShipFiles(Collection effectiveShipFiles) { + // Add lib folder to the ship files if the environment variable is set. + // This is for convenience when running from the command-line. + // (for other files users explicitly set the ship files) + String libDir = System.getenv().get(ENV_FLINK_LIB_DIR); + if (libDir != null) { + File directoryFile = new File(libDir); + if (directoryFile.isDirectory()) { + effectiveShipFiles.add(directoryFile); + } else { + throw new YarnDeploymentException("The environment variable '" + ENV_FLINK_LIB_DIR + + "' is set to '" + libDir + "' but the directory doesn't exist."); + } + } else if (shipFiles.isEmpty()) { + LOG.warn("Environment variable '{}' not set and ship files have not been provided manually. " + + "Not shipping any library files.", ENV_FLINK_LIB_DIR); + } + } + + @VisibleForTesting + void addPluginsFoldersToShipFiles(Collection effectiveShipFiles) { + final Optional pluginsDir = PluginConfig.getPluginsDir(); + pluginsDir.ifPresent(effectiveShipFiles::add); + } + private void setClusterEntrypointInfoToConfig(final ApplicationReport report) { checkNotNull(report); @@ -1397,15 +1329,15 @@ private void setClusterEntrypointInfoToConfig(final ApplicationReport report) { flinkConfiguration.set(YarnConfigOptions.APPLICATION_ID, ConverterUtils.toString(clusterId)); } - private static class ClusterResourceDescription { - public final int totalFreeMemory; - public final int containerLimit; - public final int[] nodeManagersFree; + private static class YarnDeploymentException extends RuntimeException { + private static final long serialVersionUID = -812040641215388943L; - public ClusterResourceDescription(int totalFreeMemory, int containerLimit, int[] nodeManagersFree) { - this.totalFreeMemory = totalFreeMemory; - this.containerLimit = containerLimit; - this.nodeManagersFree = nodeManagersFree; + public YarnDeploymentException(String message) { + super(message); + } + + public YarnDeploymentException(String message, Throwable cause) { + super(message, cause); } } @@ -1426,15 +1358,22 @@ private static class ApplicationSubmissionContextReflector { private static final ApplicationSubmissionContextReflector instance = new ApplicationSubmissionContextReflector(ApplicationSubmissionContext.class); + + public static ApplicationSubmissionContextReflector getInstance() { + return instance; + } + private static final String APPLICATION_TAGS_METHOD_NAME = "setApplicationTags"; private static final String ATTEMPT_FAILURES_METHOD_NAME = "setAttemptFailuresValidityInterval"; private static final String KEEP_CONTAINERS_METHOD_NAME = "setKeepContainersAcrossApplicationAttempts"; private static final String NODE_LABEL_EXPRESSION_NAME = "setNodeLabelExpression"; + private final Method applicationTagsMethod; private final Method attemptFailuresValidityIntervalMethod; private final Method keepContainersMethod; @Nullable private final Method nodeLabelExpressionMethod; + private ApplicationSubmissionContextReflector(Class clazz) { Method applicationTagsMethod; Method attemptFailuresValidityIntervalMethod; @@ -1488,10 +1427,6 @@ private ApplicationSubmissionContextReflector(Class applicationTags) throws InvocationTargetException, IllegalAccessException { @@ -1552,15 +1487,15 @@ public void setKeepContainersAcrossApplicationAttempts( } } - private static class YarnDeploymentException extends RuntimeException { - private static final long serialVersionUID = -812040641215388943L; - - public YarnDeploymentException(String message) { - super(message); - } + private static class ClusterResourceDescription { + public final int totalFreeMemory; + public final int containerLimit; + public final int[] nodeManagersFree; - public YarnDeploymentException(String message, Throwable cause) { - super(message, cause); + public ClusterResourceDescription(int totalFreeMemory, int containerLimit, int[] nodeManagersFree) { + this.totalFreeMemory = totalFreeMemory; + this.containerLimit = containerLimit; + this.nodeManagersFree = nodeManagersFree; } } diff --git a/flinkx-launcher/src/main/resources/log4j2.xml b/flinkx-launcher/src/main/resources/log4j2.xml new file mode 100644 index 0000000000..e9fc82c633 --- /dev/null +++ b/flinkx-launcher/src/main/resources/log4j2.xml @@ -0,0 +1,25 @@ + + + + + + %d{yyyy-MM-dd HH:mm:ss,SSS} %-5p %-60c %x - %m%n + + + + + + + + + + + + + + + ${pattern} + + + + \ No newline at end of file diff --git a/flinkx-launcher/src/main/resources/logback.xml b/flinkx-launcher/src/main/resources/logback.xml new file mode 100644 index 0000000000..0125d733de --- /dev/null +++ b/flinkx-launcher/src/main/resources/logback.xml @@ -0,0 +1,22 @@ + + + + + + + + + + ${CONSOLE_LOG_PATTERN} + + + + + + + + + + + \ No newline at end of file diff --git a/flinkx-metadata-es6/flinkx-metadata-es6-reader/src/main/java/com/dtstack/flinkx/metadataes6/constants/MetaDataEs6Cons.java b/flinkx-metadata-es6/flinkx-metadata-es6-reader/src/main/java/com/dtstack/flinkx/metadataes6/constants/MetaDataEs6Cons.java deleted file mode 100644 index 85b3193e2a..0000000000 --- a/flinkx-metadata-es6/flinkx-metadata-es6-reader/src/main/java/com/dtstack/flinkx/metadataes6/constants/MetaDataEs6Cons.java +++ /dev/null @@ -1,60 +0,0 @@ -package com.dtstack.flinkx.metadataes6.constants; - -public class MetaDataEs6Cons { - - public static final String KEY_INDICES = "indices"; - - public static final String KEY_ADDRESS = "address"; - - public static final String KEY_USERNAME = "username"; - - public static final String KEY_PASSWORD = "password"; - - public static final String KEY_TIMEOUT = "timeout"; - - public static final String KEY_PATH_PREFIX = "pathPrefix"; - - public static final String KEY_INDEX_HEALTH = "health"; //green为正常,yellow表示索引不可靠(单节点),red索引不可用 - - public static final String KEY_INDEX_STATUS = "status"; //表明索引是否打开 - - public static final String KEY_INDEX = "index"; - - public static final String KEY_INDEX_PROP = "indexProperties"; - - public static final String KEY_INDEX_UUID = "uuid"; //索引的唯一标识 - - public static final String KEY_INDEX_PRI = "indexPri"; //集群的主分片数 - - public static final String KEY_INDEX_REP = "replicas"; - - public static final String KEY_INDEX_DOCS_COUNT = "docs_count"; //文档数 - - public static final String KEY_INDEX_DOCS_DELETED = "docs_deleted"; //已删除文档数 - - public static final String KEY_INDEX_SIZE = "totalsize"; //索引存储的总容量 - - public static final String KEY_INDEX_PRI_SIZE = "pri_size"; //主分片的总容量 - - public static final String KEY_INDEX_CREATE_TIME = "createtime"; //索引创建时间 - - public static final String KEY_TYPE_NAME = "type"; //索引下类型名 - - public static final String KEY_INDEX_SHARDS = "shards"; //分片数 - - public static final String KEY_ALIAS = "alias"; //索引别名 - - public static final String KEY_COLUMN = "column"; - - public static final String KEY_COLUMN_NAME = "column_name"; //文档名 - - public static final String KEY_DATA_TYPE = "data_type"; //数据类型 - - public static final String KEY_FIELDS = "fields"; //字段映射 - - public static final String KEY_FIELD_NAME = "field_name"; //字段映射名 - - public static final String KEY_FIELD_PROP = "field_prop"; //字段映射参数 - - public static final String API_METHOD_GET = "GET"; //restAPI请求方式,GET -} diff --git a/flinkx-metadata-es6/flinkx-metadata-es6-reader/src/main/java/com/dtstack/flinkx/metadataes6/format/Metadataes6InputFormat.java b/flinkx-metadata-es6/flinkx-metadata-es6-reader/src/main/java/com/dtstack/flinkx/metadataes6/format/Metadataes6InputFormat.java deleted file mode 100644 index 1cd7b1f4c4..0000000000 --- a/flinkx-metadata-es6/flinkx-metadata-es6-reader/src/main/java/com/dtstack/flinkx/metadataes6/format/Metadataes6InputFormat.java +++ /dev/null @@ -1,131 +0,0 @@ -package com.dtstack.flinkx.metadataes6.format; - -import com.dtstack.flinkx.inputformat.BaseRichInputFormat; -import com.dtstack.flinkx.metadataes6.constants.MetaDataEs6Cons; -import com.dtstack.flinkx.metadataes6.utils.Es6Util; -import org.apache.commons.collections.CollectionUtils; -import org.apache.flink.core.io.GenericInputSplit; -import org.apache.flink.core.io.InputSplit; -import org.apache.flink.types.Row; -import org.elasticsearch.client.RestClient; - -import java.io.IOException; -import java.util.*; - -public class Metadataes6InputFormat extends BaseRichInputFormat { - - protected String address; - - protected String username; - - protected String password; - - /** - * 存放所有需要查询的index的名字 - */ - protected List indices; - - /** - * 记录当前查询的表所在list中的位置 - */ - protected int start; - - protected Map clientConfig; - - private transient RestClient restClient; - - protected static transient ThreadLocal> indexIterator = new ThreadLocal<>(); - - @Override - public void openInternal(InputSplit inputSplit) throws IOException { - - restClient = Es6Util.getClient(address, username, password, clientConfig); - if (CollectionUtils.isEmpty(indices)) { - indices = showIndices(); - } - - LOG.info("indicesSize = {}, indices = {}",indices.size(), indices); - indexIterator.set(indices.iterator()); - - } - - @Override - public InputSplit[] createInputSplitsInternal(int splitNum) { - - InputSplit[] splits = new InputSplit[splitNum]; - for (int i = 0; i < splitNum; i++) { - splits[i] = new GenericInputSplit(i,splitNum); - } - - return splits; - - } - - @Override - protected Row nextRecordInternal(Row row) throws IOException { - - Map metaData = new HashMap<>(16); - String indexName = (String) indexIterator.get().next(); - metaData.putAll(queryMetaData(indexName)); - LOG.info("query metadata: {}", metaData); - - return Row.of(metaData); - - } - - @Override - protected void closeInternal() throws IOException { - - if(restClient != null) { - restClient.close(); - restClient = null; - } - - } - - /** - * 返回es集群下的所有索引 - * @return 索引列表 - * @throws IOException - */ - protected List showIndices() throws IOException { - - List indiceName = new ArrayList<>(); - String[] indices = Es6Util.queryIndicesByCat(restClient); - int n = 2; - while (n < indices.length) - { - indiceName.add(indices[n]); - n += 10; - } - - return indiceName; - - } - - /** - * 查询元数据 - * @param indexName 索引名称 - * @return 元数据 - * @throws IOException - */ - protected Map queryMetaData(String indexName) throws IOException { - - Map result = new HashMap<>(16); - Map indexProp = Es6Util.queryIndexProp(indexName,restClient); - List> alias = Es6Util.queryAliases(indexName,restClient); - List> column = Es6Util.queryColumns(indexName,restClient); - result.put(MetaDataEs6Cons.KEY_INDEX,indexName); - result.put(MetaDataEs6Cons.KEY_INDEX_PROP, indexProp); - result.put(MetaDataEs6Cons.KEY_COLUMN,column); - result.put(MetaDataEs6Cons.KEY_ALIAS,alias); - - return result; - - } - - @Override - public boolean reachedEnd(){ - return !indexIterator.get().hasNext(); - } -} diff --git a/flinkx-metadata-es6/flinkx-metadata-es6-reader/src/main/java/com/dtstack/flinkx/metadataes6/format/Metadataes6InputFormatBuilder.java b/flinkx-metadata-es6/flinkx-metadata-es6-reader/src/main/java/com/dtstack/flinkx/metadataes6/format/Metadataes6InputFormatBuilder.java deleted file mode 100644 index 1d7f168482..0000000000 --- a/flinkx-metadata-es6/flinkx-metadata-es6-reader/src/main/java/com/dtstack/flinkx/metadataes6/format/Metadataes6InputFormatBuilder.java +++ /dev/null @@ -1,47 +0,0 @@ -package com.dtstack.flinkx.metadataes6.format; - - -import com.dtstack.flinkx.inputformat.BaseRichInputFormatBuilder; - -import java.util.List; -import java.util.Map; - -public class Metadataes6InputFormatBuilder extends BaseRichInputFormatBuilder { - - private Metadataes6InputFormat format; - - public Metadataes6InputFormatBuilder() { - super.format = this.format = new Metadataes6InputFormat(); - } - - public Metadataes6InputFormatBuilder setAddress(String address) { - format.address = address; - return this; - } - - public Metadataes6InputFormatBuilder setUsername(String username) { - format.username = username; - return this; - } - - public Metadataes6InputFormatBuilder setPassword(String password) { - format.password = password; - return this; - } - - public Metadataes6InputFormatBuilder setIndices(List indices){ - format.indices = indices; - return this; - } - - public Metadataes6InputFormatBuilder setClientConfig(Map clientConfig){ - format.clientConfig = clientConfig; - return this; - } - @Override - protected void checkFormat() { - if (format.getRestoreConfig() != null && format.getRestoreConfig().isRestore()){ - throw new UnsupportedOperationException("This plugin not support restore from failed state"); - } - } -} diff --git a/flinkx-metadata-es6/flinkx-metadata-es6-reader/src/main/java/com/dtstack/flinkx/metadataes6/reader/Metadataes6Reader.java b/flinkx-metadata-es6/flinkx-metadata-es6-reader/src/main/java/com/dtstack/flinkx/metadataes6/reader/Metadataes6Reader.java deleted file mode 100644 index 75ae19730e..0000000000 --- a/flinkx-metadata-es6/flinkx-metadata-es6-reader/src/main/java/com/dtstack/flinkx/metadataes6/reader/Metadataes6Reader.java +++ /dev/null @@ -1,59 +0,0 @@ -package com.dtstack.flinkx.metadataes6.reader; - - -import com.dtstack.flinkx.config.DataTransferConfig; -import com.dtstack.flinkx.config.ReaderConfig; -import com.dtstack.flinkx.inputformat.BaseRichInputFormat; -import com.dtstack.flinkx.metadataes6.constants.MetaDataEs6Cons; -import com.dtstack.flinkx.metadataes6.format.Metadataes6InputFormatBuilder; -import com.dtstack.flinkx.reader.BaseDataReader; -import org.apache.flink.streaming.api.datastream.DataStream; -import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; -import org.apache.flink.types.Row; - -import java.util.HashMap; -import java.util.List; -import java.util.Map; - -public class Metadataes6Reader extends BaseDataReader { - - private String address; //数据库地址 - - private String username; - - private String password; - - private List indices; //索引列表 - - private Map clientConfig; - - public Metadataes6Reader(DataTransferConfig config, StreamExecutionEnvironment env) { - super(config, env); - ReaderConfig readerConfig = config.getJob().getContent().get(0).getReader(); - address = readerConfig.getParameter().getStringVal(MetaDataEs6Cons.KEY_ADDRESS); - username = readerConfig.getParameter().getStringVal(MetaDataEs6Cons.KEY_USERNAME); - password = readerConfig.getParameter().getStringVal(MetaDataEs6Cons.KEY_PASSWORD); - indices = (List) readerConfig.getParameter().getVal(MetaDataEs6Cons.KEY_INDICES); - - - clientConfig = new HashMap<>(); - clientConfig.put(MetaDataEs6Cons.KEY_TIMEOUT, readerConfig.getParameter().getVal(MetaDataEs6Cons.KEY_TIMEOUT)); - clientConfig.put(MetaDataEs6Cons.KEY_PATH_PREFIX, readerConfig.getParameter().getVal(MetaDataEs6Cons.KEY_PATH_PREFIX)); - } - - @Override - public DataStream readData() { - Metadataes6InputFormatBuilder builder = new Metadataes6InputFormatBuilder(); - builder.setDataTransferConfig(dataTransferConfig); - builder.setAddress(address); - builder.setPassword(password); - builder.setUsername(username); - builder.setIndices(indices); - builder.setClientConfig(clientConfig); - - BaseRichInputFormat format = builder.finish(); - - return createInput(format); - } - -} diff --git a/flinkx-metadata-es6/flinkx-metadata-es6-reader/src/main/java/com/dtstack/flinkx/metadataes6/utils/Es6Util.java b/flinkx-metadata-es6/flinkx-metadata-es6-reader/src/main/java/com/dtstack/flinkx/metadataes6/utils/Es6Util.java deleted file mode 100644 index b3189b40b5..0000000000 --- a/flinkx-metadata-es6/flinkx-metadata-es6-reader/src/main/java/com/dtstack/flinkx/metadataes6/utils/Es6Util.java +++ /dev/null @@ -1,257 +0,0 @@ -package com.dtstack.flinkx.metadataes6.utils; - -import com.dtstack.flinkx.metadataes6.constants.MetaDataEs6Cons; -import com.dtstack.flinkx.util.DateUtil; -import com.dtstack.flinkx.util.GsonUtil; -import com.dtstack.flinkx.util.TelnetUtil; -import org.apache.commons.collections.MapUtils; -import org.apache.commons.lang3.StringUtils; -import org.apache.http.util.EntityUtils; -import org.elasticsearch.client.Response; -import org.elasticsearch.client.RestClient; -import org.elasticsearch.client.RestClientBuilder; -import org.apache.http.HttpHost; -import org.apache.http.auth.AuthScope; -import org.apache.http.auth.UsernamePasswordCredentials; -import org.apache.http.client.CredentialsProvider; -import org.apache.http.impl.client.BasicCredentialsProvider; - -import java.io.IOException; -import java.text.SimpleDateFormat; -import java.util.*; - -public class Es6Util { - - /** - * 建立LowLevelRestClient连接 - * @param address es服务端地址,"ip:port" - * @param username 用户名 - * @param password 密码 - * @param config 配置 - * @return LowLevelRestClient - */ - public static RestClient getClient(String address, String username, String password, Map config) { - List httpHostList = new ArrayList<>(); - String[] addr = address.split(","); - for(String add : addr) { - String[] pair = add.split(":"); - TelnetUtil.telnet(pair[0], Integer.parseInt(pair[1])); - httpHostList.add(new HttpHost(pair[0], Integer.parseInt(pair[1]), "http")); - } - - RestClientBuilder builder = RestClient.builder(httpHostList.toArray(new HttpHost[0])); - - Integer timeout = MapUtils.getInteger(config, MetaDataEs6Cons.KEY_TIMEOUT); - if (timeout != null){ - builder.setMaxRetryTimeoutMillis(timeout * 1000); - } - - String pathPrefix = MapUtils.getString(config, MetaDataEs6Cons.KEY_PATH_PREFIX); - if (StringUtils.isNotEmpty(pathPrefix)){ - builder.setPathPrefix(pathPrefix); - } - if(StringUtils.isNotBlank(username)){ - CredentialsProvider credentialsProvider = new BasicCredentialsProvider(); - credentialsProvider.setCredentials(AuthScope.ANY, new UsernamePasswordCredentials(username, password)); - builder.setHttpClientConfigCallback(httpClientBuilder -> { - httpClientBuilder.disableAuthCaching(); - return httpClientBuilder.setDefaultCredentialsProvider(credentialsProvider); - }); - } - - return builder.build(); - } - - /** - * 返回指定索引的配置信息 - * @param indexName 索引名称 - * @param restClient ES6 LowLevelRestClient - * @return 索引的配置信息 - * @throws IOException - */ - public static Map queryIndexProp(String indexName,RestClient restClient) throws IOException { - Map indexProp = new HashMap<>(16); - - String [] prop_1 = queryIndexByCat(indexName,restClient); - indexProp.put(MetaDataEs6Cons.KEY_INDEX_UUID,prop_1[3]); - indexProp.put(MetaDataEs6Cons.KEY_INDEX_SIZE,prop_1[8]); - indexProp.put(MetaDataEs6Cons.KEY_INDEX_DOCS_COUNT,prop_1[6]); - indexProp.put(MetaDataEs6Cons.KEY_INDEX_DOCS_DELETED,prop_1[7]); - indexProp.put(MetaDataEs6Cons.KEY_INDEX_PRI_SIZE,prop_1[9]); - indexProp.put(MetaDataEs6Cons.KEY_INDEX_STATUS,prop_1[1]); - - Map index = queryIndex(indexName,restClient); - Map settings = ( Map) (( Map) index.get(indexName)).get("settings"); - settings = ( Map) settings.get(MetaDataEs6Cons.KEY_INDEX); - Object creation_date = formatDate(settings.get("creation_date")); - Object shards = settings.get("number_of_shards"); - Object replicas = settings.get("number_of_replicas"); - indexProp.put(MetaDataEs6Cons.KEY_INDEX_CREATE_TIME,creation_date); - indexProp.put(MetaDataEs6Cons.KEY_INDEX_SHARDS,shards); - indexProp.put(MetaDataEs6Cons.KEY_INDEX_REP,replicas); - - return indexProp; - } - - /** - * 查询指定索引下的所有字段信息 - * @param indexName 索引名称 - * @param restClient ES6 LowLevelRestClient - * @return 字段信息 - * @throws IOException - */ - public static List> queryColumns(String indexName,RestClient restClient) throws IOException { - - List> columnList = new ArrayList<>(); - Map index = queryIndex(indexName,restClient); - Map mappings = (Map) ((Map) index.get(indexName)).get("mappings"); - - if (mappings.isEmpty()){ - return columnList; - } - - for (int i = 0; i < 2; i++) { - List keys = new ArrayList(mappings.keySet()); - mappings = (Map) mappings.get(keys.get(0)); - } - - return getColumn(mappings,new StringBuilder(),new ArrayList<>()); - } - - /** - * 返回字段列表 - * @param docs 未经处理包含所有字段信息的map - * @param columnName 字段名 - * @param columnList 处理后的字段列表 - * @return 字段列表 - */ - public static List> getColumn(Map docs,StringBuilder columnName,List> columnList){ - for(String key : docs.keySet()){ - if (key.equals("properties")){ - getColumn((Map) docs.get(key),columnName,columnList); - break; - }else if(key.equals("type")){ - Map column = new HashMap<>(); - StringBuilder column_name = new StringBuilder(columnName); - column.put(MetaDataEs6Cons.KEY_COLUMN_NAME,column_name); - column.put(MetaDataEs6Cons.KEY_DATA_TYPE,docs.get(key)); - if (docs.get(MetaDataEs6Cons.KEY_FIELDS) != null){ - column.put(MetaDataEs6Cons.KEY_FIELDS,getFieldList(docs)); - } - int cursor = columnList.size() + 1; - column.put("cursor",cursor); - columnList.add(column); - break; - } else { - StringBuilder temp = new StringBuilder(columnName); - if (columnName.toString().equals("")){ - columnName.append(key); - }else { - columnName.append(".").append(key); - } - getColumn((Map) docs.get(key),columnName,columnList); - columnName.delete(0,columnName.length()); - columnName.append(temp); - } - } - - return columnList; - } - - /** - * 返回字段映射参数 - * @param docs 该字段属性map - * @return - */ - public static List> getFieldList(Map docs){ - Map fields = (Map) docs.get("fields"); - Iterator> it = fields.entrySet().iterator(); - List> fieldsList = new ArrayList<>(); - Map field = new HashMap(); - while (it.hasNext()){ - Map.Entry entry = it.next(); - field.put(MetaDataEs6Cons.KEY_FIELD_NAME,entry.getKey()); - field.put(MetaDataEs6Cons.KEY_FIELD_PROP,entry.getValue()); - fieldsList.add(field); - } - - return fieldsList; - } - - /** - * 查询索引别名 - * @param indexName 索引名称 - * @param restClient ES6 LowLevelRestClient - * @return - * @throws IOException - */ - public static List> queryAliases(String indexName,RestClient restClient) throws IOException { - List> aliasList = new ArrayList<>(); - Map alias = new HashMap(); - Map index = queryIndex(indexName,restClient); - Map aliases = (Map) ((Map) index.get(indexName)).get("aliases"); - Iterator> it = aliases.entrySet().iterator(); - - while (it.hasNext()){ - Map.Entry entry = it.next(); - alias.put("aliase_name",entry.getKey()); - alias.put("aliase_prop",entry.getValue()); - aliasList.add(alias); - } - return aliasList; - } - - /** - * 使用/_cat/indices{index}的方式查询指定index - * @param restClient ES6 LowLevelRestClient - * @param indexName 索引名称 - * @return - * @throws IOException - */ - public static String[] queryIndexByCat(String indexName,RestClient restClient) throws IOException { - String endpoint = "/_cat/indices"; - Map params = Collections.singletonMap(MetaDataEs6Cons.KEY_INDEX, indexName); - Response response = restClient.performRequest(MetaDataEs6Cons.API_METHOD_GET,endpoint,params); - String resBody = EntityUtils.toString(response.getEntity()); - String [] indices = resBody.split("\\s+"); - return indices; - } - - /** - * indexName为*表示查询所有的索引信息 - * @param restClient ES6 LowLevelRestClient - * @return - * @throws IOException - */ - public static String[] queryIndicesByCat(RestClient restClient) throws IOException { - return queryIndexByCat("*",restClient); - } - - /** - * 使用/index的方式查询指定索引的详细信息 - * @param indexName 索引名称 - * @param restClient ES6 LowLevelRestClient - * @return - * @throws IOException - */ - public static Map queryIndex(String indexName,RestClient restClient) throws IOException { - String endpoint = "/"+indexName; - Response response = restClient.performRequest(MetaDataEs6Cons.API_METHOD_GET,endpoint); - String resBody = EntityUtils.toString(response.getEntity()); - Map index = GsonUtil.GSON.fromJson(resBody, GsonUtil.gsonMapTypeToken); - return index; - } - - /** - * 格式化日期 - * @param date - * @return - */ - public static Object formatDate (Object date){ - long long_time =Long.parseLong(date.toString()); - Date date_time = new Date(long_time); - SimpleDateFormat format = DateUtil.getDateTimeFormatter(); - date = format.format(date_time); - return date; - } -} diff --git a/flinkx-metadata-es6/pom.xml b/flinkx-metadata-es6/pom.xml deleted file mode 100644 index 5aeaf82134..0000000000 --- a/flinkx-metadata-es6/pom.xml +++ /dev/null @@ -1,27 +0,0 @@ - - - - flinkx-all - com.dtstack.flinkx - 1.6 - - 4.0.0 - - flinkx-metadata-es6 - pom - - - flinkx-metadata-es6-reader - - - - - com.dtstack.flinkx - flinkx-core - 1.6 - provided - - - \ No newline at end of file diff --git a/flinkx-metadata-hbase/flinkx-metadata-hbase-reader/pom.xml b/flinkx-metadata-hbase/flinkx-metadata-hbase-reader/pom.xml deleted file mode 100644 index 56a4b7eab6..0000000000 --- a/flinkx-metadata-hbase/flinkx-metadata-hbase-reader/pom.xml +++ /dev/null @@ -1,265 +0,0 @@ - - - - flinkx-metadata-hbase - com.dtstack.flinkx - 1.6 - - 4.0.0 - - flinkx-metadata-hbase-reader - - - - com.google.guava - guava - 12.0.1 - - - com.dtstack.flinkx - flinkx-metadata-reader - 1.6 - - - - - org.apache.hbase - hbase-client - 1.3.1 - - - org.apache.hadoop - hadoop-common - - - org.apache.hadoop - hadoop-auth - - - org.apache.hadoop - hadoop-mapreduce-client-core - - - log4j - log4j - - - guava - com.google.guava - - - commons-codec - commons-codec - - - commons-collections - commons-collections - - - commons-lang - commons-lang - - - commons-logging - commons-logging - - - jackson-core-asl - org.codehaus.jackson - - - zookeeper - org.apache.zookeeper - - - jackson-mapper-asl - org.codehaus.jackson - - - slf4j-api - org.slf4j - - - slf4j-log4j12 - org.slf4j - - - - - - org.apache.curator - curator-test - 2.6.0 - test - - - zookeeper - org.apache.zookeeper - - - guava - com.google.guava - - - - - org.apache.hbase - hbase-mapreduce - 2.2.5 - compile - - - commons-codec - commons-codec - - - commons-io - commons-io - - - commons-lang3 - org.apache.commons - - - commons-math3 - org.apache.commons - - - hadoop-annotations - org.apache.hadoop - - - hadoop-auth - org.apache.hadoop - - - hadoop-common - org.apache.hadoop - - - hadoop-hdfs - org.apache.hadoop - - - hadoop-mapreduce-client-core - org.apache.hadoop - - - hbase-client - org.apache.hbase - - - hbase-common - org.apache.hbase - - - hbase-protocol - org.apache.hbase - - - zookeeper - org.apache.zookeeper - - - javassist - org.javassist - - - slf4j-log4j12 - org.slf4j - - - slf4j-api - org.slf4j - - - log4j - log4j - - - - - - - - - - org.apache.maven.plugins - maven-shade-plugin - 3.1.0 - - - package - - shade - - - false - - - org.slf4j:slf4j-api - log4j:log4j - ch.qos.logback:* - - - - - *:* - - META-INF/*.SF - META-INF/*.DSA - META-INF/*.RSA - - - - - - io.netty - shade.metadatahbase.io.netty - - - com.google.common - shade.metadatahbase.com.google.common - - - com.google.thirdparty - shade.metadatahbase.com.google.thirdparty - - - - - - - - - maven-antrun-plugin - 1.2 - - - copy-resources - - package - - run - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/flinkx-metadata-hbase/flinkx-metadata-hbase-reader/src/main/java/com/dtstack/flinkx/metadatahbase/inputformat/MetadatahbaseInputFormat.java b/flinkx-metadata-hbase/flinkx-metadata-hbase-reader/src/main/java/com/dtstack/flinkx/metadatahbase/inputformat/MetadatahbaseInputFormat.java deleted file mode 100644 index 71e4970aa9..0000000000 --- a/flinkx-metadata-hbase/flinkx-metadata-hbase-reader/src/main/java/com/dtstack/flinkx/metadatahbase/inputformat/MetadatahbaseInputFormat.java +++ /dev/null @@ -1,269 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package com.dtstack.flinkx.metadatahbase.inputformat; - -import com.dtstack.flinkx.constants.ConstantValue; -import com.dtstack.flinkx.enums.SizeUnitType; -import com.dtstack.flinkx.metadata.inputformat.BaseMetadataInputFormat; -import com.dtstack.flinkx.metadata.inputformat.MetadataInputSplit; -import com.dtstack.flinkx.metadatahbase.util.HbaseHelper; -import com.dtstack.flinkx.util.ExceptionUtil; -import com.dtstack.flinkx.util.ZkHelper; -import org.apache.commons.collections.CollectionUtils; -import org.apache.flink.core.io.InputSplit; -import org.apache.hadoop.hbase.ClusterStatus; -import org.apache.hadoop.hbase.HColumnDescriptor; -import org.apache.hadoop.hbase.HConstants; -import org.apache.hadoop.hbase.HRegionInfo; -import org.apache.hadoop.hbase.HTableDescriptor; -import org.apache.hadoop.hbase.RegionLoad; -import org.apache.hadoop.hbase.ServerLoad; -import org.apache.hadoop.hbase.ServerName; -import org.apache.hadoop.hbase.TableName; -import org.apache.hadoop.hbase.client.Admin; -import org.apache.hadoop.hbase.client.Connection; -import org.apache.zookeeper.ZooKeeper; - -import java.io.IOException; -import java.sql.SQLException; -import java.util.ArrayList; -import java.util.HashMap; -import java.util.LinkedList; -import java.util.List; -import java.util.Map; - -import static com.dtstack.flinkx.constants.ConstantValue.COMMA_SYMBOL; -import static com.dtstack.flinkx.metadata.MetaDataCons.KEY_COLUMN; -import static com.dtstack.flinkx.metadata.MetaDataCons.KEY_TABLE_PROPERTIES; -import static com.dtstack.flinkx.metadatahbase.util.HbaseCons.KEY_COLUMN_FAMILY; -import static com.dtstack.flinkx.metadatahbase.util.HbaseCons.KEY_CREATE_TIME; -import static com.dtstack.flinkx.metadatahbase.util.HbaseCons.KEY_NAMESPACE; -import static com.dtstack.flinkx.metadatahbase.util.HbaseCons.KEY_REGION_COUNT; -import static com.dtstack.flinkx.metadatahbase.util.HbaseCons.KEY_STORAGE_SIZE; -import static com.dtstack.flinkx.metadatahbase.util.HbaseCons.KEY_TABLE_NAME; -import static com.dtstack.flinkx.util.ZkHelper.APPEND_PATH; - -/** 获取元数据 - * @author kunni@dtstack.com - */ -public class MetadatahbaseInputFormat extends BaseMetadataInputFormat { - - private static final long serialVersionUID = 1L; - - /** - * 用于连接hbase的配置 - */ - protected Map hadoopConfig; - - /** - * hbase 连接 - */ - protected Connection hbaseConnection; - - protected Admin admin; - - protected Map createTimeMap; - - protected Map tableSizeMap; - - protected ZooKeeper zooKeeper; - - protected String path; - - /** - * 因为connection的类型不同,重写该方法 - * @param inputSplit 某个命名空间及需要查询的表 - */ - @Override - protected void openInternal(InputSplit inputSplit) throws IOException{ - LOG.info("inputSplit = {}", inputSplit); - currentDb.set(((MetadataInputSplit) inputSplit).getDbName()); - tableList = ((MetadataInputSplit) inputSplit).getTableList(); - try { - createTimeMap = queryCreateTimeMap(hadoopConfig); - hbaseConnection = HbaseHelper.getHbaseConnection(hadoopConfig); - hadoopConfig.forEach((key,value)->{ - LOG.info("{}:{} ",key,value); - }); - admin = hbaseConnection.getAdmin(); - tableSizeMap = generateTableSizeMap(); - if(CollectionUtils.isEmpty(tableList)){ - tableList = showTables(); - } - LOG.info("current database = {}, tableSize = {}, tableList = {}",currentDb.get(), tableList.size(), tableList); - tableIterator.set(tableList.iterator()); - }catch (Exception e){ - throw new IOException(e); - } - } - - /** - * 获取表的region大小的总和即为表饿的存储大小,误差最大为1M * regionSize - * @return - * @throws Exception - */ - private Map generateTableSizeMap() throws Exception{ - Map sizeMap = new HashMap<>(16); - ClusterStatus clusterStatus = admin.getClusterStatus(); - for (ServerName serverName : clusterStatus.getServers()) { - ServerLoad serverLoad = clusterStatus.getLoad(serverName); - for (Map.Entry entry : serverLoad.getRegionsLoad().entrySet()) { - RegionLoad regionLoad = entry.getValue(); - String regionName = new String(entry.getKey(), "UTF-8"); - String[] regionSplits = regionName.split(COMMA_SYMBOL); - //regionSplits[0] 为table name - int sumSize=sizeMap.getOrDefault(regionSplits[0],0)+regionLoad.getStorefileSizeMB();; - sizeMap.put(regionSplits[0],sumSize); - } - } - return sizeMap; - } - - @Override - protected void closeInternal() { - HbaseHelper.closeAdmin(admin); - HbaseHelper.closeConnection(hbaseConnection); - } - - @Override - protected List showTables() throws SQLException { - List tableNameList = new LinkedList<>(); - try { - HTableDescriptor[] tableNames = admin.listTableDescriptorsByNamespace(currentDb.get()); - for (HTableDescriptor table : tableNames){ - TableName tableName = table.getTableName(); - // 排除系统表 - if(!tableName.isSystemTable()){ - //此时的表名带有namespace,需要去除 - String tableWithNameSpace = tableName.getNameAsString(); - if(tableWithNameSpace.contains(ConstantValue.COLON_SYMBOL)){ - tableWithNameSpace = tableWithNameSpace.split(ConstantValue.COLON_SYMBOL)[1]; - } - tableNameList.add(tableWithNameSpace); - } - } - }catch (IOException e){ - LOG.error("query table list failed. currentDb = {}, Exception = {}", currentDb.get(), ExceptionUtil.getErrorMessage(e)); - throw new SQLException(e); - } - return tableNameList; - } - - @Override - protected void switchDatabase(String databaseName) { - currentDb.set(databaseName); - } - - @Override - protected Map queryMetaData(String tableName) throws SQLException { - Map result = new HashMap<>(16); - tableName = String.format("%s:%s", currentDb.get(), tableName); - result.put(KEY_TABLE_PROPERTIES, queryTableProperties(tableName)); - result.put(KEY_COLUMN, queryColumnList(tableName)); - return result; - } - - /** - * 获取hbase表级别的元数据信息 - * @param tableName 表名 - * @return 表的元数据 - * @throws SQLException sql异常 - */ - protected Map queryTableProperties(String tableName) throws SQLException { - Map tableProperties = new HashMap<>(16); - try{ - HTableDescriptor table = admin.getTableDescriptor(TableName.valueOf(tableName)); - List regionInfos = admin.getTableRegions(table.getTableName()); - tableProperties.put(KEY_REGION_COUNT, regionInfos.size()); - //统一表大小单位为字节 - String tableSize = SizeUnitType.covertUnit(SizeUnitType.MB,SizeUnitType.B,Long.valueOf(tableSizeMap.get(table.getNameAsString()))); - tableProperties.put(KEY_STORAGE_SIZE, Long.valueOf(tableSize)); - tableProperties.put(KEY_CREATE_TIME, createTimeMap.get(table.getNameAsString())); - //这里的table带了schema - if(tableName.contains(ConstantValue.COLON_SYMBOL)){ - tableName = tableName.split(ConstantValue.COLON_SYMBOL)[1]; - } - tableProperties.put(KEY_TABLE_NAME, tableName); - tableProperties.put(KEY_NAMESPACE, currentDb.get()); - }catch (IOException e){ - LOG.error("query tableProperties failed. {}", ExceptionUtil.getErrorMessage(e)); - throw new SQLException(e); - } - return tableProperties; - } - - /** - * 获取列族信息 - * @return 列族 - */ - protected List> queryColumnList(String tableName) throws SQLException { - List> columnList = new ArrayList<>(); - try{ - HTableDescriptor table = admin.getTableDescriptor(TableName.valueOf(tableName)); - HColumnDescriptor[] columnDescriptors = table.getColumnFamilies(); - for (HColumnDescriptor column : columnDescriptors){ - Map map = new HashMap<>(16); - map.put(KEY_COLUMN_FAMILY, column.getNameAsString()); - columnList.add(map); - } - }catch (IOException e){ - LOG.error("query columnList failed. {}", ExceptionUtil.getErrorMessage(e)); - throw new SQLException(e); - } - return columnList; - } - - /** - * 查询hbase表的创建时间 - * 如果zookeeper没有权限访问,返回空map - * @param hadoopConfig hadoop配置 - * @return 表名与创建时间的映射 - */ - protected Map queryCreateTimeMap(Map hadoopConfig) { - Map createTimeMap = new HashMap<>(16); - try{ - zooKeeper = ZkHelper.createZkClient((String) hadoopConfig.get(HConstants.ZOOKEEPER_QUORUM), ZkHelper.DEFAULT_TIMEOUT); - List tables = ZkHelper.getChildren(zooKeeper, path); - if(tables != null){ - for(String table : tables){ - LOG.info(table); - createTimeMap.put(table, ZkHelper.getCreateTime(zooKeeper,path + ConstantValue.SINGLE_SLASH_SYMBOL + table)); - } - } - }catch (Exception e){ - LOG.error("query createTime map failed, error {} ", ExceptionUtil.getErrorMessage(e)); - }finally { - ZkHelper.closeZooKeeper(zooKeeper); - } - return createTimeMap; - } - - public void setPath(String path){ - this.path = path + APPEND_PATH; - } - - @Override - protected String quote(String name) { - return name; - } - - public void setHadoopConfig(Map hadoopConfig){ - this.hadoopConfig = hadoopConfig; - } -} diff --git a/flinkx-metadata-hbase/flinkx-metadata-hbase-reader/src/main/java/com/dtstack/flinkx/metadatahbase/reader/MetadatahbaseReader.java b/flinkx-metadata-hbase/flinkx-metadata-hbase-reader/src/main/java/com/dtstack/flinkx/metadatahbase/reader/MetadatahbaseReader.java deleted file mode 100644 index db539a1253..0000000000 --- a/flinkx-metadata-hbase/flinkx-metadata-hbase-reader/src/main/java/com/dtstack/flinkx/metadatahbase/reader/MetadatahbaseReader.java +++ /dev/null @@ -1,69 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package com.dtstack.flinkx.metadatahbase.reader; - -import com.dtstack.flinkx.config.DataTransferConfig; -import com.dtstack.flinkx.metadata.inputformat.MetadataInputFormatBuilder; -import com.dtstack.flinkx.metadata.reader.MetadataReader; -import com.dtstack.flinkx.metadatahbase.inputformat.MetadatahbaseInputFormat; -import com.dtstack.flinkx.metadatahbase.inputformat.MetadatahbaseInputFormatBuilder; -import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; -import org.apache.hadoop.hbase.HConstants; - -import java.util.Map; - -import static com.dtstack.flinkx.metadatahbase.util.HbaseCons.KEY_HADOOP_CONFIG; -import static com.dtstack.flinkx.metadatahbase.util.HbaseCons.KEY_PATH; -import static com.dtstack.flinkx.util.ZkHelper.DEFAULT_PATH; - -/** - * 读取hbase config并进行配置 - * @author kunni@dtstack.com - */ -public class MetadatahbaseReader extends MetadataReader { - - private Map hadoopConfig; - - private String path; - - @SuppressWarnings("unchecked") - public MetadatahbaseReader(DataTransferConfig config, StreamExecutionEnvironment env) { - super(config, env); - hadoopConfig = (Map) config.getJob().getContent() - .get(0).getReader().getParameter().getVal(KEY_HADOOP_CONFIG); - if(!hadoopConfig.containsKey(HConstants.ZOOKEEPER_QUORUM)){ - hadoopConfig.put(HConstants.ZOOKEEPER_QUORUM, jdbcUrl); - } - path = config.getJob().getContent().get(0).getReader() - .getParameter().getStringVal(KEY_PATH, DEFAULT_PATH); - if(!hadoopConfig.containsKey(HConstants.ZOOKEEPER_ZNODE_PARENT)){ - hadoopConfig.put(HConstants.ZOOKEEPER_ZNODE_PARENT, path); - } - } - - @Override - protected MetadataInputFormatBuilder getBuilder(){ - MetadatahbaseInputFormatBuilder builder = new MetadatahbaseInputFormatBuilder(new MetadatahbaseInputFormat()); - builder.setHadoopConfig(hadoopConfig); - builder.setDataTransferConfig(dataTransferConfig); - builder.setPath(path); - return builder; - } - -} diff --git a/flinkx-metadata-hbase/flinkx-metadata-hbase-reader/src/test/java/com/dtstack/flinkx/metadatahbase/inputformat/MetadatahbaseInputFormatTest.java b/flinkx-metadata-hbase/flinkx-metadata-hbase-reader/src/test/java/com/dtstack/flinkx/metadatahbase/inputformat/MetadatahbaseInputFormatTest.java deleted file mode 100644 index a120eb501e..0000000000 --- a/flinkx-metadata-hbase/flinkx-metadata-hbase-reader/src/test/java/com/dtstack/flinkx/metadatahbase/inputformat/MetadatahbaseInputFormatTest.java +++ /dev/null @@ -1,80 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package com.dtstack.flinkx.metadatahbase.inputformat; - -import org.apache.curator.framework.CuratorFramework; -import org.apache.curator.framework.CuratorFrameworkFactory; -import org.apache.curator.retry.ExponentialBackoffRetry; -import org.apache.curator.test.TestingServer; -import org.apache.hadoop.hbase.HConstants; -import org.apache.zookeeper.CreateMode; -import org.junit.After; -import org.junit.Assert; -import org.junit.Before; -import org.junit.Test; - -import java.io.IOException; -import java.util.HashMap; -import java.util.Map; - -public class MetadatahbaseInputFormatTest { - - protected MetadatahbaseInputFormat inputFormat = new MetadatahbaseInputFormat(); - - private static TestingServer server; - - @Before - public void createZkServer() throws Exception { - server = new TestingServer(2191, true); - server.start(); - CuratorFramework client = CuratorFrameworkFactory.builder() - .connectString("localhost:2191") - .connectionTimeoutMs(5000) - .retryPolicy(new ExponentialBackoffRetry(1000, 3)) - .build(); - client.start(); - client.create().creatingParentsIfNeeded().withMode(CreateMode.EPHEMERAL).forPath("/hbase/table/test1", "init".getBytes()); - client.create().creatingParentsIfNeeded().withMode(CreateMode.EPHEMERAL).forPath("/hbase/table/test2", "init".getBytes()); - client.close(); - } - - @Test - public void testSetPath(){ - inputFormat.setPath("/hbase"); - Assert.assertEquals(inputFormat.path, "/hbase/table"); - } - - @Test - public void testQuote(){ - Assert.assertEquals(inputFormat.quote("table"), "table"); - } - - @Test - public void testQueryCreateTimeMap(){ - Map hadoopConfig = new HashMap<>(); - hadoopConfig.put(HConstants.ZOOKEEPER_QUORUM, "localhost:2191"); - inputFormat.setPath("/hbase"); - Assert.assertEquals(inputFormat.queryCreateTimeMap(hadoopConfig).size(),0); - } - - @After - public void closedZkServer() throws IOException { - server.close(); - } -} diff --git a/flinkx-metadata-hive1/flinkx-metadata-hive1-reader/pom.xml b/flinkx-metadata-hive1/flinkx-metadata-hive1-reader/pom.xml deleted file mode 100644 index 5fd45d15ba..0000000000 --- a/flinkx-metadata-hive1/flinkx-metadata-hive1-reader/pom.xml +++ /dev/null @@ -1,246 +0,0 @@ - - - - flinkx-metadata-hive1 - com.dtstack.flinkx - 1.6 - - 4.0.0 - - flinkx-metadata-hive1-reader - - - com.dtstack.flinkx - flinkx-metadata-reader - 1.6 - - - org.apache.hive - hive-jdbc - 1.1.1 - - - slf4j-log4j12 - org.slf4j - - - log4j-slf4j-impl - org.apache.logging.log4j - - - log4j-web - org.apache.logging.log4j - - - log4j-core - org.apache.logging.log4j - - - log4j-api - org.apache.logging.log4j - - - log4j-1.2-api - org.apache.logging.log4j - - - netty-all - io.netty - - - hive-common - org.apache.hive - - - parquet-hadoop-bundle - org.apache.parquet - - - xerces - xercesImpl - - - hbase-client - org.apache.hbase - - - curator-framework - org.apache.curator - - - zookeeper - org.apache.zookeeper - - - slf4j-api - org.slf4j - - - commons-cli - commons-cli - - - commons-compress - org.apache.commons - - - commons-lang - commons-lang - - - guava - com.google.guava - - - gson - com.google.code.gson - - - avro - org.apache.avro - - - hbase-common - org.apache.hbase - - - hbase-hadoop2-compat - org.apache.hbase - - - hbase-server - org.apache.hbase - - - tephra-hbase-compat-1.0 - co.cask.tephra - - - hbase-hadoop-compat - org.apache.hbase - - - - - com.dtstack.flinkx - flinkx-metadata-hive2-reader - 1.6 - - - org.apache.hive - hive-jdbc - - - org.apache.hive - hive-serde - - - - - - - - - org.apache.maven.plugins - maven-shade-plugin - 3.1.0 - - - package - - shade - - - false - - - org.slf4j:slf4j-api - log4j:log4j - ch.qos.logback:* - - - - - - *:* - - META-INF/*.SF - META-INF/*.DSA - META-INF/*.RSA - - - - - - org.apache.hive.jdbc - shade.hive1.jdbc - - - org.apache.hadoop.hive.serde2 - shade.hive1.serde2 - - - org.apache.hive.service - shade.hive1.service - - - com.google.common - shade.core.com.google.common - - - com.google.thirdparty - shade.core.com.google.thirdparty - - - org.apache.http - shade.metadatahive1.org.apache.http - - - - - - META-INF/services/java.sql.Driver - - - META-INF/services - java.sql.hive1.Driver - - - - - - - - - maven-antrun-plugin - 1.2 - - - copy-resources - - package - - run - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/flinkx-metadata-hive2/flinkx-metadata-hive2-reader/src/main/java/com/dtstack/flinkx/metadatahive2/constants/Hive2MetaDataCons.java b/flinkx-metadata-hive2/flinkx-metadata-hive2-reader/src/main/java/com/dtstack/flinkx/metadatahive2/constants/Hive2MetaDataCons.java deleted file mode 100644 index 004e69230c..0000000000 --- a/flinkx-metadata-hive2/flinkx-metadata-hive2-reader/src/main/java/com/dtstack/flinkx/metadatahive2/constants/Hive2MetaDataCons.java +++ /dev/null @@ -1,71 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - *

- * http://www.apache.org/licenses/LICENSE-2.0 - *

- * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -package com.dtstack.flinkx.metadatahive2.constants; - -import com.dtstack.flinkx.metadata.MetaDataCons; - -/** - * @author : tiezhu - * @date : 2020/3/9 - * @description : - */ -@SuppressWarnings("all") -public class Hive2MetaDataCons extends MetaDataCons { - public static final String DRIVER_NAME = "org.apache.hive.jdbc.HiveDriver"; - public static final String KEY_HADOOP_CONFIG = "hadoopConfig"; - - public static final String KEY_SOURCE = "source"; - public static final String KEY_VERSION = "version"; - - public static final String TEXT_FORMAT = "TextOutputFormat"; - public static final String ORC_FORMAT = "OrcOutputFormat"; - public static final String PARQUET_FORMAT = "MapredParquetOutputFormat"; - - public static final String TYPE_TEXT = "text"; - public static final String TYPE_ORC = "orc"; - public static final String TYPE_PARQUET = "parquet"; - - public static final String PARTITION_INFORMATION = "# Partition Information"; - public static final String TABLE_INFORMATION = "# Detailed Table Information"; - - // desc formatted后的列名 - public static final String KEY_RESULTSET_COL_NAME = "# col_name"; - public static final String KEY_RESULTSET_DATA_TYPE = "data_type"; - public static final String KEY_RESULTSET_COMMENT = "comment"; - - public static final String KEY_COL_LOCATION = "Location:"; - public static final String KEY_COL_CREATETIME = "CreateTime:"; - public static final String KEY_COL_CREATE_TIME = "Create Time:"; - public static final String KEY_COL_LASTACCESSTIME = "LastAccessTime:"; - public static final String KEY_COL_LAST_ACCESS_TIME = "Last Access Time:"; - public static final String KEY_COL_OUTPUTFORMAT = "OutputFormat:"; - public static final String KEY_COL_TABLE_PARAMETERS = "Table Parameters:"; - - public static final String KEY_LOCATION = "location"; - public static final String KEY_CREATETIME = "createTime"; - public static final String KEY_LASTACCESSTIME = "lastAccessTime"; - public static final String KEY_TOTALSIZE = "totalSize"; - public static final String KEY_TRANSIENT_LASTDDLTIME = "transient_lastDdlTime"; - - public static final String KEY_NAME = "name"; - public static final String KEY_VALUE = "value"; - - - public static final String SQL_QUERY_DATA = "desc formatted %s"; - public static final String SQL_SHOW_PARTITIONS = "show partitions %s"; -} diff --git a/flinkx-metadata-hive2/flinkx-metadata-hive2-reader/src/main/java/com/dtstack/flinkx/metadatahive2/constants/HiveDbUtil.java b/flinkx-metadata-hive2/flinkx-metadata-hive2-reader/src/main/java/com/dtstack/flinkx/metadatahive2/constants/HiveDbUtil.java deleted file mode 100644 index f98b7860f8..0000000000 --- a/flinkx-metadata-hive2/flinkx-metadata-hive2-reader/src/main/java/com/dtstack/flinkx/metadatahive2/constants/HiveDbUtil.java +++ /dev/null @@ -1,210 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package com.dtstack.flinkx.metadatahive2.constants; - -import com.dtstack.flinkx.authenticate.KerberosUtil; -import com.dtstack.flinkx.util.ExceptionUtil; -import com.dtstack.flinkx.util.FileSystemUtil; -import com.dtstack.flinkx.util.RetryUtil; -import org.apache.commons.collections.MapUtils; -import org.apache.commons.lang.StringUtils; -import org.apache.hadoop.conf.Configuration; -import org.apache.hadoop.security.UserGroupInformation; -import org.slf4j.Logger; -import org.slf4j.LoggerFactory; - -import java.security.PrivilegedAction; -import java.sql.Connection; -import java.sql.DriverManager; -import java.sql.SQLException; -import java.util.Map; -import java.util.concurrent.locks.ReentrantLock; - -/** - * @author toutian - */ - -public final class HiveDbUtil { - - public static final String SQLSTATE_USERNAME_PWD_ERROR = "28000"; - public static final String SQLSTATE_CANNOT_ACQUIRE_CONNECT = "08004"; - public static final int JDBC_PART_SIZE = 2; - public static final String JDBC_REGEX = "[?|;|#]"; - public static final String KEY_VAL_DELIMITER = "="; - public static final String PARAM_DELIMITER = "&"; - public static final String KEY_PRINCIPAL = "principal"; - private static Logger LOG = LoggerFactory.getLogger(HiveDbUtil.class); - private static ReentrantLock lock = new ReentrantLock(); - - private HiveDbUtil() { - } - - public static Connection getConnection(ConnectionInfo connectionInfo) { - if(openKerberos(connectionInfo.getJdbcUrl())){ - return getConnectionWithKerberos(connectionInfo); - } else { - return getConnectionWithRetry(connectionInfo); - } - } - - private static Connection getConnectionWithRetry(ConnectionInfo connectionInfo){ - try { - return RetryUtil.executeWithRetry(() -> connect(connectionInfo), 1, 1000L, false); - } catch (Exception e1) { - throw new RuntimeException(String.format("连接:%s 时发生错误:%s.", connectionInfo.getJdbcUrl(), ExceptionUtil.getErrorMessage(e1))); - } - } - - private static Connection getConnectionWithKerberos(ConnectionInfo connectionInfo){ - if(connectionInfo.getHiveConf() == null || connectionInfo.getHiveConf().isEmpty()){ - throw new IllegalArgumentException("hiveConf can not be null or empty"); - } - - String keytabFileName = KerberosUtil.getPrincipalFileName(connectionInfo.getHiveConf()); - - keytabFileName = KerberosUtil.loadFile(connectionInfo.getHiveConf(), keytabFileName); - String principal = KerberosUtil.getPrincipal(connectionInfo.getHiveConf(), keytabFileName); - KerberosUtil.loadKrb5Conf(connectionInfo.getHiveConf()); - - Configuration conf = FileSystemUtil.getConfiguration(connectionInfo.getHiveConf(), null); - - UserGroupInformation ugi; - try { - ugi = KerberosUtil.loginAndReturnUgi(conf, principal, keytabFileName); - } catch (Exception e){ - throw new RuntimeException("Login kerberos error:", e); - } - - LOG.info("current ugi:{}", ugi); - return ugi.doAs((PrivilegedAction) () -> getConnectionWithRetry(connectionInfo)); - } - - private static boolean openKerberos(final String jdbcUrl){ - String[] splits = jdbcUrl.split(JDBC_REGEX); - if (splits.length != JDBC_PART_SIZE) { - return false; - } - - String paramsStr = splits[1]; - String[] paramArray = paramsStr.split(PARAM_DELIMITER); - for (String param : paramArray) { - String[] keyVal = param.split(KEY_VAL_DELIMITER); - if(KEY_PRINCIPAL.equalsIgnoreCase(keyVal[0])){ - return true; - } - } - - return false; - } - - public static Connection connect(ConnectionInfo connectionInfo) { - lock.lock(); - try { - Class.forName(connectionInfo.getDriver()); - DriverManager.setLoginTimeout(connectionInfo.getTimeout()); - if(StringUtils.isNotBlank(connectionInfo.getUsername())){ - return DriverManager.getConnection(connectionInfo.getJdbcUrl(), connectionInfo.getUsername(), connectionInfo.getPassword()); - }else{ - return DriverManager.getConnection(connectionInfo.getJdbcUrl()); - } - } catch (SQLException e) { - if (SQLSTATE_USERNAME_PWD_ERROR.equals(e.getSQLState())) { - throw new RuntimeException("用户名或密码错误."); - } else if (SQLSTATE_CANNOT_ACQUIRE_CONNECT.equals(e.getSQLState())) { - throw new RuntimeException("应用程序服务器拒绝建立连接."); - } else { - throw new RuntimeException("连接信息:" + connectionInfo.getJdbcUrl() + " 错误信息:" + ExceptionUtil.getErrorMessage(e)); - } - } catch (Exception e1) { - throw new RuntimeException("连接信息:" + connectionInfo.getJdbcUrl() + " 错误信息:" + ExceptionUtil.getErrorMessage(e1)); - } finally { - lock.unlock(); - } - } - - public static class ConnectionInfo{ - private String jdbcUrl; - private String username; - private String password; - private String driver; - private int timeout = 30000; - private Map hiveConf; - - public String getJdbcUrl() { - return jdbcUrl; - } - - public void setJdbcUrl(String jdbcUrl) { - this.jdbcUrl = jdbcUrl; - } - - public String getUsername() { - return username; - } - - public void setUsername(String username) { - this.username = username; - } - - public String getPassword() { - return password; - } - - public void setPassword(String password) { - this.password = password; - } - - public Map getHiveConf() { - return hiveConf; - } - - public void setHiveConf(Map hiveConf) { - this.hiveConf = hiveConf; - } - - public int getTimeout() { - return timeout; - } - - public void setTimeout(int timeout) { - this.timeout = timeout; - } - - public String getDriver(){ - return driver; - } - - public void setDriver(String driver){ - this.driver = driver; - } - - @Override - public String toString() { - return "ConnectionInfo{" + - "jdbcUrl='" + jdbcUrl + '\'' + - ", username='" + username + '\'' + - ", password='" + password + '\'' + - ", timeout='" + timeout + '\'' + - ", driver='" + driver + '\'' + - ", hiveConf=" + hiveConf + - '}'; - } - } - -} diff --git a/flinkx-metadata-hive2/flinkx-metadata-hive2-reader/src/main/java/com/dtstack/flinkx/metadatahive2/inputformat/Metadatahive2InputFormat.java b/flinkx-metadata-hive2/flinkx-metadata-hive2-reader/src/main/java/com/dtstack/flinkx/metadatahive2/inputformat/Metadatahive2InputFormat.java deleted file mode 100644 index 4bbce68a59..0000000000 --- a/flinkx-metadata-hive2/flinkx-metadata-hive2-reader/src/main/java/com/dtstack/flinkx/metadatahive2/inputformat/Metadatahive2InputFormat.java +++ /dev/null @@ -1,368 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -package com.dtstack.flinkx.metadatahive2.inputformat; - -import com.dtstack.flinkx.constants.ConstantValue; -import com.dtstack.flinkx.metadata.inputformat.BaseMetadataInputFormat; -import com.dtstack.flinkx.metadatahive2.constants.HiveDbUtil; -import org.apache.commons.lang3.StringUtils; - -import java.sql.Connection; -import java.sql.ResultSet; -import java.sql.ResultSetMetaData; -import java.sql.SQLException; -import java.util.ArrayList; -import java.util.HashMap; -import java.util.Iterator; -import java.util.LinkedHashMap; -import java.util.List; -import java.util.Map; - -import static com.dtstack.flinkx.metadata.MetaDataCons.KEY_INDEX_COMMENT; -import static com.dtstack.flinkx.metadata.MetaDataCons.KEY_TABLE_COMMENT; -import static com.dtstack.flinkx.metadatahive2.constants.Hive2MetaDataCons.KEY_COLUMN; -import static com.dtstack.flinkx.metadatahive2.constants.Hive2MetaDataCons.KEY_COLUMN_COMMENT; -import static com.dtstack.flinkx.metadatahive2.constants.Hive2MetaDataCons.KEY_COLUMN_DATA_TYPE; -import static com.dtstack.flinkx.metadatahive2.constants.Hive2MetaDataCons.KEY_COLUMN_INDEX; -import static com.dtstack.flinkx.metadatahive2.constants.Hive2MetaDataCons.KEY_COLUMN_NAME; -import static com.dtstack.flinkx.metadatahive2.constants.Hive2MetaDataCons.KEY_COLUMN_TYPE; -import static com.dtstack.flinkx.metadatahive2.constants.Hive2MetaDataCons.KEY_COL_CREATETIME; -import static com.dtstack.flinkx.metadatahive2.constants.Hive2MetaDataCons.KEY_COL_CREATE_TIME; -import static com.dtstack.flinkx.metadatahive2.constants.Hive2MetaDataCons.KEY_COL_LASTACCESSTIME; -import static com.dtstack.flinkx.metadatahive2.constants.Hive2MetaDataCons.KEY_COL_LAST_ACCESS_TIME; -import static com.dtstack.flinkx.metadatahive2.constants.Hive2MetaDataCons.KEY_COL_LOCATION; -import static com.dtstack.flinkx.metadatahive2.constants.Hive2MetaDataCons.KEY_COL_NAME; -import static com.dtstack.flinkx.metadatahive2.constants.Hive2MetaDataCons.KEY_COL_OUTPUTFORMAT; -import static com.dtstack.flinkx.metadatahive2.constants.Hive2MetaDataCons.KEY_COL_TABLE_PARAMETERS; -import static com.dtstack.flinkx.metadatahive2.constants.Hive2MetaDataCons.KEY_CREATETIME; -import static com.dtstack.flinkx.metadatahive2.constants.Hive2MetaDataCons.KEY_LASTACCESSTIME; -import static com.dtstack.flinkx.metadatahive2.constants.Hive2MetaDataCons.KEY_LOCATION; -import static com.dtstack.flinkx.metadatahive2.constants.Hive2MetaDataCons.KEY_NAME; -import static com.dtstack.flinkx.metadatahive2.constants.Hive2MetaDataCons.KEY_PARTITIONS; -import static com.dtstack.flinkx.metadatahive2.constants.Hive2MetaDataCons.KEY_PARTITION_COLUMNS; -import static com.dtstack.flinkx.metadatahive2.constants.Hive2MetaDataCons.KEY_RESULTSET_COL_NAME; -import static com.dtstack.flinkx.metadatahive2.constants.Hive2MetaDataCons.KEY_RESULTSET_COMMENT; -import static com.dtstack.flinkx.metadatahive2.constants.Hive2MetaDataCons.KEY_RESULTSET_DATA_TYPE; -import static com.dtstack.flinkx.metadatahive2.constants.Hive2MetaDataCons.KEY_STORED_TYPE; -import static com.dtstack.flinkx.metadatahive2.constants.Hive2MetaDataCons.KEY_TABLE_PROPERTIES; -import static com.dtstack.flinkx.metadatahive2.constants.Hive2MetaDataCons.KEY_TOTALSIZE; -import static com.dtstack.flinkx.metadatahive2.constants.Hive2MetaDataCons.KEY_TRANSIENT_LASTDDLTIME; -import static com.dtstack.flinkx.metadatahive2.constants.Hive2MetaDataCons.KEY_VALUE; -import static com.dtstack.flinkx.metadatahive2.constants.Hive2MetaDataCons.ORC_FORMAT; -import static com.dtstack.flinkx.metadatahive2.constants.Hive2MetaDataCons.PARQUET_FORMAT; -import static com.dtstack.flinkx.metadatahive2.constants.Hive2MetaDataCons.PARTITION_INFORMATION; -import static com.dtstack.flinkx.metadatahive2.constants.Hive2MetaDataCons.SQL_QUERY_DATA; -import static com.dtstack.flinkx.metadatahive2.constants.Hive2MetaDataCons.SQL_SHOW_PARTITIONS; -import static com.dtstack.flinkx.metadatahive2.constants.Hive2MetaDataCons.SQL_SHOW_TABLES; -import static com.dtstack.flinkx.metadatahive2.constants.Hive2MetaDataCons.SQL_SWITCH_DATABASE; -import static com.dtstack.flinkx.metadatahive2.constants.Hive2MetaDataCons.TABLE_INFORMATION; -import static com.dtstack.flinkx.metadatahive2.constants.Hive2MetaDataCons.TEXT_FORMAT; -import static com.dtstack.flinkx.metadatahive2.constants.Hive2MetaDataCons.TYPE_ORC; -import static com.dtstack.flinkx.metadatahive2.constants.Hive2MetaDataCons.TYPE_PARQUET; -import static com.dtstack.flinkx.metadatahive2.constants.Hive2MetaDataCons.TYPE_TEXT; - -/** - * @author : tiezhu - * @date : 2020/3/9 - */ -public class Metadatahive2InputFormat extends BaseMetadataInputFormat { - - private static final long serialVersionUID = 1L; - - protected Map hadoopConfig; - - String paraFirst = KEY_COL_NAME; - String paraSecond = KEY_COLUMN_DATA_TYPE; - - @Override - protected void switchDatabase(String databaseName) throws SQLException { - statement.get().execute(String.format(SQL_SWITCH_DATABASE, quote(databaseName))); - } - - /** - * Unicode 编码转字符串 - * - * @param string 支持 Unicode 编码和普通字符混合的字符串 - * @return 解码后的字符串 - */ - public static String unicodeToStr(String string) { - String prefix = "\\u"; - if (string == null || !string.contains(prefix)) { - // 传入字符串为空或不包含 Unicode 编码返回原内容 - return string; - } - - StringBuilder value = new StringBuilder(string.length() >> 2); - String[] strings = string.split("\\\\u"); - String hex, mix; - char hexChar; - int ascii, n; - - if (strings[0].length() > 0) { - // 处理开头的普通字符串 - value.append(strings[0]); - } - - try { - for (int i = 1; i < strings.length; i++) { - hex = strings[i]; - if (hex.length() > 3) { - mix = ""; - if (hex.length() > 4) { - // 处理 Unicode 编码符号后面的普通字符串 - mix = hex.substring(4); - } - hex = hex.substring(0, 4); - - try { - Integer.parseInt(hex, 16); - } catch (Exception e) { - // 不能将当前 16 进制字符串正常转换为 10 进制数字,拼接原内容后跳出 - value.append(prefix).append(strings[i]); - continue; - } - - ascii = 0; - for (int j = 0; j < hex.length(); j++) { - hexChar = hex.charAt(j); - // 将 Unicode 编码中的 16 进制数字逐个转为 10 进制 - n = Integer.parseInt(String.valueOf(hexChar), 16); - // 转换为 ASCII 码 - ascii += n * ((int) Math.pow(16, (hex.length() - j - 1))); - } - - // 拼接解码内容 - value.append((char) ascii).append(mix); - } else { - // 不转换特殊长度的 Unicode 编码 - value.append(prefix).append(hex); - } - } - } catch (Exception e) { - // Unicode 编码格式有误,解码失败 - return null; - } - - return value.toString(); - } - - @Override - protected String quote(String name) { - return String.format("`%s`", name); - } - - @Override - protected List showTables() throws SQLException { - List tables = new ArrayList<>(); - try (ResultSet rs = statement.get().executeQuery(SQL_SHOW_TABLES)) { - int pos = rs.getMetaData().getColumnCount()==1?1:2; - while (rs.next()) { - tables.add(rs.getString(pos)); - } - } - - return tables; - } - - @Override - protected Map queryMetaData(String tableName) throws SQLException { - Map result = new HashMap<>(16); - List> columnList = new ArrayList<>(); - List> partitionColumnList = new ArrayList<>(); - Map tableProperties = new HashMap<>(16); - - List> metaData = queryData(tableName); - Iterator> it = metaData.iterator(); - int metaDataFlag = 0; - while(it.hasNext()){ - Map lineDataInternal = it.next(); - String colNameInternal = lineDataInternal.get(KEY_COL_NAME); - if (StringUtils.isBlank(colNameInternal)) { - continue; - } - if(colNameInternal.startsWith("#")){ - colNameInternal = StringUtils.trim(colNameInternal); - switch (colNameInternal){ - case PARTITION_INFORMATION: - metaDataFlag = 1; - break; - case TABLE_INFORMATION: - metaDataFlag = 2; - break; - case KEY_RESULTSET_COL_NAME: - paraFirst = KEY_RESULTSET_DATA_TYPE; - paraSecond = KEY_RESULTSET_COMMENT; - break; - default: - break; - } - continue; - } - switch (metaDataFlag){ - case 0: - columnList.add(parseColumn(lineDataInternal, columnList.size()+1)); - break; - case 1: - partitionColumnList.add(parseColumn(lineDataInternal, partitionColumnList.size()+1)); - break; - case 2: - parseTableProperties(lineDataInternal, tableProperties, it); - break; - default: - break; - } - } - - if (partitionColumnList.size() > 0) { - List partitionColumnNames = new ArrayList<>(); - for (Map partitionColumn : partitionColumnList) { - partitionColumnNames.add(partitionColumn.get(KEY_COLUMN_NAME).toString()); - } - - columnList.removeIf(column -> partitionColumnNames.contains(column.get(KEY_COLUMN_NAME).toString())); - result.put(KEY_PARTITIONS, showPartitions(tableName)); - } - result.put(KEY_TABLE_PROPERTIES, tableProperties); - result.put(KEY_PARTITION_COLUMNS, partitionColumnList); - result.put(KEY_COLUMN, columnList); - - return result; - } - - private Map parseColumn(Map lineDataInternal, int index){ - String dataTypeInternal = lineDataInternal.get(KEY_COLUMN_DATA_TYPE); - String commentInternal = lineDataInternal.get(KEY_INDEX_COMMENT); - String colNameInternal = lineDataInternal.get(KEY_COL_NAME); - - Map lineResult = new HashMap<>(16); - lineResult.put(KEY_COLUMN_NAME, colNameInternal); - lineResult.put(KEY_COLUMN_TYPE, dataTypeInternal); - lineResult.put(KEY_COLUMN_COMMENT, unicodeToStr(commentInternal)); - lineResult.put(KEY_COLUMN_INDEX, index); - return lineResult; - } - - private String getStoredType(String storedClass) { - if (storedClass.endsWith(TEXT_FORMAT)){ - return TYPE_TEXT; - } else if (storedClass.endsWith(ORC_FORMAT)){ - return TYPE_ORC; - } else if (storedClass.endsWith(PARQUET_FORMAT)){ - return TYPE_PARQUET; - } else { - return storedClass; - } - } - - - private List> queryData(String table) throws SQLException{ - try (ResultSet rs = statement.get().executeQuery(String.format(SQL_QUERY_DATA, quote(table)))) { - ResultSetMetaData metaData = rs.getMetaData(); - int columnCount = metaData.getColumnCount(); - List columnNames = new ArrayList<>(columnCount); - for (int i = 0; i < columnCount; i++) { - columnNames.add(metaData.getColumnName(i+1)); - } - - List> data = new ArrayList<>(); - while (rs.next()) { - Map lineData = new HashMap<>(Math.max((int) (columnCount/.75f) + 1, 16)); - for (String columnName : columnNames) { - lineData.put(columnName, rs.getString(columnName)); - } - - data.add(lineData); - } - - return data; - } - } - - private List> showPartitions (String table) throws SQLException{ - List> partitions = new ArrayList<>(); - try (ResultSet rs = statement.get().executeQuery(String.format(SQL_SHOW_PARTITIONS, quote(table)))) { - while (rs.next()) { - String str = rs.getString(1); - String[] split = str.split(ConstantValue.EQUAL_SYMBOL); - if(split.length == 2){ - Map map = new LinkedHashMap<>(); - map.put(KEY_NAME, split[0]); - map.put(KEY_VALUE, split[1]); - partitions.add(map); - } - } - } - - return partitions; - } - - void parseTableProperties(Map lineDataInternal, Map tableProperties, Iterator> it){ - String name = lineDataInternal.get(KEY_COL_NAME); - - if (name.contains(KEY_COL_LOCATION)) { - tableProperties.put(KEY_LOCATION, StringUtils.trim(lineDataInternal.get(KEY_COLUMN_DATA_TYPE))); - } - - if (name.contains(KEY_COL_CREATETIME) || name.contains(KEY_COL_CREATE_TIME)) { - tableProperties.put(KEY_CREATETIME, StringUtils.trim(lineDataInternal.get(KEY_COLUMN_DATA_TYPE))); - } - - if (name.contains(KEY_COL_LASTACCESSTIME) || name.contains(KEY_COL_LAST_ACCESS_TIME)) { - tableProperties.put(KEY_LASTACCESSTIME, StringUtils.trim(lineDataInternal.get(KEY_COLUMN_DATA_TYPE))); - } - - if (name.contains(KEY_COL_OUTPUTFORMAT)) { - String storedClass = lineDataInternal.get(KEY_COLUMN_DATA_TYPE); - tableProperties.put(KEY_STORED_TYPE, getStoredType(storedClass)); - } - - if (name.contains(KEY_COL_TABLE_PARAMETERS)) { - while (it.hasNext()) { - lineDataInternal = it.next(); - String nameInternal = lineDataInternal.get(paraFirst); - if (null == nameInternal) { - continue; - } - - nameInternal = nameInternal.trim(); - if (nameInternal.contains(KEY_INDEX_COMMENT)) { - tableProperties.put(KEY_TABLE_COMMENT, StringUtils.trim(unicodeToStr(lineDataInternal.get(paraSecond)))); - } - - if (nameInternal.contains(KEY_TOTALSIZE)) { - tableProperties.put(KEY_TOTALSIZE, StringUtils.trim(lineDataInternal.get(paraSecond))); - } - - if (nameInternal.contains(KEY_TRANSIENT_LASTDDLTIME)) { - tableProperties.put(KEY_TRANSIENT_LASTDDLTIME, StringUtils.trim(lineDataInternal.get(paraSecond))); - } - } - } - } - - @Override - public Connection getConnection() { - HiveDbUtil.ConnectionInfo connectionInfo = new HiveDbUtil.ConnectionInfo(); - connectionInfo.setJdbcUrl(dbUrl); - connectionInfo.setUsername(username); - connectionInfo.setPassword(password); - connectionInfo.setHiveConf(hadoopConfig); - connectionInfo.setDriver(driverName); - return HiveDbUtil.getConnection(connectionInfo); - } -} diff --git a/flinkx-metadata-hive2/flinkx-metadata-hive2-reader/src/main/java/com/dtstack/flinkx/metadatahive2/inputformat/Metadatehive2InputFormatBuilder.java b/flinkx-metadata-hive2/flinkx-metadata-hive2-reader/src/main/java/com/dtstack/flinkx/metadatahive2/inputformat/Metadatehive2InputFormatBuilder.java deleted file mode 100644 index 6a57e171d1..0000000000 --- a/flinkx-metadata-hive2/flinkx-metadata-hive2-reader/src/main/java/com/dtstack/flinkx/metadatahive2/inputformat/Metadatehive2InputFormatBuilder.java +++ /dev/null @@ -1,26 +0,0 @@ -package com.dtstack.flinkx.metadatahive2.inputformat; - -import com.dtstack.flinkx.metadata.inputformat.MetadataInputFormatBuilder; - -import java.util.Map; - -/** - * Date: 2020/05/26 - * Company: www.dtstack.com - * - * @author tudou - */ -public class Metadatehive2InputFormatBuilder extends MetadataInputFormatBuilder { - private Metadatahive2InputFormat format; - - - public Metadatehive2InputFormatBuilder(Metadatahive2InputFormat format) { - super(format); - this.format = format; - } - - public void setHadoopConfig(Map hadoopConfig) { - format.hadoopConfig = hadoopConfig; - } - -} diff --git a/flinkx-metadata-hive2/flinkx-metadata-hive2-reader/src/main/java/com/dtstack/flinkx/metadatahive2/reader/Metadatahive2Reader.java b/flinkx-metadata-hive2/flinkx-metadata-hive2-reader/src/main/java/com/dtstack/flinkx/metadatahive2/reader/Metadatahive2Reader.java deleted file mode 100644 index e70c3cb562..0000000000 --- a/flinkx-metadata-hive2/flinkx-metadata-hive2-reader/src/main/java/com/dtstack/flinkx/metadatahive2/reader/Metadatahive2Reader.java +++ /dev/null @@ -1,52 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - *

- * http://www.apache.org/licenses/LICENSE-2.0 - *

- * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -package com.dtstack.flinkx.metadatahive2.reader; - -import com.dtstack.flinkx.config.DataTransferConfig; -import com.dtstack.flinkx.config.ReaderConfig; -import com.dtstack.flinkx.metadata.inputformat.MetadataInputFormatBuilder; -import com.dtstack.flinkx.metadata.reader.MetadataReader; -import com.dtstack.flinkx.metadatahive2.inputformat.Metadatahive2InputFormat; -import com.dtstack.flinkx.metadatahive2.inputformat.Metadatehive2InputFormatBuilder; -import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; - -import java.util.Map; - -import static com.dtstack.flinkx.metadatahive2.constants.Hive2MetaDataCons.DRIVER_NAME; -import static com.dtstack.flinkx.metadatahive2.constants.Hive2MetaDataCons.KEY_HADOOP_CONFIG; - -/** - * @author : tiezhu - * @date : 2020/3/9 - */ -public class Metadatahive2Reader extends MetadataReader { - - public Metadatahive2Reader(DataTransferConfig config, StreamExecutionEnvironment env) { - super(config, env); - ReaderConfig readerConfig = config.getJob().getContent().get(0).getReader(); - hadoopConfig = (Map) readerConfig.getParameter().getVal(KEY_HADOOP_CONFIG); - driverName = DRIVER_NAME; - } - - @Override - protected MetadataInputFormatBuilder getBuilder(){ - Metadatehive2InputFormatBuilder builder = new Metadatehive2InputFormatBuilder(new Metadatahive2InputFormat()); - builder.setHadoopConfig(hadoopConfig); - return builder; - } -} diff --git a/flinkx-metadata-hive2/pom.xml b/flinkx-metadata-hive2/pom.xml deleted file mode 100644 index c17cb7b38d..0000000000 --- a/flinkx-metadata-hive2/pom.xml +++ /dev/null @@ -1,25 +0,0 @@ - - - - flinkx-all - com.dtstack.flinkx - 1.6 - - 4.0.0 - - flinkx-metadata-hive2 - pom - - flinkx-metadata-hive2-reader - - - - com.dtstack.flinkx - flinkx-core - 1.6 - provided - - - \ No newline at end of file diff --git a/flinkx-metadata-mysql/flinkx-metadata-mysql-reader/pom.xml b/flinkx-metadata-mysql/flinkx-metadata-mysql-reader/pom.xml deleted file mode 100644 index 2d88dc095c..0000000000 --- a/flinkx-metadata-mysql/flinkx-metadata-mysql-reader/pom.xml +++ /dev/null @@ -1,105 +0,0 @@ - - - - flinkx-metadata-mysql - com.dtstack.flinkx - 1.6 - - 4.0.0 - - flinkx-metadata-mysql-reader - - - com.dtstack.flinkx - flinkx-metadata-tidb-reader - 1.6 - - - mysql - mysql-connector-java - 5.1.46 - - - - - - - org.apache.maven.plugins - maven-shade-plugin - 3.1.0 - - - package - - shade - - - false - - - org.slf4j:slf4j-api - log4j:log4j - ch.qos.logback:* - - - - - *:* - - META-INF/*.SF - META-INF/*.DSA - META-INF/*.RSA - - - - - - io.netty - shade.metadatamysqlreader.io.netty - - - com.google.common - shade.core.com.google.common - - - com.google.thirdparty - shade.core.com.google.thirdparty - - - - - - - - - maven-antrun-plugin - 1.2 - - - copy-resources - - package - - run - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/flinkx-metadata-mysql/flinkx-metadata-mysql-reader/src/main/java/com/dtstack/flinkx/metadatamysql/constants/MysqlMetadataCons.java b/flinkx-metadata-mysql/flinkx-metadata-mysql-reader/src/main/java/com/dtstack/flinkx/metadatamysql/constants/MysqlMetadataCons.java deleted file mode 100644 index 3ab3282541..0000000000 --- a/flinkx-metadata-mysql/flinkx-metadata-mysql-reader/src/main/java/com/dtstack/flinkx/metadatamysql/constants/MysqlMetadataCons.java +++ /dev/null @@ -1,49 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - *

- * http://www.apache.org/licenses/LICENSE-2.0 - *

- * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package com.dtstack.flinkx.metadatamysql.constants; - -import com.dtstack.flinkx.metadatatidb.constants.TidbMetadataCons; - -/** - * 由于查询语句有变化,MysqlMetadataCons改写了一些常量定义 - * @author : kunni@dtstack.com - * @date : 2020/6/8 - */ - -public class MysqlMetadataCons extends TidbMetadataCons { - - public static final String KEY_ENGINE = "engine"; - public static final String KEY_ROW_FORMAT = "rowFormat"; - public static final String KEY_TABLE_TYPE = "tableType"; - - public static final String RESULT_TABLE_TYPE = "TABLE_TYPE"; - public static final String RESULT_ENGINE = "ENGINE"; - public static final String RESULT_ROW_FORMAT = "ROW_FORMAT"; - public static final String RESULT_ROWS = "TABLE_ROWS"; - public static final String RESULT_DATA_LENGTH = "DATA_LENGTH"; - public static final String RESULT_CREATE_TIME = "CREATE_TIME"; - public static final String RESULT_TABLE_COMMENT = "TABLE_COMMENT"; - public static final String RESULT_KEY_NAME = "Key_name"; - public static final String RESULT_COLUMN_NAME = "Column_name"; - public static final String RESULT_INDEX_TYPE = "Index_type"; - public static final String RESULT_INDEX_COMMENT = "Index_comment"; - - public static final String SQL_QUERY_TABLE_INFO = "SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = '%s' AND TABLE_NAME = '%s'"; - public static final String SQL_QUERY_INDEX = "SHOW INDEX FROM `%s`"; -} diff --git a/flinkx-metadata-mysql/flinkx-metadata-mysql-reader/src/main/java/com/dtstack/flinkx/metadatamysql/inputformat/MetadatamysqlInputFormat.java b/flinkx-metadata-mysql/flinkx-metadata-mysql-reader/src/main/java/com/dtstack/flinkx/metadatamysql/inputformat/MetadatamysqlInputFormat.java deleted file mode 100644 index d9c4d83225..0000000000 --- a/flinkx-metadata-mysql/flinkx-metadata-mysql-reader/src/main/java/com/dtstack/flinkx/metadatamysql/inputformat/MetadatamysqlInputFormat.java +++ /dev/null @@ -1,97 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - *

- * http://www.apache.org/licenses/LICENSE-2.0 - *

- * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package com.dtstack.flinkx.metadatamysql.inputformat; - -import com.dtstack.flinkx.metadatatidb.inputformat.MetadatatidbInputFormat; -import org.apache.commons.lang.exception.ExceptionUtils; - -import java.sql.ResultSet; -import java.sql.SQLException; -import java.sql.Statement; -import java.util.HashMap; -import java.util.LinkedList; -import java.util.List; -import java.util.Map; - -import static com.dtstack.flinkx.metadatamysql.constants.MysqlMetadataCons.*; - -/** - * @author : kunni@dtstack.com - * @date : 2020/6/8 - */ - -public class MetadatamysqlInputFormat extends MetadatatidbInputFormat { - - private static final long serialVersionUID = 1L; - - @Override - protected Map queryMetaData(String tableName) throws SQLException { - Map result = new HashMap<>(16); - Map tableProp = queryTableProp(tableName); - List> column = queryColumn(tableName); - List> index = queryIndex(tableName); - result.put(KEY_TABLE_PROPERTIES, tableProp); - result.put(KEY_COLUMN, column); - result.put(KEY_COLUMN_INDEX, index); - return result; - } - - /** - * @description add engine、table_type、row_format - */ - @Override - public Map queryTableProp(String tableName) throws SQLException { - Map tableProp = new HashMap<>(16); - String sql = String.format(SQL_QUERY_TABLE_INFO, quote(currentDb.get()), quote(tableName)); - try(Statement st = connection.get().createStatement(); - ResultSet rs = st.executeQuery(sql)) { - while (rs.next()) { - tableProp.put(KEY_TABLE_TYPE, RESULT_TABLE_TYPE); - tableProp.put(KEY_ENGINE, rs.getString(RESULT_ENGINE)); - tableProp.put(KEY_ROW_FORMAT, rs.getString(RESULT_ROW_FORMAT)); - tableProp.put(KEY_TABLE_ROWS, rs.getString(RESULT_ROWS)); - tableProp.put(KEY_TABLE_TOTAL_SIZE, rs.getString(RESULT_DATA_LENGTH)); - tableProp.put(KEY_TABLE_CREATE_TIME, rs.getString(RESULT_CREATE_TIME)); - tableProp.put(KEY_TABLE_COMMENT, rs.getString(RESULT_TABLE_COMMENT)); - } - } catch (SQLException e) { - throw new SQLException(ExceptionUtils.getMessage(e)); - } - return tableProp; - } - - protected List> queryIndex(String tableName) throws SQLException { - List> index = new LinkedList<>(); - String sql = String.format(SQL_QUERY_INDEX, tableName); - try(Statement st = connection.get().createStatement(); - ResultSet rs = st.executeQuery(sql)) { - while (rs.next()) { - Map perIndex = new HashMap<>(16); - perIndex.put(KEY_INDEX_NAME, rs.getString(RESULT_KEY_NAME)); - perIndex.put(KEY_COLUMN_NAME, rs.getString(RESULT_COLUMN_NAME)); - perIndex.put(KEY_COLUMN_TYPE, rs.getString(RESULT_INDEX_TYPE)); - perIndex.put(KEY_INDEX_COMMENT, rs.getString(RESULT_INDEX_COMMENT)); - index.add(perIndex); - } - } catch (SQLException e) { - throw new SQLException(ExceptionUtils.getMessage(e)); - } - return index; - } -} diff --git a/flinkx-metadata-oracle/flinkx-metadata-oracle-reader/src/main/java/com/dtstack/flinkx/metadataoracle/constants/OracleMetaDataCons.java b/flinkx-metadata-oracle/flinkx-metadata-oracle-reader/src/main/java/com/dtstack/flinkx/metadataoracle/constants/OracleMetaDataCons.java deleted file mode 100644 index 68496ab907..0000000000 --- a/flinkx-metadata-oracle/flinkx-metadata-oracle-reader/src/main/java/com/dtstack/flinkx/metadataoracle/constants/OracleMetaDataCons.java +++ /dev/null @@ -1,127 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - *

- * http://www.apache.org/licenses/LICENSE-2.0 - *

- * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package com.dtstack.flinkx.metadataoracle.constants; - -import com.dtstack.flinkx.metadata.MetaDataCons; - -/** - * @author : kunni@dtstack.com - * @date : 2020/6/9 - * @description : sql语句将"select *"替换为"select 具体属性" - */ - -public class OracleMetaDataCons extends MetaDataCons { - - public static final String DRIVER_NAME = "oracle.jdbc.driver.OracleDriver"; - - public static final String KEY_TABLE_TYPE = "tableType"; - - public static final String KEY_NUMBER = "NUMBER"; - - public static final String NUMBER_PRECISION = "NUMBER(%s,%s)"; - - public static final String KEY_MAX_NUMBER = "127"; - - public static final String KEY_PRIMARY_KEY = "primaryKey"; - - public static final String KEY_CREATE_TIME = "createTime"; - - public static final String KEY_PARTITION_KEY = "partitionKey"; - - /** - * 通过in语法,减少内存占用 - */ - public static final String SQL_QUERY_INDEX = "AND COLUMNS.TABLE_NAME IN (%s) "; - public static final String SQL_QUERY_COLUMN = "AND COLUMNS.TABLE_NAME IN (%s) "; - public static final String SQL_QUERY_PRIMARY_KEY = "AND COLUMNS.TABLE_NAME IN (%s) "; - public static final String SQL_QUERY_TABLE_PROPERTIES = "AND TABLES.TABLE_NAME IN (%s) "; - public static final String SQL_QUERY_TABLE_CREATE_TIME = "AND TABLES.TABLE_NAME IN (%s) "; - public static final String SQL_QUERY_TABLE_PARTITION_KEY = "AND NAME IN (%s) "; - - /** - * 查询索引信息 - */ - public static final String SQL_QUERY_INDEX_TOTAL = - "SELECT COLUMNS.INDEX_NAME, COLUMNS.COLUMN_NAME, INDEXES.INDEX_TYPE, COLUMNS.TABLE_NAME " + - "FROM ALL_IND_COLUMNS COLUMNS " + - "LEFT JOIN ALL_INDEXES INDEXES " + - "ON COLUMNS.INDEX_NAME = INDEXES.INDEX_NAME " + - "AND COLUMNS.TABLE_NAME = INDEXES.TABLE_NAME " + - "AND COLUMNS.TABLE_OWNER = INDEXES.TABLE_OWNER " + - "WHERE COLUMNS.TABLE_OWNER = %s "; - /** - * 查询列的基本信息 - * DATA_LENGTH列的长度、DATA_PRECISION小数精度、DATA_SCALE小数点右边数字 - */ - public static final String SQL_QUERY_COLUMN_TOTAL = - "SELECT COLUMNS.COLUMN_NAME, COLUMNS.DATA_TYPE, COMMENTS.COMMENTS, COLUMNS.TABLE_NAME, " + - "DATA_DEFAULT, NULLABLE, DATA_LENGTH, " + - "COLUMNS.COLUMN_ID, COLUMNS.DATA_PRECISION, COLUMNS.DATA_SCALE "+ - "FROM ALL_TAB_COLUMNS COLUMNS " + - "LEFT JOIN ALL_COL_COMMENTS COMMENTS " + - "ON COLUMNS.OWNER = COMMENTS.OWNER " + - "AND COLUMNS.TABLE_NAME = COMMENTS.TABLE_NAME " + - "AND COLUMNS.COLUMN_NAME = COMMENTS.COLUMN_NAME " + - "WHERE COLUMNS.OWNER = %s "; - /** - * 查询表的基本信息 - */ - public static final String SQL_QUERY_TABLE_PROPERTIES_TOTAL = - "SELECT TABLES.NUM_ROWS * TABLES.AVG_ROW_LEN AS TOTALSIZE, " + - "COMMENTS.COMMENTS, COMMENTS.TABLE_TYPE, TABLES.NUM_ROWS, TABLES.TABLE_NAME " + - "FROM ALL_TABLES TABLES " + - "LEFT JOIN ALL_TAB_COMMENTS COMMENTS " + - "ON TABLES.OWNER = COMMENTS.OWNER " + - "AND TABLES.TABLE_NAME = COMMENTS.TABLE_NAME " + - "WHERE TABLES.OWNER = %s "; - /** - * 查询主键信息 - */ - public static final String SQL_QUERY_PRIMARY_KEY_TOTAL = - "SELECT COLUMNS.TABLE_NAME, COLUMNS.COLUMN_NAME " + - "FROM ALL_CONS_COLUMNS COLUMNS, ALL_CONSTRAINTS CONS " + - "WHERE COLUMNS.CONSTRAINT_NAME = CONS.CONSTRAINT_NAME " + - "AND CONS.CONSTRAINT_TYPE = 'P' " + - "AND COLUMNS.OWNER = %s "; - /** - * 查询创建时间信息 - */ - public static final String SQL_QUERY_TABLE_CREATE_TIME_TOTAL = - "SELECT TABLES.TABLE_NAME, OBJS.CREATED " + - "FROM ALL_TABLES TABLES " + - "LEFT JOIN ALL_OBJECTS OBJS " + - "ON TABLES.OWNER = OBJS.OWNER " + - "AND TABLES.TABLE_NAME = OBJS.OBJECT_NAME " + - "WHERE TABLES.OWNER = %s "; - - /** - * 查询分区列信息 - */ - public static final String SQL_PARTITION_KEY = - "SELECT NAME, COLUMN_NAME FROM ALL_PART_KEY_COLUMNS " + - "WHERE OBJECT_TYPE = 'TABLE' AND OWNER=%s "; - - /** - * 查询特定schema下的所有表 - * 排除嵌套表等特殊表 - */ - public static final String SQL_SHOW_TABLES = - "SELECT TABLE_NAME FROM ALL_TABLES " + - "WHERE OWNER = %s AND NESTED = 'NO' AND IOT_NAME IS NULL "; -} \ No newline at end of file diff --git a/flinkx-metadata-oracle/flinkx-metadata-oracle-reader/src/main/java/com/dtstack/flinkx/metadataoracle/inputformat/MetadataoracleInputFormat.java b/flinkx-metadata-oracle/flinkx-metadata-oracle-reader/src/main/java/com/dtstack/flinkx/metadataoracle/inputformat/MetadataoracleInputFormat.java deleted file mode 100644 index 5131359be9..0000000000 --- a/flinkx-metadata-oracle/flinkx-metadata-oracle-reader/src/main/java/com/dtstack/flinkx/metadataoracle/inputformat/MetadataoracleInputFormat.java +++ /dev/null @@ -1,332 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - *

- * http://www.apache.org/licenses/LICENSE-2.0 - *

- * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package com.dtstack.flinkx.metadataoracle.inputformat; - -import com.dtstack.flinkx.constants.ConstantValue; -import com.dtstack.flinkx.metadata.inputformat.BaseMetadataInputFormat; -import org.apache.commons.lang3.StringUtils; - -import java.sql.ResultSet; -import java.sql.SQLException; -import java.util.ArrayList; -import java.util.HashMap; -import java.util.LinkedList; -import java.util.List; -import java.util.Map; - -import static com.dtstack.flinkx.metadata.MetaDataCons.KEY_COLUMN_DEFAULT; -import static com.dtstack.flinkx.metadata.MetaDataCons.KEY_COLUMN_NULL; -import static com.dtstack.flinkx.metadata.MetaDataCons.KEY_COLUMN_PRIMARY; -import static com.dtstack.flinkx.metadata.MetaDataCons.KEY_COLUMN_SCALE; -import static com.dtstack.flinkx.metadata.MetaDataCons.KEY_FALSE; -import static com.dtstack.flinkx.metadata.MetaDataCons.KEY_INDEX_NAME; -import static com.dtstack.flinkx.metadata.MetaDataCons.KEY_PARTITION_COLUMNS; -import static com.dtstack.flinkx.metadata.MetaDataCons.KEY_TABLE_COMMENT; -import static com.dtstack.flinkx.metadata.MetaDataCons.KEY_TABLE_CREATE_TIME; -import static com.dtstack.flinkx.metadata.MetaDataCons.KEY_TABLE_ROWS; -import static com.dtstack.flinkx.metadata.MetaDataCons.KEY_TABLE_TOTAL_SIZE; -import static com.dtstack.flinkx.metadata.MetaDataCons.KEY_TRUE; -import static com.dtstack.flinkx.metadata.MetaDataCons.MAX_TABLE_SIZE; -import static com.dtstack.flinkx.metadataoracle.constants.OracleMetaDataCons.KEY_COLUMN; -import static com.dtstack.flinkx.metadataoracle.constants.OracleMetaDataCons.KEY_COLUMN_COMMENT; -import static com.dtstack.flinkx.metadataoracle.constants.OracleMetaDataCons.KEY_COLUMN_INDEX; -import static com.dtstack.flinkx.metadataoracle.constants.OracleMetaDataCons.KEY_COLUMN_NAME; -import static com.dtstack.flinkx.metadataoracle.constants.OracleMetaDataCons.KEY_COLUMN_TYPE; -import static com.dtstack.flinkx.metadataoracle.constants.OracleMetaDataCons.KEY_CREATE_TIME; -import static com.dtstack.flinkx.metadataoracle.constants.OracleMetaDataCons.KEY_MAX_NUMBER; -import static com.dtstack.flinkx.metadataoracle.constants.OracleMetaDataCons.KEY_NUMBER; -import static com.dtstack.flinkx.metadataoracle.constants.OracleMetaDataCons.KEY_PARTITION_KEY; -import static com.dtstack.flinkx.metadataoracle.constants.OracleMetaDataCons.KEY_PRIMARY_KEY; -import static com.dtstack.flinkx.metadataoracle.constants.OracleMetaDataCons.KEY_TABLE_PROPERTIES; -import static com.dtstack.flinkx.metadataoracle.constants.OracleMetaDataCons.KEY_TABLE_TYPE; -import static com.dtstack.flinkx.metadataoracle.constants.OracleMetaDataCons.NUMBER_PRECISION; -import static com.dtstack.flinkx.metadataoracle.constants.OracleMetaDataCons.SQL_PARTITION_KEY; -import static com.dtstack.flinkx.metadataoracle.constants.OracleMetaDataCons.SQL_QUERY_COLUMN; -import static com.dtstack.flinkx.metadataoracle.constants.OracleMetaDataCons.SQL_QUERY_COLUMN_TOTAL; -import static com.dtstack.flinkx.metadataoracle.constants.OracleMetaDataCons.SQL_QUERY_INDEX; -import static com.dtstack.flinkx.metadataoracle.constants.OracleMetaDataCons.SQL_QUERY_INDEX_TOTAL; -import static com.dtstack.flinkx.metadataoracle.constants.OracleMetaDataCons.SQL_QUERY_PRIMARY_KEY; -import static com.dtstack.flinkx.metadataoracle.constants.OracleMetaDataCons.SQL_QUERY_PRIMARY_KEY_TOTAL; -import static com.dtstack.flinkx.metadataoracle.constants.OracleMetaDataCons.SQL_QUERY_TABLE_CREATE_TIME; -import static com.dtstack.flinkx.metadataoracle.constants.OracleMetaDataCons.SQL_QUERY_TABLE_CREATE_TIME_TOTAL; -import static com.dtstack.flinkx.metadataoracle.constants.OracleMetaDataCons.SQL_QUERY_TABLE_PARTITION_KEY; -import static com.dtstack.flinkx.metadataoracle.constants.OracleMetaDataCons.SQL_QUERY_TABLE_PROPERTIES; -import static com.dtstack.flinkx.metadataoracle.constants.OracleMetaDataCons.SQL_QUERY_TABLE_PROPERTIES_TOTAL; -import static com.dtstack.flinkx.metadataoracle.constants.OracleMetaDataCons.SQL_SHOW_TABLES; - -/** - * @author : kunni@dtstack.com - * @date : 2020/6/9 - * @description :Oracle元数据可在系统表中查询 - */ - -public class MetadataoracleInputFormat extends BaseMetadataInputFormat { - - private static final long serialVersionUID = 1L; - /** - * 表基本属性 - */ - private Map> tablePropertiesMap; - - /** - * 列基本属性 - */ - private Map>> columnListMap; - - /** - * 索引基本属性 - */ - private Map>> indexListMap; - - /** - * 主键信息 - */ - private Map primaryKeyMap; - - /** - * 表创建时间 - */ - private Map createdTimeMap; - - /** - * 分区列 - */ - private Map partitionMap; - - private String allTable; - - private String sql; - - @Override - protected List showTables() throws SQLException { - List tableNameList = new LinkedList<>(); - sql = String.format(SQL_SHOW_TABLES, quote(currentDb.get())); - try (ResultSet rs = statement.get().executeQuery(sql)) { - while (rs.next()) { - tableNameList.add(rs.getString(1)); - } - } - return tableNameList; - } - - @Override - protected void switchDatabase(String databaseName) { - currentDb.set(databaseName); - } - - @Override - protected String quote(String name) { - return String.format("'%s'",name); - } - - /** - * 从预先查询好的map中取出信息 - * @param tableName 表名 - * @return 表的元数据信息 - * @throws SQLException 异常 - */ - @Override - protected Map queryMetaData(String tableName) throws SQLException { - Map result = new HashMap<>(16); - // 如果当前map中没有,说明要重新取值 - if(!tablePropertiesMap.containsKey(tableName)){ - init(); - } - Map tableProperties = tablePropertiesMap.get(tableName); - tableProperties.put(KEY_TABLE_CREATE_TIME, createdTimeMap.get(tableName)); - List> columnList = columnListMap.get(tableName); - List> indexList = indexListMap.get(tableName); - String primaryColumn = primaryKeyMap.get(tableName); - String partitionKey = partitionMap.get(tableName); - List> partitionColumnList = new ArrayList<>(); - for(Map map : columnList){ - if(StringUtils.equals(map.get(KEY_COLUMN_NAME), primaryColumn)){ - map.put(KEY_COLUMN_PRIMARY, KEY_TRUE); - }else{ - map.put(KEY_COLUMN_PRIMARY, KEY_FALSE); - } - if(StringUtils.equals(map.get(KEY_COLUMN_NAME), partitionKey)){ - partitionColumnList.add(map); - } - } - result.put(KEY_PARTITION_COLUMNS, partitionColumnList); - result.put(KEY_TABLE_PROPERTIES, tableProperties); - result.put(KEY_COLUMN, columnList); - result.put(KEY_COLUMN_INDEX, indexList); - return result; - } - - protected Map > queryTableProperties() throws SQLException { - Map> tablePropertiesMap = new HashMap<>(16); - sql = String.format(SQL_QUERY_TABLE_PROPERTIES_TOTAL, quote(currentDb.get())); - if(StringUtils.isNotBlank(allTable)){ - sql += String.format(SQL_QUERY_TABLE_PROPERTIES, allTable); - } - LOG.info("querySQL: {}", sql); - try (ResultSet rs = statement.get().executeQuery(sql)) { - while (rs.next()) { - Map map = new HashMap<>(16); - map.put(KEY_TABLE_TOTAL_SIZE, rs.getString(1)); - map.put(KEY_TABLE_COMMENT, rs.getString(2)); - map.put(KEY_TABLE_TYPE, rs.getString(3)); - map.put(KEY_TABLE_ROWS, rs.getString(4)); - tablePropertiesMap.put(rs.getString(5), map); - } - } - return tablePropertiesMap; - } - - protected Map>> queryIndexList() throws SQLException { - Map>> indexListMap = new HashMap<>(16); - sql = String.format(SQL_QUERY_INDEX_TOTAL, quote(currentDb.get())); - if(StringUtils.isNotBlank(allTable)){ - sql += String.format(SQL_QUERY_INDEX, allTable); - } - LOG.info("querySQL: {}", sql); - try (ResultSet rs = statement.get().executeQuery(sql)) { - while (rs.next()) { - Map column = new HashMap<>(16); - column.put(KEY_INDEX_NAME, rs.getString(1)); - column.put(KEY_COLUMN_NAME, rs.getString(2)); - column.put(KEY_COLUMN_TYPE, rs.getString(3)); - String tableName = rs.getString(4); - if(indexListMap.containsKey(tableName)){ - indexListMap.get(tableName).add(column); - }else { - List> indexList = new LinkedList<>(); - indexList.add(column); - indexListMap.put(tableName, indexList); - } - } - } - return indexListMap; - } - - protected Map>> queryColumnList() throws SQLException { - Map>> columnListMap = new HashMap<>(16); - sql = String.format(SQL_QUERY_COLUMN_TOTAL, quote(currentDb.get())); - if(StringUtils.isNotBlank(allTable)){ - sql += String.format(SQL_QUERY_COLUMN, allTable); - } - LOG.info("querySQL: {}", sql); - try (ResultSet rs = statement.get().executeQuery(sql)) { - while (rs.next()) { - Map column = new HashMap<>(16); - // oracle中,resultSet的LONG、LONG ROW类型要放第一个取出来 - column.put(KEY_COLUMN_DEFAULT, rs.getString(5)); - column.put(KEY_COLUMN_NAME, rs.getString(1)); - String type = rs.getString(2); - String length = rs.getString(7); - if(StringUtils.equals(type, KEY_NUMBER)){ - String precision = rs.getString(9); - String scale = rs.getString(10); - column.put(KEY_COLUMN_TYPE, String.format(NUMBER_PRECISION, precision==null?length:precision, scale==null?KEY_MAX_NUMBER:scale)); - }else { - column.put(KEY_COLUMN_TYPE, type); - } - column.put(KEY_COLUMN_COMMENT, rs.getString(3)); - String tableName = rs.getString(4); - column.put(KEY_COLUMN_NULL, rs.getString(6)); - column.put(KEY_COLUMN_SCALE, length); - String index = rs.getString(8); - if(columnListMap.containsKey(tableName)){ - column.put(KEY_COLUMN_INDEX, index); - columnListMap.get(tableName).add(column); - }else { - List>columnList = new LinkedList<>(); - column.put(KEY_COLUMN_INDEX, index); - columnList.add(column); - columnListMap.put(tableName, columnList); - } - } - } - return columnListMap; - } - - /** - * 表名和某个特定属性映射的map - * @return 映射map - * @throws SQLException sql异常 - */ - protected Map queryTableKeyMap(String type) throws SQLException { - Map primaryKeyMap = new HashMap<>(16); - switch (type){ - case KEY_PRIMARY_KEY:{ - sql = String.format(SQL_QUERY_PRIMARY_KEY_TOTAL, quote(currentDb.get())); - if (StringUtils.isNotBlank(allTable)){ - sql += String.format(SQL_QUERY_PRIMARY_KEY, allTable); - } - break; - } - case KEY_CREATE_TIME:{ - sql = String.format(SQL_QUERY_TABLE_CREATE_TIME_TOTAL, quote(currentDb.get())); - if (StringUtils.isNotBlank(allTable)){ - sql += String.format(SQL_QUERY_TABLE_CREATE_TIME, allTable); - } - break; - } - case KEY_PARTITION_KEY:{ - sql = String.format(SQL_PARTITION_KEY, quote(currentDb.get())); - if (StringUtils.isNotBlank(allTable)){ - sql += String.format(SQL_QUERY_TABLE_PARTITION_KEY, allTable); - } - break; - } - default: break; - } - LOG.info("querySQL: {}", sql); - try (ResultSet rs = statement.get().executeQuery(sql)){ - while (rs.next()){ - primaryKeyMap.put(rs.getString(1), rs.getString(2)); - } - } - return primaryKeyMap; - } - - /** - * 每隔20张表进行一次查询 - * @throws SQLException 执行sql出现的异常 - */ - @Override - protected void init() throws SQLException { - // 没有表则退出 - if(tableList.size()==0){ - return; - } - allTable = null; - if (start < tableList.size()){ - // 取出子数组,注意避免越界 - List splitTableList = tableList.subList(start, Math.min(start+MAX_TABLE_SIZE, tableList.size())); - StringBuilder stringBuilder = new StringBuilder(2 * splitTableList.size()); - for(int index=0;index - * http://www.apache.org/licenses/LICENSE-2.0 - *

- * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package com.dtstack.flinkx.metadataoracle.reader; - -import com.dtstack.flinkx.config.DataTransferConfig; -import com.dtstack.flinkx.metadata.inputformat.MetadataInputFormatBuilder; -import com.dtstack.flinkx.metadata.reader.MetadataReader; -import com.dtstack.flinkx.metadataoracle.inputformat.MetadataoracleInputFormat; -import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; - -import static com.dtstack.flinkx.metadataoracle.constants.OracleMetaDataCons.DRIVER_NAME; - -/** - * @author : kunni@dtstack.com - * @date : 2020/6/9 - * @description : MetadataOracleReader - */ - -public class MetadataoracleReader extends MetadataReader { - public MetadataoracleReader(DataTransferConfig config, StreamExecutionEnvironment env) { - super(config, env); - driverName = DRIVER_NAME; - } - - @Override - protected MetadataInputFormatBuilder getBuilder(){ - return new MetadataInputFormatBuilder(new MetadataoracleInputFormat()); - } -} diff --git a/flinkx-metadata-phoenix5/flinkx-metadata-phoenix5-reader/pom.xml b/flinkx-metadata-phoenix5/flinkx-metadata-phoenix5-reader/pom.xml deleted file mode 100644 index c91c543d28..0000000000 --- a/flinkx-metadata-phoenix5/flinkx-metadata-phoenix5-reader/pom.xml +++ /dev/null @@ -1,121 +0,0 @@ - - - - flinkx-metadata-phoenix5 - com.dtstack.flinkx - 1.6 - - 4.0.0 - - flinkx-metadata-phoenix5-reader - - - - com.dtstack.flinkx - flinkx-metadata-reader - 1.6 - - - org.apache.phoenix - phoenix-core - 5.0.0-HBase-2.0 - - - - org.codehaus.janino - janino - 3.1.1 - - - - org.apache.curator - curator-test - 2.6.0 - test - - - - - - - - org.apache.maven.plugins - maven-shade-plugin - 3.1.0 - - - package - - shade - - - false - - - org.slf4j:slf4j-api - log4j:log4j - ch.qos.logback:* - - - - - *:* - - META-INF/*.SF - META-INF/*.DSA - META-INF/*.RSA - - - - - - io.netty - shade.metadataphoenix5.io.netty - - - com.google.common - shade.metadataphoenix5.com.google.common - - - com.google.thirdparty - shade.metadataphoenix5.com.google.thirdparty - - - - - - - - - maven-antrun-plugin - 1.2 - - - copy-resources - - package - - run - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/flinkx-metadata-phoenix5/flinkx-metadata-phoenix5-reader/src/main/java/com/dtstack/flinkx/metadataphoenix5/inputformat/MetadataPhoenixBuilder.java b/flinkx-metadata-phoenix5/flinkx-metadata-phoenix5-reader/src/main/java/com/dtstack/flinkx/metadataphoenix5/inputformat/MetadataPhoenixBuilder.java deleted file mode 100644 index 37a006ccb7..0000000000 --- a/flinkx-metadata-phoenix5/flinkx-metadata-phoenix5-reader/src/main/java/com/dtstack/flinkx/metadataphoenix5/inputformat/MetadataPhoenixBuilder.java +++ /dev/null @@ -1,57 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - *

- * http://www.apache.org/licenses/LICENSE-2.0 - *

- * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package com.dtstack.flinkx.metadataphoenix5.inputformat; - -import com.dtstack.flinkx.metadata.inputformat.MetadataInputFormatBuilder; -import org.apache.commons.lang.StringUtils; - -import java.util.Map; - -/** - * @author kunni@dtstack.com - */ -public class MetadataPhoenixBuilder extends MetadataInputFormatBuilder { - - protected Metadataphoenix5InputFormat format; - - public MetadataPhoenixBuilder(Metadataphoenix5InputFormat format) { - super(format); - this.format = format; - } - - @Override - protected void checkFormat() { - super.checkFormat(); - StringBuilder sb = new StringBuilder(256); - if (StringUtils.isEmpty(format.path)) { - sb.append("phoenix zookeeper can not be empty ;\n"); - } - if (sb.length() > 0) { - throw new IllegalArgumentException(sb.toString()); - } - } - - public void setPath(String path){ - format.setPath(path); - } - - public void setHadoopConfig(Map hadoopConfig){ - format.setHadoopConfig(hadoopConfig); - } -} diff --git a/flinkx-metadata-phoenix5/flinkx-metadata-phoenix5-reader/src/main/java/com/dtstack/flinkx/metadataphoenix5/inputformat/Metadataphoenix5InputFormat.java b/flinkx-metadata-phoenix5/flinkx-metadata-phoenix5-reader/src/main/java/com/dtstack/flinkx/metadataphoenix5/inputformat/Metadataphoenix5InputFormat.java deleted file mode 100644 index a746bef5f5..0000000000 --- a/flinkx-metadata-phoenix5/flinkx-metadata-phoenix5-reader/src/main/java/com/dtstack/flinkx/metadataphoenix5/inputformat/Metadataphoenix5InputFormat.java +++ /dev/null @@ -1,346 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - *

- * http://www.apache.org/licenses/LICENSE-2.0 - *

- * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package com.dtstack.flinkx.metadataphoenix5.inputformat; - -import com.dtstack.flinkx.constants.ConstantValue; -import com.dtstack.flinkx.metadata.inputformat.BaseMetadataInputFormat; -import com.dtstack.flinkx.metadata.inputformat.MetadataInputSplit; -import com.dtstack.flinkx.metadataphoenix5.util.IPhoenix5Helper; -import com.dtstack.flinkx.metadataphoenix5.util.Phoenix5Util; -import com.dtstack.flinkx.util.ZkHelper; -import com.dtstack.flinkx.util.ClassUtil; -import com.dtstack.flinkx.util.ExceptionUtil; -import com.dtstack.flinkx.util.GsonUtil; -import com.dtstack.flinkx.util.ReflectionUtils; -import com.google.common.collect.Lists; -import org.apache.commons.collections.CollectionUtils; -import org.apache.commons.collections.MapUtils; -import org.apache.commons.io.FilenameUtils; -import org.apache.commons.lang.StringUtils; -import org.apache.flink.core.io.InputSplit; -import org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders; -import org.apache.zookeeper.ZooKeeper; -import org.codehaus.commons.compiler.CompileException; -import sun.misc.URLClassPath; - -import java.io.IOException; -import java.lang.reflect.Field; -import java.net.URL; -import java.net.URLClassLoader; -import java.sql.Connection; -import java.sql.ResultSet; -import java.sql.SQLException; -import java.sql.Statement; -import java.util.HashMap; -import java.util.LinkedList; -import java.util.List; -import java.util.Map; -import java.util.Properties; - -import static com.dtstack.flinkx.metadata.MetaDataCons.KEY_COLUMN; -import static com.dtstack.flinkx.metadata.MetaDataCons.KEY_COLUMN_DATA_TYPE; -import static com.dtstack.flinkx.metadata.MetaDataCons.KEY_COLUMN_INDEX; -import static com.dtstack.flinkx.metadata.MetaDataCons.KEY_COLUMN_NAME; -import static com.dtstack.flinkx.metadata.MetaDataCons.KEY_CONN_PASSWORD; -import static com.dtstack.flinkx.metadata.MetaDataCons.KEY_FALSE; -import static com.dtstack.flinkx.metadata.MetaDataCons.KEY_TABLE_PROPERTIES; -import static com.dtstack.flinkx.metadata.MetaDataCons.KEY_TRUE; -import static com.dtstack.flinkx.metadata.MetaDataCons.KEY_USER; -import static com.dtstack.flinkx.metadata.MetaDataCons.RESULT_SET_COLUMN_NAME; -import static com.dtstack.flinkx.metadata.MetaDataCons.RESULT_SET_ORDINAL_POSITION; -import static com.dtstack.flinkx.metadata.MetaDataCons.RESULT_SET_TABLE_NAME; -import static com.dtstack.flinkx.metadata.MetaDataCons.RESULT_SET_TYPE_NAME; -import static com.dtstack.flinkx.metadataphoenix5.util.PhoenixMetadataCons.HBASE_MASTER_KERBEROS_PRINCIPAL; -import static com.dtstack.flinkx.metadataphoenix5.util.PhoenixMetadataCons.KEY_CREATE_TIME; -import static com.dtstack.flinkx.metadataphoenix5.util.PhoenixMetadataCons.KEY_DEFAULT; -import static com.dtstack.flinkx.metadataphoenix5.util.PhoenixMetadataCons.KEY_NAMESPACE; -import static com.dtstack.flinkx.metadataphoenix5.util.PhoenixMetadataCons.KEY_PRIMARY_KEY; -import static com.dtstack.flinkx.metadataphoenix5.util.PhoenixMetadataCons.KEY_TABLE_NAME; -import static com.dtstack.flinkx.metadataphoenix5.util.PhoenixMetadataCons.SQL_COLUMN; -import static com.dtstack.flinkx.metadataphoenix5.util.PhoenixMetadataCons.SQL_DEFAULT_COLUMN; -import static com.dtstack.flinkx.metadataphoenix5.util.PhoenixMetadataCons.SQL_DEFAULT_TABLE_NAME; -import static com.dtstack.flinkx.metadataphoenix5.util.PhoenixMetadataCons.SQL_TABLE_NAME; -import static com.dtstack.flinkx.util.ZkHelper.APPEND_PATH; - -/** - * @author kunni@Dtstack.com - */ - -public class Metadataphoenix5InputFormat extends BaseMetadataInputFormat { - - private Map createTimeMap; - - public static final String JDBC_PHOENIX_PREFIX = "jdbc:phoenix:"; - - public static final String DEFAULT_SCHEMA = "default"; - - protected ZooKeeper zooKeeper; - - protected Map hadoopConfig; - - protected String path; - - protected String zooKeeperPath; - - @Override - protected void closeInternal() throws IOException { - tableIterator.remove(); - Statement st = statement.get(); - if (null != st) { - try { - st.close(); - statement.remove(); - } catch (SQLException e) { - LOG.error("close statement failed, e = {}", ExceptionUtil.getErrorMessage(e)); - throw new IOException("close statement failed", e); - } - } - currentDb.remove(); - } - - @Override - public void closeInputFormat() throws IOException { - if(connection.get() != null){ - try{ - connection.get().close(); - connection.remove(); - }catch (SQLException e){ - LOG.error("failed to close connection, e = {}", ExceptionUtil.getErrorMessage(e)); - } - } - super.closeInputFormat(); - } - - - @Override - protected List showTables() { - String sql; - if (StringUtils.endsWithIgnoreCase(currentDb.get(), KEY_DEFAULT) || - StringUtils.isBlank(currentDb.get())) { - sql = SQL_DEFAULT_TABLE_NAME; - } else { - sql = String.format(SQL_TABLE_NAME, currentDb.get()); - } - List table = new LinkedList<>(); - try (ResultSet resultSet = executeQuery0(sql, statement.get())) { - while (resultSet.next()) { - table.add(resultSet.getString(RESULT_SET_TABLE_NAME)); - } - } catch (SQLException e) { - LOG.error("query table lists failed, {}", ExceptionUtil.getErrorMessage(e)); - } - return table; - } - - @Override - protected void switchDatabase(String databaseName) { - currentDb.set(databaseName); - } - - @Override - protected Map queryMetaData(String tableName) { - Map result = new HashMap<>(16); - Map tableProp = queryTableProp(tableName); - List> column = queryColumn(tableName); - result.put(KEY_TABLE_PROPERTIES, tableProp); - result.put(KEY_COLUMN, column); - return result; - } - - @Override - protected String quote(String name) { - return name; - } - - /** - * 获取表级别的元数据信息 - * - * @param tableName 表名 - * @return 表的元数据 - */ - public Map queryTableProp(String tableName) { - Map tableProp = new HashMap<>(16); - if (StringUtils.endsWithIgnoreCase(currentDb.get(), KEY_DEFAULT) || currentDb.get() == null) { - tableProp.put(KEY_CREATE_TIME, createTimeMap.get(tableName)); - } else { - tableProp.put(KEY_CREATE_TIME, createTimeMap.get(currentDb.get() + ConstantValue.POINT_SYMBOL + tableName)); - } - tableProp.put(KEY_NAMESPACE, currentDb.get()); - tableProp.put(KEY_TABLE_NAME, tableName); - return tableProp; - } - - /** - * 获取列级别的元数据信息 - * - * @param tableName 表名 - * @return 列的元数据信息 - */ - public List> queryColumn(String tableName) { - List> column = new LinkedList<>(); - String sql; - if (isDefaultSchema()) { - sql = String.format(SQL_DEFAULT_COLUMN, tableName); - } else { - sql = String.format(SQL_COLUMN, currentDb.get(), tableName); - } - Map familyMap = new HashMap<>(16); - try (ResultSet resultSet = executeQuery0(sql, statement.get())) { - while (resultSet.next()) { - familyMap.put(resultSet.getString(1), resultSet.getString(2)); - } - } catch (SQLException e) { - LOG.error("query column information failed, {}", ExceptionUtil.getErrorMessage(e)); - } - //default schema需要特殊处理 - String originSchema = currentDb.get(); - if (DEFAULT_SCHEMA.equalsIgnoreCase(originSchema)) { - originSchema = null; - } - try (ResultSet resultSet = connection.get().getMetaData().getColumns(null, originSchema, tableName, null)) { - while (resultSet.next()) { - Map map = new HashMap<>(16); - String index = resultSet.getString(RESULT_SET_ORDINAL_POSITION); - String family = familyMap.get(index); - if (StringUtils.isBlank(family)) { - map.put(KEY_PRIMARY_KEY, KEY_TRUE); - map.put(KEY_COLUMN_NAME, resultSet.getString(RESULT_SET_COLUMN_NAME)); - } else { - map.put(KEY_PRIMARY_KEY, KEY_FALSE); - map.put(KEY_COLUMN_NAME, family + ConstantValue.COLON_SYMBOL + resultSet.getString(RESULT_SET_COLUMN_NAME)); - } - map.put(KEY_COLUMN_DATA_TYPE, resultSet.getString(RESULT_SET_TYPE_NAME)); - map.put(KEY_COLUMN_INDEX, index); - column.add(map); - } - } catch (SQLException e) { - LOG.error("failed to get column information, {} ", ExceptionUtil.getErrorMessage(e)); - } - return column; - } - - /** - * 查询表的创建时间 - * 如果zookeeper没有权限访问,返回空map - * - * @param hosts zookeeper地址 - * @return 表名与创建时间的映射 - */ - protected Map queryCreateTimeMap(String hosts) { - Map createTimeMap = new HashMap<>(16); - try { - zooKeeper = ZkHelper.createZkClient(hosts, ZkHelper.DEFAULT_TIMEOUT); - List tables = ZkHelper.getChildren(zooKeeper, path); - if (tables != null) { - for (String table : tables) { - createTimeMap.put(table, ZkHelper.getCreateTime(zooKeeper, path + ConstantValue.SINGLE_SLASH_SYMBOL + table)); - } - } - } catch (Exception e) { - LOG.error("query createTime map failed, error {}", ExceptionUtil.getErrorMessage(e)); - } finally { - ZkHelper.closeZooKeeper(zooKeeper); - } - return createTimeMap; - } - - - @Override - protected void init() { - //兼容url携带zookeeper路径 - String hosts = dbUrl.substring(JDBC_PHOENIX_PREFIX.length()); - createTimeMap = queryCreateTimeMap(hosts); - } - - @Override - public Connection getConnection() throws SQLException { - Field declaredField = ReflectionUtils.getDeclaredField(getClass().getClassLoader(), "ucp"); - assert declaredField != null; - declaredField.setAccessible(true); - URLClassPath urlClassPath; - try { - urlClassPath = (URLClassPath) declaredField.get(getClass().getClassLoader()); - } catch (IllegalAccessException e) { - String message = String.format("cannot get urlClassPath from current classLoader, classLoader = %s, e = %s", getClass().getClassLoader(), ExceptionUtil.getErrorMessage(e)); - throw new RuntimeException(message, e); - } - declaredField.setAccessible(false); - - List needJar = Lists.newArrayList(); - for (URL url : urlClassPath.getURLs()) { - String urlFileName = FilenameUtils.getName(url.getPath()); - if (urlFileName.startsWith("flinkx-metadata-phoenix5-reader")) { - needJar.add(url); - break; - } - } - - ClassLoader parentClassLoader = getClass().getClassLoader(); - List list = new LinkedList<>(); - list.add("org.apache.flink"); - list.add("com.dtstack.flinkx"); - - URLClassLoader childFirstClassLoader = FlinkUserCodeClassLoaders.childFirst(needJar.toArray(new URL[0]), parentClassLoader, list.toArray(new String[0])); - Properties properties = new Properties(); - ClassUtil.forName(driverName, childFirstClassLoader); - if (StringUtils.isNotEmpty(username)) { - properties.setProperty(KEY_USER, username); - } - if (StringUtils.isNotEmpty(password)) { - properties.setProperty(KEY_CONN_PASSWORD, password); - } - String jdbcUrl = dbUrl + ConstantValue.COLON_SYMBOL + zooKeeperPath; - try { - IPhoenix5Helper helper = Phoenix5Util.getHelper(childFirstClassLoader); - if (StringUtils.isNotEmpty(MapUtils.getString(hadoopConfig, HBASE_MASTER_KERBEROS_PRINCIPAL))) { - Phoenix5Util.setKerberosParams(properties, hadoopConfig); - return Phoenix5Util.getConnectionWithKerberos(hadoopConfig, properties, jdbcUrl, helper); - } - return helper.getConn(jdbcUrl, properties); - } catch (IOException | CompileException e) { - String message = String.format("cannot get phoenix connection, dbUrl = %s, properties = %s, e = %s", dbUrl, GsonUtil.GSON.toJson(properties), ExceptionUtil.getErrorMessage(e)); - throw new RuntimeException(message, e); - } - } - - /** - * phoenix 默认schema为空,在平台层设置值为default - * - * @return 是否为默认schema - */ - public boolean isDefaultSchema() { - return StringUtils.endsWithIgnoreCase(currentDb.get(), KEY_DEFAULT) || - StringUtils.isBlank(currentDb.get()); - } - - /** - * 传入为hbase在zookeeper中的路径,增加/table表示table所在路径 - * - * @param path hbase路径 - */ - public void setPath(String path) { - this.zooKeeperPath = path; - this.path = path + APPEND_PATH; - } - - - public void setHadoopConfig(Map hadoopConfig) { - this.hadoopConfig = hadoopConfig; - } -} diff --git a/flinkx-metadata-phoenix5/flinkx-metadata-phoenix5-reader/src/main/java/com/dtstack/flinkx/metadataphoenix5/reader/Metadataphoenix5Reader.java b/flinkx-metadata-phoenix5/flinkx-metadata-phoenix5-reader/src/main/java/com/dtstack/flinkx/metadataphoenix5/reader/Metadataphoenix5Reader.java deleted file mode 100644 index 1378ec0442..0000000000 --- a/flinkx-metadata-phoenix5/flinkx-metadata-phoenix5-reader/src/main/java/com/dtstack/flinkx/metadataphoenix5/reader/Metadataphoenix5Reader.java +++ /dev/null @@ -1,61 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - *

- * http://www.apache.org/licenses/LICENSE-2.0 - *

- * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package com.dtstack.flinkx.metadataphoenix5.reader; - - -import com.dtstack.flinkx.config.DataTransferConfig; -import com.dtstack.flinkx.config.ReaderConfig; -import com.dtstack.flinkx.metadata.inputformat.MetadataInputFormatBuilder; -import com.dtstack.flinkx.metadata.reader.MetadataReader; -import com.dtstack.flinkx.metadataphoenix5.inputformat.MetadataPhoenixBuilder; -import com.dtstack.flinkx.metadataphoenix5.inputformat.Metadataphoenix5InputFormat; -import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; - -import java.util.Map; - -import static com.dtstack.flinkx.metadataphoenix5.util.PhoenixMetadataCons.DRIVER_NAME; -import static com.dtstack.flinkx.metadataphoenix5.util.PhoenixMetadataCons.KEY_HADOOP_CONFIG; -import static com.dtstack.flinkx.metadataphoenix5.util.PhoenixMetadataCons.KEY_PATH; -import static com.dtstack.flinkx.util.ZkHelper.DEFAULT_PATH; - -/** - * @author kunni@dtstack.com - */ -public class Metadataphoenix5Reader extends MetadataReader { - - protected String path; - - public Metadataphoenix5Reader(DataTransferConfig config, StreamExecutionEnvironment env) { - super(config, env); - hadoopConfig = (Map) config.getJob().getContent() - .get(0).getReader().getParameter().getVal(KEY_HADOOP_CONFIG); - ReaderConfig readerConfig = config.getJob().getContent().get(0).getReader(); - path = readerConfig.getParameter().getStringVal(KEY_PATH, DEFAULT_PATH); - driverName = DRIVER_NAME; - } - - @Override - protected MetadataInputFormatBuilder getBuilder(){ - MetadataPhoenixBuilder builder = new MetadataPhoenixBuilder(new Metadataphoenix5InputFormat()); - builder.setDataTransferConfig(dataTransferConfig); - builder.setPath(path); - builder.setHadoopConfig(hadoopConfig); - return builder; - } -} diff --git a/flinkx-metadata-phoenix5/flinkx-metadata-phoenix5-reader/src/main/java/com/dtstack/flinkx/metadataphoenix5/util/IPhoenix5Helper.java b/flinkx-metadata-phoenix5/flinkx-metadata-phoenix5-reader/src/main/java/com/dtstack/flinkx/metadataphoenix5/util/IPhoenix5Helper.java deleted file mode 100644 index 9b097a451a..0000000000 --- a/flinkx-metadata-phoenix5/flinkx-metadata-phoenix5-reader/src/main/java/com/dtstack/flinkx/metadataphoenix5/util/IPhoenix5Helper.java +++ /dev/null @@ -1,59 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - *

- * http://www.apache.org/licenses/LICENSE-2.0 - *

- * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package com.dtstack.flinkx.metadataphoenix5.util; - -import java.sql.Connection; -import java.sql.SQLException; -import java.util.Properties; - -/** - * Date: 2020/10/10 - * Company: www.dtstack.com - * - * @author tudou - */ -public interface IPhoenix5Helper { - - String CLASS_STR = "public transient RowProjector rowProjector;\n" + - " public List instanceList;\n" + - "\n" + - " @Override\n" + - " public Connection getConn(String url, Properties properties) throws SQLException {\n" + - " Connection dbConn;\n" + - " synchronized (ClassUtil.LOCK_STR) {\n" + - " DriverManager.setLoginTimeout(10);\n" + - " // telnet\n" + - " TelnetUtil.telnet(url);\n" + - " dbConn = DriverManager.getConnection(url, properties);\n" + - " }\n" + - "\n" + - " return dbConn;\n" + - " }\n" + - "\n"; - - /** - * 获取phoenix jdbc连接 - * @param url url地址 - * @param properties 连接配置 - * @return 连接 - * @throws SQLException sql异常 - */ - Connection getConn(String url, Properties properties) throws SQLException; - -} diff --git a/flinkx-metadata-phoenix5/flinkx-metadata-phoenix5-reader/src/main/java/com/dtstack/flinkx/metadataphoenix5/util/Phoenix5Util.java b/flinkx-metadata-phoenix5/flinkx-metadata-phoenix5-reader/src/main/java/com/dtstack/flinkx/metadataphoenix5/util/Phoenix5Util.java deleted file mode 100644 index 898f2d795f..0000000000 --- a/flinkx-metadata-phoenix5/flinkx-metadata-phoenix5-reader/src/main/java/com/dtstack/flinkx/metadataphoenix5/util/Phoenix5Util.java +++ /dev/null @@ -1,179 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - *

- * http://www.apache.org/licenses/LICENSE-2.0 - *

- * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package com.dtstack.flinkx.metadataphoenix5.util; - -import com.dtstack.flinkx.authenticate.KerberosUtil; -import com.dtstack.flinkx.util.FileSystemUtil; -import org.apache.commons.collections.MapUtils; -import org.apache.hadoop.conf.Configuration; -import org.apache.hadoop.security.UserGroupInformation; -import org.codehaus.commons.compiler.CompileException; -import org.codehaus.janino.ClassBodyEvaluator; - -import java.io.IOException; -import java.io.StringReader; -import java.security.PrivilegedAction; -import java.sql.Connection; -import java.util.Map; -import java.util.Properties; - -import static com.dtstack.flinkx.metadataphoenix5.util.PhoenixMetadataCons.AUTHENTICATION_TYPE; -import static com.dtstack.flinkx.metadataphoenix5.util.PhoenixMetadataCons.HADOOP_SECURITY_AUTHENTICATION; -import static com.dtstack.flinkx.metadataphoenix5.util.PhoenixMetadataCons.HBASE_MASTER_KERBEROS_PRINCIPAL; -import static com.dtstack.flinkx.metadataphoenix5.util.PhoenixMetadataCons.HBASE_REGIONSERVER_KERBEROS_PRINCIPAL; -import static com.dtstack.flinkx.metadataphoenix5.util.PhoenixMetadataCons.HBASE_SECURITY_AUTHENTICATION; -import static com.dtstack.flinkx.metadataphoenix5.util.PhoenixMetadataCons.HBASE_SECURITY_AUTHORIZATION; -import static com.dtstack.flinkx.metadataphoenix5.util.PhoenixMetadataCons.KEYTAB_FILE; -import static com.dtstack.flinkx.metadataphoenix5.util.PhoenixMetadataCons.KEY_PRINCIPAL; -import static com.dtstack.flinkx.metadataphoenix5.util.PhoenixMetadataCons.PHOENIX_QUERYSERVER_KERBEROS_PRINCIPAL; - -public class Phoenix5Util { - - /** - * 通过指定类加载器获取helper - * @param parentClassLoader 类加载器 - * @return helper实现类 - * @throws IOException io异常 - * @throws CompileException 编译异常 - */ - public static IPhoenix5Helper getHelper(ClassLoader parentClassLoader) throws IOException, CompileException { - ClassBodyEvaluator cbe = new ClassBodyEvaluator(); - cbe.setParentClassLoader(parentClassLoader); - cbe.setDefaultImports("com.dtstack.flinkx.util.ClassUtil", - "com.dtstack.flinkx.util.TelnetUtil", - "org.apache.commons.lang3.StringUtils", - "org.apache.commons.lang3.tuple.Pair", - "org.apache.flink.types.Row", - "org.apache.hadoop.hbase.NoTagsKeyValue", - "org.apache.hadoop.hbase.client.Result", - "org.apache.hadoop.hbase.io.ImmutableBytesWritable", - "org.apache.phoenix.compile.RowProjector", - "org.apache.phoenix.compile.StatementContext", - "org.apache.phoenix.jdbc.PhoenixEmbeddedDriver", - "org.apache.phoenix.jdbc.PhoenixPreparedStatement", - "org.apache.phoenix.jdbc.PhoenixResultSet", - "org.apache.phoenix.query.KeyRange", - "org.apache.phoenix.schema.tuple.ResultTuple", - "org.apache.phoenix.schema.types.PBoolean", - "org.apache.phoenix.schema.types.PChar", - "org.apache.phoenix.schema.types.PDataType", - "org.apache.phoenix.schema.types.PDate", - "org.apache.phoenix.schema.types.PDecimal", - "org.apache.phoenix.schema.types.PDouble", - "org.apache.phoenix.schema.types.PFloat", - "org.apache.phoenix.schema.types.PInteger", - "org.apache.phoenix.schema.types.PLong", - "org.apache.phoenix.schema.types.PSmallint", - "org.apache.phoenix.schema.types.PTime", - "org.apache.phoenix.schema.types.PTimestamp", - "org.apache.phoenix.schema.types.PTinyint", - "org.apache.phoenix.schema.types.PUnsignedDate", - "org.apache.phoenix.schema.types.PUnsignedDouble", - "org.apache.phoenix.schema.types.PUnsignedFloat", - "org.apache.phoenix.schema.types.PUnsignedInt", - "org.apache.phoenix.schema.types.PUnsignedLong", - "org.apache.phoenix.schema.types.PUnsignedSmallint", - "org.apache.phoenix.schema.types.PUnsignedTime", - "org.apache.phoenix.schema.types.PUnsignedTimestamp", - "org.apache.phoenix.schema.types.PUnsignedTinyint", - "org.apache.phoenix.schema.types.PVarchar", - "java.lang.reflect.Field", - "java.sql.Connection", - "java.sql.DriverManager", - "java.sql.PreparedStatement", - "java.sql.ResultSet", - "java.sql.SQLException", - "java.util.ArrayList", - "java.util.Collections", - "java.util.HashMap", - "java.util.List", - "java.util.Map", - "java.util.NavigableSet", - "java.util.Properties"); - cbe.setImplementedInterfaces(new Class[]{IPhoenix5Helper.class}); - StringReader sr = new StringReader(IPhoenix5Helper.CLASS_STR); - return (IPhoenix5Helper) cbe.createInstance(sr); - } - - /** - * ugi认证下获取连接 - * @param hbaseConfigMap - * @param properties - * @param url - * @param helper - * @return - */ - public static Connection getConnectionWithKerberos(Map hbaseConfigMap, Properties properties, String url, IPhoenix5Helper helper) { - try { - UserGroupInformation ugi = getUgi(hbaseConfigMap); - return ugi.doAs((PrivilegedAction) () -> { - try { - return helper.getConn(url, properties); - } catch (Exception e) { - throw new RuntimeException(e); - } - }); - } catch (Exception e) { - throw new RuntimeException("Login kerberos error", e); - } - } - - - - /** - * 设定phoenix认证所需要的kerberos参数 - * @param p - * @param hbaseConfigMap - * @return - */ - public static void setKerberosParams(Properties p, Map hbaseConfigMap) { - String keytabFileName = KerberosUtil.getPrincipalFileName(hbaseConfigMap); - keytabFileName = KerberosUtil.loadFile(hbaseConfigMap, keytabFileName); - String principal = KerberosUtil.getPrincipal(hbaseConfigMap, keytabFileName); - KerberosUtil.loadKrb5Conf(hbaseConfigMap); - KerberosUtil.refreshConfig(); - - hbaseConfigMap.putIfAbsent(HBASE_SECURITY_AUTHENTICATION, AUTHENTICATION_TYPE); - hbaseConfigMap.putIfAbsent(HBASE_SECURITY_AUTHORIZATION, AUTHENTICATION_TYPE); - hbaseConfigMap.putIfAbsent(HADOOP_SECURITY_AUTHENTICATION, AUTHENTICATION_TYPE); - hbaseConfigMap.putIfAbsent(PHOENIX_QUERYSERVER_KERBEROS_PRINCIPAL, principal); - hbaseConfigMap.putIfAbsent(KEY_PRINCIPAL,principal); - hbaseConfigMap.putIfAbsent(KEYTAB_FILE,keytabFileName); - - p.setProperty(HBASE_SECURITY_AUTHENTICATION, AUTHENTICATION_TYPE); - p.setProperty(HBASE_SECURITY_AUTHORIZATION, AUTHENTICATION_TYPE); - p.setProperty(HADOOP_SECURITY_AUTHENTICATION, AUTHENTICATION_TYPE); - p.setProperty(HBASE_MASTER_KERBEROS_PRINCIPAL, MapUtils.getString(hbaseConfigMap, HBASE_MASTER_KERBEROS_PRINCIPAL)); - p.setProperty(HBASE_REGIONSERVER_KERBEROS_PRINCIPAL, MapUtils.getString(hbaseConfigMap, HBASE_REGIONSERVER_KERBEROS_PRINCIPAL)); - } - - - /** - * 获取ugi信息 - * @param hbaseConfigMap - * @return - * @throws IOException - */ - public static UserGroupInformation getUgi(Map hbaseConfigMap) throws IOException { - Configuration conf = FileSystemUtil.getConfiguration(hbaseConfigMap, null); - String principal = MapUtils.getString(hbaseConfigMap, KEY_PRINCIPAL); - String keytabFileName = MapUtils.getString(hbaseConfigMap, KEYTAB_FILE); - return KerberosUtil.loginAndReturnUgi(conf, principal, keytabFileName); - } -} diff --git a/flinkx-metadata-phoenix5/flinkx-metadata-phoenix5-reader/src/main/java/com/dtstack/flinkx/metadataphoenix5/util/PhoenixMetadataCons.java b/flinkx-metadata-phoenix5/flinkx-metadata-phoenix5-reader/src/main/java/com/dtstack/flinkx/metadataphoenix5/util/PhoenixMetadataCons.java deleted file mode 100644 index 1c3ead2aa6..0000000000 --- a/flinkx-metadata-phoenix5/flinkx-metadata-phoenix5-reader/src/main/java/com/dtstack/flinkx/metadataphoenix5/util/PhoenixMetadataCons.java +++ /dev/null @@ -1,73 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - *

- * http://www.apache.org/licenses/LICENSE-2.0 - *

- * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package com.dtstack.flinkx.metadataphoenix5.util; - -import com.dtstack.flinkx.metadata.MetaDataCons; - -/** - * @author kunni@dtstack.com - */ - -public class PhoenixMetadataCons extends MetaDataCons { - - public static final String KEY_PRIMARY_KEY = "is_primary_key"; - - public static final String KEY_PATH = "path"; - - public static final String KEY_DEFAULT = "default"; - - public static final String KEY_TABLE_NAME = "table_name"; - - public static final String KEY_NAMESPACE = "namespace"; - - public static final String KEY_CREATE_TIME = "createTime"; - - public static final String DRIVER_NAME = "org.apache.phoenix.jdbc.PhoenixDriver"; - - public static final String SQL_DEFAULT_TABLE_NAME = " SELECT DISTINCT TABLE_NAME FROM SYSTEM.CATALOG WHERE TABLE_SCHEM is null "; - - public static final String SQL_TABLE_NAME = " SELECT DISTINCT TABLE_NAME FROM SYSTEM.CATALOG WHERE TABLE_SCHEM = '%s' "; - - public static final String SQL_COLUMN = "SELECT ORDINAL_POSITION, COLUMN_FAMILY FROM SYSTEM.CATALOG WHERE TABLE_SCHEM = '%s' AND TABLE_NAME = '%s' "; - - public static final String SQL_DEFAULT_COLUMN = "SELECT ORDINAL_POSITION, COLUMN_FAMILY FROM SYSTEM.CATALOG WHERE TABLE_SCHEM is null AND TABLE_NAME = '%s' "; - - public static final String HBASE_MASTER_KERBEROS_PRINCIPAL = "hbase.master.kerberos.principal"; - - public static final String HBASE_REGIONSERVER_KERBEROS_PRINCIPAL = "hbase.regionserver.kerberos.principal"; - - public static final String PHOENIX_QUERYSERVER_KERBEROS_PRINCIPAL = "phoenix.queryserver.kerberos.principal"; - - public final static String HBASE_SECURITY_AUTHENTICATION = "hbase.security.authentication"; - - public final static String HBASE_SECURITY_AUTHORIZATION = "hbase.security.authorization"; - - - public final static String HADOOP_SECURITY_AUTHENTICATION = "hadoop.security.authentication"; - - public final static String KEY_PRINCIPAL = "principal"; - - public static final String KEY_HADOOP_CONFIG = "hadoopConfig"; - - public static final String AUTHENTICATION_TYPE = "kerberos"; - - public static final String KEYTAB_FILE = "keytabFileName"; - - -} diff --git a/flinkx-metadata-sqlserver/flinkx-metadata-sqlserver-reader/pom.xml b/flinkx-metadata-sqlserver/flinkx-metadata-sqlserver-reader/pom.xml deleted file mode 100644 index 0c3484a6bf..0000000000 --- a/flinkx-metadata-sqlserver/flinkx-metadata-sqlserver-reader/pom.xml +++ /dev/null @@ -1,113 +0,0 @@ - - - - flinkx-metadata-sqlserver - com.dtstack.flinkx - 1.6 - - 4.0.0 - - flinkx-metadata-sqlserver-reader - - - - com.dtstack.flinkx - flinkx-metadata-reader - 1.6 - - - net.sourceforge.jtds - jtds - 1.3.1 - - - - - - - org.apache.maven.plugins - maven-shade-plugin - 3.1.0 - - - package - - shade - - - false - - - org.slf4j:slf4j-api - log4j:log4j - ch.qos.logback:* - - - - - - *:* - - META-INF/*.SF - META-INF/*.DSA - META-INF/*.RSA - - - - - - com.google.common - shade.sqlserver.com.google.common - - - com.google.thirdparty - shade.sqlserver.com.google.thirdparty - - - - - - META-INF/services/java.sql.Driver - - - META-INF/services - java.sql.sqlserver.Driver - - - - - - - - - maven-antrun-plugin - 1.2 - - - copy-resources - - package - - run - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/flinkx-metadata-sqlserver/flinkx-metadata-sqlserver-reader/src/main/java/com/dtstack/flinkx/metadatasqlserver/constants/SqlServerMetadataCons.java b/flinkx-metadata-sqlserver/flinkx-metadata-sqlserver-reader/src/main/java/com/dtstack/flinkx/metadatasqlserver/constants/SqlServerMetadataCons.java deleted file mode 100644 index b5505e6a9d..0000000000 --- a/flinkx-metadata-sqlserver/flinkx-metadata-sqlserver-reader/src/main/java/com/dtstack/flinkx/metadatasqlserver/constants/SqlServerMetadataCons.java +++ /dev/null @@ -1,87 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - *

- * http://www.apache.org/licenses/LICENSE-2.0 - *

- * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package com.dtstack.flinkx.metadatasqlserver.constants; - -import com.dtstack.flinkx.metadata.MetaDataCons; - -/** - * @author : kunni@dtstack.com - * @date : 2020/08/06 - */ - -public class SqlServerMetadataCons extends MetaDataCons { - - public static final String DRIVER_NAME = "net.sourceforge.jtds.jdbc.Driver"; - - public static final String KEY_PARTITION_COLUMN = "partitionColumn"; - public static final String KEY_FILE_GROUP_NAME = "fileGroupName"; - public static final String KEY_SCHEMA_NAME = "schemaName"; - public static final String KEY_TABLE_NAME = "tableName"; - public static final String KEY_TABLE_SCHEMA = "tableSchema"; - - public static final String KEY_ZERO = "0"; - - public static final String SQL_SWITCH_DATABASE = "USE \"%s\""; - /** - * 拼接成schema.table - */ - public static final String SQL_SHOW_TABLES = "SELECT OBJECT_SCHEMA_NAME(object_id, DB_ID()) as SCHEMA_NAME, name FROM sys.tables"; - - public static final String SQL_SHOW_TABLE_PROPERTIES = "SELECT a.crdate, b.rows, rtrim(8*dpages) used, ep.value \n" + - "FROM sysobjects AS a INNER JOIN sysindexes AS b ON a.id = b.id \n" + - "LEFT JOIN sys.extended_properties AS ep ON a.id = ep.major_id AND ep.minor_id = 0 \n" + - "WHERE (a.type = 'u') AND (b.indid IN (0, 1)) and a.name = %s AND OBJECT_SCHEMA_NAME(a.id, DB_ID()) = %s "; - - public static final String SQL_SHOW_TABLE_COLUMN = "SELECT B.name AS name, TY.name as type, C.value AS comment, B.is_nullable as nullable, COLUMNPROPERTY(B.object_id ,B.name,'PRECISION') as presice, D.text, B.column_id \n" + - "FROM sys.tables A INNER JOIN sys.columns B ON B.object_id = A.object_id \n" + - "INNER JOIN sys.types TY ON B.system_type_id = TY.system_type_id \n" + - "LEFT JOIN sys.extended_properties C ON C.major_id = B.object_id AND C.minor_id = B.column_id \n" + - "left join syscomments D on A.object_id=D.id " + - "WHERE A.name = %s and OBJECT_SCHEMA_NAME(A.object_id, DB_ID())=%s"; - - public static final String SQL_SHOW_TABLE_INDEX = "SELECT a.name, d.name as columnName, type_desc as type \n" + - "FROM sys.indexes a JOIN sysindexkeys b ON a.object_id=b.id AND a.index_id=b.indid \n" + - "JOIN sysobjects c ON b.id=c.id JOIN syscolumns d ON b.id=d.id AND b.colid=d.colid \n" + - "WHERE c.name=%s and OBJECT_SCHEMA_NAME(A.object_id, DB_ID())=%s \n" + - "AND a.index_id NOT IN(0,255)"; - - public static final String SQL_SHOW_PARTITION_COLUMN = "SELECT b.name\n" + - "FROM sys.index_columns a JOIN sys.columns b ON a.object_id = b.object_id and a.column_id = b.column_id \n" + - "JOIN sys.objects c ON c.object_id = a.object_id \n" + - "WHERE a.partition_ordinal <> 0 AND c.name =%s \n" + - "AND OBJECT_SCHEMA_NAME(a.object_id , DB_ID())=%s "; - - public static final String SQL_SHOW_PARTITION = "select ps.name, p.rows, pf.create_date, ds2.name as filegroup \n" + - "from sys.indexes i join sys.partition_schemes ps on i.data_space_id = ps.data_space_id \n" + - "join sys.destination_data_spaces dds on ps.data_space_id = dds.partition_scheme_id \n" + - "join sys.data_spaces ds2 on dds.data_space_id = ds2.data_space_id \n" + - "join sys.partitions p on dds.destination_id = p.partition_number \n" + - "and p.object_id = i.object_id and p.index_id = i.index_id \n" + - "join sys.partition_functions pf on ps.function_id = pf.function_id \n" + - "WHERE i.object_id = object_id(%s) and OBJECT_SCHEMA_NAME(i.object_id, DB_ID())=%s \n" + - "and i.index_id in (0, 1)"; - - public static final String SQL_QUERY_PRIMARY_KEY = "SELECT ku.COLUMN_NAME\n" + - "FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS AS tc\n" + - "INNER JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE AS ku\n" + - "ON tc.CONSTRAINT_TYPE = 'PRIMARY KEY' \n" + - "AND tc.CONSTRAINT_NAME = ku.CONSTRAINT_NAME\n" + - "WHERE ku.TABLE_NAME = %s AND ku.TABLE_SCHEMA = %s"; - -} diff --git a/flinkx-metadata-sqlserver/flinkx-metadata-sqlserver-reader/src/main/java/com/dtstack/flinkx/metadatasqlserver/inputformat/MetadatasqlserverInputFormat.java b/flinkx-metadata-sqlserver/flinkx-metadata-sqlserver-reader/src/main/java/com/dtstack/flinkx/metadatasqlserver/inputformat/MetadatasqlserverInputFormat.java deleted file mode 100644 index 56cecffb15..0000000000 --- a/flinkx-metadata-sqlserver/flinkx-metadata-sqlserver-reader/src/main/java/com/dtstack/flinkx/metadatasqlserver/inputformat/MetadatasqlserverInputFormat.java +++ /dev/null @@ -1,264 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - *

- * http://www.apache.org/licenses/LICENSE-2.0 - *

- * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package com.dtstack.flinkx.metadatasqlserver.inputformat; - -import com.dtstack.flinkx.constants.ConstantValue; -import com.dtstack.flinkx.metadata.MetaDataCons; -import com.dtstack.flinkx.metadata.inputformat.BaseMetadataInputFormat; -import com.dtstack.flinkx.metadata.inputformat.MetadataInputSplit; -import com.dtstack.flinkx.metadatasqlserver.constants.SqlServerMetadataCons; -import com.dtstack.flinkx.util.ExceptionUtil; -import org.apache.commons.collections.CollectionUtils; -import org.apache.commons.lang.StringUtils; -import org.apache.commons.lang3.tuple.Pair; -import org.apache.flink.core.io.InputSplit; -import org.apache.flink.types.Row; - -import java.io.IOException; -import java.sql.ResultSet; -import java.sql.SQLException; -import java.util.ArrayList; -import java.util.HashMap; -import java.util.LinkedList; -import java.util.List; -import java.util.Map; - -import static com.dtstack.flinkx.metadata.MetaDataCons.KEY_COLUMN_NAME; -import static com.dtstack.flinkx.metadata.MetaDataCons.KEY_COLUMN_PRIMARY; -import static com.dtstack.flinkx.metadata.MetaDataCons.KEY_TABLE_CREATE_TIME; -import static com.dtstack.flinkx.metadata.MetaDataCons.KEY_FALSE; -import static com.dtstack.flinkx.metadata.MetaDataCons.KEY_INDEX_NAME; -import static com.dtstack.flinkx.metadata.MetaDataCons.KEY_TABLE_ROWS; -import static com.dtstack.flinkx.metadata.MetaDataCons.KEY_TABLE_COMMENT; -import static com.dtstack.flinkx.metadata.MetaDataCons.KEY_TABLE_TOTAL_SIZE; -import static com.dtstack.flinkx.metadata.MetaDataCons.KEY_TRUE; -import static com.dtstack.flinkx.metadatasqlserver.constants.SqlServerMetadataCons.KEY_SCHEMA_NAME; -import static com.dtstack.flinkx.metadatasqlserver.constants.SqlServerMetadataCons.KEY_TABLE_NAME; -import static com.dtstack.flinkx.metadatasqlserver.constants.SqlServerMetadataCons.KEY_ZERO; - -/** - * @author : kunni@dtstack.com - * @date : 2020/08/06 - */ - -public class MetadatasqlserverInputFormat extends BaseMetadataInputFormat { - - private static final long serialVersionUID = 1L; - - protected String schema; - - protected String table; - - /** - * 在use database失败时,不影响下一个任务 - * @param inputSplit 分片 - * @throws IOException 异常 - */ - @Override - protected void openInternal(InputSplit inputSplit) throws IOException { - LOG.info("inputSplit = {}", inputSplit); - try { - connection.set(getConnection()); - statement.set(connection.get().createStatement()); - currentDb.set(((MetadataInputSplit) inputSplit).getDbName()); - tableList = ((MetadataInputSplit) inputSplit).getTableList(); - switchDatabase(currentDb.get()); - if (CollectionUtils.isEmpty(tableList)) { - tableList = showTables(); - queryTable = true; - } - } catch (ClassNotFoundException e) { - LOG.error("could not find suitable driver, e={}", ExceptionUtil.getErrorMessage(e)); - throw new IOException(e); - } catch (SQLException e){ - LOG.error("获取table列表异常, dbUrl = {}, username = {}, inputSplit = {}, e = {}", dbUrl, username, inputSplit, ExceptionUtil.getErrorMessage(e)); - tableList = new LinkedList<>(); - } - tableIterator.set(tableList.iterator()); - } - - @Override - protected List showTables() throws SQLException { - List tableNameList = new LinkedList<>(); - try (ResultSet rs = statement.get().executeQuery(SqlServerMetadataCons.SQL_SHOW_TABLES)) { - while (rs.next()) { - tableNameList.add(Pair.of(rs.getString(1), rs.getString(2))); - } - } - return tableNameList; - } - - @Override - protected void switchDatabase(String databaseName) throws SQLException { - // database 以数字开头时,需要双引号 - statement.get().execute(String.format(SqlServerMetadataCons.SQL_SWITCH_DATABASE, databaseName)); - } - - @Override - protected Row nextRecordInternal(Row row) throws IOException{ - Map metaData = new HashMap<>(16); - metaData.put(MetaDataCons.KEY_OPERA_TYPE, MetaDataCons.DEFAULT_OPERA_TYPE); - - if(queryTable){ - Pair pair = (Pair) tableIterator.get().next(); - schema = pair.getKey(); - table = pair.getValue(); - }else{ - Map map = (Map)tableIterator.get().next(); - schema = map.get(KEY_SCHEMA_NAME); - table = map.get(KEY_TABLE_NAME); - } - String tableName = schema + ConstantValue.POINT_SYMBOL + table; - metaData.put(MetaDataCons.KEY_SCHEMA, currentDb.get()); - metaData.put(MetaDataCons.KEY_TABLE, table); - metaData.put(SqlServerMetadataCons.KEY_TABLE_SCHEMA, schema); - try { - metaData.putAll(queryMetaData(tableName)); - metaData.put(MetaDataCons.KEY_QUERY_SUCCESS, true); - } catch (Exception e) { - metaData.put(MetaDataCons.KEY_QUERY_SUCCESS, false); - metaData.put(MetaDataCons.KEY_ERROR_MSG, ExceptionUtil.getErrorMessage(e)); - LOG.error(ExceptionUtil.getErrorMessage(e)); - } - return Row.of(metaData); - } - - @Override - protected Map queryMetaData(String tableName) throws SQLException { - Map result = new HashMap<>(16); - Map tableProperties = queryTableProp(); - result.put(MetaDataCons.KEY_TABLE_PROPERTIES, tableProperties); - List> column = queryColumn(); - String partitionKey = queryPartitionColumn(); - List> partitionColumn = new ArrayList<>(); - if(StringUtils.isNotEmpty(partitionKey)){ - column.removeIf((Map perColumn)-> - { - if(StringUtils.equals(partitionKey, perColumn.get(KEY_COLUMN_NAME))){ - partitionColumn.add(perColumn); - return true; - }else { - return false; - } - }); - } - result.put(SqlServerMetadataCons.KEY_PARTITION_COLUMN, partitionColumn); - result.put(MetaDataCons.KEY_COLUMN, column); - List> index = queryIndex(); - result.put(MetaDataCons.KEY_COLUMN_INDEX, index); - List> partition = queryPartition(); - result.put(MetaDataCons.KEY_PARTITIONS, partition); - return result; - } - - protected List> queryPartition() throws SQLException{ - List> index = new ArrayList<>(); - String sql = String.format(SqlServerMetadataCons.SQL_SHOW_PARTITION, quote(table), quote(schema)); - try(ResultSet resultSet = statement.get().executeQuery(sql)){ - while (resultSet.next()){ - Map perIndex = new HashMap<>(16); - perIndex.put(KEY_COLUMN_NAME, resultSet.getString(1)); - perIndex.put(KEY_TABLE_ROWS, resultSet.getString(2)); - perIndex.put(KEY_TABLE_CREATE_TIME, resultSet.getString(3)); - perIndex.put(SqlServerMetadataCons.KEY_FILE_GROUP_NAME, resultSet.getString(4)); - index.add(perIndex); - } - } - return index; - } - - protected List> queryIndex() throws SQLException{ - List> index = new ArrayList<>(); - String sql = String.format(SqlServerMetadataCons.SQL_SHOW_TABLE_INDEX, quote(table), quote(schema)); - try(ResultSet resultSet = statement.get().executeQuery(sql)){ - while (resultSet.next()){ - Map perIndex = new HashMap<>(16); - perIndex.put(KEY_INDEX_NAME, resultSet.getString(1)); - perIndex.put(MetaDataCons.KEY_COLUMN_NAME, resultSet.getString(2)); - perIndex.put(MetaDataCons.KEY_COLUMN_TYPE, resultSet.getString(3)); - index.add(perIndex); - } - } - return index; - } - - protected String queryPartitionColumn() throws SQLException{ - String partitionKey = null; - String sql = String.format(SqlServerMetadataCons.SQL_SHOW_PARTITION_COLUMN, quote(table), quote(schema)); - try(ResultSet resultSet = statement.get().executeQuery(sql)){ - while (resultSet.next()){ - partitionKey = resultSet.getString(1); - } - } - return partitionKey; - } - - protected List> queryColumn() throws SQLException { - List> column = new ArrayList<>(); - String sql = String.format(SqlServerMetadataCons.SQL_SHOW_TABLE_COLUMN, quote(table), quote(schema)); - try(ResultSet resultSet = statement.get().executeQuery(sql)){ - while(resultSet.next()){ - Map perColumn = new HashMap<>(16); - perColumn.put(KEY_COLUMN_NAME, resultSet.getString(1)); - perColumn.put(MetaDataCons.KEY_COLUMN_TYPE, resultSet.getString(2)); - perColumn.put(MetaDataCons.KEY_COLUMN_COMMENT, resultSet.getString(3)); - perColumn.put(MetaDataCons.KEY_COLUMN_NULL, StringUtils.equals(resultSet.getString(4), KEY_ZERO) ? KEY_FALSE : KEY_TRUE); - perColumn.put(MetaDataCons.KEY_COLUMN_SCALE, resultSet.getString(5)); - perColumn.put(MetaDataCons.KEY_COLUMN_DEFAULT, resultSet.getString(6)); - perColumn.put(MetaDataCons.KEY_COLUMN_INDEX, resultSet.getString(7)); - column.add(perColumn); - } - } - sql = String.format(SqlServerMetadataCons.SQL_QUERY_PRIMARY_KEY, quote(table), quote(schema)); - String primaryKey = null; - try(ResultSet resultSet = statement.get().executeQuery(sql)){ - while(resultSet.next()){ - primaryKey = resultSet.getString(1); - } - } - for(Map perColumn : column){ - perColumn.put(KEY_COLUMN_PRIMARY, StringUtils.equals(perColumn.get(KEY_COLUMN_NAME), primaryKey) ? KEY_TRUE : KEY_FALSE); - } - return column; - } - - protected Map queryTableProp() throws SQLException{ - Map tableProperties = new HashMap<>(16); - String sql = String.format(SqlServerMetadataCons.SQL_SHOW_TABLE_PROPERTIES, quote(table), quote(schema)); - try(ResultSet resultSet = statement.get().executeQuery(sql)){ - while(resultSet.next()){ - tableProperties.put(KEY_TABLE_CREATE_TIME, resultSet.getString(1)); - tableProperties.put(KEY_TABLE_ROWS, resultSet.getString(2)); - tableProperties.put(KEY_TABLE_TOTAL_SIZE, resultSet.getString(3)); - tableProperties.put(KEY_TABLE_COMMENT, resultSet.getString(4)); - } - } - if(tableProperties.size()==0){ - throw new SQLException(String.format("no such table(schema=%s,table=%s) in database", schema, table)); - } - tableProperties.put(SqlServerMetadataCons.KEY_TABLE_SCHEMA, schema); - return tableProperties; - } - - @Override - protected String quote(String name) { - return "'" + name + "'"; - } - -} diff --git a/flinkx-metadata-sqlserver/flinkx-metadata-sqlserver-reader/src/main/java/com/dtstack/flinkx/metadatasqlserver/reader/MetadatasqlserverReader.java b/flinkx-metadata-sqlserver/flinkx-metadata-sqlserver-reader/src/main/java/com/dtstack/flinkx/metadatasqlserver/reader/MetadatasqlserverReader.java deleted file mode 100644 index 1847acf8a7..0000000000 --- a/flinkx-metadata-sqlserver/flinkx-metadata-sqlserver-reader/src/main/java/com/dtstack/flinkx/metadatasqlserver/reader/MetadatasqlserverReader.java +++ /dev/null @@ -1,44 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - *

- * http://www.apache.org/licenses/LICENSE-2.0 - *

- * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package com.dtstack.flinkx.metadatasqlserver.reader; - - -import com.dtstack.flinkx.config.DataTransferConfig; -import com.dtstack.flinkx.metadata.inputformat.MetadataInputFormatBuilder; -import com.dtstack.flinkx.metadata.reader.MetadataReader; -import com.dtstack.flinkx.metadatasqlserver.constants.SqlServerMetadataCons; -import com.dtstack.flinkx.metadatasqlserver.inputformat.MetadatasqlserverInputFormat; -import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; - -/** - * @author : kunni@dtstack.com - * @date : 2020/08/06 - */ - -public class MetadatasqlserverReader extends MetadataReader { - public MetadatasqlserverReader(DataTransferConfig config, StreamExecutionEnvironment env) { - super(config, env); - driverName = SqlServerMetadataCons.DRIVER_NAME; - } - - @Override - protected MetadataInputFormatBuilder getBuilder(){ - return new MetadataInputFormatBuilder(new MetadatasqlserverInputFormat()); - } -} diff --git a/flinkx-metadata-sqlserver/pom.xml b/flinkx-metadata-sqlserver/pom.xml deleted file mode 100644 index 405fd4fd0c..0000000000 --- a/flinkx-metadata-sqlserver/pom.xml +++ /dev/null @@ -1,27 +0,0 @@ - - - - flinkx-all - com.dtstack.flinkx - 1.6 - - 4.0.0 - - flinkx-metadata-sqlserver - pom - - flinkx-metadata-sqlserver-reader - - - - - com.dtstack.flinkx - flinkx-core - 1.6 - provided - - - - \ No newline at end of file diff --git a/flinkx-metadata-tidb/flinkx-metadata-tidb-reader/src/main/java/com/dtstack/flinkx/metadatatidb/constants/TidbMetadataCons.java b/flinkx-metadata-tidb/flinkx-metadata-tidb-reader/src/main/java/com/dtstack/flinkx/metadatatidb/constants/TidbMetadataCons.java deleted file mode 100644 index 6e43bb0a1e..0000000000 --- a/flinkx-metadata-tidb/flinkx-metadata-tidb-reader/src/main/java/com/dtstack/flinkx/metadatatidb/constants/TidbMetadataCons.java +++ /dev/null @@ -1,60 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - *

- * http://www.apache.org/licenses/LICENSE-2.0 - *

- * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -package com.dtstack.flinkx.metadatatidb.constants; - -import com.dtstack.flinkx.metadata.MetaDataCons; - -/** - * @author : kunni@dtstack.com - * @date : 2020/5/26 - */ -public class TidbMetadataCons extends MetaDataCons { - - public static final String DRIVER_NAME = "com.mysql.jdbc.Driver"; - public static final String KEY_PARTITION_COLUMN = "partitionColumn"; - public static final String KEY_UPDATE_TIME = "updateTime"; - public static final String KEY_TES = "YES"; - public static final String KEY_PRI = "PRI"; - - public static final String RESULT_ROWS = "Rows"; - public static final String RESULT_DATA_LENGTH = "Data_length"; - public static final String RESULT_FIELD = "Field"; - public static final String RESULT_TYPE = "Type"; - public static final String RESULT_COLUMN_NULL = "Null"; - public static final String RESULT_KEY = "Key"; - public static final String RESULT_COLUMN_DEFAULT = "Default"; - public static final String RESULT_PARTITION_NAME = "PARTITION_NAME"; - public static final String RESULT_PARTITION_CREATE_TIME = "CREATE_TIME"; - public static final String RESULT_PARTITION_TABLE_ROWS = "TABLE_ROWS"; - public static final String RESULT_PARTITION_DATA_LENGTH = "DATA_LENGTH"; - public static final String RESULT_PARTITIONNAME = "Partition_name"; - public static final String RESULT_PARTITION_EXPRESSION = "PARTITION_EXPRESSION"; - public static final String RESULT_CREATE_TIME = "Create_time"; - public static final String RESULT_UPDATE_TIME = "Update_time"; - public static final String RESULT_COMMENT = "Comment"; - - /** sql语句 */ - public static final String SQL_SWITCH_DATABASE = "USE `%s`"; - public static final String SQL_SHOW_TABLES = "SHOW FULL TABLES WHERE Table_type = 'BASE TABLE'"; - public static final String SQL_QUERY_TABLE_INFO = "SHOW TABLE STATUS LIKE '%s'"; - public static final String SQL_QUERY_COLUMN = "SHOW FULL COLUMNS FROM `%s`"; - public static final String SQL_QUERY_UPDATE_TIME = "SHOW STATS_META WHERE Table_name = '%s'"; - public static final String SQL_QUERY_PARTITION = "SELECT * FROM information_schema.partitions WHERE table_schema = schema() AND table_name='%s'"; - public static final String SQL_QUERY_PARTITION_COLUMN = "SELECT DISTINCT PARTITION_EXPRESSION FROM information_schema.partitions WHERE table_schema = schema() AND table_name='%s'"; -} - diff --git a/flinkx-metadata-tidb/flinkx-metadata-tidb-reader/src/main/java/com/dtstack/flinkx/metadatatidb/inputformat/MetadatatidbInputFormat.java b/flinkx-metadata-tidb/flinkx-metadata-tidb-reader/src/main/java/com/dtstack/flinkx/metadatatidb/inputformat/MetadatatidbInputFormat.java deleted file mode 100644 index 5b9c78a5e6..0000000000 --- a/flinkx-metadata-tidb/flinkx-metadata-tidb-reader/src/main/java/com/dtstack/flinkx/metadatatidb/inputformat/MetadatatidbInputFormat.java +++ /dev/null @@ -1,246 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - *

- * http://www.apache.org/licenses/LICENSE-2.0 - *

- * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -package com.dtstack.flinkx.metadatatidb.inputformat; - -import com.dtstack.flinkx.metadata.inputformat.BaseMetadataInputFormat; -import org.apache.commons.collections.CollectionUtils; -import org.apache.commons.lang.StringUtils; - -import java.sql.ResultSet; -import java.sql.SQLException; -import java.sql.Statement; -import java.util.ArrayList; -import java.util.HashMap; -import java.util.LinkedList; -import java.util.List; -import java.util.Map; - -import static com.dtstack.flinkx.metadata.MetaDataCons.KEY_FALSE; -import static com.dtstack.flinkx.metadata.MetaDataCons.KEY_COLUMN_PRIMARY; -import static com.dtstack.flinkx.metadata.MetaDataCons.KEY_TABLE_COMMENT; -import static com.dtstack.flinkx.metadata.MetaDataCons.KEY_TABLE_ROWS; -import static com.dtstack.flinkx.metadata.MetaDataCons.KEY_TRUE; -import static com.dtstack.flinkx.metadatatidb.constants.TidbMetadataCons.KEY_COLUMN; -import static com.dtstack.flinkx.metadatatidb.constants.TidbMetadataCons.KEY_COLUMN_COMMENT; -import static com.dtstack.flinkx.metadatatidb.constants.TidbMetadataCons.KEY_COLUMN_INDEX; -import static com.dtstack.flinkx.metadatatidb.constants.TidbMetadataCons.KEY_COLUMN_NAME; -import static com.dtstack.flinkx.metadatatidb.constants.TidbMetadataCons.KEY_COLUMN_TYPE; -import static com.dtstack.flinkx.metadatatidb.constants.TidbMetadataCons.KEY_TABLE_CREATE_TIME; -import static com.dtstack.flinkx.metadatatidb.constants.TidbMetadataCons.KEY_COLUMN_DEFAULT; -import static com.dtstack.flinkx.metadatatidb.constants.TidbMetadataCons.KEY_COLUMN_NULL; -import static com.dtstack.flinkx.metadatatidb.constants.TidbMetadataCons.KEY_PARTITIONS; -import static com.dtstack.flinkx.metadatatidb.constants.TidbMetadataCons.KEY_PARTITION_COLUMN; -import static com.dtstack.flinkx.metadatatidb.constants.TidbMetadataCons.KEY_PRI; -import static com.dtstack.flinkx.metadatatidb.constants.TidbMetadataCons.KEY_COLUMN_SCALE; -import static com.dtstack.flinkx.metadatatidb.constants.TidbMetadataCons.KEY_TABLE_PROPERTIES; -import static com.dtstack.flinkx.metadatatidb.constants.TidbMetadataCons.KEY_TES; -import static com.dtstack.flinkx.metadatatidb.constants.TidbMetadataCons.KEY_TABLE_TOTAL_SIZE; -import static com.dtstack.flinkx.metadatatidb.constants.TidbMetadataCons.KEY_UPDATE_TIME; -import static com.dtstack.flinkx.metadatatidb.constants.TidbMetadataCons.RESULT_COLUMN_DEFAULT; -import static com.dtstack.flinkx.metadatatidb.constants.TidbMetadataCons.RESULT_COLUMN_NULL; -import static com.dtstack.flinkx.metadatatidb.constants.TidbMetadataCons.RESULT_COMMENT; -import static com.dtstack.flinkx.metadatatidb.constants.TidbMetadataCons.RESULT_CREATE_TIME; -import static com.dtstack.flinkx.metadatatidb.constants.TidbMetadataCons.RESULT_DATA_LENGTH; -import static com.dtstack.flinkx.metadatatidb.constants.TidbMetadataCons.RESULT_FIELD; -import static com.dtstack.flinkx.metadatatidb.constants.TidbMetadataCons.RESULT_KEY; -import static com.dtstack.flinkx.metadatatidb.constants.TidbMetadataCons.RESULT_PARTITIONNAME; -import static com.dtstack.flinkx.metadatatidb.constants.TidbMetadataCons.RESULT_PARTITION_CREATE_TIME; -import static com.dtstack.flinkx.metadatatidb.constants.TidbMetadataCons.RESULT_PARTITION_DATA_LENGTH; -import static com.dtstack.flinkx.metadatatidb.constants.TidbMetadataCons.RESULT_PARTITION_EXPRESSION; -import static com.dtstack.flinkx.metadatatidb.constants.TidbMetadataCons.RESULT_PARTITION_NAME; -import static com.dtstack.flinkx.metadatatidb.constants.TidbMetadataCons.RESULT_PARTITION_TABLE_ROWS; -import static com.dtstack.flinkx.metadatatidb.constants.TidbMetadataCons.RESULT_ROWS; -import static com.dtstack.flinkx.metadatatidb.constants.TidbMetadataCons.RESULT_TYPE; -import static com.dtstack.flinkx.metadatatidb.constants.TidbMetadataCons.RESULT_UPDATE_TIME; -import static com.dtstack.flinkx.metadatatidb.constants.TidbMetadataCons.SQL_QUERY_COLUMN; -import static com.dtstack.flinkx.metadatatidb.constants.TidbMetadataCons.SQL_QUERY_PARTITION; -import static com.dtstack.flinkx.metadatatidb.constants.TidbMetadataCons.SQL_QUERY_PARTITION_COLUMN; -import static com.dtstack.flinkx.metadatatidb.constants.TidbMetadataCons.SQL_QUERY_TABLE_INFO; -import static com.dtstack.flinkx.metadatatidb.constants.TidbMetadataCons.SQL_QUERY_UPDATE_TIME; -import static com.dtstack.flinkx.metadatatidb.constants.TidbMetadataCons.SQL_SHOW_TABLES; -import static com.dtstack.flinkx.metadatatidb.constants.TidbMetadataCons.SQL_SWITCH_DATABASE; - - -/** - * @author : kunni@dtstack.com - * @date : 2020/5/26 - */ -public class MetadatatidbInputFormat extends BaseMetadataInputFormat { - - private static final long serialVersionUID = 1L; - - @Override - protected List showTables() throws SQLException { - List tables = new ArrayList<>(); - try (ResultSet rs = statement.get().executeQuery(SQL_SHOW_TABLES)) { - while (rs.next()) { - tables.add(rs.getString(1)); - } - } - - return tables; - } - - - @Override - protected void switchDatabase(String databaseName) throws SQLException { - statement.get().execute(String.format(SQL_SWITCH_DATABASE, quote(databaseName))); - } - - @Override - protected String quote(String name) { - return name; - } - - @Override - protected Map queryMetaData(String tableName) throws SQLException { - Map result = new HashMap<>(16); - Map tableProp = queryTableProp(tableName); - List> column = queryColumn(tableName); - List> partition = queryPartition(tableName); - Map updateTime = queryAddPartition(tableName, KEY_UPDATE_TIME); - List> partitionColumn = queryPartitionColumn(tableName); - column.removeIf((Map perColumn)->{ - for(Map perPartitionColumn : partitionColumn){ - if(StringUtils.equals((String)perPartitionColumn.get(KEY_COLUMN_NAME), (String)perColumn.get(KEY_COLUMN_NAME))){ - perPartitionColumn.put(KEY_COLUMN_TYPE, perColumn.get(KEY_COLUMN_TYPE)); - perPartitionColumn.put(KEY_COLUMN_NULL, perColumn.get(KEY_COLUMN_NULL)); - perPartitionColumn.put(KEY_COLUMN_DEFAULT, perColumn.get(KEY_COLUMN_DEFAULT)); - perPartitionColumn.put(KEY_COLUMN_COMMENT, perColumn.get(KEY_COLUMN_COMMENT)); - perPartitionColumn.put(KEY_COLUMN_INDEX, perColumn.get(KEY_COLUMN_INDEX)); - return true; - } - } - return false; - }); - result.put(KEY_TABLE_PROPERTIES, tableProp); - result.put(KEY_COLUMN, column); - if(CollectionUtils.size(partition) > 1){ - for(Map perPartition : partition){ - String columnName = (String)perPartition.get(KEY_COLUMN_NAME); - perPartition.put(KEY_UPDATE_TIME, updateTime.get(KEY_COLUMN_NAME)); - } - result.put(KEY_PARTITIONS, partition); - } - result.put(KEY_PARTITION_COLUMN, partitionColumn); - return result; - } - - public Map queryTableProp(String tableName) throws SQLException { - Map tableProp = new HashMap<>(16); - String sql = String.format(SQL_QUERY_TABLE_INFO, tableName); - try(Statement st = connection.get().createStatement(); - ResultSet rs = st.executeQuery(sql)) { - while (rs.next()) { - tableProp.put(KEY_TABLE_ROWS, rs.getString(RESULT_ROWS)); - tableProp.put(KEY_TABLE_TOTAL_SIZE, rs.getString(RESULT_DATA_LENGTH)); - tableProp.put(KEY_TABLE_CREATE_TIME, rs.getString(RESULT_CREATE_TIME)); - tableProp.put(KEY_TABLE_COMMENT, rs.getString(RESULT_COMMENT)); - } - } catch (SQLException e) { - throw new SQLException(e.getMessage()); - } - return tableProp; - } - - protected List > queryColumn(String tableName) throws SQLException { - List > column = new LinkedList<>(); - String sql = String.format(SQL_QUERY_COLUMN, tableName); - try(Statement st = connection.get().createStatement(); - ResultSet rs = st.executeQuery(sql)) { - int pos = 1; - while (rs.next()) { - Map perColumn = new HashMap<>(16); - perColumn.put(KEY_COLUMN_NAME, rs.getString(RESULT_FIELD)); - String type = rs.getString(RESULT_TYPE); - perColumn.put(KEY_COLUMN_TYPE, type); - perColumn.put(KEY_COLUMN_NULL, StringUtils.equals(rs.getString(RESULT_COLUMN_NULL), KEY_TES) ? KEY_TRUE : KEY_FALSE); - perColumn.put(KEY_COLUMN_PRIMARY, StringUtils.equals(rs.getString(RESULT_KEY), KEY_PRI) ? KEY_TRUE : KEY_FALSE); - perColumn.put(KEY_COLUMN_DEFAULT, rs.getString(RESULT_COLUMN_DEFAULT)); - perColumn.put(KEY_COLUMN_SCALE, StringUtils.contains(type, '(') ? StringUtils.substring(type, type.indexOf("(")+1, type.indexOf(")")) : 0); - perColumn.put(KEY_COLUMN_COMMENT, rs.getString(RESULT_COMMENT)); - perColumn.put(KEY_COLUMN_INDEX, pos++); - column.add(perColumn); - } - } catch (SQLException e) { - throw new SQLException(e.getMessage()); - } - return column; - } - - protected List> queryPartition(String tableName) throws SQLException { - List > partition = new LinkedList<>(); - String sql = String.format(SQL_QUERY_PARTITION, tableName); - try(Statement st = connection.get().createStatement(); - ResultSet rs = st.executeQuery(sql)) { - while (rs.next()) { - Map perPartition = new HashMap<>(16); - perPartition.put(KEY_COLUMN_NAME, rs.getString(RESULT_PARTITION_NAME)); - perPartition.put(KEY_TABLE_CREATE_TIME, rs.getString(RESULT_PARTITION_CREATE_TIME)); - perPartition.put(KEY_TABLE_ROWS, rs.getInt(RESULT_PARTITION_TABLE_ROWS)); - perPartition.put(KEY_TABLE_TOTAL_SIZE, rs.getLong(RESULT_PARTITION_DATA_LENGTH)); - partition.add(perPartition); - } - } catch (SQLException e) { - throw new SQLException(e.getMessage()); - } - return partition; - } - - - protected Map queryAddPartition(String tableName, String msg) throws SQLException { - Map result = new HashMap<>(16); - String sql = String.format(SQL_QUERY_UPDATE_TIME, tableName); - try(Statement st = connection.get().createStatement(); - ResultSet rs = st.executeQuery(sql)) { - while (rs.next()) { - /* 考虑partitionName 为空的情况 */ - String name = rs.getString(RESULT_PARTITIONNAME); - if (StringUtils.isNotBlank(name)) { - result.put(name, rs.getString(RESULT_UPDATE_TIME)); - } else { - result.put(KEY_UPDATE_TIME, rs.getString(RESULT_UPDATE_TIME)); - } - } - } catch (SQLException e) { - throw new SQLException(e.getMessage()); - } - return result; - } - - protected List > queryPartitionColumn(String tableName) throws SQLException { - List > partitionColumn = new LinkedList<>(); - String sql = String.format(SQL_QUERY_PARTITION_COLUMN, tableName); - try(Statement st = connection.get().createStatement(); - ResultSet rs = st.executeQuery(sql)) { - while (rs.next()) { - Map perPartitionColumn = new HashMap<>(16); - String partitionExp = rs.getString(RESULT_PARTITION_EXPRESSION); - if(StringUtils.isNotBlank(partitionExp)){ - String columnName = partitionExp.substring(partitionExp.indexOf("`")+1, partitionExp.lastIndexOf("`")); - perPartitionColumn.put(KEY_COLUMN_NAME, columnName); - } - partitionColumn.add(perPartitionColumn); - } - } catch (SQLException e) { - throw new SQLException(e.getMessage()); - } - return partitionColumn; - } - -} diff --git a/flinkx-metadata-tidb/flinkx-metadata-tidb-reader/src/main/java/com/dtstack/flinkx/metadatatidb/reader/MetadatatidbReader.java b/flinkx-metadata-tidb/flinkx-metadata-tidb-reader/src/main/java/com/dtstack/flinkx/metadatatidb/reader/MetadatatidbReader.java deleted file mode 100644 index d601bfc820..0000000000 --- a/flinkx-metadata-tidb/flinkx-metadata-tidb-reader/src/main/java/com/dtstack/flinkx/metadatatidb/reader/MetadatatidbReader.java +++ /dev/null @@ -1,42 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - *

- * http://www.apache.org/licenses/LICENSE-2.0 - *

- * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -package com.dtstack.flinkx.metadatatidb.reader; - -import com.dtstack.flinkx.config.DataTransferConfig; -import com.dtstack.flinkx.metadata.inputformat.MetadataInputFormatBuilder; -import com.dtstack.flinkx.metadata.reader.MetadataReader; -import com.dtstack.flinkx.metadatatidb.inputformat.MetadatatidbInputFormat; - -import static com.dtstack.flinkx.metadatatidb.constants.TidbMetadataCons.DRIVER_NAME; - -/** - * @author : kunni@dtstack.com - * @date : 2020/5/26 - */ -public class MetadatatidbReader extends MetadataReader { - - public MetadatatidbReader(DataTransferConfig config, org.apache.flink.streaming.api.environment.StreamExecutionEnvironment env) { - super(config, env); - driverName = DRIVER_NAME; - } - - @Override - protected MetadataInputFormatBuilder getBuilder(){ - return new MetadataInputFormatBuilder(new MetadatatidbInputFormat()); - } -} diff --git a/flinkx-metadata-tidb/pom.xml b/flinkx-metadata-tidb/pom.xml deleted file mode 100644 index 764f388bd5..0000000000 --- a/flinkx-metadata-tidb/pom.xml +++ /dev/null @@ -1,27 +0,0 @@ - - - - flinkx-all - com.dtstack.flinkx - 1.6 - - 4.0.0 - - flinkx-metadata-tidb - pom - - flinkx-metadata-tidb-reader - - - - - com.dtstack.flinkx - flinkx-core - 1.6 - provided - - - - \ No newline at end of file diff --git a/flinkx-metadata-vertica/flinkx-metadata-vertica-reader/src/main/java/com/dtstack/flinkx/metadatavertica/constants/VerticaMetaDataCons.java b/flinkx-metadata-vertica/flinkx-metadata-vertica-reader/src/main/java/com/dtstack/flinkx/metadatavertica/constants/VerticaMetaDataCons.java deleted file mode 100644 index cb24c893f8..0000000000 --- a/flinkx-metadata-vertica/flinkx-metadata-vertica-reader/src/main/java/com/dtstack/flinkx/metadatavertica/constants/VerticaMetaDataCons.java +++ /dev/null @@ -1,39 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - *

- * http://www.apache.org/licenses/LICENSE-2.0 - *

- * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package com.dtstack.flinkx.metadatavertica.constants; - -import com.dtstack.flinkx.metadata.MetaDataCons; - -/** - * 定义了一些常量属性 - * @author kunni@dtstack.com - */ -public class VerticaMetaDataCons extends MetaDataCons { - - public static final String DRIVER_NAME = "com.vertica.jdbc.Driver"; - - public static final String SQL_CREATE_TIME = " SELECT table_name, create_time FROM tables WHERE table_schema = '%s' "; - - public static final String SQL_COMMENT = " SELECT object_name, comment FROM comments WHERE object_schema = '%s' "; - - public static final String SQL_TOTAL_SIZE = " SELECT anchor_table_name, used_bytes FROM projection_storage WHERE anchor_table_schema = '%s' "; - - public static final String SQL_PT_COLUMN = " SELECT table_name, partition_expression FROM tables \n" + - "WHERE partition_expression <> '' AND table_schema = '%s' "; -} diff --git a/flinkx-metadata-vertica/flinkx-metadata-vertica-reader/src/main/java/com/dtstack/flinkx/metadatavertica/inputformat/MetadataverticaInputFormat.java b/flinkx-metadata-vertica/flinkx-metadata-vertica-reader/src/main/java/com/dtstack/flinkx/metadatavertica/inputformat/MetadataverticaInputFormat.java deleted file mode 100644 index 2064888dbc..0000000000 --- a/flinkx-metadata-vertica/flinkx-metadata-vertica-reader/src/main/java/com/dtstack/flinkx/metadatavertica/inputformat/MetadataverticaInputFormat.java +++ /dev/null @@ -1,239 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - *

- * http://www.apache.org/licenses/LICENSE-2.0 - *

- * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package com.dtstack.flinkx.metadatavertica.inputformat; - -import com.dtstack.flinkx.constants.ConstantValue; -import com.dtstack.flinkx.metadata.inputformat.BaseMetadataInputFormat; -import com.dtstack.flinkx.util.ExceptionUtil; -import org.apache.commons.lang.StringUtils; - -import java.sql.ResultSet; -import java.sql.SQLException; -import java.util.Arrays; -import java.util.HashMap; -import java.util.LinkedList; -import java.util.List; -import java.util.Map; - -import static com.dtstack.flinkx.metadata.MetaDataCons.KEY_COLUMN; -import static com.dtstack.flinkx.metadata.MetaDataCons.KEY_COLUMN_COMMENT; -import static com.dtstack.flinkx.metadata.MetaDataCons.KEY_COLUMN_DATA_TYPE; -import static com.dtstack.flinkx.metadata.MetaDataCons.KEY_COLUMN_DEFAULT; -import static com.dtstack.flinkx.metadata.MetaDataCons.KEY_COLUMN_INDEX; -import static com.dtstack.flinkx.metadata.MetaDataCons.KEY_COLUMN_NAME; -import static com.dtstack.flinkx.metadata.MetaDataCons.KEY_COLUMN_NULL; -import static com.dtstack.flinkx.metadata.MetaDataCons.KEY_PARTITION_COLUMNS; -import static com.dtstack.flinkx.metadata.MetaDataCons.KEY_TABLE_COMMENT; -import static com.dtstack.flinkx.metadata.MetaDataCons.KEY_TABLE_CREATE_TIME; -import static com.dtstack.flinkx.metadata.MetaDataCons.KEY_TABLE_PROPERTIES; -import static com.dtstack.flinkx.metadata.MetaDataCons.KEY_TABLE_TOTAL_SIZE; -import static com.dtstack.flinkx.metadata.MetaDataCons.RESULT_SET_COLUMN_DEF; -import static com.dtstack.flinkx.metadata.MetaDataCons.RESULT_SET_COLUMN_NAME; -import static com.dtstack.flinkx.metadata.MetaDataCons.RESULT_SET_COLUMN_SIZE; -import static com.dtstack.flinkx.metadata.MetaDataCons.RESULT_SET_DECIMAL_DIGITS; -import static com.dtstack.flinkx.metadata.MetaDataCons.RESULT_SET_IS_NULLABLE; -import static com.dtstack.flinkx.metadata.MetaDataCons.RESULT_SET_ORDINAL_POSITION; -import static com.dtstack.flinkx.metadata.MetaDataCons.RESULT_SET_REMARKS; -import static com.dtstack.flinkx.metadata.MetaDataCons.RESULT_SET_TABLE_NAME; -import static com.dtstack.flinkx.metadata.MetaDataCons.RESULT_SET_TYPE_NAME; -import static com.dtstack.flinkx.metadatavertica.constants.VerticaMetaDataCons.SQL_COMMENT; -import static com.dtstack.flinkx.metadatavertica.constants.VerticaMetaDataCons.SQL_CREATE_TIME; -import static com.dtstack.flinkx.metadatavertica.constants.VerticaMetaDataCons.SQL_PT_COLUMN; -import static com.dtstack.flinkx.metadatavertica.constants.VerticaMetaDataCons.SQL_TOTAL_SIZE; - -/** 读取vertica的元数据 - * @author kunni@dtstack.com - */ -public class MetadataverticaInputFormat extends BaseMetadataInputFormat { - - protected Map createTimeMap; - - protected Map commentMap; - - protected Map totalSizeMap; - - protected Map ptColumnMap; - - List> ptColumns = new LinkedList<>(); - - /** - * 采用数组是为了构建Varchar(10)、Decimal(10,2)这种格式 - */ - private static final List SINGLE_DIGITAL_TYPE = Arrays.asList("Integer", "Varchar", "Char", "Numeric"); - - private static final List DOUBLE_DIGITAL_TYPE = Arrays.asList("Timestamp", "Decimal"); - - @Override - protected List showTables() { - List tables = new LinkedList<>(); - try(ResultSet resultSet = connection.get().getMetaData().getTables(null, currentDb.get(), null, null)){ - while (resultSet.next()){ - tables.add(resultSet.getString(RESULT_SET_TABLE_NAME)); - } - }catch (SQLException e){ - LOG.error("query table lists failed, {}", ExceptionUtil.getErrorMessage(e)); - } - return tables; - } - - @Override - protected void switchDatabase(String databaseName) { - currentDb.set(databaseName); - } - - @Override - protected Map queryMetaData(String tableName) { - Map result = new HashMap<>(16); - result.put(KEY_TABLE_PROPERTIES, queryTableProp(tableName)); - result.put(KEY_COLUMN, queryColumn(tableName)); - result.put(KEY_PARTITION_COLUMNS, ptColumns); - return result; - } - - @Override - protected String quote(String name) { - return name; - } - - @Override - protected void init() { - queryCreateTime(); - queryComment(); - queryTotalSizeMap(); - queryPtColumnMap(); - } - - /** - * 获取列级别的元数据信息 - * @param tableName 表名 - * @return 列的元数据 - */ - public List> queryColumn(String tableName) { - List> columns = new LinkedList<>(); - try(ResultSet resultSet = connection.get().getMetaData().getColumns(null, currentDb.get(), tableName, null)){ - while (resultSet.next()){ - Map map = new HashMap<>(16); - String columnName = resultSet.getString(RESULT_SET_COLUMN_NAME); - map.put(KEY_COLUMN_NAME, columnName); - String dataSize = resultSet.getString(RESULT_SET_COLUMN_SIZE); - String digits = resultSet.getString(RESULT_SET_DECIMAL_DIGITS); - String type = resultSet.getString(RESULT_SET_TYPE_NAME); - if(SINGLE_DIGITAL_TYPE.contains(type)){ - type += ConstantValue.LEFT_PARENTHESIS_SYMBOL + dataSize + ConstantValue.RIGHT_PARENTHESIS_SYMBOL; - }else if(DOUBLE_DIGITAL_TYPE.contains(type)){ - type += ConstantValue.LEFT_PARENTHESIS_SYMBOL + dataSize + ConstantValue.COMMA_SYMBOL + digits + ConstantValue.RIGHT_PARENTHESIS_SYMBOL; - } - map.put(KEY_COLUMN_DATA_TYPE, type); - map.put(KEY_COLUMN_COMMENT, resultSet.getString(RESULT_SET_REMARKS)); - map.put(KEY_COLUMN_INDEX, resultSet.getString(RESULT_SET_ORDINAL_POSITION)); - map.put(KEY_COLUMN_NULL, resultSet.getString(RESULT_SET_IS_NULLABLE)); - map.put(KEY_COLUMN_DEFAULT, resultSet.getString(RESULT_SET_COLUMN_DEF)); - // 分区列信息,vertical partition express 中字段自动增加表名 - String expressColumn = tableName + ConstantValue.POINT_SYMBOL + columnName; - String partitionExpression = ptColumnMap.get(tableName); - if (StringUtils.isNotBlank(partitionExpression) && partitionExpression.contains(expressColumn)) { - ptColumns.add(map); - }else{ - columns.add(map); - } - } - } catch (SQLException e){ - LOG.error("query columns failed, {}", ExceptionUtil.getErrorMessage(e) ); - } - return columns; - } - - /** - * 获取表级别的元数据信息 - * @param tableName 表名 - * @return 表的元数据 - */ - public Map queryTableProp(String tableName) { - Map tableProperties = new HashMap<>(16); - tableProperties.put(KEY_TABLE_CREATE_TIME, createTimeMap.get(tableName)); - tableProperties.put(KEY_TABLE_COMMENT, commentMap.get(tableName)); - // 单位 byte - String totalSize = totalSizeMap.get(tableName); - totalSize = totalSize == null ? "0" : totalSize; - tableProperties.put(KEY_TABLE_TOTAL_SIZE, totalSize); - return tableProperties; - } - - /** - * 获取创建时间 - */ - public void queryCreateTime() { - createTimeMap = new HashMap<>(16); - String sql = String.format(SQL_CREATE_TIME, currentDb.get()); - try(ResultSet resultSet = executeQuery0(sql, statement.get())){ - while (resultSet.next()){ - createTimeMap.put(resultSet.getString(1), resultSet.getString(2)); - } - }catch (SQLException e){ - LOG.error("query create time failed, {}", ExceptionUtil.getErrorMessage(e)); - } - } - - /** - * 获取表注释 - */ - public void queryComment() { - commentMap = new HashMap<>(16); - String sql = String.format(SQL_COMMENT, currentDb.get()); - try(ResultSet resultSet = executeQuery0(sql, statement.get())){ - while (resultSet.next()){ - commentMap.put(resultSet.getString(1), resultSet.getString(2)); - } - }catch (SQLException e){ - LOG.error("query comment failed, {}", ExceptionUtil.getErrorMessage(e)); - } - } - - /** - * 获取表的总大小 - */ - public void queryTotalSizeMap() { - totalSizeMap = new HashMap<>(16); - String sql = String.format(SQL_TOTAL_SIZE, currentDb.get()); - try(ResultSet resultSet = executeQuery0(sql, statement.get())){ - while (resultSet.next()){ - totalSizeMap.put(resultSet.getString(1), resultSet.getString(2)); - } - }catch (SQLException e){ - LOG.error("query totalSize failed, {}", ExceptionUtil.getErrorMessage(e)); - } - } - - /** - * 获取分区列信息 - */ - public void queryPtColumnMap() { - ptColumnMap = new HashMap<>(16); - String sql = String.format(SQL_PT_COLUMN, currentDb.get()); - try(ResultSet resultSet = executeQuery0(sql, statement.get())){ - while (resultSet.next()){ - String expression = resultSet.getString(2); - ptColumnMap.put(resultSet.getString(1), expression); - } - }catch (SQLException e){ - LOG.error("query partition columns failed, {}", ExceptionUtil.getErrorMessage(e)); - } - } - -} diff --git a/flinkx-metadata-vertica/flinkx-metadata-vertica-reader/src/main/java/com/dtstack/flinkx/metadatavertica/reader/MetadataverticaReader.java b/flinkx-metadata-vertica/flinkx-metadata-vertica-reader/src/main/java/com/dtstack/flinkx/metadatavertica/reader/MetadataverticaReader.java deleted file mode 100644 index d70f8d94a0..0000000000 --- a/flinkx-metadata-vertica/flinkx-metadata-vertica-reader/src/main/java/com/dtstack/flinkx/metadatavertica/reader/MetadataverticaReader.java +++ /dev/null @@ -1,43 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - *

- * http://www.apache.org/licenses/LICENSE-2.0 - *

- * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package com.dtstack.flinkx.metadatavertica.reader; - -import com.dtstack.flinkx.config.DataTransferConfig; -import com.dtstack.flinkx.metadata.inputformat.MetadataInputFormatBuilder; -import com.dtstack.flinkx.metadata.reader.MetadataReader; -import com.dtstack.flinkx.metadatavertica.inputformat.MetadataverticaInputFormat; -import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; - -import static com.dtstack.flinkx.metadatavertica.constants.VerticaMetaDataCons.DRIVER_NAME; - -/** - * 读取配置参数 - * @author kunni@dtstack.com - */ -public class MetadataverticaReader extends MetadataReader { - public MetadataverticaReader(DataTransferConfig config, StreamExecutionEnvironment env) { - super(config, env); - driverName = DRIVER_NAME; - } - - @Override - protected MetadataInputFormatBuilder getBuilder(){ - return new MetadataInputFormatBuilder(new MetadataverticaInputFormat()); - } -} diff --git a/flinkx-metadata-vertica/flinkx-metadata-vertica-reader/src/test/java/com/dtstack/flinkx/metadatavertica/inputformat/MetadataverticaInputFormatTest.java b/flinkx-metadata-vertica/flinkx-metadata-vertica-reader/src/test/java/com/dtstack/flinkx/metadatavertica/inputformat/MetadataverticaInputFormatTest.java deleted file mode 100644 index e19bf92ab8..0000000000 --- a/flinkx-metadata-vertica/flinkx-metadata-vertica-reader/src/test/java/com/dtstack/flinkx/metadatavertica/inputformat/MetadataverticaInputFormatTest.java +++ /dev/null @@ -1,66 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package com.dtstack.flinkx.metadatavertica.inputformat; - -import com.dtstack.flinkx.metadata.inputformat.BaseMetadataInputFormat; -import org.junit.Assert; -import org.junit.Before; -import org.junit.Test; -import org.mockito.Mockito; - -import java.lang.reflect.Field; -import java.sql.Connection; -import java.sql.DatabaseMetaData; -import java.sql.ResultSet; -import java.sql.SQLException; - -public class MetadataverticaInputFormatTest { - - - private MetadataverticaInputFormat inputFormat; - - @Before - public void beforeMethod() throws NoSuchFieldException, IllegalAccessException, SQLException { - inputFormat = new MetadataverticaInputFormat(); - inputFormat.switchDatabase("testDb"); - ThreadLocal connectionTL = Mockito.mock(ThreadLocal.class); - Connection connection = Mockito.mock(Connection.class); - DatabaseMetaData metaData = Mockito.mock(DatabaseMetaData.class); - Mockito.when(connection.getMetaData()).thenReturn(metaData); - ResultSet resultSet = Mockito.mock(ResultSet.class); - Mockito.when(resultSet.next()).thenReturn(true).thenReturn(false); - Mockito.when(resultSet.getString(Mockito.anyString())).thenReturn("test"); - Mockito.when(metaData.getTables(null, "testDb", null, null)).thenReturn(resultSet); - Mockito.when(connectionTL.get()).thenReturn(connection); - Field connectionField = BaseMetadataInputFormat.class.getDeclaredField("connection"); - connectionField.setAccessible(true); - connectionField.set(inputFormat, connectionTL); - } - - - @Test - public void testShowTable() { - Assert.assertEquals(inputFormat.showTables().size(), 1); - } - - @Test - public void testQuote() { - Assert.assertEquals(inputFormat.quote("test"), "test"); - } -} diff --git a/flinkx-metadata/flinkx-metadata-core/src/main/java/com/dtstack/flinkx/metadata/MetaDataCons.java b/flinkx-metadata/flinkx-metadata-core/src/main/java/com/dtstack/flinkx/metadata/MetaDataCons.java deleted file mode 100644 index 59391a78f5..0000000000 --- a/flinkx-metadata/flinkx-metadata-core/src/main/java/com/dtstack/flinkx/metadata/MetaDataCons.java +++ /dev/null @@ -1,113 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - *

- * http://www.apache.org/licenses/LICENSE-2.0 - *

- * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -package com.dtstack.flinkx.metadata; - -/** - * @author : tiezhu@dtstack.com - * @date : 2020/3/8 - * @description : 元数据同步涉及的相关参数 - * todo 需要定义的常量较多,找到好的命名方法 - */ -public class MetaDataCons { - - - public static final String KEY_USER = "user"; - - /** - * reader 需要的参数 - */ - public static final String KEY_CONN_USERNAME = "username"; - public static final String KEY_CONN_PASSWORD = "password"; - public static final String KEY_JDBC_URL = "jdbcUrl"; - - public static final String KEY_DB_LIST = "dbList"; - public static final String KEY_DB_NAME = "dbName"; - public static final String KEY_TABLE_LIST = "tableList"; - - /** - * 表技术属性 - */ - public static final String KEY_TABLE_COMMENT = "comment"; - public static final String KEY_TABLE_TOTAL_SIZE = "totalSize"; - public static final String KEY_TABLE_CREATE_TIME = "createTime"; - public static final String KEY_TABLE_ROWS = "rows"; - public static final String KEY_QUERY_SUCCESS = "querySuccess"; - public static final String KEY_ERROR_MSG = "errorMsg"; - public static final String KEY_SCHEMA = "schema"; - public static final String KEY_OPERA_TYPE = "operaType"; - public static final String KEY_TABLE = "table"; - public static final String KEY_TABLE_PROPERTIES = "tableProperties"; - - /** - * 结果集中获取表信息 - */ - public static final String RESULT_SET_TABLE_NAME = "TABLE_NAME"; - - /** - * 列的技术属性 - */ - public static final String KEY_COLUMN_NAME = "column_name"; - public static final String KEY_COLUMN_INDEX = "index"; - public static final String KEY_COLUMN_COMMENT = "column_comment"; - public static final String KEY_COLUMN_TYPE = "data_type"; - public static final String KEY_COLUMN_DEFAULT = "column_default"; - public static final String KEY_COLUMN_NULL = "is_nullable"; - public static final String KEY_COLUMN_PRIMARY = "column_key"; - public static final String KEY_COLUMN_SCALE = "data_length"; - public static final String KEY_COLUMN_DATA_TYPE = "data_type"; - public static final String KEY_TRUE = "Y"; - public static final String KEY_FALSE = "N"; - - /** - * 结果集中获取列信息 - */ - public static final String RESULT_SET_COLUMN_NAME = "COLUMN_NAME"; - public static final String RESULT_SET_TYPE_NAME = "TYPE_NAME"; - public static final String RESULT_SET_COLUMN_SIZE = "COLUMN_SIZE"; - public static final String RESULT_SET_DECIMAL_DIGITS = "DECIMAL_DIGITS"; - public static final String RESULT_SET_ORDINAL_POSITION = "ORDINAL_POSITION"; - public static final String RESULT_SET_IS_NULLABLE = "IS_NULLABLE"; - public static final String RESULT_SET_REMARKS = "REMARKS"; - public static final String RESULT_SET_COLUMN_DEF = "COLUMN_DEF"; - - - - /** - * - */ - public static final String KEY_COLUMN = "column"; - public static final String KEY_STORED_TYPE = "storedType"; - public static final String KEY_PARTITION_COLUMNS = "partitionColumn"; - public static final String KEY_PARTITIONS = "partitions"; - - /** - * 索引属性 - */ - public static final String KEY_INDEX_NAME = "name"; - public static final String KEY_INDEX_COMMENT = "comment"; - - - public static final String KEY_COL_NAME = "col_name"; - - public static final String DEFAULT_OPERA_TYPE = "createTable"; - - public static final String SQL_SHOW_TABLES = "SHOW TABLES"; - public static final String SQL_SWITCH_DATABASE = "USE %s"; - - public static final int MAX_TABLE_SIZE = 20; -} \ No newline at end of file diff --git a/flinkx-metadata/flinkx-metadata-core/src/main/java/com/dtstack/flinkx/metadata/util/ConnUtil.java b/flinkx-metadata/flinkx-metadata-core/src/main/java/com/dtstack/flinkx/metadata/util/ConnUtil.java deleted file mode 100644 index b7a03c1c3f..0000000000 --- a/flinkx-metadata/flinkx-metadata-core/src/main/java/com/dtstack/flinkx/metadata/util/ConnUtil.java +++ /dev/null @@ -1,91 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - *

- * http://www.apache.org/licenses/LICENSE-2.0 - *

- * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -package com.dtstack.flinkx.metadata.util; - -import com.dtstack.flinkx.util.ClassUtil; -import com.dtstack.flinkx.util.SysUtil; -import com.dtstack.flinkx.util.TelnetUtil; - -import java.sql.Connection; -import java.sql.DriverManager; -import java.sql.SQLException; -import java.sql.Statement; - -/** - * @author : tiezhu - * @date : 2020/3/8 - */ -public class ConnUtil { - - /** - * 数据库连接的最大重试次数 - */ - private static int MAX_RETRY_TIMES = 3; - - /** - * 获取jdbc连接(超时10S) - * @param url url - * @param username 账号 - * @param password 密码 - * @return - * @throws SQLException - */ - private static Connection getConnectionInternal(String url, String username, String password) throws SQLException { - Connection dbConn; - synchronized (ClassUtil.LOCK_STR){ - DriverManager.setLoginTimeout(10); - // telnet - TelnetUtil.telnet(url); - - if (username == null) { - dbConn = DriverManager.getConnection(url); - } else { - dbConn = DriverManager.getConnection(url, username, password); - } - } - - return dbConn; - } - - /** - * 获取jdbc连接(重试3次) - * @param url url - * @param username 账号 - * @param password 密码 - * @return - * @throws SQLException - */ - public static Connection getConnection(String url, String username, String password) throws SQLException { - boolean failed = true; - Connection dbConn = null; - for (int i = 0; i < MAX_RETRY_TIMES && failed; ++i) { - try { - dbConn = getConnectionInternal(url, username, password); - failed = false; - } catch (Exception e) { - if (i == MAX_RETRY_TIMES - 1) { - throw e; - } else { - SysUtil.sleep(3000); - } - } - } - - return dbConn; - } -} diff --git a/flinkx-metadata/flinkx-metadata-reader/pom.xml b/flinkx-metadata/flinkx-metadata-reader/pom.xml deleted file mode 100644 index f06baff291..0000000000 --- a/flinkx-metadata/flinkx-metadata-reader/pom.xml +++ /dev/null @@ -1,22 +0,0 @@ - - - - flinkx-metadata - com.dtstack.flinkx - 1.6 - - jar - 4.0.0 - - flinkx-metadata-reader - - - - com.dtstack.flinkx - flinkx-metadata-core - 1.6 - - - \ No newline at end of file diff --git a/flinkx-metadata/flinkx-metadata-reader/src/main/java/com/dtstack/flinkx/metadata/inputformat/BaseMetadataInputFormat.java b/flinkx-metadata/flinkx-metadata-reader/src/main/java/com/dtstack/flinkx/metadata/inputformat/BaseMetadataInputFormat.java deleted file mode 100644 index f134c8726c..0000000000 --- a/flinkx-metadata/flinkx-metadata-reader/src/main/java/com/dtstack/flinkx/metadata/inputformat/BaseMetadataInputFormat.java +++ /dev/null @@ -1,253 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -package com.dtstack.flinkx.metadata.inputformat; - -import com.dtstack.flinkx.inputformat.BaseRichInputFormat; -import com.dtstack.flinkx.metadata.MetaDataCons; -import com.dtstack.flinkx.metadata.util.ConnUtil; -import com.dtstack.flinkx.util.ExceptionUtil; -import org.apache.commons.collections.CollectionUtils; -import org.apache.commons.collections.MapUtils; -import org.apache.commons.lang.StringUtils; -import org.apache.flink.core.io.InputSplit; -import org.apache.flink.types.Row; - -import java.io.IOException; -import java.sql.Connection; -import java.sql.ResultSet; -import java.sql.SQLException; -import java.sql.Statement; -import java.util.HashMap; -import java.util.Iterator; -import java.util.LinkedList; -import java.util.List; -import java.util.Map; - -/** - * @author : tiezhu - * @date : 2020/3/20 - */ -public abstract class BaseMetadataInputFormat extends BaseRichInputFormat { - - private static final long serialVersionUID = 1L; - - protected String dbUrl; - - protected String username; - - protected String password; - - protected String driverName; - - protected boolean queryTable; - - /** - * 记录任务参数传入的database和表名 - */ - protected List> dbTableList; - - /** - * 存放所有需要查询的表的名字 - */ - protected List tableList; - - /** - * 记录当前查询的表所在list中的位置 - */ - protected int start; - - protected static transient ThreadLocal connection = new ThreadLocal<>(); - - protected static transient ThreadLocal statement = new ThreadLocal<>(); - - protected static transient ThreadLocal currentDb = new ThreadLocal<>(); - - protected static transient ThreadLocal> tableIterator = new ThreadLocal<>(); - - @Override - protected void openInternal(InputSplit inputSplit) throws IOException { - try { - if(connection.get() == null){ - connection.set(getConnection()); - } - statement.set(connection.get().createStatement()); - currentDb.set(((MetadataInputSplit) inputSplit).getDbName()); - switchDatabase(currentDb.get()); - tableList = ((MetadataInputSplit) inputSplit).getTableList(); - if (CollectionUtils.isEmpty(tableList)) { - tableList = showTables(); - queryTable = true; - } - LOG.info("current database = {}, tableSize = {}, tableList = {}",currentDb.get(), tableList.size(), tableList); - tableIterator.set(tableList.iterator()); - start = 0; - init(); - } catch (ClassNotFoundException e) { - LOG.error("could not find suitable driver, e={}", ExceptionUtil.getErrorMessage(e)); - throw new IOException(e); - } catch (SQLException e){ - LOG.error("获取table列表异常, dbUrl = {}, username = {}, inputSplit = {}, e = {}", dbUrl, username, inputSplit, ExceptionUtil.getErrorMessage(e)); - tableList = new LinkedList<>(); - } - LOG.info("curentDb = {}, tableList = {}", currentDb.get(), tableList); - tableIterator.set(tableList.iterator()); - } - - /** - * 按照database进行划分,可能与channel数不同 - * @param splitNumber 最小分片数 - * @return 分片 - */ - @Override - @SuppressWarnings("unchecked") - protected InputSplit[] createInputSplitsInternal(int splitNumber) { - InputSplit[] inputSplits = new MetadataInputSplit[dbTableList.size()]; - for (int index = 0; index < dbTableList.size(); index++) { - Map dbTables = dbTableList.get(index); - String dbName = MapUtils.getString(dbTables, MetaDataCons.KEY_DB_NAME); - if(StringUtils.isNotEmpty(dbName)){ - List tables = (List)dbTables.get(MetaDataCons.KEY_TABLE_LIST); - inputSplits[index] = new MetadataInputSplit(splitNumber, dbName, tables); - } - } - return inputSplits; - } - - @Override - protected Row nextRecordInternal(Row row) throws IOException{ - Map metaData = new HashMap<>(16); - metaData.put(MetaDataCons.KEY_OPERA_TYPE, MetaDataCons.DEFAULT_OPERA_TYPE); - - String tableName = (String) tableIterator.get().next(); - metaData.put(MetaDataCons.KEY_SCHEMA, currentDb.get()); - metaData.put(MetaDataCons.KEY_TABLE, tableName); - try { - metaData.putAll(queryMetaData(tableName)); - metaData.put(MetaDataCons.KEY_QUERY_SUCCESS, true); - } catch (Exception e) { - metaData.put(MetaDataCons.KEY_QUERY_SUCCESS, false); - metaData.put(MetaDataCons.KEY_ERROR_MSG, ExceptionUtil.getErrorMessage(e)); - LOG.error(ExceptionUtil.getErrorMessage(e)); - } - start++; - return Row.of(metaData); - } - - @Override - public boolean reachedEnd() { - return !tableIterator.get().hasNext(); - } - - @Override - protected void closeInternal() throws IOException { - tableIterator.remove(); - Statement st = statement.get(); - if (null != st) { - try { - st.close(); - statement.remove(); - } catch (SQLException e) { - LOG.error("close statement failed, e = {}", ExceptionUtil.getErrorMessage(e)); - throw new IOException("close statement failed", e); - } - } - - currentDb.remove(); - Connection conn = connection.get(); - if (null != conn) { - try { - conn.close(); - connection.remove(); - } catch (SQLException e) { - LOG.error("close database connection failed, e = {}", ExceptionUtil.getErrorMessage(e)); - throw new IOException("close database connection failed", e); - } - } - } - - @Override - public void closeInputFormat() throws IOException { - super.closeInputFormat(); - } - - /** - * 创建数据库连接 - */ - public Connection getConnection() throws SQLException, ClassNotFoundException { - Class.forName(driverName); - return ConnUtil.getConnection(dbUrl, username, password); - } - - /** - * 查询当前数据库下所有的表 - * - * @return 表名列表 - * @throws SQLException 异常 - */ - protected abstract List showTables() throws SQLException; - - /** - * 切换当前database - * - * @param databaseName 数据库名 - * @throws SQLException 异常 - */ - protected abstract void switchDatabase(String databaseName) throws SQLException; - - /** - * 根据表名查询元数据信息 - * @param tableName 表名 - * @return 元数据信息 - * @throws SQLException 异常 - */ - protected abstract Map queryMetaData(String tableName) throws SQLException; - - /** - * 将数据库名,表名,列名字符串转为对应的引用,如:testTable -> `testTable` - * @param name 入参 - * @return 返回数据库名,表名,列名的引用 - */ - protected abstract String quote(String name); - - /** - * 提供子类对新增成员变量初始化的接口 - */ - protected void init() throws SQLException {} - - /** - * 对执行sql方法的包装,增加日志打印和异常处理 - * @param sql 原始sql语句 - * @param statement 原始的statement - */ - protected ResultSet executeQuery0(String sql, Statement statement){ - ResultSet resultSet = null; - if(StringUtils.isNotBlank(sql)){ - LOG.info("execute SQL : {}", sql); - try{ - if(statement!=null){ - resultSet = statement.executeQuery(sql); - } - }catch (SQLException e){ - LOG.error("execute SQL failed : {}", ExceptionUtil.getErrorMessage(e)); - } - - } - return resultSet; - } - -} diff --git a/flinkx-metadata/flinkx-metadata-reader/src/main/java/com/dtstack/flinkx/metadata/inputformat/MetadataInputFormatBuilder.java b/flinkx-metadata/flinkx-metadata-reader/src/main/java/com/dtstack/flinkx/metadata/inputformat/MetadataInputFormatBuilder.java deleted file mode 100644 index 3b9077262c..0000000000 --- a/flinkx-metadata/flinkx-metadata-reader/src/main/java/com/dtstack/flinkx/metadata/inputformat/MetadataInputFormatBuilder.java +++ /dev/null @@ -1,65 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - *

- * http://www.apache.org/licenses/LICENSE-2.0 - *

- * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -package com.dtstack.flinkx.metadata.inputformat; - -import com.dtstack.flinkx.inputformat.BaseRichInputFormatBuilder; - -import java.util.List; -import java.util.Map; - -/** - * @author : tiezhu - * @date : 2020/3/8 - */ -public class MetadataInputFormatBuilder extends BaseRichInputFormatBuilder { - private BaseMetadataInputFormat format; - - public MetadataInputFormatBuilder(BaseMetadataInputFormat format) { - super.format = this.format = format; - } - - public void setDbUrl(String dbUrl) { - format.dbUrl = dbUrl; - } - - public void setUsername(String username) { - format.username = username; - } - - public void setPassword(String password) { - format.password = password; - } - - public void setDriverName(String driverName) { - format.driverName = driverName; - } - - public void setDbList(List> dbTableList){ - format.dbTableList = dbTableList; - } - - @Override - protected void checkFormat() { - if (format.password == null || format.username == null) { - throw new IllegalArgumentException("请检查用户密码是否填写"); - } - if (format.dbUrl == null) { - throw new IllegalArgumentException("请检查url是否填写"); - } - } -} diff --git a/flinkx-metadata/flinkx-metadata-reader/src/main/java/com/dtstack/flinkx/metadata/inputformat/MetadataInputSplit.java b/flinkx-metadata/flinkx-metadata-reader/src/main/java/com/dtstack/flinkx/metadata/inputformat/MetadataInputSplit.java deleted file mode 100644 index 658cf1252c..0000000000 --- a/flinkx-metadata/flinkx-metadata-reader/src/main/java/com/dtstack/flinkx/metadata/inputformat/MetadataInputSplit.java +++ /dev/null @@ -1,67 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -package com.dtstack.flinkx.metadata.inputformat; - -import com.google.gson.GsonBuilder; -import org.apache.flink.core.io.InputSplit; - -import java.util.List; - -/** - * @author : tiezhu - * @date : 2020/3/20 - */ -public class MetadataInputSplit implements InputSplit { - - private static final long serialVersionUID = -4483633039887822171L; - - private int splitNumber; - - private String dbName; - - private List tableList; - - public MetadataInputSplit(int splitNumber, String dbName, List tableList) { - this.splitNumber = splitNumber; - this.dbName = dbName; - this.tableList = tableList; - } - - public String getDbName() { - return dbName; - } - - public List getTableList() { - return tableList; - } - - @Override - public String toString() { - return "MetadataInputSplit{" + - "splitNumber=" + splitNumber + - ", dbName='" + dbName + '\'' + - ", tableList=" + new GsonBuilder().serializeNulls().create().toJson(tableList) + - '}'; - } - - @Override - public int getSplitNumber() { - return splitNumber; - } -} - diff --git a/flinkx-metadata/flinkx-metadata-reader/src/main/java/com/dtstack/flinkx/metadata/reader/MetadataReader.java b/flinkx-metadata/flinkx-metadata-reader/src/main/java/com/dtstack/flinkx/metadata/reader/MetadataReader.java deleted file mode 100644 index 2e8a95e843..0000000000 --- a/flinkx-metadata/flinkx-metadata-reader/src/main/java/com/dtstack/flinkx/metadata/reader/MetadataReader.java +++ /dev/null @@ -1,75 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - *

- * http://www.apache.org/licenses/LICENSE-2.0 - *

- * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -package com.dtstack.flinkx.metadata.reader; - -import com.dtstack.flinkx.config.DataTransferConfig; -import com.dtstack.flinkx.config.ReaderConfig; -import com.dtstack.flinkx.inputformat.BaseRichInputFormat; -import com.dtstack.flinkx.metadata.inputformat.MetadataInputFormatBuilder; -import com.dtstack.flinkx.reader.BaseDataReader; -import org.apache.flink.streaming.api.datastream.DataStream; -import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; -import org.apache.flink.types.Row; -import com.dtstack.flinkx.metadata.MetaDataCons; - -import java.util.List; -import java.util.Map; - -/** - * @author : tiezhu - * @date : 2020/3/8 - */ -public class MetadataReader extends BaseDataReader { - protected String jdbcUrl; - protected List> dbList; - protected String username; - protected String password; - protected String driverName; - - @SuppressWarnings("unchecked") - protected MetadataReader(DataTransferConfig config, StreamExecutionEnvironment env) { - super(config, env); - - ReaderConfig readerConfig = config.getJob().getContent().get(0).getReader(); - - jdbcUrl = readerConfig.getParameter().getStringVal(MetaDataCons.KEY_JDBC_URL); - username = readerConfig.getParameter().getStringVal(MetaDataCons.KEY_CONN_USERNAME); - password = readerConfig.getParameter().getStringVal(MetaDataCons.KEY_CONN_PASSWORD); - dbList = (List>) readerConfig.getParameter().getVal(MetaDataCons.KEY_DB_LIST); - - } - - @Override - public DataStream readData() { - MetadataInputFormatBuilder builder = getBuilder(); - builder.setDataTransferConfig(dataTransferConfig); - builder.setDbUrl(jdbcUrl); - builder.setPassword(password); - builder.setUsername(username); - builder.setDriverName(driverName); - builder.setDbList(dbList); - - BaseRichInputFormat format = builder.finish(); - - return createInput(format); - } - - protected MetadataInputFormatBuilder getBuilder(){ - throw new RuntimeException("子类必须覆盖getBuilder方法"); - } -} diff --git a/flinkx-mongodb/flinkx-mongodb-core/src/main/java/com/dtstack/flinkx/mongodb/MongodbConfig.java b/flinkx-mongodb/flinkx-mongodb-core/src/main/java/com/dtstack/flinkx/mongodb/MongodbConfig.java index b435493a22..b9976161d1 100644 --- a/flinkx-mongodb/flinkx-mongodb-core/src/main/java/com/dtstack/flinkx/mongodb/MongodbConfig.java +++ b/flinkx-mongodb/flinkx-mongodb-core/src/main/java/com/dtstack/flinkx/mongodb/MongodbConfig.java @@ -56,6 +56,10 @@ public class MongodbConfig implements Serializable { private List monitorCollections; + private List operateType; + + private boolean pavingData; + private String clusterMode; private int startLocation; @@ -263,14 +267,29 @@ public void setMongodbConfig(ConnectionConfig mongodbConfig) { this.mongodbConfig = mongodbConfig; } + public List getOperateType() { + return operateType; + } + + public void setOperateType(List operateType) { + this.operateType = operateType; + } + + public boolean getPavingData() { + return pavingData; + } + + public void setPavingData(boolean pavingData) { + this.pavingData = pavingData; + } + @Override public String toString() { - // TODO 密码脱敏 return "MongodbConfig{" + "hostPorts='" + hostPorts + '\'' + ", url='" + url + '\'' + ", username='" + username + '\'' + - ", password='" + "******" + '\'' + + ", password='******" + '\'' + ", authenticationMechanism='" + authenticationMechanism + '\'' + ", database='" + database + '\'' + ", collectionName='" + collectionName + '\'' + @@ -278,6 +297,13 @@ public String toString() { ", fetchSize=" + fetchSize + ", writeMode='" + writeMode + '\'' + ", replaceKey='" + replaceKey + '\'' + + ", monitorDatabases=" + monitorDatabases + + ", monitorCollections=" + monitorCollections + + ", operateType=" + operateType + + ", pavingData=" + pavingData + + ", clusterMode='" + clusterMode + '\'' + + ", startLocation=" + startLocation + + ", excludeDocId=" + excludeDocId + ", mongodbConfig=" + mongodbConfig + '}'; } diff --git a/flinkx-mongodb/flinkx-mongodb-reader/pom.xml b/flinkx-mongodb/flinkx-mongodb-reader/pom.xml index 27aea029af..9e532bf9ad 100644 --- a/flinkx-mongodb/flinkx-mongodb-reader/pom.xml +++ b/flinkx-mongodb/flinkx-mongodb-reader/pom.xml @@ -89,7 +89,7 @@ + tofile="${basedir}/../../syncplugins/mongodbreader/${project.name}-${package.name}.jar" /> diff --git a/flinkx-mongodb/flinkx-mongodb-writer/pom.xml b/flinkx-mongodb/flinkx-mongodb-writer/pom.xml index 41287f5adc..83b9d3003a 100644 --- a/flinkx-mongodb/flinkx-mongodb-writer/pom.xml +++ b/flinkx-mongodb/flinkx-mongodb-writer/pom.xml @@ -89,7 +89,7 @@ + tofile="${basedir}/../../syncplugins/mongodbwriter/${project.name}-${package.name}.jar" /> diff --git a/flinkx-websocket/flinkx-websocket-reader/pom.xml b/flinkx-mongodb/flinkx-mongodboplog-reader/pom.xml similarity index 78% rename from flinkx-websocket/flinkx-websocket-reader/pom.xml rename to flinkx-mongodb/flinkx-mongodboplog-reader/pom.xml index 8af1e260b7..6ff375839c 100644 --- a/flinkx-websocket/flinkx-websocket-reader/pom.xml +++ b/flinkx-mongodb/flinkx-mongodboplog-reader/pom.xml @@ -3,19 +3,18 @@ xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> - flinkx-websocket + flinkx-mongodb com.dtstack.flinkx 1.6 4.0.0 - flinkx-websocket-reader - + flinkx-mongodboplog-reader + flinkx-mongodb-core com.dtstack.flinkx - flinkx-websocket-core 1.6 @@ -34,13 +33,6 @@ false - - - org.slf4j:slf4j-api - log4j:log4j - ch.qos.logback:* - - *:* @@ -52,10 +44,6 @@ - - io.netty - shade.websocketreader.io.netty - com.google.common shade.core.com.google.common @@ -83,13 +71,13 @@ - + - + - + @@ -97,5 +85,4 @@ - \ No newline at end of file diff --git a/flinkx-mongodb/flinkx-mongodboplog-reader/src/main/java/com/dtstack/flinkx/mongodboplog/format/MongodbEventHandler.java b/flinkx-mongodb/flinkx-mongodboplog-reader/src/main/java/com/dtstack/flinkx/mongodboplog/format/MongodbEventHandler.java new file mode 100644 index 0000000000..7a0c6c694b --- /dev/null +++ b/flinkx-mongodb/flinkx-mongodboplog-reader/src/main/java/com/dtstack/flinkx/mongodboplog/format/MongodbEventHandler.java @@ -0,0 +1,96 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + + +package com.dtstack.flinkx.mongodboplog.format; + +import com.dtstack.flinkx.util.SnowflakeIdWorker; +import org.apache.flink.types.Row; +import org.bson.BsonTimestamp; +import org.bson.Document; + +import java.util.*; +import java.util.concurrent.atomic.AtomicLong; + +/** + * @author jiangbo + * @date 2019/12/5 + */ +public class MongodbEventHandler { + + public final static String EVENT_KEY_OP = "op"; + public final static String EVENT_KEY_NS = "ns"; + public final static String EVENT_KEY_TS = "ts"; + public final static String EVENT_KEY_DATA = "o"; + + private static SnowflakeIdWorker idWorker = new SnowflakeIdWorker(1, 1); + + public static Row handleEvent(final Document event, AtomicLong offset, boolean excludeDocId, boolean pavingData){ + MongodbOperation mongodbOperation = MongodbOperation.getByInternalNames(event.getString(EVENT_KEY_OP)); + Map eventMap = new LinkedHashMap<>(); + eventMap.put("type", mongodbOperation.name()); + + parseDbAndCollection(event, eventMap); + + BsonTimestamp timestamp = event.get(EVENT_KEY_TS, BsonTimestamp.class); + eventMap.put("ts", idWorker.nextId()); + + final Document data = (Document)event.get(EVENT_KEY_DATA); + Set keys = data.keySet(); + if(excludeDocId){ + keys.remove("_id"); + } + + if (pavingData) { + for (String key : keys) { + eventMap.put("after_" + key, data.get(key)); + } + + for (String key : keys) { + eventMap.put("before_" + key, null); + } + } else { + eventMap.put("before", processColumnList(keys, data, true)); + eventMap.put("after", processColumnList(keys, data, false)); + eventMap = Collections.singletonMap("message", eventMap); + } + + offset.set(timestamp.getValue()); + return Row.of(eventMap); + } + + private static Map processColumnList(Set keys, Document data, boolean valueNull) { + Map map = new HashMap<>(keys.size()); + for (String key : keys) { + if (valueNull) { + map.put(key, null); + } else { + map.put(key, data.get(key)); + } + } + + return map; + } + + private static void parseDbAndCollection(final Document event, Map eventMap){ + String dbCollection = event.getString(EVENT_KEY_NS); + String[] split = dbCollection.split("\\."); + eventMap.put("schema", split[0]); + eventMap.put("table", split[1]); + } +} diff --git a/flinkx-mongodb/flinkx-mongodboplog-reader/src/main/java/com/dtstack/flinkx/mongodboplog/format/MongodbOperation.java b/flinkx-mongodb/flinkx-mongodboplog-reader/src/main/java/com/dtstack/flinkx/mongodboplog/format/MongodbOperation.java new file mode 100644 index 0000000000..adfac99650 --- /dev/null +++ b/flinkx-mongodb/flinkx-mongodboplog-reader/src/main/java/com/dtstack/flinkx/mongodboplog/format/MongodbOperation.java @@ -0,0 +1,67 @@ +package com.dtstack.flinkx.mongodboplog.format; + + +import java.util.ArrayList; +import java.util.List; + +/** + * @author jiangbo + * @date 2019/12/5 + */ +public enum MongodbOperation { + + /** + * 插入 + */ + INSERT("i"), + + /** + * 更新 + */ + UPDATE("u"), + + /** + * 删除 + */ + DELETE("d"); + + private String internalName; + + MongodbOperation(String internalName) { + this.internalName = internalName; + } + + public String getInternalName() { + return internalName; + } + + public static List getInternalNames(List names) { + List internalNames = new ArrayList(names.size()); + for (String name : names) { + MongodbOperation operation = getByName(name); + internalNames.add(operation.getInternalName()); + } + + return internalNames; + } + + public static MongodbOperation getByName(String name) { + for (MongodbOperation value : MongodbOperation.values()) { + if (value.name().equalsIgnoreCase(name)){ + return value; + } + } + + throw new RuntimeException("不支持的操作类型:" + name); + } + + public static MongodbOperation getByInternalNames(String name){ + for (MongodbOperation value : MongodbOperation.values()) { + if (value.getInternalName().equalsIgnoreCase(name)){ + return value; + } + } + + throw new RuntimeException("不支持的操作类型:" + name); + } +} diff --git a/flinkx-mongodb/flinkx-mongodboplog-reader/src/main/java/com/dtstack/flinkx/mongodboplog/format/MongodboplogInputFormat.java b/flinkx-mongodb/flinkx-mongodboplog-reader/src/main/java/com/dtstack/flinkx/mongodboplog/format/MongodboplogInputFormat.java new file mode 100644 index 0000000000..acd66c4dba --- /dev/null +++ b/flinkx-mongodb/flinkx-mongodboplog-reader/src/main/java/com/dtstack/flinkx/mongodboplog/format/MongodboplogInputFormat.java @@ -0,0 +1,208 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + + +package com.dtstack.flinkx.mongodboplog.format; + +import com.dtstack.flinkx.inputformat.BaseRichInputFormat; +import com.dtstack.flinkx.mongodb.MongodbClientUtil; +import com.dtstack.flinkx.mongodb.MongodbConfig; +import com.dtstack.flinkx.restore.FormatState; +import com.mongodb.CursorType; +import com.mongodb.MongoClient; +import com.mongodb.client.FindIterable; +import com.mongodb.client.MongoCollection; +import com.mongodb.client.MongoCursor; +import com.mongodb.client.model.Filters; +import org.apache.commons.collections.CollectionUtils; +import org.apache.commons.lang.StringUtils; +import org.apache.flink.core.io.GenericInputSplit; +import org.apache.flink.core.io.InputSplit; +import org.apache.flink.types.Row; +import org.bson.BsonTimestamp; +import org.bson.Document; +import org.bson.conversions.Bson; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.List; +import java.util.concurrent.atomic.AtomicLong; + +/** + * @author jiangbo + * @date 2019/12/5 + */ +public class MongodboplogInputFormat extends BaseRichInputFormat { + + private final static String OPLOG_DB = "local"; + private final static String REPLICA_SET_COLLECTION = "oplog.rs"; + private final static String MASTER_SLAVE_COLLECTION = "oplog.$main"; + + protected MongodbConfig mongodbConfig; + + private transient MongoClient client; + + private transient MongoCursor cursor; + + private AtomicLong offset = new AtomicLong(); + + private InputSplit inputSplit; + + @Override + protected void openInternal(InputSplit inputSplit) throws IOException { + this.inputSplit = inputSplit; + initOffset(); + + client = MongodbClientUtil.getClient(mongodbConfig); + MongoCollection oplog = getOplogCollection(); + FindIterable results = oplog.find(buildFilter()) + .sort(new Document("$natural", 1)) + .oplogReplay(true) + .cursorType(CursorType.TailableAwait); + + cursor = results.iterator(); + } + + /** + * 在 master/slave 结构下, oplog 位于local.oplog.$main + * 在 Replca set 结构下, oplog 位于local.oplog.rs + */ + private MongoCollection getOplogCollection(){ + if ("REPLICA_SET".equalsIgnoreCase(mongodbConfig.getClusterMode())) { + return client.getDatabase(OPLOG_DB).getCollection(REPLICA_SET_COLLECTION); + } else if("MASTER_SLAVE".equalsIgnoreCase(mongodbConfig.getClusterMode())){ + return client.getDatabase(OPLOG_DB).getCollection(MASTER_SLAVE_COLLECTION); + } else { + throw new RuntimeException("集群模式不支持:" + mongodbConfig.getClusterMode()); + } + } + + private void initOffset(){ + BsonTimestamp startLocation = new BsonTimestamp(mongodbConfig.getStartLocation(), 0); + if (formatState != null && formatState.getState() != null) { + offset.set(Long.valueOf(formatState.getState().toString())); + long state = (Long)formatState.getState(); + if (startLocation.compareTo(new BsonTimestamp(state)) > 0) { + offset.set(mongodbConfig.getStartLocation()); + } else { + offset.set(state); + } + } else { + offset.set(startLocation.getValue()); + } + } + + private Bson buildFilter(){ + List filters = new ArrayList<>(); + + // 设置读取位置 + filters.add(Filters.gt(MongodbEventHandler.EVENT_KEY_TS, new BsonTimestamp(offset.get()))); + + // + filters.add(Filters.exists("fromMigrate", false)); + + // 过滤db和collection + String pattern = buildPattern(); + if (pattern != null) { + filters.add(Filters.regex(MongodbEventHandler.EVENT_KEY_NS, pattern)); + } + + // 过滤系统日志 + filters.add(Filters.ne(MongodbEventHandler.EVENT_KEY_NS, "config.system.sessions")); + + // 过滤操作类型 + if(CollectionUtils.isNotEmpty(mongodbConfig.getOperateType())) { + List operateTypes = MongodbOperation.getInternalNames(mongodbConfig.getOperateType()); + filters.add(Filters.in(MongodbEventHandler.EVENT_KEY_OP, operateTypes)); + } + + return Filters.and(filters); + } + + private String buildPattern() { + if (CollectionUtils.isEmpty(mongodbConfig.getMonitorDatabases()) && CollectionUtils.isEmpty(mongodbConfig.getMonitorCollections())){ + return null; + } + + StringBuilder pattern = new StringBuilder(); + if(CollectionUtils.isNotEmpty(mongodbConfig.getMonitorDatabases())){ + mongodbConfig.getMonitorDatabases().removeIf(StringUtils::isEmpty); + if(CollectionUtils.isNotEmpty(mongodbConfig.getMonitorDatabases())){ + String databasePattern = StringUtils.join(mongodbConfig.getMonitorDatabases(), "|"); + pattern.append("(").append(databasePattern).append(")"); + } else { + pattern.append(".*"); + } + } + + pattern.append("\\."); + + if(CollectionUtils.isNotEmpty(mongodbConfig.getMonitorCollections())){ + mongodbConfig.getMonitorCollections().removeIf(String::isEmpty); + if(CollectionUtils.isNotEmpty(mongodbConfig.getMonitorCollections())){ + String collectionPattern = StringUtils.join(mongodbConfig.getMonitorCollections(), "|"); + pattern.append("(").append(collectionPattern).append(")"); + } else { + pattern.append(".*"); + } + } + + return pattern.toString(); + } + + @Override + protected Row nextRecordInternal(Row row) throws IOException { + return MongodbEventHandler.handleEvent(cursor.next(), offset, mongodbConfig.getExcludeDocId(), mongodbConfig.getPavingData()); + } + + @Override + public FormatState getFormatState() { + super.getFormatState(); + + if (formatState != null){ + formatState.setState(offset.get()); + } + + return formatState; + } + + @Override + protected void closeInternal() throws IOException { + MongodbClientUtil.close(client, cursor); + } + + @Override + public InputSplit[] createInputSplitsInternal(int minNumSplits) throws IOException { + return new InputSplit[]{new GenericInputSplit(1,1)}; + } + + @Override + public boolean reachedEnd() throws IOException { + try { + return !cursor.hasNext(); + } catch (Exception e) { + // 这里出现异常可能是因为集群里某个节点挂了,所以不退出程序,调用openInternal方法重新连接,并从offset处开始同步数据, + // 如果集群有问题,在openInternal方法里结束进程 + LOG.warn("获取数据异常,可能是某个节点出问题了,程序将自动重新选择节点连接", e); + closeInternal(); + openInternal(inputSplit); + + return false; + } + } +} diff --git a/flinkx-mongodb/flinkx-mongodboplog-reader/src/main/java/com/dtstack/flinkx/mongodboplog/format/MongodboplogInputFormatBuilder.java b/flinkx-mongodb/flinkx-mongodboplog-reader/src/main/java/com/dtstack/flinkx/mongodboplog/format/MongodboplogInputFormatBuilder.java new file mode 100644 index 0000000000..f1975fe397 --- /dev/null +++ b/flinkx-mongodb/flinkx-mongodboplog-reader/src/main/java/com/dtstack/flinkx/mongodboplog/format/MongodboplogInputFormatBuilder.java @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + + +package com.dtstack.flinkx.mongodboplog.format; + +import com.dtstack.flinkx.inputformat.BaseRichInputFormatBuilder; +import com.dtstack.flinkx.mongodb.MongodbConfig; + +/** + * @author jiangbo + * @date 2019/12/5 + */ +public class MongodboplogInputFormatBuilder extends BaseRichInputFormatBuilder { + + private MongodboplogInputFormat format; + + public MongodboplogInputFormatBuilder() { + super.format = this.format = new MongodboplogInputFormat(); + } + + public void setMongodbConfig(MongodbConfig mongodbConfig){ + format.mongodbConfig = mongodbConfig; + } + + @Override + protected void checkFormat() { + + } +} diff --git a/flinkx-mongodb/flinkx-mongodboplog-reader/src/main/java/com/dtstack/flinkx/mongodboplog/reader/MongodboplogReader.java b/flinkx-mongodb/flinkx-mongodboplog-reader/src/main/java/com/dtstack/flinkx/mongodboplog/reader/MongodboplogReader.java new file mode 100644 index 0000000000..3d67f6b4ca --- /dev/null +++ b/flinkx-mongodb/flinkx-mongodboplog-reader/src/main/java/com/dtstack/flinkx/mongodboplog/reader/MongodboplogReader.java @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + + +package com.dtstack.flinkx.mongodboplog.reader; + +import com.dtstack.flinkx.config.DataTransferConfig; +import com.dtstack.flinkx.config.ReaderConfig; +import com.dtstack.flinkx.mongodb.MongodbConfig; +import com.dtstack.flinkx.reader.BaseDataReader; +import com.dtstack.flinkx.mongodboplog.format.MongodboplogInputFormatBuilder; +import org.apache.flink.streaming.api.datastream.DataStream; +import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; +import org.apache.flink.types.Row; + +/** + * @author jiangbo + * @date 2019/12/5 + */ +public class MongodboplogReader extends BaseDataReader { + + private MongodbConfig mongodbConfig; + + public MongodboplogReader(DataTransferConfig config, StreamExecutionEnvironment env) { + super(config, env); + + ReaderConfig readerConfig = config.getJob().getContent().get(0).getReader(); + try { + mongodbConfig = objectMapper.readValue(objectMapper.writeValueAsString(readerConfig.getParameter().getAll()), MongodbConfig.class); + } catch (Exception e) { + throw new RuntimeException("parse mongodb config error:", e); + } + } + + @Override + public DataStream readData() { + MongodboplogInputFormatBuilder builder = new MongodboplogInputFormatBuilder(); + builder.setDataTransferConfig(dataTransferConfig); + builder.setMongodbConfig(mongodbConfig); + builder.setMonitorUrls(monitorUrls); + builder.setBytes(bytes); + builder.setRestoreConfig(restoreConfig); + builder.setLogConfig(logConfig); + builder.setTestConfig(testConfig); + return createInput(builder.finish()); + } +} diff --git a/flinkx-mongodb/pom.xml b/flinkx-mongodb/pom.xml index d8a418734c..f16a24b779 100644 --- a/flinkx-mongodb/pom.xml +++ b/flinkx-mongodb/pom.xml @@ -14,6 +14,7 @@ flinkx-mongodb-core flinkx-mongodb-reader + flinkx-mongodboplog-reader flinkx-mongodb-writer diff --git a/flinkx-mysql/flinkx-mysql-dreader/pom.xml b/flinkx-mysql/flinkx-mysql-dreader/pom.xml index 9fb24688ee..0751f5c80a 100644 --- a/flinkx-mysql/flinkx-mysql-dreader/pom.xml +++ b/flinkx-mysql/flinkx-mysql-dreader/pom.xml @@ -95,7 +95,7 @@ + tofile="${basedir}/../../syncplugins/mysqldreader/${project.name}-${package.name}.jar" /> diff --git a/flinkx-mysql/flinkx-mysql-reader/pom.xml b/flinkx-mysql/flinkx-mysql-reader/pom.xml index 50421e8ced..5f82074c6f 100644 --- a/flinkx-mysql/flinkx-mysql-reader/pom.xml +++ b/flinkx-mysql/flinkx-mysql-reader/pom.xml @@ -95,7 +95,7 @@ + tofile="${basedir}/../../syncplugins/mysqlreader/${project.name}-${package.name}.jar" /> diff --git a/flinkx-mysql/flinkx-mysql-writer/pom.xml b/flinkx-mysql/flinkx-mysql-writer/pom.xml index bc7d8f5eb8..de701fc94d 100644 --- a/flinkx-mysql/flinkx-mysql-writer/pom.xml +++ b/flinkx-mysql/flinkx-mysql-writer/pom.xml @@ -95,7 +95,7 @@ + tofile="${basedir}/../../syncplugins/mysqlwriter/${project.name}-${package.name}.jar" /> diff --git a/flinkx-odps/flinkx-odps-reader/pom.xml b/flinkx-odps/flinkx-odps-reader/pom.xml index d9970f8cc7..33d7cbd1c5 100644 --- a/flinkx-odps/flinkx-odps-reader/pom.xml +++ b/flinkx-odps/flinkx-odps-reader/pom.xml @@ -85,7 +85,7 @@ + tofile="${basedir}/../../syncplugins/odpsreader/${project.name}-${package.name}.jar" /> diff --git a/flinkx-odps/flinkx-odps-writer/pom.xml b/flinkx-odps/flinkx-odps-writer/pom.xml index 99b672cb84..b1575373aa 100644 --- a/flinkx-odps/flinkx-odps-writer/pom.xml +++ b/flinkx-odps/flinkx-odps-writer/pom.xml @@ -90,7 +90,7 @@ + tofile="${basedir}/../../syncplugins/odpswriter/${project.name}-${package.name}.jar" /> diff --git a/flinkx-oracle/flinkx-oracle-reader/pom.xml b/flinkx-oracle/flinkx-oracle-reader/pom.xml index ddbc91c274..ab8164a41c 100644 --- a/flinkx-oracle/flinkx-oracle-reader/pom.xml +++ b/flinkx-oracle/flinkx-oracle-reader/pom.xml @@ -95,7 +95,7 @@ + tofile="${basedir}/../../syncplugins/oraclereader/${project.name}-${package.name}.jar" /> diff --git a/flinkx-oracle/flinkx-oracle-writer/pom.xml b/flinkx-oracle/flinkx-oracle-writer/pom.xml index 826ed5e124..45ef58b1fd 100644 --- a/flinkx-oracle/flinkx-oracle-writer/pom.xml +++ b/flinkx-oracle/flinkx-oracle-writer/pom.xml @@ -96,7 +96,7 @@ + tofile="${basedir}/../../syncplugins/oraclewriter/${project.name}-${package.name}.jar" /> diff --git a/flinkx-oraclelogminer/flinkx-oraclelogminer-reader/pom.xml b/flinkx-oraclelogminer/flinkx-oraclelogminer-reader/pom.xml index d3ecb477dd..a9cec6b8f8 100644 --- a/flinkx-oraclelogminer/flinkx-oraclelogminer-reader/pom.xml +++ b/flinkx-oraclelogminer/flinkx-oraclelogminer-reader/pom.xml @@ -1,6 +1,6 @@ - flinkx-oraclelogminer @@ -95,7 +95,7 @@ + tofile="${basedir}/../../syncplugins/oraclelogminerreader/${project.name}-${package.name}.jar" /> diff --git a/flinkx-oraclelogminer/flinkx-oraclelogminer-reader/src/main/java/com/dtstack/flinkx/oraclelogminer/format/LogMinerConnection.java b/flinkx-oraclelogminer/flinkx-oraclelogminer-reader/src/main/java/com/dtstack/flinkx/oraclelogminer/format/LogMinerConnection.java index 98d99128e8..45775e6293 100644 --- a/flinkx-oraclelogminer/flinkx-oraclelogminer-reader/src/main/java/com/dtstack/flinkx/oraclelogminer/format/LogMinerConnection.java +++ b/flinkx-oraclelogminer/flinkx-oraclelogminer-reader/src/main/java/com/dtstack/flinkx/oraclelogminer/format/LogMinerConnection.java @@ -440,7 +440,7 @@ private List queryLogFiles(Long scn) throws SQLException{ LogFile logFile = new LogFile(); logFile.setFileName(rs.getString("name")); logFile.setFirstChange(rs.getLong("first_change#")); - logFile.setNextChange(MAX_SCN); + logFile.setNextChange(rs.getLong("next_change#")); logFiles.add(logFile); } diff --git a/flinkx-oraclelogminer/flinkx-oraclelogminer-reader/src/main/java/com/dtstack/flinkx/oraclelogminer/util/SqlUtil.java b/flinkx-oraclelogminer/flinkx-oraclelogminer-reader/src/main/java/com/dtstack/flinkx/oraclelogminer/util/SqlUtil.java index 2569755926..3bab8c86c4 100644 --- a/flinkx-oraclelogminer/flinkx-oraclelogminer-reader/src/main/java/com/dtstack/flinkx/oraclelogminer/util/SqlUtil.java +++ b/flinkx-oraclelogminer/flinkx-oraclelogminer-reader/src/main/java/com/dtstack/flinkx/oraclelogminer/util/SqlUtil.java @@ -189,13 +189,14 @@ public class SqlUtil { public final static String SQL_QUERY_LOG_FILE = "SELECT\n" + " MIN(name) name,\n" + - " first_change#\n" + + " first_change#,\n" + + " MIN(next_change#) next_change#\n" + "FROM\n" + " (\n" + " SELECT\n" + " MIN(member) AS name,\n" + " first_change#,\n" + - " 281474976710655 AS next_change#\n" + + " MIN(next_change#) next_change#\n" + " FROM\n" + " v$log l\n" + " INNER JOIN v$logfile f ON l.group# = f.group#\n" + @@ -223,7 +224,8 @@ public class SqlUtil { public final static String SQL_QUERY_LOG_FILE_10 = "SELECT\n" + " MIN(name) name,\n" + - " first_change#\n" + + " first_change#,\n" + + " MIN(next_change#) next_change#\n" + "FROM\n" + " (\n" + " SELECT\n" + diff --git a/flinkx-oraclelogminer/pom.xml b/flinkx-oraclelogminer/pom.xml index bc4b8e878e..ef17277ffb 100644 --- a/flinkx-oraclelogminer/pom.xml +++ b/flinkx-oraclelogminer/pom.xml @@ -1,6 +1,6 @@ - flinkx-all diff --git a/flinkx-oss/flinkx-oss-core/pom.xml b/flinkx-oss/flinkx-oss-core/pom.xml new file mode 100644 index 0000000000..50f7fcc952 --- /dev/null +++ b/flinkx-oss/flinkx-oss-core/pom.xml @@ -0,0 +1,133 @@ + + + + flinkx-oss + com.dtstack.flinkx + 1.6 + + 4.0.0 + + flinkx-oss-core + + + + com.amazonaws + aws-java-sdk-s3 + 1.11.689 + + + + com.amazonaws + aws-java-sdk-dynamodb + 1.11.689 + + + + commons-lang + commons-lang + 2.6 + provided + + + + org.apache.hadoop + hadoop-aws + 3.1.0 + + + com.amazonaws + aws-java-sdk-bundle + + + + + + org.apache.hive + hive-exec + ${hive.version} + + + calcite-core + org.apache.calcite + + + calcite-avatica + org.apache.calcite + + + derby + org.apache.derby + + + org.xerial.snappy + snappy-java + + + com.fasterxml.jackson.core + jackson-databind + + + com.fasterxml.jackson.core + jackson-annotations + + + com.fasterxml.jackson.core + jackson-core + + + + + + org.apache.hive + hive-serde + ${hive.version} + + + org.apache.hadoop + hadoop-common + + + org.apache.hadoop + hadoop-yarn-api + + + org.xerial.snappy + snappy-java + + + com.fasterxml.jackson.core + jackson-databind + + + com.fasterxml.jackson.core + jackson-annotations + + + com.fasterxml.jackson.core + jackson-core + + + + + + parquet-hadoop + org.apache.parquet + 1.8.3 + + + org.xerial.snappy + snappy-java + + + + + + org.xerial.snappy + snappy-java + 1.1.4 + + + + \ No newline at end of file diff --git a/flinkx-oss/flinkx-oss-core/src/main/java/com/dtstack/flinkx/oss/ECompressType.java b/flinkx-oss/flinkx-oss-core/src/main/java/com/dtstack/flinkx/oss/ECompressType.java new file mode 100644 index 0000000000..761790580a --- /dev/null +++ b/flinkx-oss/flinkx-oss-core/src/main/java/com/dtstack/flinkx/oss/ECompressType.java @@ -0,0 +1,98 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.dtstack.flinkx.oss; + +import org.apache.commons.lang.StringUtils; + +/** + * @author wangyulei + * @date 2021-06-29 + */ +public enum ECompressType { + + /** + * text file + */ + TEXT_GZIP("GZIP", "text", ".gz", 0.331F), + TEXT_BZIP2("BZIP2", "text", ".bz2", 0.259F), + TEXT_LZO("LZO", "text", ".lzo", 1.0F), + TEXT_NONE("NONE", "text", "", 0.637F), + + /** + * orc file + */ + ORC_SNAPPY("SNAPPY", "orc", ".snappy", 0.233F), + ORC_GZIP("GZIP", "orc", ".gz", 1.0F), + ORC_BZIP("BZIP", "orc", ".bz", 1.0F), + ORC_LZ4("LZ4", "orc", ".lz4", 1.0F), + ORC_NONE("NONE", "orc", "", 0.233F), + + /** + * parquet file + */ + PARQUET_SNAPPY("SNAPPY", "parquet", ".snappy", 0.274F), + PARQUET_GZIP("GZIP", "parquet", ".gz", 1.0F), + PARQUET_LZO("LZO", "parquet", ".lzo", 1.0F), + PARQUET_NONE("NONE", "parquet", "", 1.0F); + + private String type; + + private String fileType; + + private String suffix; + + private float deviation; + + ECompressType(String type, String fileType, String suffix, float deviation) { + this.type = type; + this.fileType = fileType; + this.suffix = suffix; + this.deviation = deviation; + } + + public static ECompressType getByTypeAndFileType(String type, String fileType){ + if (StringUtils.isEmpty(type)) { + type = "NONE"; + } + + for (ECompressType value : ECompressType.values()) { + if (value.getType().equalsIgnoreCase(type) && value.getFileType().equalsIgnoreCase(fileType)){ + return value; + } + } + + throw new IllegalArgumentException("No enum constant " + type); + } + + public String getType() { + return type; + } + + public String getFileType() { + return fileType; + } + + public String getSuffix() { + return suffix; + } + + public float getDeviation() { + return deviation; + } +} diff --git a/flinkx-oss/flinkx-oss-core/src/main/java/com/dtstack/flinkx/oss/OssConfigKeys.java b/flinkx-oss/flinkx-oss-core/src/main/java/com/dtstack/flinkx/oss/OssConfigKeys.java new file mode 100644 index 0000000000..cb16edac70 --- /dev/null +++ b/flinkx-oss/flinkx-oss-core/src/main/java/com/dtstack/flinkx/oss/OssConfigKeys.java @@ -0,0 +1,63 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.dtstack.flinkx.oss; + +/** + * @author wangyulei + * @date 2021-06-28 + */ +public class OssConfigKeys { + public static final String KEY_FIELD_DELIMITER = "fieldDelimiter"; + + public static final String KEY_ACCESS_KEY = "accessKey"; + + public static final String KEY_SECRET_KEY = "secretKey"; + + public static final String KEY_ENDPOINT = "endpoint"; + + public static final String KEY_PATH = "path"; + + public static final String KEY_FILTER = "filterRegex"; + + public static final String KEY_FILE_TYPE = "fileType"; + + public static final String KEY_WRITE_MODE = "writeMode"; + + public static final String KEY_FULL_COLUMN_NAME_LIST = "fullColumnName"; + + public static final String KEY_FULL_COLUMN_TYPE_LIST = "fullColumnType"; + + public static final String KEY_COLUMN_NAME = "name"; + + public static final String KEY_COLUMN_TYPE = "type"; + + public static final String KEY_COMPRESS = "compress"; + + public static final String KEY_FILE_NAME = "fileName"; + + public static final String KEY_ENCODING = "encoding"; + + public static final String KEY_ROW_GROUP_SIZE = "rowGroupSize"; + + public static final String KEY_MAX_FILE_SIZE = "maxFileSize"; + + public static final String KEY_FLUSH_INTERVAL = "flushInterval"; + + public static final String KEY_ENABLE_DICTIONARY = "enableDictionary"; +} diff --git a/flinkx-oss/flinkx-oss-core/src/main/java/com/dtstack/flinkx/oss/OssUtil.java b/flinkx-oss/flinkx-oss-core/src/main/java/com/dtstack/flinkx/oss/OssUtil.java new file mode 100644 index 0000000000..bbc3d1667a --- /dev/null +++ b/flinkx-oss/flinkx-oss-core/src/main/java/com/dtstack/flinkx/oss/OssUtil.java @@ -0,0 +1,202 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.dtstack.flinkx.oss; + +import com.dtstack.flinkx.enums.ColumnType; +import org.apache.hadoop.hive.common.type.HiveDecimal; +import org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe; +import org.apache.hadoop.hive.serde2.io.DateWritable; +import org.apache.hadoop.hive.serde2.io.HiveDecimalWritable; +import org.apache.hadoop.hive.serde2.io.TimestampWritable; +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector; +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory; +import org.apache.hadoop.io.*; +import org.apache.parquet.io.api.Binary; + + +/** + * Utilities for OssReader and OssWriter + * + * @author wangyulei + * @date 2021-06-29 + */ +public class OssUtil { + + public static final String NULL_VALUE = "\\N"; + + private static final long NANO_SECONDS_PER_DAY = 86400_000_000_000L; + + private static final long JULIAN_EPOCH_OFFSET_DAYS = 2440588; + + private static final double SCALE_TWO = 2.0; + private static final double SCALE_TEN = 10.0; + private static final int BIT_SIZE = 8; + + public static Object getWritableValue(Object writable) { + Class clz = writable.getClass(); + Object ret = null; + + if (clz == IntWritable.class) { + ret = ((IntWritable) writable).get(); + } else if (clz == Text.class) { + ret = ((Text) writable).toString(); + } else if (clz == LongWritable.class) { + ret = ((LongWritable) writable).get(); + } else if (clz == ByteWritable.class) { + ret = ((ByteWritable) writable).get(); + } else if (clz == DateWritable.class) { + ret = ((DateWritable) writable).get(); + } else if (writable instanceof DoubleWritable){ + ret = ((DoubleWritable) writable).get(); + } else if (writable instanceof TimestampWritable){ + ret = ((TimestampWritable) writable).getTimestamp(); + } else if (writable instanceof DateWritable){ + ret = ((DateWritable) writable).get(); + } else if (writable instanceof FloatWritable){ + ret = ((FloatWritable) writable).get(); + } else if (writable instanceof BooleanWritable){ + ret = ((BooleanWritable) writable).get(); + } else { + ret = writable.toString(); + } + return ret; + } + + public static ObjectInspector columnTypeToObjectInspetor(ColumnType columnType) { + ObjectInspector objectInspector = null; + switch (columnType) { + case TINYINT: + objectInspector = ObjectInspectorFactory.getReflectionObjectInspector(Byte.class, ObjectInspectorFactory.ObjectInspectorOptions.JAVA); + break; + case SMALLINT: + objectInspector = ObjectInspectorFactory.getReflectionObjectInspector(Short.class, ObjectInspectorFactory.ObjectInspectorOptions.JAVA); + break; + case INT: + objectInspector = ObjectInspectorFactory.getReflectionObjectInspector(Integer.class, ObjectInspectorFactory.ObjectInspectorOptions.JAVA); + break; + case BIGINT: + objectInspector = ObjectInspectorFactory.getReflectionObjectInspector(Long.class, ObjectInspectorFactory.ObjectInspectorOptions.JAVA); + break; + case FLOAT: + objectInspector = ObjectInspectorFactory.getReflectionObjectInspector(Float.class, ObjectInspectorFactory.ObjectInspectorOptions.JAVA); + break; + case DOUBLE: + objectInspector = ObjectInspectorFactory.getReflectionObjectInspector(Double.class, ObjectInspectorFactory.ObjectInspectorOptions.JAVA); + break; + case DECIMAL: + objectInspector = ObjectInspectorFactory.getReflectionObjectInspector(HiveDecimalWritable.class, ObjectInspectorFactory.ObjectInspectorOptions.JAVA); + break; + case TIMESTAMP: + objectInspector = ObjectInspectorFactory.getReflectionObjectInspector(java.sql.Timestamp.class, ObjectInspectorFactory.ObjectInspectorOptions.JAVA); + break; + case DATE: + objectInspector = ObjectInspectorFactory.getReflectionObjectInspector(java.sql.Date.class, ObjectInspectorFactory.ObjectInspectorOptions.JAVA); + break; + case STRING: + case VARCHAR: + case CHAR: + objectInspector = ObjectInspectorFactory.getReflectionObjectInspector(String.class, ObjectInspectorFactory.ObjectInspectorOptions.JAVA); + break; + case BOOLEAN: + objectInspector = ObjectInspectorFactory.getReflectionObjectInspector(Boolean.class, ObjectInspectorFactory.ObjectInspectorOptions.JAVA); + break; + case BINARY: + objectInspector = ObjectInspectorFactory.getReflectionObjectInspector(BytesWritable.class, ObjectInspectorFactory.ObjectInspectorOptions.JAVA); + break; + default: + throw new IllegalArgumentException("You should not be here"); + } + return objectInspector; + } + + + public static Binary decimalToBinary(final HiveDecimal hiveDecimal, int prec, int scale) { + byte[] decimalBytes = hiveDecimal.setScale(scale).unscaledValue().toByteArray(); + + // Estimated number of bytes needed. + int precToBytes = ParquetHiveSerDe.PRECISION_TO_BYTE_COUNT[prec - 1]; + if (precToBytes == decimalBytes.length) { + // No padding needed. + return Binary.fromReusedByteArray(decimalBytes); + } + + byte[] tgt = new byte[precToBytes]; + if (hiveDecimal.signum() == -1) { + // For negative number, initializing bits to 1 + for (int i = 0; i < precToBytes; i++) { + tgt[i] |= 0xFF; + } + } + + // Padding leading zeroes/ones. + System.arraycopy(decimalBytes, 0, tgt, precToBytes - decimalBytes.length, decimalBytes.length); + return Binary.fromReusedByteArray(tgt); + } + + public static int computeMinBytesForPrecision(int precision){ + int numBytes = 1; + while (Math.pow(SCALE_TWO, BIT_SIZE * numBytes - 1.0) < Math.pow(SCALE_TEN, precision)) { + numBytes += 1; + } + return numBytes; + } + + public static byte[] longToByteArray(long data){ + long nano = data * 1000_000; + + int julianDays = (int) ((nano / NANO_SECONDS_PER_DAY) + JULIAN_EPOCH_OFFSET_DAYS); + byte[] julianDaysBytes = getBytes(julianDays); + flip(julianDaysBytes); + + long lastDayNanos = nano % NANO_SECONDS_PER_DAY; + byte[] lastDayNanosBytes = getBytes(lastDayNanos); + flip(lastDayNanosBytes); + + byte[] dst = new byte[12]; + + System.arraycopy(lastDayNanosBytes, 0, dst, 0, 8); + System.arraycopy(julianDaysBytes, 0, dst, 8, 4); + + return dst; + } + + private static byte[] getBytes(long i) { + byte[] bytes=new byte[8]; + bytes[0] = (byte)((i >> 56) & 0xFF); + bytes[1] = (byte)((i >> 48) & 0xFF); + bytes[2] = (byte)((i >> 40) & 0xFF); + bytes[3] = (byte)((i >> 32) & 0xFF); + bytes[4] = (byte)((i >> 24) & 0xFF); + bytes[5] = (byte)((i >> 16) & 0xFF); + bytes[6] = (byte)((i >> 8) & 0xFF); + bytes[7] = (byte)(i & 0xFF); + return bytes; + } + + /** + * @param bytes + */ + private static void flip(byte[] bytes) { + for (int i = 0, j = bytes.length-1; i < j; i++, j--) { + byte t = bytes[i]; + bytes[i] = bytes[j]; + bytes[j] = t; + } + } +} diff --git a/flinkx-oss/flinkx-oss-core/src/main/java/com/dtstack/flinkx/oss/util/StrUtil.java b/flinkx-oss/flinkx-oss-core/src/main/java/com/dtstack/flinkx/oss/util/StrUtil.java new file mode 100644 index 0000000000..68638c2952 --- /dev/null +++ b/flinkx-oss/flinkx-oss-core/src/main/java/com/dtstack/flinkx/oss/util/StrUtil.java @@ -0,0 +1,185 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.dtstack.flinkx.oss.util; + +/** + * + * @author wangyulei + * @date 2021-06-29 + */ +public class StrUtil { + /** + *

Check if a String starts with a specified prefix.

+ * + *

nulls are handled without exceptions. Two null + * references are considered to be equal. The comparison is case sensitive.

+ * + *
+     * StringUtils.startsWith(null, null)      = true
+     * StringUtils.startsWith(null, "abc")     = false
+     * StringUtils.startsWith("abcdef", null)  = false
+     * StringUtils.startsWith("abcdef", "abc") = true
+     * StringUtils.startsWith("ABCDEF", "abc") = false
+     * 
+ * + * @see java.lang.String#startsWith(String) + * @param str the String to check, may be null + * @param prefix the prefix to find, may be null + * @return true if the String starts with the prefix, case sensitive, or + * both null + * @since 2.4 + */ + public static boolean startsWith(String str, String prefix) { + return startsWith(str, prefix, false); + } + + /** + *

Case insensitive check if a String starts with a specified prefix.

+ * + *

nulls are handled without exceptions. Two null + * references are considered to be equal. The comparison is case insensitive.

+ * + *
+     * StringUtils.startsWithIgnoreCase(null, null)      = true
+     * StringUtils.startsWithIgnoreCase(null, "abc")     = false
+     * StringUtils.startsWithIgnoreCase("abcdef", null)  = false
+     * StringUtils.startsWithIgnoreCase("abcdef", "abc") = true
+     * StringUtils.startsWithIgnoreCase("ABCDEF", "abc") = true
+     * 
+ * + * @see java.lang.String#startsWith(String) + * @param str the String to check, may be null + * @param prefix the prefix to find, may be null + * @return true if the String starts with the prefix, case insensitive, or + * both null + * @since 2.4 + */ + public static boolean startsWithIgnoreCase(String str, String prefix) { + return startsWith(str, prefix, true); + } + + /** + *

Check if a String starts with a specified prefix (optionally case insensitive).

+ * + * @see java.lang.String#startsWith(String) + * @param str the String to check, may be null + * @param prefix the prefix to find, may be null + * @param ignoreCase inidicates whether the compare should ignore case + * (case insensitive) or not. + * @return true if the String starts with the prefix or + * both null + */ + private static boolean startsWith(String str, String prefix, boolean ignoreCase) { + if (str == null || prefix == null) { + return (str == null && prefix == null); + } + if (prefix.length() > str.length()) { + return false; + } + return str.regionMatches(ignoreCase, 0, prefix, 0, prefix.length()); + } + + /** + *

Check if a String ends with a specified suffix.

+ * + *

nulls are handled without exceptions. Two null + * references are considered to be equal. The comparison is case sensitive.

+ * + *
+     * StringUtils.endsWith(null, null)      = true
+     * StringUtils.endsWith(null, "def")     = false
+     * StringUtils.endsWith("abcdef", null)  = false
+     * StringUtils.endsWith("abcdef", "def") = true
+     * StringUtils.endsWith("ABCDEF", "def") = false
+     * StringUtils.endsWith("ABCDEF", "cde") = false
+     * 
+ * + * @see java.lang.String#endsWith(String) + * @param str the String to check, may be null + * @param suffix the suffix to find, may be null + * @return true if the String ends with the suffix, case sensitive, or + * both null + * @since 2.4 + */ + public static boolean endsWith(String str, String suffix) { + return endsWith(str, suffix, false); + } + + /** + *

Case insensitive check if a String ends with a specified suffix.

+ * + *

nulls are handled without exceptions. Two null + * references are considered to be equal. The comparison is case insensitive.

+ * + *
+     * StringUtils.endsWithIgnoreCase(null, null)      = true
+     * StringUtils.endsWithIgnoreCase(null, "def")     = false
+     * StringUtils.endsWithIgnoreCase("abcdef", null)  = false
+     * StringUtils.endsWithIgnoreCase("abcdef", "def") = true
+     * StringUtils.endsWithIgnoreCase("ABCDEF", "def") = true
+     * StringUtils.endsWithIgnoreCase("ABCDEF", "cde") = false
+     * 
+ * + * @see java.lang.String#endsWith(String) + * @param str the String to check, may be null + * @param suffix the suffix to find, may be null + * @return true if the String ends with the suffix, case insensitive, or + * both null + * @since 2.4 + */ + public static boolean endsWithIgnoreCase(String str, String suffix) { + return endsWith(str, suffix, true); + } + + /** + *

Check if a String ends with a specified suffix (optionally case insensitive).

+ * + * @see java.lang.String#endsWith(String) + * @param str the String to check, may be null + * @param suffix the suffix to find, may be null + * @param ignoreCase inidicates whether the compare should ignore case + * (case insensitive) or not. + * @return true if the String starts with the prefix or + * both null + */ + private static boolean endsWith(String str, String suffix, boolean ignoreCase) { + if (str == null || suffix == null) { + return (str == null && suffix == null); + } + if (suffix.length() > str.length()) { + return false; + } + int strOffset = str.length() - suffix.length(); + return str.regionMatches(ignoreCase, strOffset, suffix, 0, suffix.length()); + } + + public static Boolean parseBoolean(String str) { + if (null == str || "null".equalsIgnoreCase(str)) { + return Boolean.FALSE; + } + + if ("1".equals(str)) { + return Boolean.TRUE; + } else if ("0".equals(str)) { + return Boolean.FALSE; + } else { + return Boolean.parseBoolean(str); + } + } +} diff --git a/flinkx-oss/flinkx-oss-core/src/main/java/org/apache/hadoop/hive/shims/ShimLoader.java b/flinkx-oss/flinkx-oss-core/src/main/java/org/apache/hadoop/hive/shims/ShimLoader.java new file mode 100644 index 0000000000..11c91dfb2f --- /dev/null +++ b/flinkx-oss/flinkx-oss-core/src/main/java/org/apache/hadoop/hive/shims/ShimLoader.java @@ -0,0 +1,103 @@ +// +// Source code recreated from a .class file by IntelliJ IDEA +// (powered by FernFlower decompiler) +// + +package org.apache.hadoop.hive.shims; + +import org.apache.hadoop.util.VersionInfo; +import org.apache.log4j.AppenderSkeleton; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.HashMap; +import java.util.Map; + +public abstract class ShimLoader { + private static final Logger LOG = LoggerFactory.getLogger(ShimLoader.class); + public static final String HADOOP23VERSIONNAME = "0.23"; + private static volatile HadoopShims hadoopShims; + private static JettyShims jettyShims; + private static AppenderSkeleton eventCounter; + private static SchedulerShim schedulerShim; + private static final HashMap HADOOP_SHIM_CLASSES = new HashMap(); + private static final HashMap EVENT_COUNTER_SHIM_CLASSES; + private static final HashMap HADOOP_THRIFT_AUTH_BRIDGE_CLASSES; + private static final String SCHEDULER_SHIM_CLASSE = "org.apache.hadoop.hive.schshim.FairSchedulerShim"; + + public static HadoopShims getHadoopShims() { + if (hadoopShims == null) { + Class var0 = ShimLoader.class; + synchronized(ShimLoader.class) { + if (hadoopShims == null) { + try { + hadoopShims = (HadoopShims)loadShims(HADOOP_SHIM_CLASSES, HadoopShims.class); + } catch (Throwable var3) { + LOG.error("Error loading shims", var3); + throw new RuntimeException(var3); + } + } + } + } + + return hadoopShims; + } + + public static synchronized AppenderSkeleton getEventCounter() { + if (eventCounter == null) { + eventCounter = (AppenderSkeleton)loadShims(EVENT_COUNTER_SHIM_CLASSES, AppenderSkeleton.class); + } + + return eventCounter; + } + + public static synchronized SchedulerShim getSchedulerShims() { + if (schedulerShim == null) { + schedulerShim = (SchedulerShim)createShim("org.apache.hadoop.hive.schshim.FairSchedulerShim", SchedulerShim.class); + } + + return schedulerShim; + } + + private static T loadShims(Map classMap, Class xface) { + String vers = getMajorVersion(); + String className = (String)classMap.get(vers); + return createShim(className, xface); + } + + private static T createShim(String className, Class xface) { + try { + Class clazz = Class.forName(className); + return xface.cast(clazz.newInstance()); + } catch (Exception var3) { + throw new RuntimeException("Could not load shims in class " + className, var3); + } + } + + public static String getMajorVersion() { + String vers = VersionInfo.getVersion(); + String[] parts = vers.split("\\."); + if (parts.length < 2) { + throw new RuntimeException("Illegal Hadoop Version: " + vers + " (expected A.B.* format)"); + } else { + switch(Integer.parseInt(parts[0])) { + case 2: + case 3: + return "0.23"; + default: + throw new IllegalArgumentException("Unrecognized Hadoop major version number: " + vers); + } + } + } + + private ShimLoader() { + } + + static { + HADOOP_SHIM_CLASSES.put("0.23", "org.apache.hadoop.hive.shims.Hadoop23Shims"); + EVENT_COUNTER_SHIM_CLASSES = new HashMap(); + EVENT_COUNTER_SHIM_CLASSES.put("0.23", "org.apache.hadoop.log.metrics.EventCounter"); + HADOOP_THRIFT_AUTH_BRIDGE_CLASSES = new HashMap(); + HADOOP_THRIFT_AUTH_BRIDGE_CLASSES.put("0.23", "org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge23"); + } +} \ No newline at end of file diff --git a/flinkx-oss/flinkx-oss-writer/pom.xml b/flinkx-oss/flinkx-oss-writer/pom.xml new file mode 100644 index 0000000000..963060549d --- /dev/null +++ b/flinkx-oss/flinkx-oss-writer/pom.xml @@ -0,0 +1,148 @@ + + + + flinkx-oss + com.dtstack.flinkx + 1.6 + + 4.0.0 + + flinkx-oss-writer + + + + + org.anarres.lzo + lzo-core + 1.0.2 + + + org.anarres.lzo + lzo-hadoop + 1.0.5 + + + hadoop-core + org.apache.hadoop + + + + + + com.dtstack.flinkx + flinkx-oss-core + 1.6 + + + httpcore + org.apache.httpcomponents + + + httpclient + org.apache.httpcomponents + + + + + + httpcore + org.apache.httpcomponents + 4.4.5 + + + + httpclient + org.apache.httpcomponents + 4.5.2 + + + com.dtstack.flinkx + flinkx-oss-core + 1.6 + compile + + + + + + + + org.apache.maven.plugins + maven-shade-plugin + 3.1.0 + + + package + + shade + + + false + + + org.slf4j:slf4j-api + ch.qos.logback:* + com.google.code.gson:* + com.data-artisans:* + org.scala-lang:* + io.netty:* + + + + + *:* + + META-INF/*.SF + META-INF/*.DSA + META-INF/*.RSA + + + + + + com.google.common + shade.core.com.google.common + + + com.google.thirdparty + shade.core.com.google.thirdparty + + + + + + + + + maven-antrun-plugin + 1.2 + + + copy-resources + + package + + run + + + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/flinkx-oss/flinkx-oss-writer/src/main/java/com/dtstack/flinkx/oss/writer/BaseOssOutputFormat.java b/flinkx-oss/flinkx-oss-writer/src/main/java/com/dtstack/flinkx/oss/writer/BaseOssOutputFormat.java new file mode 100644 index 0000000000..c6a12168b2 --- /dev/null +++ b/flinkx-oss/flinkx-oss-writer/src/main/java/com/dtstack/flinkx/oss/writer/BaseOssOutputFormat.java @@ -0,0 +1,361 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.dtstack.flinkx.oss.writer; + +import com.dtstack.flinkx.constants.ConstantValue; +import com.dtstack.flinkx.outputformat.BaseFileOutputFormat; +import com.dtstack.flinkx.util.ColumnTypeUtil; +import com.dtstack.flinkx.util.SysUtil; +import com.google.gson.Gson; +import org.apache.commons.lang3.StringUtils; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FileStatus; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.fs.PathFilter; + +import java.io.IOException; +import java.util.List; +import java.util.Map; + + +/** + * The oss implementation of OutputFormat + * + * @author wangyulei + * @date 2021-06-30 + */ +public abstract class BaseOssOutputFormat extends BaseFileOutputFormat { + + private static final int FILE_NAME_PART_SIZE = 3; + + protected int rowGroupSize; + + protected FileSystem fs; + + protected String endpoint; + + protected String accessKey; + + protected String secretKey; + + protected List columnTypes; + + protected List columnNames; + + protected List fullColumnNames; + + protected List fullColumnTypes; + + protected String delimiter; + + protected int[] colIndices; + + protected Configuration conf; + + protected boolean enableDictionary; + + protected transient Map decimalColInfo; + + /** + * 如果key为string类型的值是map 或者 list 会使用gson转为json格式存入 + */ + protected transient Gson gson; + + @Override + protected void openInternal(int taskNumber, int numTasks) throws IOException { + gson = new Gson(); + + initColIndices(); + super.openInternal(taskNumber, numTasks); + } + + @Override + protected void checkOutputDir() { + try { + Path dir = new Path(outputFilePath); + + if (fs.exists(dir)) { + if (fs.getFileStatus(dir).isFile()) { + throw new RuntimeException("Can't write new files under common file: " + dir + "\n" + + "One can only write new files under directories"); + } + } else { + if (!makeDir) { + throw new RuntimeException("Output path not exists:" + outputFilePath); + } + } + } catch (IOException e){ + throw new RuntimeException("Check output path error", e); + } + } + + @Override + protected void createActionFinishedTag() { + try { + if (fs.createNewFile(new Path(actionFinishedTag))) { + LOG.info("Success to create action finished tag:{}", actionFinishedTag); + } else { + LOG.warn("Failed to create action finished tag:{}", actionFinishedTag); + } + } catch (Exception e){ + throw new RuntimeException("create action finished tag error:", e); + } + } + + @Override + protected void waitForActionFinishedBeforeWrite() { + try { + Path path = new Path(actionFinishedTag); + boolean readyWrite = fs.exists(path); + int n = 0; + while (!readyWrite) { + if (n > SECOND_WAIT) { + throw new RuntimeException("Wait action finished before write timeout"); + } + + SysUtil.sleep(1000); + readyWrite = fs.exists(path); + n++; + } + } catch (Exception e) { + LOG.warn("Call method waitForActionFinishedBeforeWrite error", e); + } + } + + @Override + protected void cleanDirtyData() { + int fileIndex = formatState.getFileIndex(); + String lastJobId = formatState.getJobId(); + LOG.info("start to cleanDirtyData, fileIndex = {}, lastJobId = {}",fileIndex, lastJobId); + if (StringUtils.isBlank(lastJobId)) { + return; + } + + PathFilter filter = new PathFilter() { + @Override + public boolean accept(Path path) { + String fileName = path.getName(); + if (!fileName.contains(lastJobId)) { + return false; + } + + String[] splits = fileName.split("\\."); + if (splits.length == FILE_NAME_PART_SIZE) { + return Integer.parseInt(splits[2]) > fileIndex; + } + + return false; + } + }; + + try { + FileStatus[] dirtyData = fs.listStatus(new Path(outputFilePath), filter); + if (dirtyData != null && dirtyData.length > 0) { + for (FileStatus dirtyDatum : dirtyData) { + fs.delete(dirtyDatum.getPath(), false); + LOG.info("Delete dirty data file:{}", dirtyDatum.getPath()); + } + } + } catch (Exception e) { + LOG.error("Clean dirty data error:", e); + throw new RuntimeException(e); + } + } + + @Override + protected void openSource() throws IOException{ + try { + conf = new Configuration(); + conf.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem"); + conf.set("fs.file.impl", "org.apache.hadoop.fs.LocalFileSystem"); + conf.set("fs.s3a.connection.ssl.enabled", "false"); + conf.set("fs.s3a.path.style.access", "true"); + conf.set("fs.s3a.endpoint", endpoint); + conf.set("fs.s3a.access.key", accessKey); + conf.set("fs.s3a.secret.key", secretKey); + fs = new Path(path).getFileSystem(conf); + } catch (Exception e) { + LOG.error("Failed to get S3AFileSystem with exception : " + e.getMessage()); + throw new RuntimeException("Failed to get S3AFileSystem with exception", e); + } + } + + private void initColIndices() { + if (fullColumnNames == null || fullColumnNames.size() == 0) { + fullColumnNames = columnNames; + } + + if (fullColumnTypes == null || fullColumnTypes.size() == 0) { + fullColumnTypes = columnTypes; + } + + colIndices = new int[fullColumnNames.size()]; + for (int i = 0; i < fullColumnNames.size(); ++i) { + int j = 0; + for (; j < columnNames.size(); ++j) { + if (fullColumnNames.get(i).equalsIgnoreCase(columnNames.get(j))) { + colIndices[i] = j; + break; + } + } + if (j == columnNames.size()) { + colIndices[i] = -1; + } + } + } + + @Override + protected void moveTemporaryDataBlockFileToDirectory(){ + try { + if (currentBlockFileName != null && currentBlockFileName.startsWith(ConstantValue.POINT_SYMBOL)) { + Path src = new Path(tmpPath + SP + currentBlockFileName); + if (!fs.exists(src)) { + LOG.warn("block file {} not exists", currentBlockFileName); + return; + } + + String dataFileName = currentBlockFileName.replaceFirst("\\.",""); + Path dist = new Path(tmpPath + SP + dataFileName); + + if (fs.rename(src, dist)) { + LOG.info("Rename temporary data block file:{} to:{}", src, dist); + } else { + LOG.info("Failed to rename temporary data block file:{} to:{}", src, dist); + } + } + } catch (Exception e){ + LOG.error("Failed to rename file with exception : " + e.getMessage()); + throw new RuntimeException(e); + } + } + + @Override + protected void clearTemporaryDataFiles() throws IOException{ + Path finishedDir = null, tmpDir = null; + if (outputFilePath.endsWith("/")) { + finishedDir = new Path(outputFilePath, FINISHED_SUBDIR); + tmpDir = new Path(outputFilePath, DATA_SUBDIR); + } else { + finishedDir = new Path(outputFilePath + SP + FINISHED_SUBDIR); + tmpDir = new Path(outputFilePath + SP + DATA_SUBDIR); + } + + if (fs.delete(finishedDir, true)) { + LOG.info("Success to delete .finished dir:{}", finishedDir); + } else { + LOG.warn("Failed to delete .finished dir:{}", finishedDir); + } + + if (fs.delete(tmpDir, true)) { + LOG.info("Success to delete .data dir:{}", tmpDir); + } else { + LOG.warn("Failed to delete .data dir:{}", tmpDir); + } + } + + @Override + protected void closeSource() throws IOException { + if (fs != null) { + fs.close(); + } + } + + @Override + protected void createFinishedTag() throws IOException{ + if (fs != null) { + fs.createNewFile(new Path(finishedPath)); + LOG.info("Create finished tag dir:{}", finishedPath); + } + } + + @Override + protected void waitForAllTasksToFinish() throws IOException{ + Path finishedDir = new Path(outputFilePath + SP + FINISHED_SUBDIR); + final int maxRetryTime = 100; + int i = 0; + for (; i < maxRetryTime; ++i) { + if (fs.listStatus(finishedDir).length == numTasks) { + break; + } + SysUtil.sleep(3000); + } + + if (i == maxRetryTime) { + String subTaskDataPath = outputFilePath + SP + DATA_SUBDIR; + fs.delete(new Path(subTaskDataPath), true); + LOG.info("waitForAllTasksToFinish: delete path:[{}]", subTaskDataPath); + + fs.delete(finishedDir, true); + LOG.info("waitForAllTasksToFinish: delete finished dir:[{}]", finishedDir); + + throw new RuntimeException("timeout when gathering finish tags for each subtasks"); + } + } + + @Override + protected void coverageData() throws IOException{ + LOG.info("Overwrite the original data"); + + Path dir = new Path(outputFilePath); + if (!fs.exists(dir)) { + return; + } + + fs.delete(dir, true); + fs.mkdirs(dir); + } + + @Override + protected void moveTemporaryDataFileToDirectory() throws IOException { + PathFilter pathFilter = path -> path.getName().startsWith(String.valueOf(taskNumber)); + Path dir = new Path(outputFilePath); + Path tmpDir = new Path(tmpPath); + + FileStatus[] dataFiles = fs.listStatus(tmpDir, pathFilter); + for (FileStatus dataFile : dataFiles) { + if (fs.rename(dataFile.getPath(), new Path(dir, dataFile.getPath().getName()))) { + LOG.info("Rename temp file:{} to dir:{}", dataFile.getPath(), dir); + } else { + LOG.info("Failed to rename temp file:{} to dir:{}", dataFile.getPath(), dir); + } + } + } + + @Override + protected void moveAllTemporaryDataFileToDirectory() throws IOException { + PathFilter pathFilter = path -> !path.getName().startsWith("."); + Path dir = new Path(outputFilePath); + Path tmpDir = new Path(tmpPath); + + FileStatus[] dataFiles = fs.listStatus(tmpDir, pathFilter); + for (FileStatus dataFile : dataFiles) { + if (fs.rename(dataFile.getPath(), new Path(dir, dataFile.getPath().getName()))) { + LOG.info("Rename temp file:{} to dir:{}", dataFile.getPath(), dir); + } else { + LOG.warn("Failed to rename temp file:{} to dir:{}", dataFile.getPath(), dir); + } + } + } + + @Override + protected void writeMultipleRecordsInternal() throws Exception { + notSupportBatchWrite("OssWriter"); + } +} diff --git a/flinkx-oss/flinkx-oss-writer/src/main/java/com/dtstack/flinkx/oss/writer/OssOutputFormatBuilder.java b/flinkx-oss/flinkx-oss-writer/src/main/java/com/dtstack/flinkx/oss/writer/OssOutputFormatBuilder.java new file mode 100644 index 0000000000..e4af3bca12 --- /dev/null +++ b/flinkx-oss/flinkx-oss-writer/src/main/java/com/dtstack/flinkx/oss/writer/OssOutputFormatBuilder.java @@ -0,0 +1,101 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.dtstack.flinkx.oss.writer; + +import com.dtstack.flinkx.constants.ConstantValue; +import com.dtstack.flinkx.outputformat.FileOutputFormatBuilder; + +import java.util.List; + +/** + * The builder class of HdfsOutputFormat + * + * @author wangyulei + * @date 2021-06-30 + */ +public class OssOutputFormatBuilder extends FileOutputFormatBuilder { + + private BaseOssOutputFormat format; + + public OssOutputFormatBuilder(String type) { + switch (type.toUpperCase()) { + case "TEXT": + format = new OssTextOutputFormat(); + break; + default: + throw new IllegalArgumentException("Unsupported Oss file type: " + type); + } + + super.setFormat(format); + } + + public void setColumnNames(List columnNames) { + format.columnNames = columnNames; + } + + public void setColumnTypes(List columnTypes) { + format.columnTypes = columnTypes; + } + + public void setEndpoint(String endpoint) { + format.endpoint = endpoint; + } + + public void setAccessKey(String accessKey) { + format.accessKey = accessKey; + } + + public void setSecretKey(String secretKey) { + format.secretKey = secretKey; + } + + public void setFullColumnNames(List fullColumnNames) { + format.fullColumnNames = fullColumnNames; + } + + public void setDelimiter(String delimiter) { + format.delimiter = delimiter; + } + + public void setRowGroupSize(int rowGroupSize){ + format.rowGroupSize = rowGroupSize; + } + + public void setFullColumnTypes(List fullColumnTypes) { + format.fullColumnTypes = fullColumnTypes; + } + + public void setEnableDictionary(boolean enableDictionary) { + format.enableDictionary = enableDictionary; + } + + @Override + protected void checkFormat() { + super.checkFormat(); + + if (super.format.getPath() == null || super.format.getPath().length() == 0) { + throw new IllegalArgumentException("No valid path supplied."); + } + + if (!super.format.getPath().startsWith(ConstantValue.PROTOCOL_S3A)) { + throw new IllegalArgumentException("Path should start with s3a://"); + } + } + +} diff --git a/flinkx-oss/flinkx-oss-writer/src/main/java/com/dtstack/flinkx/oss/writer/OssTextOutputFormat.java b/flinkx-oss/flinkx-oss-writer/src/main/java/com/dtstack/flinkx/oss/writer/OssTextOutputFormat.java new file mode 100644 index 0000000000..784bebd813 --- /dev/null +++ b/flinkx-oss/flinkx-oss-writer/src/main/java/com/dtstack/flinkx/oss/writer/OssTextOutputFormat.java @@ -0,0 +1,257 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.dtstack.flinkx.oss.writer; + +import com.dtstack.flinkx.enums.ColumnType; +import com.dtstack.flinkx.exception.WriteRecordException; +import com.dtstack.flinkx.oss.ECompressType; +import com.dtstack.flinkx.oss.OssUtil; +import com.dtstack.flinkx.oss.util.StrUtil; +import com.dtstack.flinkx.util.DateUtil; +import org.apache.commons.compress.compressors.bzip2.BZip2CompressorOutputStream; +import org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream; +import org.apache.flink.types.Row; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hive.common.type.HiveDecimal; +import org.apache.hadoop.io.compress.CompressionCodecFactory; + +import java.io.IOException; +import java.io.OutputStream; +import java.math.BigDecimal; +import java.math.BigInteger; +import java.sql.Timestamp; +import java.text.SimpleDateFormat; +import java.util.Date; +import java.util.List; +import java.util.Map; + +/** + * The builder class of OssOutputFormat writing text files + * + * @author wangyulei + * @date 2021-06-30 + */ +public class OssTextOutputFormat extends BaseOssOutputFormat { + + private static final int NEWLINE = 10; + private transient OutputStream stream; + + private static final int BUFFER_SIZE = 1000; + + @Override + public void flushDataInternal() throws IOException { + LOG.info("Close current text stream, write data size:[{}]", bytesWriteCounter.getLocalValue()); + + if (stream != null){ + stream.flush(); + stream.close(); + stream = null; + } + } + + @Override + public float getDeviation(){ + ECompressType compressType = ECompressType.getByTypeAndFileType(compress, "text"); + return compressType.getDeviation(); + } + + @Override + public String getExtension() { + ECompressType compressType = ECompressType.getByTypeAndFileType(compress, "text"); + return compressType.getSuffix(); + } + + @Override + protected void nextBlock(){ + super.nextBlock(); + + if (stream != null) { + return; + } + + try { + String currentBlockTmpPath = null; + if (tmpPath.endsWith("/")) { + currentBlockTmpPath = tmpPath + currentBlockFileName; + } else { + currentBlockTmpPath = tmpPath + SP + currentBlockFileName; + } + Path p = new Path(currentBlockTmpPath); + + ECompressType compressType = ECompressType.getByTypeAndFileType(compress, "text"); + if (ECompressType.TEXT_NONE.equals(compressType)) { + stream = fs.create(p); + } else { + p = new Path(currentBlockTmpPath); + if (compressType == ECompressType.TEXT_GZIP){ + stream = new GzipCompressorOutputStream(fs.create(p)); + } else if(compressType == ECompressType.TEXT_BZIP2){ + stream = new BZip2CompressorOutputStream(fs.create(p)); + } else if (compressType == ECompressType.TEXT_LZO) { + CompressionCodecFactory factory = new CompressionCodecFactory(new Configuration()); + stream = factory.getCodecByClassName("com.hadoop.compression.lzo.LzopCodec").createOutputStream(fs.create(p)); + } + } + + LOG.info("subtask:[{}] create block file:{}", taskNumber, currentBlockTmpPath); + + blockIndex++; + } catch (Exception e){ + LOG.error(e.getMessage(), e); + throw new RuntimeException(e); + } + } + + @Override + public void writeSingleRecordToFile(Row row) throws WriteRecordException { + if (stream == null) { + nextBlock(); + } + + StringBuilder sb = new StringBuilder(); + int i = 0; + try { + int cnt = fullColumnNames.size(); + for (; i < cnt; ++i) { + int j = colIndices[i]; + if (j == -1) { + continue; + } + + if (i != 0) { + sb.append(delimiter); + } + + appendDataToString(sb, row.getField(j), ColumnType.fromString(columnTypes.get(j))); + } + } catch (Exception e) { + if (i < row.getArity()) { + throw new WriteRecordException(recordConvertDetailErrorMessage(i, row), e, i, row); + } + throw new WriteRecordException(e.getMessage(), e); + } + + try { + byte[] bytes = sb.toString().getBytes(this.charsetName); + this.stream.write(bytes); + this.stream.write(NEWLINE); + rowsOfCurrentBlock++; + + if (restoreConfig.isRestore()) { + lastRow = row; + } + + if (rowsOfCurrentBlock % BUFFER_SIZE == 0) { + this.stream.flush(); + } + } catch (IOException e) { + LOG.error(e.getMessage(), e); + throw new WriteRecordException(String.format("数据写入oss异常,row:{%s}", row), e); + } + } + + private void appendDataToString(StringBuilder sb, Object column, ColumnType columnType) { + if (column == null) { + sb.append(OssUtil.NULL_VALUE); + return; + } + + String rowData = column.toString(); + if (rowData.length() == 0) { + sb.append(""); + } else { + switch (columnType) { + case TINYINT: + sb.append(Byte.valueOf(rowData)); + break; + case SMALLINT: + sb.append(Short.valueOf(rowData)); + break; + case INT: + sb.append(Integer.valueOf(rowData)); + break; + case BIGINT: + if (column instanceof Timestamp){ + column=((Timestamp) column).getTime(); + sb.append(column); + break; + } + + BigInteger data = new BigInteger(rowData); + if (data.compareTo(new BigInteger(String.valueOf(Long.MAX_VALUE))) > 0){ + sb.append(data); + } else { + sb.append(Long.valueOf(rowData)); + } + break; + case FLOAT: + sb.append(Float.valueOf(rowData)); + break; + case DOUBLE: + sb.append(Double.valueOf(rowData)); + break; + case DECIMAL: + sb.append(HiveDecimal.create(new BigDecimal(rowData))); + break; + case STRING: + case VARCHAR: + case CHAR: + if (column instanceof Timestamp) { + SimpleDateFormat fm = DateUtil.getDateTimeFormatterForMillisencond(); + sb.append(fm.format(column)); + } else if (column instanceof Map || column instanceof List) { + sb.append(gson.toJson(column)); + } else { + sb.append(rowData); + } + break; + case BOOLEAN: + sb.append(StrUtil.parseBoolean(rowData)); + break; + case DATE: + column = DateUtil.columnToDate(column,null); + sb.append(DateUtil.dateToString((Date) column)); + break; + case TIMESTAMP: + column = DateUtil.columnToTimestamp(column,null); + sb.append(DateUtil.timestampToString((Date)column)); + break; + default: + throw new IllegalArgumentException("Unsupported column type: " + columnType); + } + } + } + + @Override + protected String recordConvertDetailErrorMessage(int pos, Row row) { + return "\nOssTextOutputFormat [" + jobName + "] writeRecord error: when converting field[" + columnNames.get(pos) + "] in Row(" + row + ")"; + } + + @Override + public void closeSource() throws IOException { + OutputStream s = this.stream; + if (s != null) { + s.flush(); + this.stream = null; + s.close(); + } + } + +} diff --git a/flinkx-oss/flinkx-oss-writer/src/main/java/com/dtstack/flinkx/oss/writer/OssWriter.java b/flinkx-oss/flinkx-oss-writer/src/main/java/com/dtstack/flinkx/oss/writer/OssWriter.java new file mode 100644 index 0000000000..526c636455 --- /dev/null +++ b/flinkx-oss/flinkx-oss-writer/src/main/java/com/dtstack/flinkx/oss/writer/OssWriter.java @@ -0,0 +1,173 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package com.dtstack.flinkx.oss.writer; + +import com.dtstack.flinkx.config.DataTransferConfig; +import com.dtstack.flinkx.config.WriterConfig; +import com.dtstack.flinkx.constants.ConstantValue; +import com.dtstack.flinkx.oss.util.StrUtil; +import com.dtstack.flinkx.writer.BaseDataWriter; +import org.apache.flink.streaming.api.datastream.DataStream; +import org.apache.flink.streaming.api.datastream.DataStreamSink; +import org.apache.flink.types.Row; +import org.apache.parquet.hadoop.ParquetWriter; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.ArrayList; +import java.util.List; +import java.util.Map; + +import static com.dtstack.flinkx.oss.OssConfigKeys.*; + +/** + * The writer plugin of oss + * + * @author wangyulei + * @date 2021-06-29 + */ +public class OssWriter extends BaseDataWriter { + + protected final Logger LOG = LoggerFactory.getLogger(getClass()); + + protected String fileType; + + protected String path; + + protected String fieldDelimiter; + + protected String compress; + + protected String fileName; + + protected List columnName; + + protected List columnType; + + protected String endpoint; + + protected String accessKey; + + protected String secretKey; + + protected String charSet; + + protected List fullColumnName; + + protected List fullColumnType; + + protected int rowGroupSize; + + protected long maxFileSize; + + protected long flushInterval; + + protected boolean enableDictionary; + + public OssWriter(DataTransferConfig config) { + super(config); + WriterConfig writerConfig = config.getJob().getContent().get(0).getWriter(); + endpoint = writerConfig.getParameter().getStringVal(KEY_ENDPOINT); + accessKey = writerConfig.getParameter().getStringVal(KEY_ACCESS_KEY); + secretKey = writerConfig.getParameter().getStringVal(KEY_SECRET_KEY); + List columns = writerConfig.getParameter().getColumn(); + fileType = writerConfig.getParameter().getStringVal(KEY_FILE_TYPE); + + path = writerConfig.getParameter().getStringVal(KEY_PATH); + fieldDelimiter = writerConfig.getParameter().getStringVal(KEY_FIELD_DELIMITER); + charSet = writerConfig.getParameter().getStringVal(KEY_ENCODING); + rowGroupSize = writerConfig.getParameter().getIntVal(KEY_ROW_GROUP_SIZE, ParquetWriter.DEFAULT_BLOCK_SIZE); + maxFileSize = writerConfig.getParameter().getLongVal(KEY_MAX_FILE_SIZE, ConstantValue.STORE_SIZE_G); + flushInterval = writerConfig.getParameter().getLongVal(KEY_FLUSH_INTERVAL, 0); + enableDictionary = writerConfig.getParameter().getBooleanVal(KEY_ENABLE_DICTIONARY, true); + + if (fieldDelimiter == null || fieldDelimiter.length() == 0) { + fieldDelimiter = "\001"; + } else { + fieldDelimiter = com.dtstack.flinkx.util.StringUtil.convertRegularExpr(fieldDelimiter); + } + + compress = writerConfig.getParameter().getStringVal(KEY_COMPRESS); + fileName = writerConfig.getParameter().getStringVal(KEY_FILE_NAME, ""); + if (columns != null && columns.size() > 0) { + columnName = new ArrayList<>(); + columnType = new ArrayList<>(); + for (Object column : columns) { + Map sm = (Map) column; + columnName.add((String) sm.get(KEY_COLUMN_NAME)); + columnType.add((String) sm.get(KEY_COLUMN_TYPE)); + } + } + + fullColumnName = (List) writerConfig.getParameter().getVal(KEY_FULL_COLUMN_NAME_LIST); + fullColumnType = (List) writerConfig.getParameter().getVal(KEY_FULL_COLUMN_TYPE_LIST); + + mode = writerConfig.getParameter().getStringVal(KEY_WRITE_MODE); + } + + @Override + public DataStreamSink writeData(DataStream dataSet) { + OssOutputFormatBuilder builder = new OssOutputFormatBuilder(fileType); + builder.setPath(dealWithPath(path)); + builder.setEndpoint(endpoint); + builder.setAccessKey(accessKey); + builder.setSecretKey(secretKey); + builder.setFileName(fileName); + builder.setWriteMode(mode); + builder.setColumnNames(columnName); + builder.setColumnTypes(columnType); + builder.setCompress(compress); + builder.setMonitorUrls(monitorUrls); + builder.setErrors(errors); + builder.setErrorRatio(errorRatio); + builder.setFullColumnNames(fullColumnName); + builder.setFullColumnTypes(fullColumnType); + builder.setDirtyPath(dirtyPath); + builder.setDirtyHadoopConfig(dirtyHadoopConfig); + builder.setSrcCols(srcCols); + builder.setCharSetName(charSet); + builder.setDelimiter(fieldDelimiter); + builder.setRowGroupSize(rowGroupSize); + builder.setRestoreConfig(restoreConfig); + builder.setMaxFileSize(maxFileSize); + builder.setFlushBlockInterval(flushInterval); + builder.setEnableDictionary(enableDictionary); + + return createOutput(dataSet, builder.finish()); + } + + private String dealWithPath(String path) { + String pathWithPrefix = path; + if (!StrUtil.startsWith(path, "s3a://")) { + if (StrUtil.startsWith(path, "//")) { + pathWithPrefix = "s3a:" + path; + } else if (StrUtil.startsWith(path, "/")) { + pathWithPrefix = "s3a:/" + path; + } else { + pathWithPrefix = "s3a://" + path; + } + } + + if (!StrUtil.endsWith(pathWithPrefix,"/")) { + pathWithPrefix = pathWithPrefix + "/"; + } + + LOG.debug("Path = " + pathWithPrefix); + return pathWithPrefix; + } +} diff --git a/flinkx-metadata-mysql/pom.xml b/flinkx-oss/pom.xml similarity index 86% rename from flinkx-metadata-mysql/pom.xml rename to flinkx-oss/pom.xml index a899bb5965..510558dc36 100644 --- a/flinkx-metadata-mysql/pom.xml +++ b/flinkx-oss/pom.xml @@ -9,11 +9,11 @@ 4.0.0 - flinkx-metadata-mysql + flinkx-oss pom - - flinkx-metadata-mysql-reader + flinkx-oss-core + flinkx-oss-writer diff --git a/flinkx-websocket/flinkx-websocket-core/pom.xml b/flinkx-pgwal/flinkx-pgwal-core/pom.xml similarity index 52% rename from flinkx-websocket/flinkx-websocket-core/pom.xml rename to flinkx-pgwal/flinkx-pgwal-core/pom.xml index 3da8e4ed43..ebbdb1958a 100644 --- a/flinkx-websocket/flinkx-websocket-core/pom.xml +++ b/flinkx-pgwal/flinkx-pgwal-core/pom.xml @@ -1,24 +1,22 @@ - - flinkx-websocket + flinkx-pgwal com.dtstack.flinkx 1.6 4.0.0 - flinkx-websocket-core + flinkx-pgwal-core - io.netty - netty-all - 4.0.23.Final + org.postgresql + postgresql + 42.2.8 - - \ No newline at end of file diff --git a/flinkx-pgwal/flinkx-pgwal-core/src/main/java/com/dtstack/flinkx/pgwal/PgDecoder.java b/flinkx-pgwal/flinkx-pgwal-core/src/main/java/com/dtstack/flinkx/pgwal/PgDecoder.java new file mode 100644 index 0000000000..ffa87be7ef --- /dev/null +++ b/flinkx-pgwal/flinkx-pgwal-core/src/main/java/com/dtstack/flinkx/pgwal/PgDecoder.java @@ -0,0 +1,274 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package com.dtstack.flinkx.pgwal; + +import com.dtstack.flinkx.reader.MetaColumn; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.nio.ByteBuffer; +import java.sql.SQLException; +import java.time.Instant; +import java.time.LocalDate; +import java.time.ZoneOffset; +import java.time.temporal.ChronoUnit; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +/** + * Date: 2019/12/14 + * Company: www.dtstack.com + * + * reference to https://github.com/debezium/debezium & http://www.postgres.cn/docs/10/protocol-logicalrep-message-formats.html + * + * @author tudou + */ +public class PgDecoder { + private static final Logger LOG = LoggerFactory.getLogger(PgDecoder.class); + + private static Instant PG_EPOCH = LocalDate.of(2000, 1, 1).atStartOfDay().toInstant(ZoneOffset.UTC); + + private Map tableMap = new HashMap<>(64); + private Map pgTypeMap; + private volatile long currentLsn; + private volatile long ts; + + public PgDecoder(Map pgTypeMap) { + this.pgTypeMap = pgTypeMap; + } + + private static String readColumnValueAsString(ByteBuffer buffer) { + //Int32 列值的长度 + int length = buffer.getInt(); + byte[] value = new byte[length]; + //Byte(n) 该列的值,以文本格式显示。n是上面的长度 + buffer.get(value, 0, length); + return new String(value); + } + + private static String readString(ByteBuffer buffer) { + StringBuilder sb = new StringBuilder(); + byte b = 0; + while ((b = buffer.get()) != 0) { + sb.append((char) b); + } + return sb.toString(); + } + + public static String unquoteIdentifierPart(String identifierPart) { + if (identifierPart == null || identifierPart.length() < 2) { + return identifierPart; + } + + Character quotingChar = deriveQuotingChar(identifierPart); + if (quotingChar != null) { + identifierPart = identifierPart.substring(1, identifierPart.length() - 1); + identifierPart = identifierPart.replace(quotingChar.toString() + quotingChar.toString(), quotingChar.toString()); + } + + return identifierPart; + } + + private static Character deriveQuotingChar(String identifierPart) { + char first = identifierPart.charAt(0); + char last = identifierPart.charAt(identifierPart.length() - 1); + + if (first == last && (first == '"' || first == '\'' || first == '`')) { + return first; + } + + return null; + } + + public Table decode(ByteBuffer buffer) throws SQLException { + Table table = new Table(); + PgMessageTypeEnum type = PgMessageTypeEnum.forType((char) buffer.get()); + switch (type) { + case BEGIN: + //Byte1('B') 将消息标识为开始消息 + handleBeginMessage(buffer); + break; + case COMMIT: + //Byte1('C') 将消息标识为提交消息 + handleCommitMessage(buffer); + break; + case RELATION: + //Byte1('R') 将消息标识为关系消息 + handleRelationMessage(buffer); + break; + case INSERT: + //Byte1('I') 将消息标识为插入消息 + table = decodeInsert(buffer); + break; + case UPDATE: + //Byte1('U') 将消息标识为更新消息 + table = decodeUpdate(buffer); + break; + case DELETE: + //Byte1('D') 将消息标识为删除消息 + table = decodeDelete(buffer); + break; + default: + break; + } + table.setType(type); + return table; + } + + private void handleBeginMessage(ByteBuffer buffer) { + //Int64 事务的结束LSN + long lsn = buffer.getLong(); + //Int64 提交事务的时间戳。自PostgreSQL纪元(2000-01-01)以来的数值是微秒数 + Instant plus = PG_EPOCH.plus(buffer.getLong(), ChronoUnit.MICROS); + //Int32 事务的Xid + int anInt = buffer.getInt(); + currentLsn = lsn; + ts = plus.toEpochMilli(); + LOG.trace("handleBeginMessage result = { lsn = {}, plus = {}, anInt = {}}", lsn, plus, anInt); + } + + private void handleCommitMessage(ByteBuffer buffer) { + if(LOG.isTraceEnabled()){ + //Int8 标志;目前未使用(必须为0) + int flags = buffer.get(); + //Int64 提交的LSN + long lsn = buffer.getLong(); + //Int64 事务的结束LSN + long endLsn = buffer.getLong(); + //Int64 提交事务的时间戳。自PostgreSQL纪元(2000-01-01)以来的数值是微秒数 + Instant commitTimestamp = PG_EPOCH.plus(buffer.getLong(), ChronoUnit.MICROS); + LOG.trace("handleCommitMessage result = { flags = {}, lsn = {}, endLsn = {}, commitTimestamp = {}}", flags, lsn, endLsn, commitTimestamp); + } + } + + private void handleRelationMessage(ByteBuffer buffer) throws SQLException { + //Int32 关系的ID + int relationId = buffer.getInt(); + //String 命名空间(pg_catalog的空字符串) + String schemaName = readString(buffer); + //String 关系名称 + String tableName = readString(buffer); + //Int8 该关系的副本标识设置(与pg_class 中的relreplident相同) + int replicaIdentityId = buffer.get(); + //Int16 列数 + short columnCount = buffer.getShort(); + LOG.debug("handleRelationMessage result = { schemaName = {}, tableName = {}}", schemaName, tableName); + if(!tableMap.containsKey(relationId)){ + List columnList = new ArrayList<>(columnCount); + for (int i = 0; i < columnCount; i++) { + //Int8 列的标志。当前可以是0表示没有标记或1表示将列标记为关键字的一部分 + byte flags = buffer.get(); + //String 列的名称 + String name = unquoteIdentifierPart(readString(buffer)); + //Int32 列的数据类型的ID + String type = pgTypeMap.get(buffer.getInt()); + MetaColumn metaColumn = new MetaColumn(); + metaColumn.setIndex(i); + metaColumn.setName(name); + metaColumn.setType(type); + columnList.add(metaColumn); + //Int32 列的类型修饰符(atttypmod) + int attypmod = buffer.getInt(); + } + Table table = new Table(schemaName, tableName, columnList); + tableMap.put(relationId, table); + } + } + + private Table decodeInsert(ByteBuffer buffer) { + //Int32 与关系消息中的ID对应的关系的ID + int relationId = buffer.getInt(); + //Byte1('N') 将以下TupleData消息标识为新元组 + char tupleType = (char) buffer.get(); + //TupleData TupleData消息部分表示新元组的内容 + Object[] newData = resolveColumnsFromStreamTupleData(buffer); + Table table = tableMap.get(relationId); + table.setOldData(new Object[newData.length]); + table.setNewData(newData); + table.setCurrentLsn(currentLsn); + table.setTs(ts); + return table; + } + + private Table decodeUpdate(ByteBuffer buffer) throws SQLException { + //Int32 与关系消息中的ID对应的关系的ID + int relationId = buffer.getInt(); + Table table = tableMap.get(relationId); + //Byte1('K') 将以下TupleData子消息标识为键。该字段是可选的, 并且只有在更新改变了REPLICA IDENTITY索引一部分的任何一列中的数据时才存在 + //Byte1('O') 将以下TupleData子消息标识为旧元组。此字段是可选的, 并且仅当发生更新的表的REPLICA IDENTITY设置为FULL时才存在 + //更新消息可以包含'K'消息部分或者'O'消息部分或者都不包含它们,但不同时包括它们两者 + char tupleType = (char) buffer.get(); + if ('O' == tupleType || 'K' == tupleType) { + //TupleData TupleData消息部分表示旧元组或主键的内容。 只有在前面的'O'或'K'部分存在时才存在 + Object[] oldData = resolveColumnsFromStreamTupleData(buffer); + table.setOldData(oldData); + // Read the 'N' tuple type + // This is necessary so the stream position is accurate for resolving the column tuple data + //Byte1('N') 将以下TupleData消息标识为新元组 + tupleType = (char) buffer.get(); + } + //TupleData TupleData消息部分表示新元组的内容 + Object[] newData = resolveColumnsFromStreamTupleData(buffer); + table.setNewData(newData); + table.setCurrentLsn(currentLsn); + table.setTs(ts); + return table; + } + + private Table decodeDelete(ByteBuffer buffer) throws SQLException { + //Int32 与关系消息中的ID对应的关系的ID + int relationId = buffer.getInt(); + Table table = tableMap.get(relationId); + //Byte1('K') 将以下TupleData子消息标识为键。 如果发生删除的表使用索引作为REPLICA IDENTITY,则此字段存在 + //Byte1('O') 将以下TupleData消息标识为旧元组。 如果发生删除的表的REPLICA IDENTITY设置为FULL,则此字段存在 + //删除消息可能包含'K'消息部分或'O'消息部分,但不会同时包含这两个部分 + char tupleType = (char) buffer.get(); + //TupleData TupleData消息部分,表示旧元组或主键的内容,具体取决于前一个字段 + Object[] oldData = resolveColumnsFromStreamTupleData(buffer); + table.setOldData(oldData); + table.setNewData(new Object[oldData.length]); + table.setCurrentLsn(currentLsn); + table.setTs(ts); + return table; + } + + private Object[] resolveColumnsFromStreamTupleData(ByteBuffer buffer) { + //Int16 列数 + short numberOfColumns = buffer.getShort(); + Object[] data = new Object[numberOfColumns]; + for (int i = 0; i < numberOfColumns; i++) { + + //Byte1('n') 将数据标识为NULL值 + //Byte1('u') 识别未更改的TOASTed值(实际值未发送) + //Byte1('t') 将数据标识为文本格式的值 + char type = (char) buffer.get(); + if (type == 't') { + data[i] = readColumnValueAsString(buffer); + } else if (type == 'u') { + data[i] = null; + } else if (type == 'n') { + data[i] = null; + } + } + return data; + } + + +} diff --git a/flinkx-pgwal/flinkx-pgwal-core/src/main/java/com/dtstack/flinkx/pgwal/PgMessageTypeEnum.java b/flinkx-pgwal/flinkx-pgwal-core/src/main/java/com/dtstack/flinkx/pgwal/PgMessageTypeEnum.java new file mode 100644 index 0000000000..c12a614149 --- /dev/null +++ b/flinkx-pgwal/flinkx-pgwal-core/src/main/java/com/dtstack/flinkx/pgwal/PgMessageTypeEnum.java @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package com.dtstack.flinkx.pgwal; + +/** + * Date: 2019/12/14 + * Company: www.dtstack.com + * + * reference to https://github.com/debezium/debezium & http://www.postgres.cn/docs/10/protocol-logicalrep-message-formats.html + * + * @author tudou + */ +public enum PgMessageTypeEnum { + RELATION, + BEGIN, + COMMIT, + INSERT, + UPDATE, + DELETE, + TYPE, + ORIGIN; + + public static PgMessageTypeEnum forType(char type) { + switch (type) { + case 'R': return RELATION; + case 'B': return BEGIN; + case 'C': return COMMIT; + case 'I': return INSERT; + case 'U': return UPDATE; + case 'D': return DELETE; + case 'Y': return TYPE; + case 'O': return ORIGIN; + default: throw new IllegalArgumentException("Unsupported message type: " + type); + } + } +} diff --git a/flinkx-pgwal/flinkx-pgwal-core/src/main/java/com/dtstack/flinkx/pgwal/PgRelicationSlot.java b/flinkx-pgwal/flinkx-pgwal-core/src/main/java/com/dtstack/flinkx/pgwal/PgRelicationSlot.java new file mode 100644 index 0000000000..f002f50d6d --- /dev/null +++ b/flinkx-pgwal/flinkx-pgwal-core/src/main/java/com/dtstack/flinkx/pgwal/PgRelicationSlot.java @@ -0,0 +1,161 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package com.dtstack.flinkx.pgwal; + +/** + * Date: 2019/12/13 + * Company: www.dtstack.com + * + * @author tudou + */ +public class PgRelicationSlot { + private String slotName; + private String plugin; + private String slotType; + private Integer datoid; + private String database; + private String temporary; + private String active; + private Integer activePid; + private String xmin; + private String catalogXmin; + private String restartLsn; + private String confirmedFlushLsn; + + public boolean isActive(){ + return "t".equalsIgnoreCase(active); + } + + public boolean isNotActive(){ + return !isActive(); + } + + public String getSlotName() { + return slotName; + } + + public void setSlotName(String slotName) { + this.slotName = slotName; + } + + public String getPlugin() { + return plugin; + } + + public void setPlugin(String plugin) { + this.plugin = plugin; + } + + public String getSlotType() { + return slotType; + } + + public void setSlotType(String slotType) { + this.slotType = slotType; + } + + public Integer getDatoid() { + return datoid; + } + + public void setDatoid(Integer datoid) { + this.datoid = datoid; + } + + public String getDatabase() { + return database; + } + + public void setDatabase(String database) { + this.database = database; + } + + public String getTemporary() { + return temporary; + } + + public void setTemporary(String temporary) { + this.temporary = temporary; + } + + public String getActive() { + return active; + } + + public void setActive(String active) { + this.active = active; + } + + public Integer getActivePid() { + return activePid; + } + + public void setActivePid(Integer activePid) { + this.activePid = activePid; + } + + public String getXmin() { + return xmin; + } + + public void setXmin(String xmin) { + this.xmin = xmin; + } + + public String getCatalogXmin() { + return catalogXmin; + } + + public void setCatalogXmin(String catalogXmin) { + this.catalogXmin = catalogXmin; + } + + public String getRestartLsn() { + return restartLsn; + } + + public void setRestartLsn(String restartLsn) { + this.restartLsn = restartLsn; + } + + public String getConfirmedFlushLsn() { + return confirmedFlushLsn; + } + + public void setConfirmedFlushLsn(String confirmedFlushLsn) { + this.confirmedFlushLsn = confirmedFlushLsn; + } + + @Override + public String toString() { + return "PgRelicationSlots{" + + "slotName='" + slotName + '\'' + + ", plugin='" + plugin + '\'' + + ", slotType='" + slotType + '\'' + + ", datoid=" + datoid + + ", database='" + database + '\'' + + ", temporary='" + temporary + '\'' + + ", active='" + active + '\'' + + ", activePid='" + activePid + '\'' + + ", xmin='" + xmin + '\'' + + ", catalogXmin='" + catalogXmin + '\'' + + ", restartLsn='" + restartLsn + '\'' + + ", conFirmedFlushLsn='" + confirmedFlushLsn + '\'' + + '}'; + } +} diff --git a/flinkx-pgwal/flinkx-pgwal-core/src/main/java/com/dtstack/flinkx/pgwal/PgWalConfigKeys.java b/flinkx-pgwal/flinkx-pgwal-core/src/main/java/com/dtstack/flinkx/pgwal/PgWalConfigKeys.java new file mode 100644 index 0000000000..c0d3f0a03f --- /dev/null +++ b/flinkx-pgwal/flinkx-pgwal-core/src/main/java/com/dtstack/flinkx/pgwal/PgWalConfigKeys.java @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.dtstack.flinkx.pgwal; + +/** + * Date: 2019/12/13 + * Company: www.dtstack.com + * + * @author tudou + */ +public class PgWalConfigKeys { + public static final String KEY_USER_NAME = "username"; + + public static final String KEY_PASSWORD = "password"; + + public static final String KEY_URL = "url"; + + public final static String KEY_DATABASE_NAME = "databaseName"; + + public final static String KEY_CATALOG = "cat"; + + public final static String KEY_PAVING_DATA = "pavingData"; + + public final static String KEY_TABLE_LIST = "tableList"; + + public final static String KEY_STATUS_INTERVAL = "statusInterval"; + + public final static String KEY_LSN = "lsn"; + + public final static String KEY_SLOT_NAME = "slotName"; + + public final static String KEY_ALLOW_CREATE_SLOT = "allowCreateSlot"; + + public final static String KEY_TEMPORARY = "temporary"; +} diff --git a/flinkx-pgwal/flinkx-pgwal-core/src/main/java/com/dtstack/flinkx/pgwal/PgWalUtil.java b/flinkx-pgwal/flinkx-pgwal-core/src/main/java/com/dtstack/flinkx/pgwal/PgWalUtil.java new file mode 100644 index 0000000000..4f0e9dd212 --- /dev/null +++ b/flinkx-pgwal/flinkx-pgwal-core/src/main/java/com/dtstack/flinkx/pgwal/PgWalUtil.java @@ -0,0 +1,254 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package com.dtstack.flinkx.pgwal; + +import com.dtstack.flinkx.util.ClassUtil; +import com.dtstack.flinkx.util.ExceptionUtil; +import com.dtstack.flinkx.util.TelnetUtil; +import org.postgresql.PGProperty; +import org.postgresql.core.ServerVersion; +import org.postgresql.jdbc.PgConnection; +import org.postgresql.replication.ReplicationSlotInfo; +import org.postgresql.replication.fluent.logical.ChainedLogicalCreateSlotBuilder; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.sql.*; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.Properties; + +/** + * Date: 2019/12/13 + * Company: www.dtstack.com + * + * @author tudou + */ +public class PgWalUtil { + + public static final String DRIVER = "org.postgresql.Driver"; + public static final String SLOT_PRE = "flinkx_"; + public static final String PUBLICATION_NAME = "dtstack_flinkx"; + public static final String QUERY_LEVEL = "SHOW wal_level;"; + public static final String QUERY_MAX_SLOT = "SHOW max_replication_slots;"; + public static final String QUERY_SLOT = "SELECT * FROM pg_replication_slots;"; + public static final String QUERY_TABLE_REPLICA_IDENTITY = "SELECT relreplident FROM pg_catalog.pg_class c LEFT JOIN pg_catalog.pg_namespace n ON c.relnamespace=n.oid WHERE n.nspname='%s' and c.relname='%s';"; + public static final String UPDATE_REPLICA_IDENTITY = "ALTER TABLE %s REPLICA IDENTITY FULL;"; + public static final String QUERY_PUBLICATION = "SELECT COUNT(1) FROM pg_publication WHERE pubname = '%s';"; + public static final String CREATE_PUBLICATION = "CREATE PUBLICATION %s FOR ALL TABLES;"; + public static final String QUERY_TYPES = "SELECT t.oid AS oid, t.typname AS name FROM pg_catalog.pg_type t JOIN pg_catalog.pg_namespace n ON (t.typnamespace = n.oid) WHERE n.nspname != 'pg_toast' AND t.typcategory <> 'A';"; + private static final Logger LOG = LoggerFactory.getLogger(PgWalUtil.class); + + public static PgRelicationSlot checkPostgres(PgConnection conn, boolean allowCreateSlot, String slotName, List tableList) throws Exception{ + ResultSet resultSet; + PgRelicationSlot availableSlot = null; + + //1. check postgres version + // this Judge maybe not need? + if (!conn.haveMinimumServerVersion(ServerVersion.v10)){ + String version = conn.getDBVersionNumber(); + LOG.error("postgres version must > 10, current = [{}]", version); + throw new UnsupportedOperationException("postgres version must >= 10, current = " + version); + } + + //2. check postgres wal_level + resultSet = conn.execSQLQuery(QUERY_LEVEL); + resultSet.next(); + String wal_level = resultSet.getString(1); + if(!"logical".equalsIgnoreCase(wal_level)){ + LOG.error("postgres wal_level must be logical, current = [{}]", wal_level); + throw new UnsupportedOperationException("postgres wal_level must be logical, current = " + wal_level); + } + + //3.check postgres slot + resultSet = conn.execSQLQuery(QUERY_MAX_SLOT); + resultSet.next(); + int maxSlot = resultSet.getInt(1); + int slotCount = 0; + resultSet = conn.execSQLQuery(QUERY_SLOT); + while(resultSet.next()){ + PgRelicationSlot slot = new PgRelicationSlot(); + String name = resultSet.getString("slot_name"); + slot.setSlotName(name); + slot.setActive(resultSet.getString("active")); + + if(name.equalsIgnoreCase(slotName) && slot.isNotActive()){ + slot.setPlugin(resultSet.getString("plugin")); + slot.setSlotType(resultSet.getString("slot_type")); + slot.setDatoid(resultSet.getInt("datoid")); + slot.setDatabase(resultSet.getString("database")); + slot.setTemporary(resultSet.getString("temporary")); + slot.setActivePid(resultSet.getInt("active_pid")); + slot.setXmin(resultSet.getString("xmin")); + slot.setCatalogXmin(resultSet.getString("catalog_xmin")); + slot.setRestartLsn(resultSet.getString("restart_lsn")); + slot.setConfirmedFlushLsn(resultSet.getString("confirmed_flush_lsn")); + availableSlot = slot; + break; + } + slotCount++; + } + + if(availableSlot == null){ + if(!allowCreateSlot){ + String msg = String.format("there is no available slot named [%s], please check whether slotName[%s] is correct, or set allowCreateSlot = true", slotName, slotName); + LOG.error(msg); + throw new UnsupportedOperationException(msg); + }else if(slotCount >= maxSlot){ + LOG.error("the number of slot reaches max_replication_slots[{}], please turn up max_replication_slots or remove unused slot", maxSlot); + throw new UnsupportedOperationException("the number of slot reaches max_replication_slots[" + maxSlot + "], please turn up max_replication_slots or remove unused slot"); + } + } + + //4.check table replica identity + for (String table : tableList) { + //schema.tableName + String[] tables = table.split("\\."); + resultSet = conn.execSQLQuery(String.format(QUERY_TABLE_REPLICA_IDENTITY, tables[0], tables[1])); + resultSet.next(); + String identity = parseReplicaIdentity(resultSet.getString(1)); + if(!"full".equals(identity)){ + LOG.warn("update {} replica identity, set {} to full", table, identity); + conn.createStatement().execute(String.format(UPDATE_REPLICA_IDENTITY, table)); + } + } + + //5.check publication + resultSet = conn.execSQLQuery(String.format(QUERY_PUBLICATION, PUBLICATION_NAME)); + resultSet.next(); + long count = resultSet.getLong(1); + if(count == 0L){ + LOG.warn("no publication named [{}] existed, flinkx will create one", PUBLICATION_NAME); + conn.createStatement().execute(String.format(CREATE_PUBLICATION, PUBLICATION_NAME)); + } + + closeDBResources(resultSet, null, null, false); + return availableSlot; + } + + public static PgRelicationSlot createSlot(PgConnection conn, String slotName, boolean temporary) throws SQLException{ + ChainedLogicalCreateSlotBuilder builder = conn.getReplicationAPI() + .createReplicationSlot() + .logical() + .withSlotName(slotName) + .withOutputPlugin("pgoutput"); + if(temporary){ + builder.withTemporaryOption(); + } + ReplicationSlotInfo replicationSlotInfo = builder.make(); + PgRelicationSlot slot = new PgRelicationSlot(); + slot.setSlotName(slotName); + slot.setConfirmedFlushLsn(replicationSlotInfo.getConsistentPoint().asString()); + slot.setPlugin(replicationSlotInfo.getOutputPlugin()); + return slot; + } + + public static Map queryTypes(PgConnection conn) throws SQLException{ + Map map = new HashMap<>(512); + ResultSet resultSet = conn.execSQLQuery(QUERY_TYPES); + while (resultSet.next()){ + int oid = (int) resultSet.getLong("oid"); + String typeName = resultSet.getString("name"); + map.put(oid, typeName); + } + closeDBResources(resultSet, null, null, false); + return map; + } + + public static String parseReplicaIdentity(String s) { + switch (s) { + case "n": + return "nothing"; + case "d": + return "default"; + case "i" : + return "index"; + case "f" : + return "full"; + default: + return "unknown"; + } + } + + /** + * 获取jdbc连接(超时10S) + * @param url url + * @param username 账号 + * @param password 密码 + * @return + * @throws SQLException + */ + public static PgConnection getConnection(String url, String username, String password) throws SQLException { + Connection dbConn; + ClassUtil.forName(PgWalUtil.DRIVER, PgWalUtil.class.getClassLoader()); + Properties props = new Properties(); + PGProperty.USER.set(props, username); + PGProperty.PASSWORD.set(props, password); + PGProperty.REPLICATION.set(props, "database"); + PGProperty.PREFER_QUERY_MODE.set(props, "simple"); + //postgres version must > 10 + PGProperty.ASSUME_MIN_SERVER_VERSION.set(props, "10"); + synchronized (ClassUtil.LOCK_STR) { + DriverManager.setLoginTimeout(10); + // telnet + TelnetUtil.telnet(url); + dbConn = DriverManager.getConnection(url, props); + } + + return dbConn.unwrap(PgConnection.class); + } + + /** + * 关闭连接资源 + * + * @param rs ResultSet + * @param stmt Statement + * @param conn Connection + * @param commit + */ + public static void closeDBResources(ResultSet rs, Statement stmt, Connection conn, boolean commit) { + if (null != rs) { + try { + rs.close(); + } catch (SQLException e) { + LOG.warn("Close resultSet error: {}", ExceptionUtil.getErrorMessage(e)); + } + } + + if (null != stmt) { + try { + stmt.close(); + } catch (SQLException e) { + LOG.warn("Close statement error:{}", ExceptionUtil.getErrorMessage(e)); + } + } + + if (null != conn) { + try { + if (commit && !conn.isClosed()) { + conn.commit(); + } + conn.close(); + } catch (SQLException e) { + LOG.warn("Close connection error:{}", ExceptionUtil.getErrorMessage(e)); + } + } + } + +} diff --git a/flinkx-pgwal/flinkx-pgwal-core/src/main/java/com/dtstack/flinkx/pgwal/Table.java b/flinkx-pgwal/flinkx-pgwal-core/src/main/java/com/dtstack/flinkx/pgwal/Table.java new file mode 100644 index 0000000000..442ae67718 --- /dev/null +++ b/flinkx-pgwal/flinkx-pgwal-core/src/main/java/com/dtstack/flinkx/pgwal/Table.java @@ -0,0 +1,124 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package com.dtstack.flinkx.pgwal; + +import com.dtstack.flinkx.reader.MetaColumn; +import org.postgresql.replication.LogSequenceNumber; + +import java.util.List; + +/** + * Date: 2019/12/14 + * Company: www.dtstack.com + * + * @author tudou + */ +public class Table { + private String id; + private String schema; + private String table; + private List columnList; + private Object[] oldData; + private Object[] newData; + private PgMessageTypeEnum type; + + private long currentLsn; + private long ts; + + public Table(String schema, String table, List columnList) { + this.schema = schema; + this.table = table; + this.columnList = columnList; + this.id = schema + "." + table; + } + + public Table() { + } + + public String getId() { + return id; + } + + public void setId(String id) { + this.id = id; + } + + public String getSchema() { + return schema; + } + + public void setSchema(String schema) { + this.schema = schema; + } + + public String getTable() { + return table; + } + + public void setTable(String table) { + this.table = table; + } + + public List getColumnList() { + return columnList; + } + + public void setColumnList(List columnList) { + this.columnList = columnList; + } + + public Object[] getOldData() { + return oldData; + } + + public void setOldData(Object[] oldData) { + this.oldData = oldData; + } + + public Object[] getNewData() { + return newData; + } + + public void setNewData(Object[] newData) { + this.newData = newData; + } + + public PgMessageTypeEnum getType() { + return type; + } + + public void setType(PgMessageTypeEnum type) { + this.type = type; + } + + public long getCurrentLsn() { + return currentLsn; + } + + public void setCurrentLsn(long currentLsn) { + this.currentLsn = currentLsn; + } + + public long getTs() { + return ts; + } + + public void setTs(long ts) { + this.ts = ts; + } +} diff --git a/flinkx-pgwal/flinkx-pgwal-reader/pom.xml b/flinkx-pgwal/flinkx-pgwal-reader/pom.xml new file mode 100644 index 0000000000..363722cc11 --- /dev/null +++ b/flinkx-pgwal/flinkx-pgwal-reader/pom.xml @@ -0,0 +1,69 @@ + + + + flinkx-pgwal + com.dtstack.flinkx + 1.6 + + 4.0.0 + + flinkx-pgwal-reader + + + + com.dtstack.flinkx + flinkx-pgwal-core + 1.6 + + + + + + + org.apache.maven.plugins + maven-shade-plugin + 3.1.0 + + + package + + shade + + + false + + + + + + + maven-antrun-plugin + 1.2 + + + copy-resources + + package + + run + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/flinkx-pgwal/flinkx-pgwal-reader/src/main/java/com/dtstack/flinkx/pgwal/format/PgWalInputFormat.java b/flinkx-pgwal/flinkx-pgwal-reader/src/main/java/com/dtstack/flinkx/pgwal/format/PgWalInputFormat.java new file mode 100644 index 0000000000..aed7f96f48 --- /dev/null +++ b/flinkx-pgwal/flinkx-pgwal-reader/src/main/java/com/dtstack/flinkx/pgwal/format/PgWalInputFormat.java @@ -0,0 +1,202 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.dtstack.flinkx.pgwal.format; + +import com.dtstack.flinkx.inputformat.BaseRichInputFormat; +import com.dtstack.flinkx.pgwal.PgRelicationSlot; +import com.dtstack.flinkx.pgwal.PgWalUtil; +import com.dtstack.flinkx.pgwal.listener.PgWalListener; +import com.dtstack.flinkx.restore.FormatState; +import com.dtstack.flinkx.util.ExceptionUtil; +import org.apache.commons.lang.StringUtils; +import org.apache.flink.core.io.GenericInputSplit; +import org.apache.flink.core.io.InputSplit; +import org.apache.flink.types.Row; +import org.postgresql.jdbc.PgConnection; + +import java.io.IOException; +import java.util.List; +import java.util.Map; +import java.util.concurrent.BlockingQueue; +import java.util.concurrent.ExecutorService; +import java.util.concurrent.Executors; +import java.util.concurrent.SynchronousQueue; + +/** + * Date: 2019/12/13 + * Company: www.dtstack.com + * + * @author tudou + */ +public class PgWalInputFormat extends BaseRichInputFormat { + protected String username; + protected String password; + protected String url; + protected String databaseName; + protected boolean pavingData = false; + protected List tableList; + protected String cat; + protected Integer statusInterval; + protected Long lsn; + protected String slotName; + protected boolean allowCreateSlot; + protected boolean temporary; + + private PgConnection conn; + private volatile long startLsn; + + private transient BlockingQueue> queue; + private transient ExecutorService executor; + private volatile boolean running = false; + + @Override + public void openInputFormat() throws IOException{ + super.openInputFormat(); + executor = Executors.newFixedThreadPool(1); + queue = new SynchronousQueue<>(true); + } + + @Override + protected void openInternal(InputSplit inputSplit) throws IOException { + if (inputSplit.getSplitNumber() != 0) { + LOG.info("PgWalInputFormat openInternal split number:{} abort...", inputSplit.getSplitNumber()); + return; + } + LOG.info("PgWalInputFormat openInternal split number:{} start...", inputSplit.getSplitNumber()); + try { + conn = PgWalUtil.getConnection(url, username, password); + if(StringUtils.isBlank(slotName)){ + slotName = PgWalUtil.SLOT_PRE + jobId; + } + PgRelicationSlot availableSlot = PgWalUtil.checkPostgres(conn, allowCreateSlot, slotName, tableList); + if(availableSlot == null){ + PgWalUtil.createSlot(conn, slotName, temporary); + } + if(lsn != 0){ + startLsn = lsn; + }else if(formatState != null && formatState.getState() != null){ + startLsn = (long)formatState.getState(); + } + + executor.submit(new PgWalListener(this)); + running = true; + }catch (Exception e){ + LOG.error("PgWalInputFormat open() failed, e = {}", ExceptionUtil.getErrorMessage(e)); + throw new RuntimeException("PgWalInputFormat open() failed, e = " + ExceptionUtil.getErrorMessage(e)); + } + LOG.info("PgWalInputFormat[{}]open: end", jobName); + + } + + @Override + protected Row nextRecordInternal(Row row) throws IOException { + try { + Map map = queue.take(); + if(map.size() == 1){ + throw new IOException((String) map.get("e")); + }else{ + startLsn = (long) map.get("lsn"); + row = Row.of(map); + } + } catch (InterruptedException e) { + LOG.error("takeEvent interrupted error:{}", ExceptionUtil.getErrorMessage(e)); + } + return row; + + } + + @Override + public FormatState getFormatState() { + if (!restoreConfig.isRestore()) { + LOG.info("return null for formatState"); + return null; + } + + super.getFormatState(); + if (formatState != null) { + formatState.setState(startLsn); + } + return formatState; + } + + @Override + protected void closeInternal() throws IOException { + if (running) { + executor.shutdownNow(); + running = false; + LOG.warn("shutdown SqlServerCdcListener......"); + } + + } + + @Override + public InputSplit[] createInputSplitsInternal(int minNumSplits) throws IOException { + InputSplit[] splits = new InputSplit[minNumSplits]; + for (int i = 0; i < minNumSplits; i++) { + splits[i] = new GenericInputSplit(i, minNumSplits); + } + return splits; + } + + @Override + public boolean reachedEnd() throws IOException { + return false; + } + + public void processEvent(Map event) { + try { + queue.put(event); + } catch (InterruptedException e) { + LOG.error("takeEvent interrupted event:{} error:{}", event, ExceptionUtil.getErrorMessage(e)); + } + } + + + public boolean isPavingData() { + return pavingData; + } + + public List getTableList() { + return tableList; + } + + public String getCat() { + return cat; + } + + public Integer getStatusInterval() { + return statusInterval; + } + + public String getSlotName() { + return slotName; + } + + public PgConnection getConn() { + return conn; + } + + public long getStartLsn() { + return startLsn; + } + + public boolean isRunning() { + return running; + } +} \ No newline at end of file diff --git a/flinkx-pgwal/flinkx-pgwal-reader/src/main/java/com/dtstack/flinkx/pgwal/format/PgWalInputFormatBuilder.java b/flinkx-pgwal/flinkx-pgwal-reader/src/main/java/com/dtstack/flinkx/pgwal/format/PgWalInputFormatBuilder.java new file mode 100644 index 0000000000..2beae76f04 --- /dev/null +++ b/flinkx-pgwal/flinkx-pgwal-reader/src/main/java/com/dtstack/flinkx/pgwal/format/PgWalInputFormatBuilder.java @@ -0,0 +1,113 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with format work for additional information + * regarding copyright ownership. The ASF licenses format file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use format file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.dtstack.flinkx.pgwal.format; + +import com.dtstack.flinkx.inputformat.BaseRichInputFormatBuilder; +import org.apache.commons.collections.CollectionUtils; +import org.apache.commons.lang3.StringUtils; + +import java.util.List; + +/** + * Date: 2019/12/13 + * Company: www.dtstack.com + * + * @author tudou + */ +public class PgWalInputFormatBuilder extends BaseRichInputFormatBuilder { + + protected PgWalInputFormat format; + + public PgWalInputFormatBuilder() { + super.format = this.format = new PgWalInputFormat(); + } + + public void setUsername(String username) { + format.username = username; + } + + public void setPassword(String password) { + format.password = password; + } + + public void setUrl(String url) { + format.url = url; + } + + public void setDatabaseName(String databaseName) { + format.databaseName = databaseName; + } + + public void setPavingData(boolean pavingData) { + format.pavingData = pavingData; + } + + public void setTableList(List tableList) { + format.tableList = tableList; + } + + public void setCat(String cat) { + format.cat = cat; + } + + public void setStatusInterval(Integer statusInterval) { + format.statusInterval = statusInterval; + } + + public void setLsn(Long lsn) { + format.lsn = lsn; + } + + public void setAllowCreateSlot(Boolean allowCreateSlot) { + format.allowCreateSlot = allowCreateSlot; + } + + public void setSlotName(String slotName) { + format.slotName = slotName; + } + + public void setTemporary(Boolean temporary) { + format.temporary = temporary; + } + + @Override + protected void checkFormat() { + if (StringUtils.isBlank(format.username)) { + throw new IllegalArgumentException("No username supplied"); + } + if (StringUtils.isBlank(format.password)) { + throw new IllegalArgumentException("No password supplied"); + } + if (StringUtils.isBlank(format.url)) { + throw new IllegalArgumentException("No url supplied"); + } + if (StringUtils.isBlank(format.databaseName)) { + throw new IllegalArgumentException("No databaseName supplied"); + } + if (CollectionUtils.isEmpty(format.tableList)) { + throw new IllegalArgumentException("No tableList supplied"); + } + if (StringUtils.isBlank(format.cat)) { + throw new IllegalArgumentException("No cat supplied"); + } + if(!format.allowCreateSlot && StringUtils.isBlank(format.slotName)){ + throw new IllegalArgumentException("slotName can not be null if allowCreateSlot is false"); + } + } +} \ No newline at end of file diff --git a/flinkx-pgwal/flinkx-pgwal-reader/src/main/java/com/dtstack/flinkx/pgwal/listener/PgWalListener.java b/flinkx-pgwal/flinkx-pgwal-reader/src/main/java/com/dtstack/flinkx/pgwal/listener/PgWalListener.java new file mode 100644 index 0000000000..8e5efb743f --- /dev/null +++ b/flinkx-pgwal/flinkx-pgwal-reader/src/main/java/com/dtstack/flinkx/pgwal/listener/PgWalListener.java @@ -0,0 +1,148 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.dtstack.flinkx.pgwal.listener; + +import com.dtstack.flinkx.pgwal.PgDecoder; +import com.dtstack.flinkx.pgwal.PgWalUtil; +import com.dtstack.flinkx.pgwal.Table; +import com.dtstack.flinkx.pgwal.format.PgWalInputFormat; +import com.dtstack.flinkx.reader.MetaColumn; +import com.dtstack.flinkx.util.ExceptionUtil; +import com.google.gson.Gson; +import org.apache.commons.lang3.StringUtils; +import org.postgresql.jdbc.PgConnection; +import org.postgresql.replication.LogSequenceNumber; +import org.postgresql.replication.PGReplicationStream; +import org.postgresql.replication.fluent.logical.ChainedLogicalStreamBuilder; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.nio.ByteBuffer; +import java.util.*; +import java.util.concurrent.TimeUnit; + +/** + * Date: 2019/12/14 + * Company: www.dtstack.com + * + * @author tudou + */ +public class PgWalListener implements Runnable { + private static final Logger LOG = LoggerFactory.getLogger(PgWalListener.class); + private static Gson gson = new Gson(); + + private PgWalInputFormat format; + private PgConnection conn; + private Set tableSet; + private Set cat; + private boolean pavingData; + + private PGReplicationStream stream; + private PgDecoder decoder; + + public PgWalListener(PgWalInputFormat format) { + this.format = format; + this.conn = format.getConn(); + this.tableSet = new HashSet<>(format.getTableList()); + this.cat = new HashSet<>(); + for (String type : format.getCat().split(",")) { + cat.add(type.toLowerCase()); + } + this.pavingData = format.isPavingData(); + } + + public void init() throws Exception{ + decoder = new PgDecoder(PgWalUtil.queryTypes(conn)); + ChainedLogicalStreamBuilder builder = conn.getReplicationAPI() + .replicationStream() + .logical() + .withSlotName(format.getSlotName()) + //协议版本。当前仅支持版本1 + .withSlotOption("proto_version", "1") + //逗号分隔的要订阅的发布名称列表(接收更改)。 单个发布名称被视为标准对象名称,并可根据需要引用 + .withSlotOption("publication_names", PgWalUtil.PUBLICATION_NAME) + .withStatusInterval(format.getStatusInterval(), TimeUnit.MILLISECONDS); + long lsn = format.getStartLsn(); + if(lsn != 0){ + builder.withStartPosition(LogSequenceNumber.valueOf(lsn)); + } + stream = builder.start(); + TimeUnit.SECONDS.sleep(1); + stream.forceUpdateStatus(); + LOG.info("init PGReplicationStream successfully..."); + } + + @Override + public void run() { + LOG.info("PgWalListener start running....."); + try { + init(); + while (format.isRunning()) { + ByteBuffer buffer = stream.readPending(); + if (buffer == null) { + continue; + } + Table table = decoder.decode(buffer); + if(StringUtils.isBlank(table.getId())){ + continue; + } + String type = table.getType().name().toLowerCase(); + if(!cat.contains(type)){ + continue; + } + if(!tableSet.contains(table.getId())){ + continue; + } + LOG.trace("table = {}",gson.toJson(table)); + Map map = new LinkedHashMap<>(); + map.put("type", type); + map.put("schema", table.getSchema()); + map.put("table", table.getTable()); + map.put("lsn", table.getCurrentLsn()); + map.put("ts", table.getTs()); + map.put("ingestion", System.nanoTime()); + if(pavingData){ + int i = 0; + for (MetaColumn column : table.getColumnList()) { + map.put("before_" + column.getName(), table.getOldData()[i]); + map.put("after_" + column.getName(), table.getNewData()[i]); + i++; + } + }else { + Map before = new LinkedHashMap<>(); + Map after = new LinkedHashMap<>(); + int i = 0; + for (MetaColumn column : table.getColumnList()) { + before.put(column.getName(), table.getOldData()[i]); + after.put(column.getName(), table.getNewData()[i]); + i++; + } + map.put("before", before); + map.put("after", after); + } + format.processEvent(map); + } + }catch (Exception e){ + String errorMessage = ExceptionUtil.getErrorMessage(e); + LOG.error(errorMessage); + format.processEvent(Collections.singletonMap("e", errorMessage)); + + } + } +} diff --git a/flinkx-pgwal/flinkx-pgwal-reader/src/main/java/com/dtstack/flinkx/pgwal/reader/PgwalReader.java b/flinkx-pgwal/flinkx-pgwal-reader/src/main/java/com/dtstack/flinkx/pgwal/reader/PgwalReader.java new file mode 100644 index 0000000000..5dcfaf888c --- /dev/null +++ b/flinkx-pgwal/flinkx-pgwal-reader/src/main/java/com/dtstack/flinkx/pgwal/reader/PgwalReader.java @@ -0,0 +1,88 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.dtstack.flinkx.pgwal.reader; + +import com.dtstack.flinkx.config.DataTransferConfig; +import com.dtstack.flinkx.config.ReaderConfig; +import com.dtstack.flinkx.pgwal.PgWalConfigKeys; +import com.dtstack.flinkx.pgwal.format.PgWalInputFormatBuilder; +import com.dtstack.flinkx.reader.BaseDataReader; +import org.apache.flink.streaming.api.datastream.DataStream; +import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; +import org.apache.flink.types.Row; + +import java.util.List; + +/** + * Date: 2019/12/13 + * Company: www.dtstack.com + * + * @author tudou + */ +public class PgwalReader extends BaseDataReader { + private String username; + private String password; + private String url; + private String databaseName; + private String cat; + private boolean pavingData; + private List tableList; + private Integer statusInterval; + private Long lsn; + private String slotName; + private boolean allowCreateSlot; + private boolean temporary; + + @SuppressWarnings("unchecked") + public PgwalReader(DataTransferConfig config, StreamExecutionEnvironment env) { + super(config, env); + ReaderConfig readerConfig = config.getJob().getContent().get(0).getReader(); + username = readerConfig.getParameter().getStringVal(PgWalConfigKeys.KEY_USER_NAME); + password = readerConfig.getParameter().getStringVal(PgWalConfigKeys.KEY_PASSWORD); + url = readerConfig.getParameter().getStringVal(PgWalConfigKeys.KEY_URL); + databaseName = readerConfig.getParameter().getStringVal(PgWalConfigKeys.KEY_DATABASE_NAME); + cat = readerConfig.getParameter().getStringVal(PgWalConfigKeys.KEY_CATALOG); + pavingData = readerConfig.getParameter().getBooleanVal(PgWalConfigKeys.KEY_PAVING_DATA, false); + tableList = (List) readerConfig.getParameter().getVal(PgWalConfigKeys.KEY_TABLE_LIST); + statusInterval = readerConfig.getParameter().getIntVal(PgWalConfigKeys.KEY_STATUS_INTERVAL, 20000); + lsn = readerConfig.getParameter().getLongVal(PgWalConfigKeys.KEY_LSN, 0); + slotName = readerConfig.getParameter().getStringVal(PgWalConfigKeys.KEY_SLOT_NAME); + allowCreateSlot = readerConfig.getParameter().getBooleanVal(PgWalConfigKeys.KEY_ALLOW_CREATE_SLOT, true); + temporary = readerConfig.getParameter().getBooleanVal(PgWalConfigKeys.KEY_TEMPORARY, true); + } + + @Override + public DataStream readData() { + PgWalInputFormatBuilder builder = new PgWalInputFormatBuilder(); + builder.setUsername(username); + builder.setPassword(password); + builder.setUrl(url); + builder.setDatabaseName(databaseName); + builder.setCat(cat); + builder.setPavingData(pavingData); + builder.setTableList(tableList); + builder.setRestoreConfig(restoreConfig); + builder.setStatusInterval(statusInterval); + builder.setLsn(lsn); + builder.setSlotName(slotName); + builder.setAllowCreateSlot(allowCreateSlot); + builder.setTemporary(temporary); + return createInput(builder.finish(), "pgwalreader"); + } +} diff --git a/flinkx-metadata-hive1/pom.xml b/flinkx-pgwal/pom.xml similarity index 85% rename from flinkx-metadata-hive1/pom.xml rename to flinkx-pgwal/pom.xml index cb5e94d91c..01b1f0d7c8 100644 --- a/flinkx-metadata-hive1/pom.xml +++ b/flinkx-pgwal/pom.xml @@ -8,12 +8,15 @@ 1.6 4.0.0 - - flinkx-metadata-hive1 pom + + flinkx-pgwal + - flinkx-metadata-hive1-reader + flinkx-pgwal-core + flinkx-pgwal-reader + com.dtstack.flinkx diff --git a/flinkx-phoenix/flinkx-phoenix-reader/pom.xml b/flinkx-phoenix/flinkx-phoenix-reader/pom.xml index 8729b10ac9..195da1bee4 100644 --- a/flinkx-phoenix/flinkx-phoenix-reader/pom.xml +++ b/flinkx-phoenix/flinkx-phoenix-reader/pom.xml @@ -91,7 +91,7 @@ + tofile="${basedir}/../../syncplugins/phoenixreader/${project.name}-${package.name}.jar" /> diff --git a/flinkx-phoenix/flinkx-phoenix-reader/src/main/java/com/dtstack/flinkx/phoenix/format/PhoenixInputFormat.java b/flinkx-phoenix/flinkx-phoenix-reader/src/main/java/com/dtstack/flinkx/phoenix/format/PhoenixInputFormat.java index 656763e9b5..59602e71ee 100644 --- a/flinkx-phoenix/flinkx-phoenix-reader/src/main/java/com/dtstack/flinkx/phoenix/format/PhoenixInputFormat.java +++ b/flinkx-phoenix/flinkx-phoenix-reader/src/main/java/com/dtstack/flinkx/phoenix/format/PhoenixInputFormat.java @@ -30,6 +30,7 @@ import org.apache.flink.core.io.InputSplit; import org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders; import org.apache.flink.types.Row; +import org.apache.flink.util.FlinkUserCodeClassLoader; import sun.misc.URLClassPath; import java.io.IOException; @@ -72,7 +73,7 @@ public void openInternal(InputSplit inputSplit) throws IOException { String[] alwaysParentFirstPatterns = new String[2]; alwaysParentFirstPatterns[0] = "org.apache.flink"; alwaysParentFirstPatterns[1] = "com.dtstack.flinkx"; - URLClassLoader childFirstClassLoader = FlinkUserCodeClassLoaders.childFirst(needJar.toArray(new URL[0]), parentClassLoader, alwaysParentFirstPatterns); + URLClassLoader childFirstClassLoader = FlinkUserCodeClassLoaders.childFirst(needJar.toArray(new URL[0]), parentClassLoader, alwaysParentFirstPatterns, FlinkUserCodeClassLoader.NOOP_EXCEPTION_HANDLER); ClassUtil.forName(driverName, childFirstClassLoader); diff --git a/flinkx-phoenix/flinkx-phoenix-writer/pom.xml b/flinkx-phoenix/flinkx-phoenix-writer/pom.xml index a284085fa6..4f69ff0a3f 100644 --- a/flinkx-phoenix/flinkx-phoenix-writer/pom.xml +++ b/flinkx-phoenix/flinkx-phoenix-writer/pom.xml @@ -91,7 +91,7 @@ + tofile="${basedir}/../../syncplugins/phoenixwriter/${project.name}-${package.name}.jar" /> diff --git a/flinkx-phoenix/flinkx-phoenix-writer/src/main/java/com/dtstack/flinkx/phoenix/format/PhoenixOutputFormat.java b/flinkx-phoenix/flinkx-phoenix-writer/src/main/java/com/dtstack/flinkx/phoenix/format/PhoenixOutputFormat.java index 35597aff05..303e438850 100644 --- a/flinkx-phoenix/flinkx-phoenix-writer/src/main/java/com/dtstack/flinkx/phoenix/format/PhoenixOutputFormat.java +++ b/flinkx-phoenix/flinkx-phoenix-writer/src/main/java/com/dtstack/flinkx/phoenix/format/PhoenixOutputFormat.java @@ -26,6 +26,7 @@ import org.apache.commons.collections.CollectionUtils; import org.apache.commons.io.FilenameUtils; import org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders; +import org.apache.flink.util.FlinkUserCodeClassLoader; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import sun.misc.URLClassPath; @@ -66,7 +67,7 @@ protected void openInternal(int taskNumber, int numTasks){ String[] alwaysParentFirstPatterns = new String[2]; alwaysParentFirstPatterns[0] = "org.apache.flink"; alwaysParentFirstPatterns[1] = "com.dtstack.flinkx"; - URLClassLoader childFirstClassLoader = FlinkUserCodeClassLoaders.childFirst(needJar.toArray(new URL[0]), parentClassLoader, alwaysParentFirstPatterns); + URLClassLoader childFirstClassLoader = FlinkUserCodeClassLoaders.childFirst(needJar.toArray(new URL[0]), parentClassLoader, alwaysParentFirstPatterns, FlinkUserCodeClassLoader.NOOP_EXCEPTION_HANDLER); ClassUtil.forName(driverName, childFirstClassLoader); dbConn = PhoenixUtil.getConnectionInternal(dbUrl, username, password, childFirstClassLoader); diff --git a/flinkx-phoenix5/flinkx-phoenix5-reader/pom.xml b/flinkx-phoenix5/flinkx-phoenix5-reader/pom.xml index ab6fc2962a..e07787b6cf 100644 --- a/flinkx-phoenix5/flinkx-phoenix5-reader/pom.xml +++ b/flinkx-phoenix5/flinkx-phoenix5-reader/pom.xml @@ -90,7 +90,7 @@ + tofile="${basedir}/../../syncplugins/phoenix5reader/${project.name}-${package.name}.jar" /> diff --git a/flinkx-phoenix5/flinkx-phoenix5-reader/src/main/java/com/dtstack/flinkx/phoenix5/format/Phoenix5InputFormat.java b/flinkx-phoenix5/flinkx-phoenix5-reader/src/main/java/com/dtstack/flinkx/phoenix5/format/Phoenix5InputFormat.java index d960251ab3..10beef7869 100644 --- a/flinkx-phoenix5/flinkx-phoenix5-reader/src/main/java/com/dtstack/flinkx/phoenix5/format/Phoenix5InputFormat.java +++ b/flinkx-phoenix5/flinkx-phoenix5-reader/src/main/java/com/dtstack/flinkx/phoenix5/format/Phoenix5InputFormat.java @@ -35,6 +35,7 @@ import org.apache.flink.core.io.InputSplit; import org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders; import org.apache.flink.types.Row; +import org.apache.flink.util.FlinkUserCodeClassLoader; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.HConstants; @@ -268,7 +269,7 @@ protected Connection getConnection() throws SQLException { list.add("org.apache.flink"); list.add("com.dtstack.flinkx"); - childFirstClassLoader = FlinkUserCodeClassLoaders.childFirst(needJar.toArray(new URL[0]), parentClassLoader, list.toArray(new String[0])); + childFirstClassLoader = FlinkUserCodeClassLoaders.childFirst(needJar.toArray(new URL[0]), parentClassLoader, list.toArray(new String[0]), FlinkUserCodeClassLoader.NOOP_EXCEPTION_HANDLER); ClassUtil.forName(driverName, childFirstClassLoader); if(StringUtils.isNotEmpty(username)){ diff --git a/flinkx-phoenix5/flinkx-phoenix5-writer/pom.xml b/flinkx-phoenix5/flinkx-phoenix5-writer/pom.xml index 07912fc063..de68f42658 100644 --- a/flinkx-phoenix5/flinkx-phoenix5-writer/pom.xml +++ b/flinkx-phoenix5/flinkx-phoenix5-writer/pom.xml @@ -91,7 +91,7 @@ + tofile="${basedir}/../../syncplugins/phoenix5writer/${project.name}-${package.name}.jar" /> diff --git a/flinkx-phoenix5/flinkx-phoenix5-writer/src/main/java/com/dtstack/flinkx/phoenix5/format/Phoenix5OutputFormat.java b/flinkx-phoenix5/flinkx-phoenix5-writer/src/main/java/com/dtstack/flinkx/phoenix5/format/Phoenix5OutputFormat.java index 07efa749a3..6957a2c47b 100644 --- a/flinkx-phoenix5/flinkx-phoenix5-writer/src/main/java/com/dtstack/flinkx/phoenix5/format/Phoenix5OutputFormat.java +++ b/flinkx-phoenix5/flinkx-phoenix5-writer/src/main/java/com/dtstack/flinkx/phoenix5/format/Phoenix5OutputFormat.java @@ -27,6 +27,7 @@ import org.apache.commons.io.FilenameUtils; import org.apache.commons.lang3.StringUtils; import org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders; +import org.apache.flink.util.FlinkUserCodeClassLoader; import org.apache.phoenix.query.QueryServices; import org.slf4j.Logger; import org.slf4j.LoggerFactory; @@ -68,7 +69,7 @@ protected void openInternal(int taskNumber, int numTasks){ String[] alwaysParentFirstPatterns = new String[2]; alwaysParentFirstPatterns[0] = "org.apache.flink"; alwaysParentFirstPatterns[1] = "com.dtstack.flinkx"; - URLClassLoader childFirstClassLoader = FlinkUserCodeClassLoaders.childFirst(needJar.toArray(new URL[0]), parentClassLoader, alwaysParentFirstPatterns); + URLClassLoader childFirstClassLoader = FlinkUserCodeClassLoaders.childFirst(needJar.toArray(new URL[0]), parentClassLoader, alwaysParentFirstPatterns, FlinkUserCodeClassLoader.NOOP_EXCEPTION_HANDLER); ClassUtil.forName(driverName, childFirstClassLoader); if(StringUtils.isNotEmpty(username)){ diff --git a/flinkx-phoenix5/pom.xml b/flinkx-phoenix5/pom.xml index 921299bcb2..cc8aeef584 100644 --- a/flinkx-phoenix5/pom.xml +++ b/flinkx-phoenix5/pom.xml @@ -1,6 +1,6 @@ - flinkx-all diff --git a/flinkx-polardb/flinkx-polardb-dreader/pom.xml b/flinkx-polardb/flinkx-polardb-dreader/pom.xml index 05c314e99c..d4bd90d2c4 100644 --- a/flinkx-polardb/flinkx-polardb-dreader/pom.xml +++ b/flinkx-polardb/flinkx-polardb-dreader/pom.xml @@ -100,7 +100,7 @@ + tofile="${basedir}/../../syncplugins/polardbdreader/${project.name}-${package.name}.jar" /> diff --git a/flinkx-polardb/flinkx-polardb-reader/pom.xml b/flinkx-polardb/flinkx-polardb-reader/pom.xml index 895ed074c0..60e39a8821 100644 --- a/flinkx-polardb/flinkx-polardb-reader/pom.xml +++ b/flinkx-polardb/flinkx-polardb-reader/pom.xml @@ -99,7 +99,7 @@ + tofile="${basedir}/../../syncplugins/polardbreader/${project.name}-${package.name}.jar" /> diff --git a/flinkx-polardb/flinkx-polardb-writer/pom.xml b/flinkx-polardb/flinkx-polardb-writer/pom.xml index 9d8060a532..100ecc20ad 100644 --- a/flinkx-polardb/flinkx-polardb-writer/pom.xml +++ b/flinkx-polardb/flinkx-polardb-writer/pom.xml @@ -100,7 +100,7 @@ + tofile="${basedir}/../../syncplugins/polardbwriter/${project.name}-${package.name}.jar" /> diff --git a/flinkx-postgresql/flinkx-postgresql-core/src/main/java/com/dtstack/flinkx/postgresql/PostgresqlDatabaseMeta.java b/flinkx-postgresql/flinkx-postgresql-core/src/main/java/com/dtstack/flinkx/postgresql/PostgresqlDatabaseMeta.java index 4af93b9480..191dd0e493 100644 --- a/flinkx-postgresql/flinkx-postgresql-core/src/main/java/com/dtstack/flinkx/postgresql/PostgresqlDatabaseMeta.java +++ b/flinkx-postgresql/flinkx-postgresql-core/src/main/java/com/dtstack/flinkx/postgresql/PostgresqlDatabaseMeta.java @@ -20,7 +20,10 @@ import com.dtstack.flinkx.enums.EDatabaseType; import com.dtstack.flinkx.rdb.BaseDatabaseMeta; +import org.apache.commons.lang.StringUtils; +import java.util.ArrayList; +import java.util.Iterator; import java.util.List; import java.util.Map; @@ -73,10 +76,22 @@ public String getSqlQueryColumnFields(List column, String table) { "where attrelid = '%s' ::regclass and attnum > 0 and attisdropped = 'f'"; return String.format(sql,table); } - @Override public String getUpsertStatement(List column, String table, Map> updateKey) { - throw new UnsupportedOperationException("PostgreSQL not support update mode"); + return "INSERT INTO " + quoteTable(table) + + " (" + quoteColumns(column) + ") values " + + makeValues(column.size()) + + " ON CONFLICT (" + StringUtils.join(updateKey.get("key"), ",") + ") DO UPDATE SET " + + makeUpdatePart(column); + } + + private String makeUpdatePart (List column) { + List updateList = new ArrayList<>(); + for(String col : column) { + String quotedCol = quoteColumn(col); + updateList.add(quotedCol + "=excluded." + quotedCol); + } + return StringUtils.join(updateList, ","); } @Override @@ -98,4 +113,8 @@ public int getFetchSize(){ public int getQueryTimeout(){ return 1000; } + + private String makeValues(int nCols) { + return "(" + StringUtils.repeat("?", ",", nCols) + ")"; + } } diff --git a/flinkx-postgresql/flinkx-postgresql-core/src/main/java/com/dtstack/flinkx/postgresql/PostgresqlTypeConverter.java b/flinkx-postgresql/flinkx-postgresql-core/src/main/java/com/dtstack/flinkx/postgresql/PostgresqlTypeConverter.java index b4bff44466..a405cab134 100644 --- a/flinkx-postgresql/flinkx-postgresql-core/src/main/java/com/dtstack/flinkx/postgresql/PostgresqlTypeConverter.java +++ b/flinkx-postgresql/flinkx-postgresql-core/src/main/java/com/dtstack/flinkx/postgresql/PostgresqlTypeConverter.java @@ -19,13 +19,18 @@ package com.dtstack.flinkx.postgresql; import com.dtstack.flinkx.rdb.type.TypeConverterInterface; +import com.google.common.base.Splitter; +import com.google.common.collect.Iterables; import org.apache.commons.lang3.StringUtils; import java.math.BigDecimal; import java.util.Arrays; import java.util.Collections; +import java.util.Iterator; import java.util.List; import java.util.Locale; +import java.util.function.Function; +import java.util.function.Predicate; /** * The type converter for PostgreSQL database @@ -72,7 +77,38 @@ public Object convert(Object data,String typeName) { } else if(bitTypes.contains(typeName)){ // }else if(byteTypes.contains(typeName)){ - data = Byte.valueOf(dataValue); + // According to https://www.postgresql.org/docs/current/datatype-binary.html + // the bytea data type is corresponding to byte array (byte[]) in java. + if (!(data instanceof byte[])) { + // convert binary string to byte[] + // - escape format e.g. \153\154\155\251\124 (3 octal digits and precede by backslash per byte) + // - hex format. e.g. \xDEADBEEF (2 hex digits per byte) + + // NOTE: we suppose the given binary string is valid, + // otherwise it makes no sense. + if (dataValue.startsWith("\\x")) { // hex format + data = + parseBinaryString2ByteArray( + dataValue.substring(2).replace(" ", ""), + 2, + 16, + s -> s.length() == 2, + Function.identity()); + } else if (dataValue.startsWith("\\")) { // escape format + data = + parseBinaryString2ByteArray( + dataValue, + 4, + 8, + s -> s.length() == 4 && s.startsWith("\\"), + s -> s.replace("\\", "")); + } else { + throw new IllegalArgumentException( + String.format( + "Invalid binary string [%s]. can not convert to bytea type.", + dataValue)); + } + } } else if(intTypes.contains(typeName)){ if(dataValue.contains(".")){ dataValue = new BigDecimal(dataValue).stripTrailingZeros().toPlainString(); @@ -82,4 +118,26 @@ public Object convert(Object data,String typeName) { return data; } + + private byte[] parseBinaryString2ByteArray( + String s, + int numsPerGroup, + int radix, + Predicate checker, + Function groupProcessor) { + Iterable it = Splitter.fixedLength(numsPerGroup).split(s); + byte[] ret = new byte[Iterables.size(it)]; + Iterator iterator = it.iterator(); + int i = 0; + while (iterator.hasNext()) { + String nums = iterator.next(); + if (!checker.test(nums)) { + throw new IllegalArgumentException( + String.format( + "Invalid binary string [%s]. can not parse to bytea type.", s)); + } + ret[i++] = Byte.parseByte(groupProcessor.apply(nums), radix); + } + return ret; + } } diff --git a/flinkx-postgresql/flinkx-postgresql-reader/pom.xml b/flinkx-postgresql/flinkx-postgresql-reader/pom.xml index b69180d66d..296ca62033 100644 --- a/flinkx-postgresql/flinkx-postgresql-reader/pom.xml +++ b/flinkx-postgresql/flinkx-postgresql-reader/pom.xml @@ -96,7 +96,7 @@ + tofile="${basedir}/../../syncplugins/postgresqlreader/${project.name}-${package.name}.jar" /> diff --git a/flinkx-postgresql/flinkx-postgresql-writer/pom.xml b/flinkx-postgresql/flinkx-postgresql-writer/pom.xml index 9a8a3edae8..5a64580055 100644 --- a/flinkx-postgresql/flinkx-postgresql-writer/pom.xml +++ b/flinkx-postgresql/flinkx-postgresql-writer/pom.xml @@ -95,7 +95,7 @@ + tofile="${basedir}/../../syncplugins/postgresqlwriter/${project.name}-${package.name}.jar" /> diff --git a/flinkx-kafka09/flinkx-kafka09-reader/pom.xml b/flinkx-pulsar/flinkx-pulsar-writer/pom.xml similarity index 62% rename from flinkx-kafka09/flinkx-kafka09-reader/pom.xml rename to flinkx-pulsar/flinkx-pulsar-writer/pom.xml index 71ade56f12..49ec0fab1d 100644 --- a/flinkx-kafka09/flinkx-kafka09-reader/pom.xml +++ b/flinkx-pulsar/flinkx-pulsar-writer/pom.xml @@ -1,23 +1,15 @@ - - flinkx-kafka09 + flinkx-pulsar com.dtstack.flinkx 1.6 4.0.0 - flinkx-kafka09-reader - - - - com.dtstack.flinkx - flinkx-kb-reader - 1.6 - - + flinkx-pulsar-writer @@ -33,16 +25,16 @@ false - - - com.google.common - shade.core.com.google.common - - - com.google.thirdparty - shade.core.com.google.thirdparty - - + + + *:* + + META-INF/*.SF + META-INF/*.DSA + META-INF/*.RSA + + + @@ -60,14 +52,13 @@ - + - - + @@ -75,5 +66,4 @@ - \ No newline at end of file diff --git a/flinkx-pulsar/flinkx-pulsar-writer/src/main/java/com/dtstack/flinkx/pulsar/format/Constants.java b/flinkx-pulsar/flinkx-pulsar-writer/src/main/java/com/dtstack/flinkx/pulsar/format/Constants.java new file mode 100644 index 0000000000..029ac9ecc8 --- /dev/null +++ b/flinkx-pulsar/flinkx-pulsar-writer/src/main/java/com/dtstack/flinkx/pulsar/format/Constants.java @@ -0,0 +1,30 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package com.dtstack.flinkx.pulsar.format; + +/** + * @author: pierre + * @create: 2020/3/21 + */ +public class Constants { + public static final String KEY_TOPIC = "topic"; + public static final String KEY_PULSAR_SERVICE_URL = "pulsarServiceUrl"; + public static final String KEY_PRODUCER_SETTINGS = "producerSettings"; + public static final String KEY_TABLE_FIELDS = "tableFields"; + public static final String KEY_TOKEN = "token"; +} diff --git a/flinkx-pulsar/flinkx-pulsar-writer/src/main/java/com/dtstack/flinkx/pulsar/format/PulsarOutputFormat.java b/flinkx-pulsar/flinkx-pulsar-writer/src/main/java/com/dtstack/flinkx/pulsar/format/PulsarOutputFormat.java new file mode 100644 index 0000000000..efbc945c7b --- /dev/null +++ b/flinkx-pulsar/flinkx-pulsar-writer/src/main/java/com/dtstack/flinkx/pulsar/format/PulsarOutputFormat.java @@ -0,0 +1,121 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package com.dtstack.flinkx.pulsar.format; + +import com.dtstack.flinkx.decoder.JsonDecoder; +import com.dtstack.flinkx.exception.WriteRecordException; +import com.dtstack.flinkx.outputformat.BaseRichOutputFormat; +import com.dtstack.flinkx.util.ExceptionUtil; +import com.dtstack.flinkx.util.MapUtil; +import org.apache.flink.types.Row; +import org.apache.flink.util.StringUtils; +import org.apache.pulsar.client.api.AuthenticationFactory; +import org.apache.pulsar.client.api.Producer; +import org.apache.pulsar.client.api.PulsarClient; +import org.apache.pulsar.client.api.Schema; + +import java.io.IOException; +import java.util.Collections; +import java.util.LinkedHashMap; +import java.util.List; +import java.util.Map; + +/** + * @author: pierre + * @create: 2020/3/21 + */ +public class PulsarOutputFormat extends BaseRichOutputFormat { + + private transient Producer producer; + + protected String topic; + protected String pulsarServiceUrl; + protected String token; + protected Map producerSettings; + + protected List tableFields; + protected static JsonDecoder jsonDecoder = new JsonDecoder(); + + @Override + protected void openInternal(int taskNumber, int numTasks) throws IOException { + PulsarClient client; + + if (null != token) { + client = PulsarClient.builder() + .serviceUrl(pulsarServiceUrl) + .authentication(AuthenticationFactory.token(token)) + .build(); + } else { + client = PulsarClient.builder() + .serviceUrl(pulsarServiceUrl) + .build(); + } + // pulsar-client 2.4.0 loadConf有bug + producer = client.newProducer(Schema.STRING) + .topic(topic) + .loadConf(producerSettings) + .create(); + } + + @Override + protected void writeSingleRecordInternal(Row row) throws WriteRecordException { + // copy from kafka-writer + try { + Map map; + int arity = row.getArity(); + if (tableFields != null && tableFields.size() >= arity) { + map = new LinkedHashMap<>((arity << 2) / 3); + for (int i = 0; i < arity; i++) { + map.put(tableFields.get(i), StringUtils.arrayAwareToString(row.getField(i))); + } + } else { + if (arity == 1) { + Object obj = row.getField(0); + if (obj instanceof Map) { + map = (Map) obj; + } else if (obj instanceof String) { + map = jsonDecoder.decode(obj.toString()); + } else { + map = Collections.singletonMap("message", row.toString()); + } + } else { + map = Collections.singletonMap("message", row.toString()); + } + } + emit(map); + } catch (Throwable e) { + LOG.error("pulsar writeSingleRecordInternal error:{}", ExceptionUtil.getErrorMessage(e)); + throw new WriteRecordException(e.getMessage(), e); + } + } + + protected void emit(Map event) throws IOException { + producer.send(MapUtil.writeValueAsString(event)); + } + + @Override + protected void writeMultipleRecordsInternal() { + throw new UnsupportedOperationException(); + } + + @Override + public void closeInternal() throws IOException { + LOG.warn("pulsar output closeInternal."); + producer.close(); + } +} diff --git a/flinkx-pulsar/flinkx-pulsar-writer/src/main/java/com/dtstack/flinkx/pulsar/format/PulsarOutputFormatBuilder.java b/flinkx-pulsar/flinkx-pulsar-writer/src/main/java/com/dtstack/flinkx/pulsar/format/PulsarOutputFormatBuilder.java new file mode 100644 index 0000000000..c2ed5acf97 --- /dev/null +++ b/flinkx-pulsar/flinkx-pulsar-writer/src/main/java/com/dtstack/flinkx/pulsar/format/PulsarOutputFormatBuilder.java @@ -0,0 +1,56 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package com.dtstack.flinkx.pulsar.format; + +import com.dtstack.flinkx.outputformat.BaseRichOutputFormatBuilder; + +import java.util.Map; + +/** + * @author: pierre + * @create: 2020/3/21 + */ +public class PulsarOutputFormatBuilder extends BaseRichOutputFormatBuilder { + + private PulsarOutputFormat format; + + public PulsarOutputFormatBuilder() { + super.format = format = new PulsarOutputFormat(); + } + + public void setTopic(String topic) { + format.topic = topic; + } + + public void setToken(String token) { + format.token = token; + } + + public void setPulsarServiceUrl(String pulsarServiceUrl) { + format.pulsarServiceUrl = pulsarServiceUrl; + } + + public void setProducerSettings(Map producerSettings) { + format.producerSettings = producerSettings; + } + + @Override + protected void checkFormat() { + + } +} diff --git a/flinkx-pulsar/flinkx-pulsar-writer/src/main/java/com/dtstack/flinkx/pulsar/writer/PulsarWriter.java b/flinkx-pulsar/flinkx-pulsar-writer/src/main/java/com/dtstack/flinkx/pulsar/writer/PulsarWriter.java new file mode 100644 index 0000000000..dc09d6fb7a --- /dev/null +++ b/flinkx-pulsar/flinkx-pulsar-writer/src/main/java/com/dtstack/flinkx/pulsar/writer/PulsarWriter.java @@ -0,0 +1,63 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package com.dtstack.flinkx.pulsar.writer; + +import com.dtstack.flinkx.config.DataTransferConfig; +import com.dtstack.flinkx.pulsar.format.PulsarOutputFormatBuilder; +import com.dtstack.flinkx.writer.BaseDataWriter; +import org.apache.flink.streaming.api.datastream.DataStream; +import org.apache.flink.streaming.api.datastream.DataStreamSink; +import org.apache.flink.types.Row; + +import java.util.List; +import java.util.Map; + +import static com.dtstack.flinkx.pulsar.format.Constants.*; + + +/** + * @author: pierre + * @create: 2020/3/21 + */ +public class PulsarWriter extends BaseDataWriter { + protected String topic; + protected String token; + protected String pulsarServiceUrl; + protected List tableFields; + protected Map producerSettings; + + @SuppressWarnings("unchecked") + public PulsarWriter(DataTransferConfig config){ + super(config); + topic = config.getJob().getContent().get(0).getWriter().getParameter().getStringVal(KEY_TOPIC); + token = config.getJob().getContent().get(0).getWriter().getParameter().getStringVal(KEY_TOKEN); + pulsarServiceUrl = config.getJob().getContent().get(0).getWriter().getParameter().getStringVal(KEY_PULSAR_SERVICE_URL); + producerSettings = (Map) config.getJob().getContent().get(0).getWriter().getParameter().getVal(KEY_PRODUCER_SETTINGS); + tableFields = (List)config.getJob().getContent().get(0).getWriter().getParameter().getVal(KEY_TABLE_FIELDS); + } + + @Override + public DataStreamSink writeData(DataStream dataSet) { + PulsarOutputFormatBuilder builder = new PulsarOutputFormatBuilder(); + builder.setTopic(topic); + builder.setPulsarServiceUrl(pulsarServiceUrl); + builder.setProducerSettings(producerSettings); + builder.setToken(token); + return createOutput(dataSet, builder.finish()); + } +} diff --git a/flinkx-metadata-phoenix5/pom.xml b/flinkx-pulsar/pom.xml similarity index 66% rename from flinkx-metadata-phoenix5/pom.xml rename to flinkx-pulsar/pom.xml index a121dea856..09ae5b675a 100644 --- a/flinkx-metadata-phoenix5/pom.xml +++ b/flinkx-pulsar/pom.xml @@ -8,21 +8,31 @@ 1.6 4.0.0 + + flinkx-pulsar pom + + flinkx-pulsar-writer + - flinkx-metadata-phoenix5 + + 2.5.0 + + + org.apache.pulsar + pulsar-client + ${pulsar.version} + compile + com.dtstack.flinkx flinkx-core 1.6 provided - - - flinkx-metadata-phoenix5-reader - + \ No newline at end of file diff --git a/flinkx-rdb/flinkx-rdb-core/pom.xml b/flinkx-rdb/flinkx-rdb-core/pom.xml index 161d4a640a..36a9305c05 100644 --- a/flinkx-rdb/flinkx-rdb-core/pom.xml +++ b/flinkx-rdb/flinkx-rdb-core/pom.xml @@ -33,7 +33,7 @@ + tofile="${basedir}/../../syncplugins/common/${project.name}-${package.name}.jar"/> diff --git a/flinkx-rdb/flinkx-rdb-core/src/main/java/com/dtstack/flinkx/rdb/BaseDatabaseMeta.java b/flinkx-rdb/flinkx-rdb-core/src/main/java/com/dtstack/flinkx/rdb/BaseDatabaseMeta.java index 758be23ace..cc5f9bffc1 100644 --- a/flinkx-rdb/flinkx-rdb-core/src/main/java/com/dtstack/flinkx/rdb/BaseDatabaseMeta.java +++ b/flinkx-rdb/flinkx-rdb-core/src/main/java/com/dtstack/flinkx/rdb/BaseDatabaseMeta.java @@ -169,7 +169,7 @@ protected String getUpdateSql(List column, String leftTable, String righ String prefixRight = StringUtils.isBlank(rightTable) ? "" : quoteTable(rightTable) + "."; List list = new ArrayList<>(); for(String col : column) { - list.add(prefixLeft + col + "=" + prefixRight + col); + list.add(prefixLeft + quoteColumn(col) + "=" + prefixRight + quoteColumn(col)); } return StringUtils.join(list, ","); } diff --git a/flinkx-rdb/flinkx-rdb-core/src/main/java/com/dtstack/flinkx/rdb/util/DbUtil.java b/flinkx-rdb/flinkx-rdb-core/src/main/java/com/dtstack/flinkx/rdb/util/DbUtil.java index b69d59aad4..c9a1e86489 100644 --- a/flinkx-rdb/flinkx-rdb-core/src/main/java/com/dtstack/flinkx/rdb/util/DbUtil.java +++ b/flinkx-rdb/flinkx-rdb-core/src/main/java/com/dtstack/flinkx/rdb/util/DbUtil.java @@ -41,6 +41,7 @@ import java.sql.Statement; import java.util.ArrayList; import java.util.HashMap; +import java.util.LinkedHashMap; import java.util.List; import java.util.Map; import java.util.regex.Pattern; @@ -283,18 +284,23 @@ public static List analyzeColumnType(ResultSet resultSet, List nameTypeMap = new HashMap<>((rd.getColumnCount() << 2) / 3); + Map nameTypeMap = new LinkedHashMap<>((rd.getColumnCount() << 2) / 3); for(int i = 0; i < rd.getColumnCount(); ++i) { nameTypeMap.put(rd.getColumnName(i+1),rd.getColumnTypeName(i+1)); } - for (MetaColumn metaColumn : metaColumns) { - if(metaColumn.getValue() != null){ - columnTypeList.add("VARCHAR"); - } else { - columnTypeList.add(nameTypeMap.get(metaColumn.getName())); + if (ConstantValue.STAR_SYMBOL.equals(metaColumns.get(0).getName())){ + columnTypeList.addAll(nameTypeMap.values()); + }else{ + for (MetaColumn metaColumn : metaColumns) { + if(metaColumn.getValue() != null){ + columnTypeList.add("VARCHAR"); + } else { + columnTypeList.add(nameTypeMap.get(metaColumn.getName())); + } } } + } catch (SQLException e) { String message = String.format("error to analyzeSchema, resultSet = %s, columnTypeList = %s, e = %s", resultSet, diff --git a/flinkx-rdb/flinkx-rdb-reader/pom.xml b/flinkx-rdb/flinkx-rdb-reader/pom.xml index 6bfc0b3048..1bd6ff5a5d 100644 --- a/flinkx-rdb/flinkx-rdb-reader/pom.xml +++ b/flinkx-rdb/flinkx-rdb-reader/pom.xml @@ -42,7 +42,7 @@ + tofile="${basedir}/../../syncplugins/common/${project.name}-${package.name}.jar"/> diff --git a/flinkx-rdb/flinkx-rdb-writer/pom.xml b/flinkx-rdb/flinkx-rdb-writer/pom.xml index f8e776598c..87cdb442d3 100644 --- a/flinkx-rdb/flinkx-rdb-writer/pom.xml +++ b/flinkx-rdb/flinkx-rdb-writer/pom.xml @@ -42,7 +42,7 @@ + tofile="${basedir}/../../syncplugins/common/${project.name}-${package.name}.jar"/> diff --git a/flinkx-redis/flinkx-redis-writer/pom.xml b/flinkx-redis/flinkx-redis-writer/pom.xml index 1581688b13..725c3a1df3 100644 --- a/flinkx-redis/flinkx-redis-writer/pom.xml +++ b/flinkx-redis/flinkx-redis-writer/pom.xml @@ -89,7 +89,7 @@ + tofile="${basedir}/../../syncplugins/rediswriter/${project.name}-${package.name}.jar" /> diff --git a/flinkx-restapi/flinkx-restapi-core/pom.xml b/flinkx-restapi/flinkx-restapi-core/pom.xml index 756ba6f80b..359dc9293d 100644 --- a/flinkx-restapi/flinkx-restapi-core/pom.xml +++ b/flinkx-restapi/flinkx-restapi-core/pom.xml @@ -1,6 +1,6 @@ - flinkx-restapi diff --git a/flinkx-restapi/flinkx-restapi-reader/pom.xml b/flinkx-restapi/flinkx-restapi-reader/pom.xml index 05f0e53591..151b903e3f 100644 --- a/flinkx-restapi/flinkx-restapi-reader/pom.xml +++ b/flinkx-restapi/flinkx-restapi-reader/pom.xml @@ -1,6 +1,6 @@ - flinkx-restapi @@ -89,7 +89,7 @@ + tofile="${basedir}/../../syncplugins/restapireader/${project.name}-${package.name}.jar"/> diff --git a/flinkx-restapi/flinkx-restapi-reader/restapi-reader.json b/flinkx-restapi/flinkx-restapi-reader/restapi-reader.json deleted file mode 100644 index 955f938e66..0000000000 --- a/flinkx-restapi/flinkx-restapi-reader/restapi-reader.json +++ /dev/null @@ -1,32 +0,0 @@ -{ - "job": { - "content": [ - { - "writer": { - "parameter": { - "print": true - }, - "name": "streamwriter" - }, - "reader":{ - "parameter":{ - "url": "http://172.16.8.109/server/index.php?g=Web&c=Mock&o=mock&projectID=58&uri=/api/tiezhu/test/get", - "body": "", - "method": "get", - "params": "" - }, - "name":"restapireader" - } - } - ], - "setting": { - "errorLimit": { - "record": 100 - }, - "speed": { - "bytes": 1048576, - "channel": 1 - } - } - } -} \ No newline at end of file diff --git a/flinkx-restapi/flinkx-restapi-reader/src/main/java/com/dtstack/flinkx/restapi/client/DefaultRestHandler.java b/flinkx-restapi/flinkx-restapi-reader/src/main/java/com/dtstack/flinkx/restapi/client/DefaultRestHandler.java index c7866a0f36..bb02385969 100644 --- a/flinkx-restapi/flinkx-restapi-reader/src/main/java/com/dtstack/flinkx/restapi/client/DefaultRestHandler.java +++ b/flinkx-restapi/flinkx-restapi-reader/src/main/java/com/dtstack/flinkx/restapi/client/DefaultRestHandler.java @@ -70,7 +70,7 @@ public Strategy chooseStrategy(List strategies, Map re MetaParam metaParam = new MetaParam(); //key一定是一个动态变量且不是内置变量 if (!MetaparamUtils.isDynamic(i.getKey()) && !MetaparamUtils.isInnerParam(i.getKey())) { - throw new IllegalArgumentException("strategy key " + i.getKey() + " is error,wo just support ${response.},${param.},${body.}"); + throw new IllegalArgumentException("strategy key " + i.getKey() + " is error,we just support ${response.},${param.},${body.}"); } HttpRequestParam copy = HttpRequestParam.copy(httpRequestParam); diff --git a/flinkx-restapi/flinkx-restapi-reader/src/main/java/com/dtstack/flinkx/restapi/format/RestapiInputFormat.java b/flinkx-restapi/flinkx-restapi-reader/src/main/java/com/dtstack/flinkx/restapi/format/RestapiInputFormat.java index fcba49719c..952c651aa2 100644 --- a/flinkx-restapi/flinkx-restapi-reader/src/main/java/com/dtstack/flinkx/restapi/format/RestapiInputFormat.java +++ b/flinkx-restapi/flinkx-restapi-reader/src/main/java/com/dtstack/flinkx/restapi/format/RestapiInputFormat.java @@ -113,10 +113,10 @@ protected Row nextRecordInternal(Row row) { ResponseValue value = (ResponseValue) row.getField(0); if (value.isNormal()) { //如果status是0代表是触发了异常策略stop,reachEnd更新为true + //todo 离线任务后期需要加上一个finished策略 这样就是代表任务正常结束 而不是异常stop if (value.getStatus() == 0) { throw new RuntimeException("the strategy [" + value.getErrorMsg() + " ] is triggered ,and the request param is [" + value.getRequestParam().toString() + "]" + " and the response value is " + value.getOriginResponseValue() + " job end" ); } - //todo 离线任务后期需要加上一个finished策略 这样就是代表任务正常结束 而不是异常stop state = new ResponseValue("", HttpRequestParam.copy(value.getRequestParam()), value.getOriginResponseValue()); return Row.of(value.getData()); } else { diff --git a/flinkx-restapi/flinkx-restapi-writer/pom.xml b/flinkx-restapi/flinkx-restapi-writer/pom.xml index 9867acc645..723c37691b 100644 --- a/flinkx-restapi/flinkx-restapi-writer/pom.xml +++ b/flinkx-restapi/flinkx-restapi-writer/pom.xml @@ -1,6 +1,6 @@ - flinkx-restapi @@ -89,7 +89,7 @@ + tofile="${basedir}/../../syncplugins/restapiwriter/${project.name}-${package.name}.jar" /> diff --git a/flinkx-restapi/pom.xml b/flinkx-restapi/pom.xml index 5d54a7cac9..876877c87a 100644 --- a/flinkx-restapi/pom.xml +++ b/flinkx-restapi/pom.xml @@ -1,6 +1,6 @@ - flinkx-all diff --git a/flinkx-saphana/flinkx-saphana-reader/pom.xml b/flinkx-saphana/flinkx-saphana-reader/pom.xml index 5f25d18b46..7c68672a0a 100644 --- a/flinkx-saphana/flinkx-saphana-reader/pom.xml +++ b/flinkx-saphana/flinkx-saphana-reader/pom.xml @@ -91,7 +91,7 @@ + tofile="${basedir}/../../syncplugins/saphanareader/${project.name}-${package.name}.jar" /> diff --git a/flinkx-saphana/flinkx-saphana-writer/pom.xml b/flinkx-saphana/flinkx-saphana-writer/pom.xml index f061f2e766..db4debcc3e 100644 --- a/flinkx-saphana/flinkx-saphana-writer/pom.xml +++ b/flinkx-saphana/flinkx-saphana-writer/pom.xml @@ -91,7 +91,7 @@ + tofile="${basedir}/../../syncplugins/saphanawriter/${project.name}-${package.name}.jar" /> diff --git a/flinkx-socket/flinkx-socket-core/src/main/java/com/dtstack/flinkx/socket/util/DtClientHandler.java b/flinkx-socket/flinkx-socket-core/src/main/java/com/dtstack/flinkx/socket/util/DtClientHandler.java deleted file mode 100644 index 58e7cc2826..0000000000 --- a/flinkx-socket/flinkx-socket-core/src/main/java/com/dtstack/flinkx/socket/util/DtClientHandler.java +++ /dev/null @@ -1,98 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package com.dtstack.flinkx.socket.util; - -import com.dtstack.flinkx.decoder.DecodeEnum; -import com.dtstack.flinkx.decoder.IDecode; -import com.dtstack.flinkx.decoder.JsonDecoder; -import com.dtstack.flinkx.decoder.TextDecoder; -import com.dtstack.flinkx.util.ExceptionUtil; -import io.netty.buffer.ByteBuf; -import io.netty.channel.ChannelHandlerContext; -import io.netty.channel.ChannelInboundHandlerAdapter; -import org.apache.commons.lang.StringUtils; -import org.apache.flink.types.Row; -import org.slf4j.Logger; -import org.slf4j.LoggerFactory; - -import java.nio.charset.Charset; -import java.util.Map; -import java.util.concurrent.SynchronousQueue; - -import static com.dtstack.flinkx.socket.constants.SocketCons.KEY_EXIT0; - -/** - * 自定义handler - * @author kunni@dtstack.com - */ - -public class DtClientHandler extends ChannelInboundHandlerAdapter { - - protected final Logger LOG = LoggerFactory.getLogger(getClass()); - - protected SynchronousQueue queue; - - protected IDecode decoder; - - protected String encoding; - - public DtClientHandler(SynchronousQueue queue, String decoder, String encoding){ - this.queue = queue; - this.decoder = getDecoder(decoder); - this.encoding = encoding; - } - - @Override - public void channelRead(ChannelHandlerContext ctx, Object msg) { - ByteBuf byteBuf = (ByteBuf) msg; - Map event = decoder.decode(byteBuf.toString(Charset.forName(encoding))); - Row row = new Row(event.size()); - int count = 0; - for(Map.Entry entry : event.entrySet()){ - row.setField(count++, entry.getValue()); - } - try{ - queue.put(row); - }catch (InterruptedException e){ - LOG.error(ExceptionUtil.getErrorMessage(e)); - } - } - - public IDecode getDecoder(String codeC){ - switch (DecodeEnum.valueOf(StringUtils.upperCase(codeC))){ - case JSON: - return new JsonDecoder(); - case TEXT: - default: - return new TextDecoder(); - } - } - - @Override - public void exceptionCaught(ChannelHandlerContext ctx, Throwable cause) { - String error = ExceptionUtil.getErrorMessage(cause); - LOG.error(error); - ctx.close(); - try { - queue.put(Row.of(KEY_EXIT0 + error)); - } catch (InterruptedException ex) { - LOG.error(ExceptionUtil.getErrorMessage(ex)); - } - } -} diff --git a/flinkx-socket/flinkx-socket-core/src/main/java/com/dtstack/flinkx/socket/util/DtSocketClient.java b/flinkx-socket/flinkx-socket-core/src/main/java/com/dtstack/flinkx/socket/util/DtSocketClient.java deleted file mode 100644 index 5e5e4dde3a..0000000000 --- a/flinkx-socket/flinkx-socket-core/src/main/java/com/dtstack/flinkx/socket/util/DtSocketClient.java +++ /dev/null @@ -1,110 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package com.dtstack.flinkx.socket.util; - -import com.dtstack.flinkx.util.ExceptionUtil; -import io.netty.bootstrap.Bootstrap; -import io.netty.channel.Channel; -import io.netty.channel.ChannelInitializer; -import io.netty.channel.ChannelOption; -import io.netty.channel.EventLoopGroup; -import io.netty.channel.nio.NioEventLoopGroup; -import io.netty.channel.socket.SocketChannel; -import io.netty.channel.socket.nio.NioSocketChannel; -import org.apache.flink.types.Row; -import org.slf4j.Logger; -import org.slf4j.LoggerFactory; - -import java.io.Closeable; -import java.io.Serializable; -import java.util.concurrent.SynchronousQueue; - -import static com.dtstack.flinkx.socket.constants.SocketCons.KEY_EXIT0; - -/** 采用netty实现Socket Client - * @author kunni.dtstack.com - */ - -public class DtSocketClient implements Closeable, Serializable { - - private static final long serialVersionUID = 1L; - - protected String host; - protected int port; - protected String encoding = "UTF-8"; - - protected String codeC; - protected EventLoopGroup group = new NioEventLoopGroup(); - protected SynchronousQueue queue; - - protected final Logger LOG = LoggerFactory.getLogger(getClass()); - - public Channel channel; - - public DtSocketClient(String host, int port, SynchronousQueue queue){ - this.host = host; - this.port = port; - this.queue = queue; - } - - public void start() { - Bootstrap bootstrap = new Bootstrap(); - bootstrap.group(group) - .channel(NioSocketChannel.class) - .option(ChannelOption.TCP_NODELAY, true) - .handler(new ChannelInitializer() { - @Override - public void initChannel(SocketChannel ch) { - ch.pipeline().addLast(new DtClientHandler(queue, codeC, encoding)); - } - }); - channel = bootstrap.connect(host, port).addListener(future -> { - if(future.isSuccess()) { - LOG.info("connect [{}:{}] success", host, port); - }else { - String error = String.format("connect [%s:%d] failed", host, port); - try { - queue.put(Row.of(KEY_EXIT0 + error)); - } catch (InterruptedException ex) { - LOG.error(ExceptionUtil.getErrorMessage(ex)); - } - } - }).channel(); - } - - public void setCodeC(String codeC) { - this.codeC = codeC; - } - - public void setEncoding(String encoding){ - this.encoding = encoding; - } - - @Override - public void close() { - LOG.error("close channel!!! "); - if(channel != null){ - channel.close(); - } - if(group != null) { - group.shutdownGracefully(); - } - } - -} diff --git a/flinkx-socket/flinkx-socket-core/src/test/java/com/dtstack/flinkx/socket/util/DtSocketClientTest.java b/flinkx-socket/flinkx-socket-core/src/test/java/com/dtstack/flinkx/socket/util/DtSocketClientTest.java deleted file mode 100644 index fe761fb692..0000000000 --- a/flinkx-socket/flinkx-socket-core/src/test/java/com/dtstack/flinkx/socket/util/DtSocketClientTest.java +++ /dev/null @@ -1,88 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package com.dtstack.flinkx.socket.util; - -import org.apache.flink.types.Row; -import org.junit.Assert; -import org.junit.Before; -import org.junit.Test; - -import java.io.BufferedWriter; -import java.io.OutputStreamWriter; -import java.net.ServerSocket; -import java.net.Socket; -import java.util.concurrent.SynchronousQueue; - -import static com.dtstack.flinkx.socket.constants.SocketCons.KEY_EXIT0; - -public class DtSocketClientTest { - - public static final String HOST = "localhost"; - - public static final int PORT = 8000; - - public static final String MESSAGE = "{\"key\":\"value\"}"; - - public static final String TEST = "text"; - - public SynchronousQueue queue; - - public DtSocketClient client; - - @Before - public void init(){ - queue = new SynchronousQueue<>(); - client = new DtSocketClient(HOST, PORT, queue); - client.setCodeC(TEST); - } - - @Test - public void testStart() throws InterruptedException { - new Thread(this::socketServer).start(); - client.start(); - Row row = queue.take(); - client.close(); - Assert.assertEquals(row.getField(0), MESSAGE); - } - - @Test - public void testStartFailed() throws InterruptedException { - client = new DtSocketClient(HOST, PORT, queue); - client.start(); - Row row = queue.take(); - client.close(); - Assert.assertTrue(((String) row.getField(0)).startsWith(KEY_EXIT0)); - } - - public void socketServer() { - try{ - ServerSocket ss = new ServerSocket(PORT); - Socket s = ss.accept(); - BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(s.getOutputStream())); - bw.write(MESSAGE); - bw.flush(); - s.close(); - }catch (Exception ignored){ - } - - } - - - -} diff --git a/flinkx-socket/flinkx-socket-reader/src/main/java/com/dtstack/flinkx/socket/format/SocketInputFormat.java b/flinkx-socket/flinkx-socket-reader/src/main/java/com/dtstack/flinkx/socket/format/SocketInputFormat.java deleted file mode 100644 index cd50693552..0000000000 --- a/flinkx-socket/flinkx-socket-reader/src/main/java/com/dtstack/flinkx/socket/format/SocketInputFormat.java +++ /dev/null @@ -1,121 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package com.dtstack.flinkx.socket.format; - -import com.dtstack.flinkx.constants.ConstantValue; -import com.dtstack.flinkx.inputformat.BaseRichInputFormat; -import com.dtstack.flinkx.socket.util.DtSocketClient; -import com.dtstack.flinkx.util.ExceptionUtil; -import com.dtstack.flinkx.util.ValueUtil; -import org.apache.commons.lang.StringUtils; -import org.apache.flink.core.io.GenericInputSplit; -import org.apache.flink.core.io.InputSplit; -import org.apache.flink.types.Row; - -import java.io.IOException; -import java.util.ArrayList; -import java.util.concurrent.SynchronousQueue; - -import static com.dtstack.flinkx.socket.constants.SocketCons.KEY_EXIT0; - -/** 读取socket传入的数据 - * - * @author kunni@dtstack.com - */ - -public class SocketInputFormat extends BaseRichInputFormat { - - private static final long serialVersionUID = 1L; - - protected String host; - protected int port; - protected String codeC; - protected ArrayList columns; - protected String encoding; - - protected DtSocketClient client; - protected SynchronousQueue queue; - - @Override - protected void openInternal(InputSplit inputSplit) { - queue = new SynchronousQueue<>(); - client = new DtSocketClient(host, port, queue); - client.setCodeC(codeC); - client.setEncoding(encoding); - client.start(); - } - - @Override - protected InputSplit[] createInputSplitsInternal(int splitNum) { - InputSplit[] splits = new InputSplit[splitNum]; - for (int i = 0; i < splitNum; i++) { - splits[i] = new GenericInputSplit(i,splitNum); - } - - return splits; - } - - @Override - protected Row nextRecordInternal(Row row) throws IOException { - try { - row = queue.take(); - // 设置特殊字符串,作为失败标志 - if(StringUtils.startsWith((String) row.getField(0), KEY_EXIT0)){ - throw new IOException("socket client lost connection completely, job failed " + row.getField(0)); - } - } catch (InterruptedException e) { - LOG.error("takeEvent interrupted error: {}", ExceptionUtil.getErrorMessage(e)); - throw new IOException(e); - } - return row; - } - - - @Override - protected void closeInternal() { - if(client != null){ - client.close(); - } - } - - - @Override - public boolean reachedEnd() { - return false; - } - - public void setAddress(String address) { - String[] hostPort = StringUtils.split(address, ConstantValue.COLON_SYMBOL); - this.host = hostPort[0]; - this.port = ValueUtil.getIntegerVal(hostPort[1]); - } - - public void setCodeC(String codeC){ - this.codeC = codeC; - } - - public void setColumns(ArrayList columns){ - this.columns = columns; - } - - public void setEncoding(String encoding){ - this.encoding = encoding; - } - -} diff --git a/flinkx-socket/flinkx-socket-reader/src/main/java/com/dtstack/flinkx/socket/reader/SocketBuilder.java b/flinkx-socket/flinkx-socket-reader/src/main/java/com/dtstack/flinkx/socket/reader/SocketBuilder.java deleted file mode 100644 index 2f527ef6cf..0000000000 --- a/flinkx-socket/flinkx-socket-reader/src/main/java/com/dtstack/flinkx/socket/reader/SocketBuilder.java +++ /dev/null @@ -1,96 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package com.dtstack.flinkx.socket.reader; - -import com.dtstack.flinkx.config.SpeedConfig; -import com.dtstack.flinkx.constants.ConstantValue; -import com.dtstack.flinkx.inputformat.BaseRichInputFormatBuilder; -import com.dtstack.flinkx.socket.format.SocketInputFormat; -import com.dtstack.flinkx.util.TelnetUtil; -import com.dtstack.flinkx.util.ValueUtil; -import org.apache.commons.lang3.StringUtils; - -import java.util.ArrayList; - -/** 构建InputFormat - * - * @author by kunni@dtstack.com - */ - -public class SocketBuilder extends BaseRichInputFormatBuilder { - - protected SocketInputFormat format; - - protected String address; - - private static final int ADDRESS_SPLITS = 2; - - public SocketBuilder(){ - super.format = format = new SocketInputFormat(); - } - - public void setAddress(String address) { - this.address = address; - format.setAddress(address); - } - - protected void setEncoding(String encoding){ - format.setEncoding(encoding); - } - - public void setCodeC(String codeC){ - format.setCodeC(codeC); - } - - public void setColumns(ArrayList columns){ - format.setColumns(columns); - } - - @Override - protected void checkFormat() { - SpeedConfig speed = format.getDataTransferConfig().getJob().getSetting().getSpeed(); - StringBuilder sb = new StringBuilder(256); - if(StringUtils.isBlank(address)){ - sb.append("config error:[address] cannot be blank \n"); - } - String[] hostPort = org.apache.commons.lang.StringUtils.split(address, ConstantValue.COLON_SYMBOL); - if(hostPort.length != ADDRESS_SPLITS){ - sb.append("please check your host format \n"); - } - String host = hostPort[0]; - int port = ValueUtil.getIntegerVal(hostPort[1]); - try{ - TelnetUtil.telnet(host, port); - }catch (Exception e){ - sb.append("could not establish connection to ").append(address).append("\n"); - } - if(speed.getReaderChannel() > 1){ - sb.append("Socket can not support readerChannel bigger than 1, current readerChannel is [") - .append(speed.getReaderChannel()) - .append("];\n"); - }else if(speed.getChannel() > 1){ - sb.append("Socket can not support channel bigger than 1, current channel is [") - .append(speed.getChannel()) - .append("];\n"); - } - if(sb.length() > 0){ - throw new IllegalArgumentException(sb.toString()); - } - } -} diff --git a/flinkx-socket/flinkx-socket-reader/src/main/java/com/dtstack/flinkx/socket/reader/SocketReader.java b/flinkx-socket/flinkx-socket-reader/src/main/java/com/dtstack/flinkx/socket/reader/SocketReader.java deleted file mode 100644 index 17f4946128..0000000000 --- a/flinkx-socket/flinkx-socket-reader/src/main/java/com/dtstack/flinkx/socket/reader/SocketReader.java +++ /dev/null @@ -1,70 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package com.dtstack.flinkx.socket.reader; - -import com.dtstack.flinkx.config.DataTransferConfig; -import com.dtstack.flinkx.config.ReaderConfig; -import com.dtstack.flinkx.reader.BaseDataReader; -import org.apache.flink.streaming.api.datastream.DataStream; -import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; -import org.apache.flink.types.Row; - -import java.util.ArrayList; - -import static com.dtstack.flinkx.socket.constants.SocketCons.DEFAULT_ENCODING; -import static com.dtstack.flinkx.socket.constants.SocketCons.KEY_ADDRESS; -import static com.dtstack.flinkx.socket.constants.SocketCons.KEY_ENCODING; -import static com.dtstack.flinkx.socket.constants.SocketCons.KEY_PARSE; - -/** 读取用户传入参数 - * - * @author by kunni@dtstack.com - */ - -public class SocketReader extends BaseDataReader { - - protected String address; - - protected String codeC; - - protected ArrayList columns; - - protected String encoding; - - @SuppressWarnings("unchecked") - public SocketReader(DataTransferConfig config, StreamExecutionEnvironment env) { - super(config, env); - ReaderConfig.ParameterConfig parameter = config.getJob().getContent().get(0).getReader().getParameter(); - address = parameter.getStringVal(KEY_ADDRESS); - codeC = parameter.getStringVal(KEY_PARSE); - encoding = parameter.getStringVal(KEY_ENCODING, DEFAULT_ENCODING); - columns = (ArrayList) parameter.getColumn(); - } - - @Override - public DataStream readData() { - SocketBuilder socketBuilder = new SocketBuilder(); - socketBuilder.setAddress(address); - socketBuilder.setCodeC(codeC); - socketBuilder.setColumns(columns); - socketBuilder.setEncoding(encoding); - socketBuilder.setDataTransferConfig(dataTransferConfig); - return createInput(socketBuilder.finish()); - } -} diff --git a/flinkx-socket/flinkx-socket-reader/src/test/java/com/dtstack/flinkx/socket/format/SocketInputFormatTest.java b/flinkx-socket/flinkx-socket-reader/src/test/java/com/dtstack/flinkx/socket/format/SocketInputFormatTest.java deleted file mode 100644 index c9142c7fb5..0000000000 --- a/flinkx-socket/flinkx-socket-reader/src/test/java/com/dtstack/flinkx/socket/format/SocketInputFormatTest.java +++ /dev/null @@ -1,115 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package com.dtstack.flinkx.socket.format; - -import com.dtstack.flinkx.socket.util.DtSocketClient; -import org.apache.flink.types.Row; -import org.junit.Assert; -import org.junit.Test; -import org.mockito.Mockito; - -import java.io.IOException; -import java.util.ArrayList; -import java.util.concurrent.SynchronousQueue; - -import static com.dtstack.flinkx.socket.constants.SocketCons.KEY_EXIT0; - -public class SocketInputFormatTest { - - private SocketInputFormat inputFormat = Mockito.mock(SocketInputFormat.class); - - private SocketInputFormat inputFormat2 = Mockito.spy(SocketInputFormat.class); - - @Test - public void testNextRecord() throws IOException { - Row row = Row.of("test"); - inputFormat.queue = new SynchronousQueue<>(); - Mockito.when(inputFormat.nextRecordInternal(row)).thenCallRealMethod(); - new Thread(() ->{ - try{ - inputFormat.queue.put(Row.of("test")); - }catch (Exception ignored){ - } - } - ).start(); - inputFormat.nextRecordInternal(row); - } - - @Test(expected = IOException.class) - public void testNextRecord2() throws IOException { - Row row1 = Row.of("test"); - inputFormat.queue = new SynchronousQueue<>(); - Mockito.when(inputFormat.nextRecordInternal(Mockito.any(Row.class))).thenCallRealMethod(); - new Thread(() ->{ - try{ - inputFormat.queue.put(Row.of(KEY_EXIT0)); - }catch (Exception ignored){ - } - } - ).start(); - inputFormat.nextRecordInternal(row1); - } - - - @Test - public void testSetAddress(){ - SocketInputFormat inputFormat = new SocketInputFormat(); - inputFormat.setAddress("localhost:8000"); - Assert.assertEquals("localhost", "localhost"); - Assert.assertEquals(8000, 8000); - } - - @Test - public void testReachedEnd(){ - Assert.assertFalse(new SocketInputFormat().reachedEnd()); - } - - @Test - public void testCreateInputSplitsInternal(){ - Assert.assertEquals(3, new SocketInputFormat().createInputSplitsInternal(3).length); - } - - @Test - public void testSetCodeC(){ - String codeC = "text"; - inputFormat2.setCodeC(codeC); - Assert.assertEquals(codeC, inputFormat2.codeC); - } - - @Test - public void testSetColumns(){ - ArrayList columns = new ArrayList<>(); - inputFormat2.setColumns(columns); - Assert.assertEquals(columns, inputFormat2.columns); - } - - @Test - public void testCloseInternal() { - inputFormat2.client = Mockito.mock(DtSocketClient.class); - inputFormat2.closeInternal(); - Mockito.verify(inputFormat2.client, Mockito.times(1)).close(); - } - - @Test - public void testOpenInternal(){ - inputFormat2.host = "localhost"; - inputFormat2.port = 8000; - inputFormat2.openInternal(null); - } -} diff --git a/flinkx-socket/pom.xml b/flinkx-socket/pom.xml deleted file mode 100644 index 82509844f2..0000000000 --- a/flinkx-socket/pom.xml +++ /dev/null @@ -1,29 +0,0 @@ - - - - flinkx-all - com.dtstack.flinkx - 1.6 - - 4.0.0 - - flinkx-socket - pom - - flinkx-socket-core - flinkx-socket-reader - - - - - - com.dtstack.flinkx - flinkx-core - 1.6 - provided - - - - \ No newline at end of file diff --git a/flinkx-sqlserver/flinkx-sqlserver-reader/pom.xml b/flinkx-sqlserver/flinkx-sqlserver-reader/pom.xml index 06934bdb07..a826c51267 100644 --- a/flinkx-sqlserver/flinkx-sqlserver-reader/pom.xml +++ b/flinkx-sqlserver/flinkx-sqlserver-reader/pom.xml @@ -95,7 +95,7 @@ + tofile="${basedir}/../../syncplugins/sqlserverreader/${project.name}-${package.name}.jar" /> diff --git a/flinkx-sqlserver/flinkx-sqlserver-writer/pom.xml b/flinkx-sqlserver/flinkx-sqlserver-writer/pom.xml index a9ed0ef18d..9ae14cbb76 100644 --- a/flinkx-sqlserver/flinkx-sqlserver-writer/pom.xml +++ b/flinkx-sqlserver/flinkx-sqlserver-writer/pom.xml @@ -96,7 +96,7 @@ + tofile="${basedir}/../../syncplugins/sqlserverwriter/${project.name}-${package.name}.jar" /> diff --git a/flinkx-sqlservercdc/flinkx-sqlservercdc-reader/pom.xml b/flinkx-sqlservercdc/flinkx-sqlservercdc-reader/pom.xml index 8b5417bad6..669f6b047a 100644 --- a/flinkx-sqlservercdc/flinkx-sqlservercdc-reader/pom.xml +++ b/flinkx-sqlservercdc/flinkx-sqlservercdc-reader/pom.xml @@ -68,7 +68,7 @@ + tofile="${basedir}/../../syncplugins/sqlservercdcreader/${project.name}-${package.name}.jar" /> diff --git a/flinkx-stream/flinkx-stream-reader/pom.xml b/flinkx-stream/flinkx-stream-reader/pom.xml index f33da66915..2ec8b70719 100644 --- a/flinkx-stream/flinkx-stream-reader/pom.xml +++ b/flinkx-stream/flinkx-stream-reader/pom.xml @@ -84,7 +84,7 @@ + tofile="${basedir}/../../syncplugins/streamreader/${project.name}-${package.name}.jar" /> diff --git a/flinkx-stream/flinkx-stream-writer/pom.xml b/flinkx-stream/flinkx-stream-writer/pom.xml index 6933af8935..540c3d9d43 100644 --- a/flinkx-stream/flinkx-stream-writer/pom.xml +++ b/flinkx-stream/flinkx-stream-writer/pom.xml @@ -80,7 +80,7 @@ + tofile="${basedir}/../../syncplugins/streamwriter/${project.name}-${package.name}.jar" /> diff --git a/flinkx-socket/flinkx-socket-core/pom.xml b/flinkx-teradata/flinkx-teradata-core/pom.xml similarity index 57% rename from flinkx-socket/flinkx-socket-core/pom.xml rename to flinkx-teradata/flinkx-teradata-core/pom.xml index 3695aca270..2af2575d7e 100644 --- a/flinkx-socket/flinkx-socket-core/pom.xml +++ b/flinkx-teradata/flinkx-teradata-core/pom.xml @@ -3,21 +3,13 @@ xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> - flinkx-socket + flinkx-teradata com.dtstack.flinkx 1.6 4.0.0 - flinkx-socket-core + flinkx-teradata-core - - - io.netty - netty-all - 4.0.23.Final - - - \ No newline at end of file diff --git a/flinkx-teradata/flinkx-teradata-core/src/main/java/com/dtstack/flinkx/teradata/Phoenix5ConfigKeys.java b/flinkx-teradata/flinkx-teradata-core/src/main/java/com/dtstack/flinkx/teradata/Phoenix5ConfigKeys.java new file mode 100644 index 0000000000..e69de29bb2 diff --git a/flinkx-teradata/flinkx-teradata-core/src/main/java/com/dtstack/flinkx/teradata/Phoenix5DatabaseMeta.java b/flinkx-teradata/flinkx-teradata-core/src/main/java/com/dtstack/flinkx/teradata/Phoenix5DatabaseMeta.java new file mode 100644 index 0000000000..e69de29bb2 diff --git a/flinkx-teradata/flinkx-teradata-core/src/main/java/com/dtstack/flinkx/teradata/Phoenix5InputSplit.java b/flinkx-teradata/flinkx-teradata-core/src/main/java/com/dtstack/flinkx/teradata/Phoenix5InputSplit.java new file mode 100644 index 0000000000..e69de29bb2 diff --git a/flinkx-teradata/flinkx-teradata-core/src/main/java/com/dtstack/flinkx/teradata/TeradataDatabaseMeta.java b/flinkx-teradata/flinkx-teradata-core/src/main/java/com/dtstack/flinkx/teradata/TeradataDatabaseMeta.java new file mode 100644 index 0000000000..0f6535497b --- /dev/null +++ b/flinkx-teradata/flinkx-teradata-core/src/main/java/com/dtstack/flinkx/teradata/TeradataDatabaseMeta.java @@ -0,0 +1,126 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.dtstack.flinkx.teradata; + + +import com.dtstack.flinkx.enums.EDatabaseType; +import com.dtstack.flinkx.rdb.BaseDatabaseMeta; +import org.apache.commons.lang.StringUtils; + +import java.util.ArrayList; +import java.util.List; +import java.util.Map; + +/** + * The class of TeraData database prototype + * + * Company: www.dtstack.com + * @author wuhui + */ +public class TeradataDatabaseMeta extends BaseDatabaseMeta { + + @Override + public EDatabaseType getDatabaseType() { + return EDatabaseType.TeraData; + } + + @Override + public String getDriverClass() { + return "com.teradata.jdbc.TeraDriver"; + } + + @Override + public String getSqlQueryFields(String tableName) { + return "SELECT * FROM " + tableName + " QUALIFY SUM(1) OVER (ROWS UNBOUNDED PRECEDING) BETWEEN 0 AND 0"; + } + + @Override + public String getSqlQueryColumnFields(List column, String table) { + return "SELECT " + quoteColumns(column) + " FROM " + quoteTable(table) + " QUALIFY SUM(1) OVER (ROWS UNBOUNDED" + + " PRECEDING) BETWEEN 0 AND 0"; + } + + @Override + public String getStartQuote() { + return "\""; + } + + @Override + public String getEndQuote() { + return "\""; + } + + @Override + public String quoteValue(String value, String column) { + return String.format("\"%s\" as %s",value,column); + } + + @Override + public String getReplaceStatement(List column, List fullColumn, String table, Map> updateKey) { + throw new UnsupportedOperationException(); + } + + @Override + public String getUpsertStatement(List column, String table, Map> updateKey) { + throw new UnsupportedOperationException(); + } + + private String makeUpdatePart (List column) { + List updateList = new ArrayList<>(); + for(String col : column) { + String quotedCol = quoteColumn(col); + updateList.add(quotedCol + "=values(" + quotedCol + ")"); + } + return StringUtils.join(updateList, ","); + } + + @Override + public String getSplitFilter(String columnName) { + return String.format("%s mod ${N} = ${M}", getStartQuote() + columnName + getEndQuote()); + } + + @Override + public String getSplitFilterWithTmpTable(String tmpTable, String columnName){ + return String.format("%s.%s mod ${N} = ${M}", tmpTable, getStartQuote() + columnName + getEndQuote()); + } + + @Override + public String getRowNumColumn(String orderBy) { + throw new RuntimeException("Not support row_number function"); + } + + private String makeValues(int nCols) { + return "(" + StringUtils.repeat("?", ",", nCols) + ")"; + } + + @Override + protected String makeValues(List column) { + throw new UnsupportedOperationException(); + } + + @Override + public int getFetchSize(){ + return 1000; + } + + @Override + public int getQueryTimeout(){ + return 1000; + } +} diff --git a/flinkx-teradata/flinkx-teradata-core/src/main/java/com/dtstack/flinkx/teradata/util/DBUtil.java b/flinkx-teradata/flinkx-teradata-core/src/main/java/com/dtstack/flinkx/teradata/util/DBUtil.java new file mode 100644 index 0000000000..1c84adcfa4 --- /dev/null +++ b/flinkx-teradata/flinkx-teradata-core/src/main/java/com/dtstack/flinkx/teradata/util/DBUtil.java @@ -0,0 +1,35 @@ +package com.dtstack.flinkx.teradata.util; + +import com.dtstack.flinkx.util.ClassUtil; + +import java.sql.Connection; +import java.sql.DriverManager; +import java.sql.SQLException; + +/** + * @author wuhui + */ +public class DBUtil { + /** + * 获取数据库连接,不使用DbUtil里的getConnection为了避免Telnet,因为jdbc4与jdbc3不同 + * @param url 连接url + * @param username 用户名 + * @param password 密码 + * @return 返回connection + * @throws SQLException 连接失败抛出异常 + */ + public static Connection getConnection(String url, String username, String password) throws SQLException { + Connection dbConn; + synchronized (ClassUtil.LOCK_STR){ + DriverManager.setLoginTimeout(10); + + if (username == null) { + dbConn = DriverManager.getConnection(url); + } else { + dbConn = DriverManager.getConnection(url, username, password); + } + } + + return dbConn; + } +} diff --git a/flinkx-teradata/flinkx-teradata-core/src/main/java/com/dtstack/flinkx/teradata/util/IPhoenix5Helper.java b/flinkx-teradata/flinkx-teradata-core/src/main/java/com/dtstack/flinkx/teradata/util/IPhoenix5Helper.java new file mode 100644 index 0000000000..e69de29bb2 diff --git a/flinkx-metadata-vertica/flinkx-metadata-vertica-reader/pom.xml b/flinkx-teradata/flinkx-teradata-reader/pom.xml similarity index 76% rename from flinkx-metadata-vertica/flinkx-metadata-vertica-reader/pom.xml rename to flinkx-teradata/flinkx-teradata-reader/pom.xml index 5d0790674c..ef7cea1706 100644 --- a/flinkx-metadata-vertica/flinkx-metadata-vertica-reader/pom.xml +++ b/flinkx-teradata/flinkx-teradata-reader/pom.xml @@ -3,24 +3,25 @@ xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> - flinkx-metadata-vertica + flinkx-teradata com.dtstack.flinkx 1.6 4.0.0 - flinkx-metadata-vertica-reader + flinkx-teradata-reader com.dtstack.flinkx - flinkx-metadata-reader + flinkx-teradata-core 1.6 - fakepath - vertica-jdbc - 9.1.1-0 + com.dtstack.flinkx + flinkx-rdb-reader + 1.6 + provided @@ -58,15 +59,11 @@ io.netty - shade.metadataverticareader.io.netty - - - com.google.common - shade.metadataverticareader.com.google.common + shade.teradatareader.io.netty - com.google.thirdparty - shade.metadataverticareader.com.google.thirdparty + com.google + shade.teradatareader.com.google @@ -87,14 +84,19 @@ - + - - + + + + + + + diff --git a/flinkx-teradata/flinkx-teradata-reader/src/main/java/com/dtstack/flinkx/teradata/format/TeradataInputFormat.java b/flinkx-teradata/flinkx-teradata-reader/src/main/java/com/dtstack/flinkx/teradata/format/TeradataInputFormat.java new file mode 100644 index 0000000000..12dd9b833e --- /dev/null +++ b/flinkx-teradata/flinkx-teradata-reader/src/main/java/com/dtstack/flinkx/teradata/format/TeradataInputFormat.java @@ -0,0 +1,130 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package com.dtstack.flinkx.teradata.format; + +import com.dtstack.flinkx.rdb.inputformat.JdbcInputFormat; +import com.dtstack.flinkx.rdb.util.DbUtil; +import com.dtstack.flinkx.teradata.util.DBUtil; +import com.dtstack.flinkx.util.ClassUtil; +import org.apache.commons.collections.CollectionUtils; +import org.apache.commons.lang3.StringUtils; +import org.apache.flink.core.io.InputSplit; +import org.apache.flink.types.Row; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; +import java.sql.SQLException; +import java.sql.Statement; +import java.util.List; + +import static com.dtstack.flinkx.rdb.util.DbUtil.clobToString; + +/** + * Company: www.dtstack.com + * + * @author wuhui + */ +public class TeradataInputFormat extends JdbcInputFormat { + private static final Logger LOG = LoggerFactory.getLogger(TeradataInputFormat.class); + + protected List descColumnTypeList; + @Override + public void openInternal(InputSplit inputSplit) throws IOException { + try { + LOG.info(inputSplit.toString()); + + ClassUtil.forName(driverName, getClass().getClassLoader()); + + if (incrementConfig.isIncrement() && incrementConfig.isUseMaxFunc()){ + getMaxValue(inputSplit); + } + + initMetric(inputSplit); + + if(!canReadData(inputSplit)){ + LOG.warn("Not read data when the start location are equal to end location"); + + hasNext = false; + return; + } + + dbConn = DBUtil.getConnection(dbUrl, username, password); + + // 部分驱动需要关闭事务自动提交,fetchSize参数才会起作用 + dbConn.setAutoCommit(false); + + Statement statement = dbConn.createStatement(resultSetType, resultSetConcurrency); + + statement.setFetchSize(0); + + statement.setQueryTimeout(queryTimeOut); + String querySql = buildQuerySql(inputSplit); + resultSet = statement.executeQuery(querySql); + columnCount = resultSet.getMetaData().getColumnCount(); + + boolean splitWithRowCol = numPartitions > 1 && StringUtils.isNotEmpty(splitKey) && splitKey.contains("("); + if(splitWithRowCol){ + columnCount = columnCount-1; + } + checkSize(columnCount, metaColumns); + hasNext = resultSet.next(); + + if(descColumnTypeList == null) { + descColumnTypeList = DbUtil.analyzeColumnType(resultSet,metaColumns); + } + + + } catch (SQLException se) { + throw new IllegalArgumentException("open() failed. " + se.getMessage(), se); + } + + LOG.info("JdbcInputFormat[{}]open: end", jobName); + } + + @Override + public Row nextRecordInternal(Row row) throws IOException { + if (!hasNext) { + return null; + } + row = new Row(columnCount); + + try { + for (int pos = 0; pos < row.getArity(); pos++) { + Object obj = resultSet.getObject(pos + 1); + if(obj != null) { + if(CollectionUtils.isNotEmpty(descColumnTypeList)) { + String columnType = descColumnTypeList.get(pos); + if("byteint".equalsIgnoreCase(columnType)) { + if(obj instanceof Boolean) { + obj = ((Boolean) obj ? 1 : 0); + } + } + } + obj = clobToString(obj); + } + + row.setField(pos, obj); + } + return super.nextRecordInternal(row); + }catch (Exception e) { + throw new IOException("Couldn't read data - " + e.getMessage(), e); + } + } + +} diff --git a/flinkx-teradata/flinkx-teradata-reader/src/main/java/com/dtstack/flinkx/teradata/reader/Phoenix5InputFormatBuilder.java b/flinkx-teradata/flinkx-teradata-reader/src/main/java/com/dtstack/flinkx/teradata/reader/Phoenix5InputFormatBuilder.java new file mode 100644 index 0000000000..e69de29bb2 diff --git a/flinkx-metadata-mysql/flinkx-metadata-mysql-reader/src/main/java/com/dtstack/flinkx/metadatamysql/reader/MetadatamysqlReader.java b/flinkx-teradata/flinkx-teradata-reader/src/main/java/com/dtstack/flinkx/teradata/reader/TeradataReader.java similarity index 56% rename from flinkx-metadata-mysql/flinkx-metadata-mysql-reader/src/main/java/com/dtstack/flinkx/metadatamysql/reader/MetadatamysqlReader.java rename to flinkx-teradata/flinkx-teradata-reader/src/main/java/com/dtstack/flinkx/teradata/reader/TeradataReader.java index 61fb16c8d8..e5bbdd8e97 100644 --- a/flinkx-metadata-mysql/flinkx-metadata-mysql-reader/src/main/java/com/dtstack/flinkx/metadatamysql/reader/MetadatamysqlReader.java +++ b/flinkx-teradata/flinkx-teradata-reader/src/main/java/com/dtstack/flinkx/teradata/reader/TeradataReader.java @@ -6,9 +6,9 @@ * to you under the Apache License, Version 2.0 (the * "License"); you may not use this file except in compliance * with the License. You may obtain a copy of the License at - *

- * http://www.apache.org/licenses/LICENSE-2.0 - *

+ * + * http://www.apache.org/licenses/LICENSE-2.0 + * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. @@ -16,27 +16,31 @@ * limitations under the License. */ -package com.dtstack.flinkx.metadatamysql.reader; +package com.dtstack.flinkx.teradata.reader; import com.dtstack.flinkx.config.DataTransferConfig; -import com.dtstack.flinkx.metadata.inputformat.MetadataInputFormatBuilder; -import com.dtstack.flinkx.metadatamysql.inputformat.MetadatamysqlInputFormat; -import com.dtstack.flinkx.metadatatidb.reader.MetadatatidbReader; +import com.dtstack.flinkx.rdb.datareader.JdbcDataReader; +import com.dtstack.flinkx.rdb.inputformat.JdbcInputFormatBuilder; +import com.dtstack.flinkx.teradata.TeradataDatabaseMeta; +import com.dtstack.flinkx.teradata.format.TeradataInputFormat; import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; + /** - * @author : kunni@dtstack.com - * @date : 2020/6/8 + * SapHana reader plugin + * + * Company: www.dtstack.com + * @author wuhui */ +public class TeradataReader extends JdbcDataReader { -public class MetadatamysqlReader extends MetadatatidbReader { - - public MetadatamysqlReader(DataTransferConfig config, StreamExecutionEnvironment env) { + public TeradataReader(DataTransferConfig config, StreamExecutionEnvironment env) { super(config, env); + setDatabaseInterface(new TeradataDatabaseMeta()); } @Override - protected MetadataInputFormatBuilder getBuilder(){ - return new MetadataInputFormatBuilder(new MetadatamysqlInputFormat()); + protected JdbcInputFormatBuilder getBuilder() { + return new JdbcInputFormatBuilder(new TeradataInputFormat()); } } diff --git a/flinkx-metadata-es6/flinkx-metadata-es6-reader/pom.xml b/flinkx-teradata/flinkx-teradata-writer/pom.xml similarity index 76% rename from flinkx-metadata-es6/flinkx-metadata-es6-reader/pom.xml rename to flinkx-teradata/flinkx-teradata-writer/pom.xml index 0086677d01..b1b009442b 100644 --- a/flinkx-metadata-es6/flinkx-metadata-es6-reader/pom.xml +++ b/flinkx-teradata/flinkx-teradata-writer/pom.xml @@ -3,26 +3,25 @@ xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> - flinkx-metadata-es6 + flinkx-teradata com.dtstack.flinkx 1.6 4.0.0 - flinkx-metadata-es6-reader + flinkx-teradata-writer com.dtstack.flinkx - flinkx-metadata-reader + flinkx-teradata-core 1.6 - compile - org.elasticsearch.client - elasticsearch-rest-client - 6.3.0 - compile + com.dtstack.flinkx + flinkx-rdb-writer + 1.6 + provided @@ -60,15 +59,11 @@ io.netty - shade.metadataes6reader.io.netty - - - com.google.common - shade.core.com.google.common + shade.teradatawriter.io.netty - com.google.thirdparty - shade.core.com.google.thirdparty + com.google + shade.teradatawriter.com.google @@ -89,14 +84,19 @@ - + - - + + + + + + + @@ -105,5 +105,4 @@ - \ No newline at end of file diff --git a/flinkx-teradata/flinkx-teradata-writer/src/main/java/com/dtstack/flinkx/teradata/format/TeradataOutputFormat.java b/flinkx-teradata/flinkx-teradata-writer/src/main/java/com/dtstack/flinkx/teradata/format/TeradataOutputFormat.java new file mode 100644 index 0000000000..e84473a9cf --- /dev/null +++ b/flinkx-teradata/flinkx-teradata-writer/src/main/java/com/dtstack/flinkx/teradata/format/TeradataOutputFormat.java @@ -0,0 +1,78 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package com.dtstack.flinkx.teradata.format; + +import com.dtstack.flinkx.enums.EWriteMode; +import com.dtstack.flinkx.rdb.outputformat.JdbcOutputFormat; +import com.dtstack.flinkx.teradata.util.DBUtil; +import com.dtstack.flinkx.util.ClassUtil; +import org.apache.commons.collections.CollectionUtils; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.sql.SQLException; + +/** + * Company: www.dtstack.com + * + * @author wuhui + */ +public class TeradataOutputFormat extends JdbcOutputFormat { + private static final Logger LOG = LoggerFactory.getLogger(TeradataOutputFormat.class); + @Override + protected void openInternal(int taskNumber, int numTasks) { + try { + ClassUtil.forName(driverName, getClass().getClassLoader()); + dbConn = DBUtil.getConnection(dbUrl, username, password); + + if (restoreConfig.isRestore()){ + dbConn.setAutoCommit(false); + } + + if(CollectionUtils.isEmpty(fullColumn)) { + fullColumn = probeFullColumns(table, dbConn); + } + + if (!EWriteMode.INSERT.name().equalsIgnoreCase(mode)){ + if(updateKey == null || updateKey.size() == 0) { + updateKey = probePrimaryKeys(table, dbConn); + } + } + + if(fullColumnType == null) { + fullColumnType = analyzeTable(); + } + + for(String col : column) { + for (int i = 0; i < fullColumn.size(); i++) { + if (col.equalsIgnoreCase(fullColumn.get(i))){ + columnType.add(fullColumnType.get(i)); + break; + } + } + } + + preparedStatement = prepareTemplates(); + readyCheckpoint = false; + + LOG.info("subTask[{}}] wait finished", taskNumber); + } catch (SQLException sqe) { + throw new IllegalArgumentException("open() failed.", sqe); + } + } +} diff --git a/flinkx-metadata-hbase/flinkx-metadata-hbase-reader/src/main/java/com/dtstack/flinkx/metadatahbase/util/HbaseCons.java b/flinkx-teradata/flinkx-teradata-writer/src/main/java/com/dtstack/flinkx/teradata/writer/TeradataWriter.java similarity index 52% rename from flinkx-metadata-hbase/flinkx-metadata-hbase-reader/src/main/java/com/dtstack/flinkx/metadatahbase/util/HbaseCons.java rename to flinkx-teradata/flinkx-teradata-writer/src/main/java/com/dtstack/flinkx/teradata/writer/TeradataWriter.java index b829bee566..e69facae73 100644 --- a/flinkx-metadata-hbase/flinkx-metadata-hbase-reader/src/main/java/com/dtstack/flinkx/metadatahbase/util/HbaseCons.java +++ b/flinkx-teradata/flinkx-teradata-writer/src/main/java/com/dtstack/flinkx/teradata/writer/TeradataWriter.java @@ -16,32 +16,29 @@ * limitations under the License. */ -package com.dtstack.flinkx.metadatahbase.util; +package com.dtstack.flinkx.teradata.writer; -import com.dtstack.flinkx.metadata.MetaDataCons; +import com.dtstack.flinkx.config.DataTransferConfig; +import com.dtstack.flinkx.rdb.datawriter.JdbcDataWriter; +import com.dtstack.flinkx.rdb.outputformat.JdbcOutputFormatBuilder; +import com.dtstack.flinkx.teradata.TeradataDatabaseMeta; +import com.dtstack.flinkx.teradata.format.TeradataOutputFormat; /** - * 属性定义 - * @author kunni@dtstack.com + * Teradata writer plugin + * + * Company: www.dtstack.com + * @author wuhui */ -public class HbaseCons extends MetaDataCons { - - public static final String KEY_PATH = "path"; - - public static final String KEY_TABLE_NAME = "table_name"; - - public static final String KEY_NAMESPACE = "namespace"; - - public static final String KEY_COLUMN_FAMILY = "column_family"; - - public static final String KEY_CREATE_TIME = "createTime"; - - public static final String KEY_REGION_COUNT = "regionCount"; +public class TeradataWriter extends JdbcDataWriter { - public static final String KEY_STORAGE_SIZE = "totalSize"; + public TeradataWriter(DataTransferConfig config) { + super(config); + setDatabaseInterface(new TeradataDatabaseMeta()); + } - /** - * 以下为reader需要的参数 - */ - public static final String KEY_HADOOP_CONFIG = "hadoopConfig"; + @Override + protected JdbcOutputFormatBuilder getBuilder() { + return new JdbcOutputFormatBuilder(new TeradataOutputFormat()); + } } diff --git a/flinkx-metadata-vertica/pom.xml b/flinkx-teradata/pom.xml similarity index 65% rename from flinkx-metadata-vertica/pom.xml rename to flinkx-teradata/pom.xml index 2e55576c80..9bfb00d6d4 100644 --- a/flinkx-metadata-vertica/pom.xml +++ b/flinkx-teradata/pom.xml @@ -9,10 +9,12 @@ 4.0.0 - flinkx-metadata-vertica + flinkx-teradata pom - flinkx-metadata-vertica-reader + flinkx-teradata-reader + flinkx-teradata-core + flinkx-teradata-writer @@ -22,6 +24,12 @@ 1.6 provided + + com.dtstack.flinkx + flinkx-rdb-core + 1.6 + provided + diff --git a/flinkx-test/pom.xml b/flinkx-test/pom.xml index 85eb539a0c..125eadee28 100644 --- a/flinkx-test/pom.xml +++ b/flinkx-test/pom.xml @@ -334,18 +334,6 @@ 1.6 - - com.dtstack.flinkx - flinkx-kafka09-reader - 1.6 - - - - com.dtstack.flinkx - flinkx-kafka09-writer - 1.6 - - com.dtstack.flinkx flinkx-kudu-reader @@ -436,48 +424,11 @@ flinkx-emqx-writer 1.6 - - com.dtstack.flinkx - flinkx-metadata-hive2-reader - 1.6 - compile - - - hbase-client - org.apache.hbase - - - hbase-common - org.apache.hbase - - - - - com.dtstack.flinkx - flinkx-metadata-hive2-reader - 1.6 - compile - - - hbase-client - org.apache.hbase - - - hbase-common - org.apache.hbase - - - com.dtstack.flinkx flinkx-restapi-reader 1.6 - - com.dtstack.flinkx - flinkx-restapi-writer - 1.6 - com.dtstack.flinkx flinkx-dm-reader @@ -499,21 +450,6 @@ flinkx-greenplum-writer 1.6 - - com.dtstack.flinkx - flinkx-metadata-tidb-reader - 1.6 - - - com.dtstack.flinkx - flinkx-metadata-oracle-reader - 1.6 - - - com.dtstack.flinkx - flinkx-metadata-mysql-reader - 1.6 - com.dtstack.flinkx flinkx-phoenix5-reader @@ -524,11 +460,6 @@ flinkx-phoenix5-writer 1.6 - - com.dtstack.flinkx - flinkx-metadata-sqlserver-reader - 1.6 - com.dtstack.flinkx flinkx-kingbase-reader @@ -541,27 +472,7 @@ com.dtstack.flinkx - flinkx-metadata-phoenix-reader - 1.6 - - - com.dtstack.flinkx - flinkx-metadata-hbase-reader - 1.6 - - - - - - - - com.dtstack.flinkx - flinkx-metadata-vertica-reader - 1.6 - - - com.dtstack.flinkx - flinkx-socket-reader + flinkx-alluxio-writer 1.6 diff --git a/flinkx-test/src/main/java/com/dtstack/flinkx/test/LocalTest.java b/flinkx-test/src/main/java/com/dtstack/flinkx/test/LocalTest.java index 03f24323b7..bef838e445 100644 --- a/flinkx-test/src/main/java/com/dtstack/flinkx/test/LocalTest.java +++ b/flinkx-test/src/main/java/com/dtstack/flinkx/test/LocalTest.java @@ -18,6 +18,7 @@ package com.dtstack.flinkx.test; import com.dtstack.flink.api.java.MyLocalStreamEnvironment; +import com.dtstack.flinkx.alluxio.writer.AlluxioWriter; import com.dtstack.flinkx.binlog.reader.BinlogReader; import com.dtstack.flinkx.carbondata.reader.CarbondataReader; import com.dtstack.flinkx.carbondata.writer.CarbondataWriter; @@ -47,8 +48,6 @@ import com.dtstack.flinkx.hive.writer.HiveWriter; import com.dtstack.flinkx.kafka.reader.KafkaReader; import com.dtstack.flinkx.kafka.writer.KafkaWriter; -import com.dtstack.flinkx.kafka09.reader.Kafka09Reader; -import com.dtstack.flinkx.kafka09.writer.Kafka09Writer; import com.dtstack.flinkx.kafka10.reader.Kafka10Reader; import com.dtstack.flinkx.kafka10.writer.Kafka10Writer; import com.dtstack.flinkx.kafka11.reader.Kafka11Reader; @@ -57,15 +56,6 @@ import com.dtstack.flinkx.kingbase.writer.KingbaseWriter; import com.dtstack.flinkx.kudu.reader.KuduReader; import com.dtstack.flinkx.kudu.writer.KuduWriter; -import com.dtstack.flinkx.metadatahbase.reader.MetadatahbaseReader; -//import com.dtstack.flinkx.metadataes6.reader.Metadataes6Reader; -import com.dtstack.flinkx.metadatahive2.reader.Metadatahive2Reader; -import com.dtstack.flinkx.metadatamysql.reader.MetadatamysqlReader; -import com.dtstack.flinkx.metadataoracle.reader.MetadataoracleReader; -import com.dtstack.flinkx.metadataphoenix5.reader.MetadataphoenixReader; -import com.dtstack.flinkx.metadatasqlserver.reader.MetadatasqlserverReader; -import com.dtstack.flinkx.metadatatidb.reader.MetadatatidbReader; -import com.dtstack.flinkx.metadatavertica.reader.MetadataverticaReader; import com.dtstack.flinkx.mongodb.reader.MongodbReader; import com.dtstack.flinkx.mongodb.writer.MongodbWriter; import com.dtstack.flinkx.mysql.reader.MysqlReader; @@ -85,8 +75,6 @@ import com.dtstack.flinkx.reader.BaseDataReader; import com.dtstack.flinkx.redis.writer.RedisWriter; import com.dtstack.flinkx.restapi.reader.RestapiReader; -import com.dtstack.flinkx.restapi.writer.RestapiWriter; -import com.dtstack.flinkx.socket.reader.SocketReader; import com.dtstack.flinkx.sqlserver.reader.SqlserverReader; import com.dtstack.flinkx.sqlserver.writer.SqlserverWriter; import com.dtstack.flinkx.sqlservercdc.reader.SqlservercdcReader; @@ -138,7 +126,7 @@ public static void main(String[] args) throws Exception{ // conf.setString("metrics.reporter.promgateway.randomJobNameSuffix","true"); // conf.setString("metrics.reporter.promgateway.deleteOnShutdown","true"); - String jobPath = "D:\\daishu\\metaes.json"; + String jobPath = "your json file's absolute path"; JobExecutionResult result = LocalTest.runJob(new File(jobPath), confProperties, null); ResultPrintUtil.printResult(result); System.exit(0); @@ -209,48 +197,36 @@ private static BaseDataReader buildDataReader(DataTransferConfig config, StreamE String readerName = config.getJob().getContent().get(0).getReader().getName(); BaseDataReader reader ; switch (readerName){ - case PluginNameConstrant.STREAM_READER : reader = new StreamReader(config, env); break; - case PluginNameConstrant.CARBONDATA_READER : reader = new CarbondataReader(config, env); break; - case PluginNameConstrant.ORACLE_READER : reader = new OracleReader(config, env); break; - case PluginNameConstrant.POSTGRESQL_READER : reader = new PostgresqlReader(config, env); break; - case PluginNameConstrant.SQLSERVER_READER : reader = new SqlserverReader(config, env); break; - case PluginNameConstrant.MYSQLD_READER : reader = new MysqldReader(config, env); break; - case PluginNameConstrant.MYSQL_READER : reader = new MysqlReader(config, env); break; - case PluginNameConstrant.DB2_READER : reader = new Db2Reader(config, env); break; - case PluginNameConstrant.GBASE_READER : reader = new GbaseReader(config, env); break; - case PluginNameConstrant.ES_READER : reader = new EsReader(config, env); break; - case PluginNameConstrant.FTP_READER : reader = new FtpReader(config, env); break; - case PluginNameConstrant.HBASE_READER : reader = new HbaseReader(config, env); break; - case PluginNameConstrant.HDFS_READER : reader = new HdfsReader(config, env); break; - case PluginNameConstrant.MONGODB_READER : reader = new MongodbReader(config, env); break; - case PluginNameConstrant.ODPS_READER : reader = new OdpsReader(config, env); break; - case PluginNameConstrant.BINLOG_READER : reader = new BinlogReader(config, env); break; - case PluginNameConstrant.KAFKA09_READER : reader = new Kafka09Reader(config, env); break; - case PluginNameConstrant.KAFKA10_READER : reader = new Kafka10Reader(config, env); break; - case PluginNameConstrant.KAFKA11_READER : reader = new Kafka11Reader(config, env); break; - case PluginNameConstrant.KAFKA_READER : reader = new KafkaReader(config, env); break; - case PluginNameConstrant.KUDU_READER : reader = new KuduReader(config, env); break; - case PluginNameConstrant.CLICKHOUSE_READER : reader = new ClickhouseReader(config, env); break; - case PluginNameConstrant.POLARDB_READER : reader = new PolardbReader(config, env); break; - case PluginNameConstrant.ORACLE_LOG_MINER_READER : reader = new OraclelogminerReader(config, env); break; -// case PluginNameConstrant.PHOENIX_READER : reader = new PhoenixReader(config, env); break; - case PluginNameConstrant.SQLSERVER_CDC_READER : reader = new SqlservercdcReader(config, env); break; - case PluginNameConstrant.EMQX_READER : reader = new EmqxReader(config, env); break; - case PluginNameConstrant.METADATAHIVE2_READER : reader = new Metadatahive2Reader(config, env);break; - case PluginNameConstrant.DM_READER : reader = new DmReader(config, env); break; - case PluginNameConstrant.METADATAMYSQL_READER : reader = new MetadatamysqlReader(config, env); break; - case PluginNameConstrant.METADATATIDB_READER : reader = new MetadatatidbReader(config, env); break; - case PluginNameConstrant.METADATAORACLE_READER : reader = new MetadataoracleReader(config, env); break; - case PluginNameConstrant.METADATASQLSERVER_READER : reader = new MetadatasqlserverReader(config, env); break; - case PluginNameConstrant.METADATAPHOENIX_READER : reader = new MetadataphoenixReader(config, env); break; - case PluginNameConstrant.METADATAHBASE_READER : reader = new MetadatahbaseReader(config, env); break; -// case PluginNameConstrant.METADATAES6_READER : reader = new Metadataes6Reader(config, env); break; - case PluginNameConstrant.METADATAVERTICA_READER : reader = new MetadataverticaReader(config, env); break; - case PluginNameConstrant.GREENPLUM_READER : reader = new GreenplumReader(config, env); break; - case PluginNameConstrant.PHOENIX5_READER : reader = new Phoenix5Reader(config, env); break; - case PluginNameConstrant.KINGBASE_READER : reader = new KingbaseReader(config, env); break; - case PluginNameConstrant.RESTAPI_READER: reader = new RestapiReader(config, env); break; - case PluginNameConstrant.SOCKET_READER : reader = new SocketReader(config, env); break; + case PluginNameConstants.STREAM_READER : reader = new StreamReader(config, env); break; + case PluginNameConstants.CARBONDATA_READER : reader = new CarbondataReader(config, env); break; + case PluginNameConstants.ORACLE_READER : reader = new OracleReader(config, env); break; + case PluginNameConstants.POSTGRESQL_READER : reader = new PostgresqlReader(config, env); break; + case PluginNameConstants.SQLSERVER_READER : reader = new SqlserverReader(config, env); break; + case PluginNameConstants.MYSQLD_READER : reader = new MysqldReader(config, env); break; + case PluginNameConstants.MYSQL_READER : reader = new MysqlReader(config, env); break; + case PluginNameConstants.DB2_READER : reader = new Db2Reader(config, env); break; + case PluginNameConstants.GBASE_READER : reader = new GbaseReader(config, env); break; + case PluginNameConstants.ES_READER : reader = new EsReader(config, env); break; + case PluginNameConstants.FTP_READER : reader = new FtpReader(config, env); break; + case PluginNameConstants.HBASE_READER : reader = new HbaseReader(config, env); break; + case PluginNameConstants.HDFS_READER : reader = new HdfsReader(config, env); break; + case PluginNameConstants.MONGODB_READER : reader = new MongodbReader(config, env); break; + case PluginNameConstants.ODPS_READER : reader = new OdpsReader(config, env); break; + case PluginNameConstants.BINLOG_READER : reader = new BinlogReader(config, env); break; + case PluginNameConstants.KAFKA10_READER : reader = new Kafka10Reader(config, env); break; + case PluginNameConstants.KAFKA11_READER : reader = new Kafka11Reader(config, env); break; + case PluginNameConstants.KAFKA_READER : reader = new KafkaReader(config, env); break; + case PluginNameConstants.KUDU_READER : reader = new KuduReader(config, env); break; + case PluginNameConstants.CLICKHOUSE_READER : reader = new ClickhouseReader(config, env); break; + case PluginNameConstants.POLARDB_READER : reader = new PolardbReader(config, env); break; + case PluginNameConstants.EMQX_READER : reader = new EmqxReader(config, env); break; + case PluginNameConstants.DM_READER : reader = new DmReader(config, env); break; + case PluginNameConstants.GREENPLUM_READER : reader = new GreenplumReader(config, env); break; + case PluginNameConstants.PHOENIX5_READER : reader = new Phoenix5Reader(config, env); break; + case PluginNameConstants.KINGBASE_READER : reader = new KingbaseReader(config, env); break; + case PluginNameConstants.ORACLE_LOG_MINER_READER : reader = new OraclelogminerReader(config, env); break; + case PluginNameConstants.RESTAPI_READER: reader = new RestapiReader(config, env); break; + case PluginNameConstants.SQLSERVER_CDC_READER: reader = new SqlservercdcReader(config, env); break; default:throw new IllegalArgumentException("Can not find reader by name:" + readerName); } @@ -261,36 +237,36 @@ private static BaseDataWriter buildDataWriter(DataTransferConfig config){ String writerName = config.getJob().getContent().get(0).getWriter().getName(); BaseDataWriter writer; switch (writerName){ - case PluginNameConstrant.STREAM_WRITER : writer = new StreamWriter(config); break; - case PluginNameConstrant.CARBONDATA_WRITER : writer = new CarbondataWriter(config); break; - case PluginNameConstrant.MYSQL_WRITER : writer = new MysqlWriter(config); break; - case PluginNameConstrant.SQLSERVER_WRITER : writer = new SqlserverWriter(config); break; - case PluginNameConstrant.ORACLE_WRITER : writer = new OracleWriter(config); break; - case PluginNameConstrant.POSTGRESQL_WRITER : writer = new PostgresqlWriter(config); break; - case PluginNameConstrant.DB2_WRITER : writer = new Db2Writer(config); break; - case PluginNameConstrant.GBASE_WRITER : writer = new GbaseWriter(config); break; - case PluginNameConstrant.ES_WRITER : writer = new EsWriter(config); break; - case PluginNameConstrant.FTP_WRITER : writer = new FtpWriter(config); break; - case PluginNameConstrant.HBASE_WRITER : writer = new HbaseWriter(config); break; - case PluginNameConstrant.HDFS_WRITER : writer = new HdfsWriter(config); break; - case PluginNameConstrant.MONGODB_WRITER : writer = new MongodbWriter(config); break; - case PluginNameConstrant.ODPS_WRITER : writer = new OdpsWriter(config); break; - case PluginNameConstrant.REDIS_WRITER : writer = new RedisWriter(config); break; - case PluginNameConstrant.HIVE_WRITER : writer = new HiveWriter(config); break; - case PluginNameConstrant.KAFKA09_WRITER : writer = new Kafka09Writer(config); break; - case PluginNameConstrant.KAFKA10_WRITER : writer = new Kafka10Writer(config); break; - case PluginNameConstrant.KAFKA11_WRITER : writer = new Kafka11Writer(config); break; - case PluginNameConstrant.KUDU_WRITER : writer = new KuduWriter(config); break; - case PluginNameConstrant.CLICKHOUSE_WRITER : writer = new ClickhouseWriter(config); break; - case PluginNameConstrant.POLARDB_WRITER : writer = new PolardbWriter(config); break; - case PluginNameConstrant.KAFKA_WRITER : writer = new KafkaWriter(config); break; -// case PluginNameConstrant.PHOENIX_WRITER : writer = new PhoenixWriter(config); break; - case PluginNameConstrant.EMQX_WRITER : writer = new EmqxWriter(config); break; -// case PluginNameConstrant.RESTAPI_WRITER : writer = new RestapiWriter(config);break; - case PluginNameConstrant.DM_WRITER : writer = new DmWriter(config); break; - case PluginNameConstrant.GREENPLUM_WRITER : writer = new GreenplumWriter(config); break; - case PluginNameConstrant.PHOENIX5_WRITER : writer = new Phoenix5Writer(config); break; - case PluginNameConstrant.KINGBASE_WRITER : writer = new KingbaseWriter(config); break; + case PluginNameConstants.STREAM_WRITER : writer = new StreamWriter(config); break; + case PluginNameConstants.CARBONDATA_WRITER : writer = new CarbondataWriter(config); break; + case PluginNameConstants.MYSQL_WRITER : writer = new MysqlWriter(config); break; + case PluginNameConstants.SQLSERVER_WRITER : writer = new SqlserverWriter(config); break; + case PluginNameConstants.ORACLE_WRITER : writer = new OracleWriter(config); break; + case PluginNameConstants.POSTGRESQL_WRITER : writer = new PostgresqlWriter(config); break; + case PluginNameConstants.DB2_WRITER : writer = new Db2Writer(config); break; + case PluginNameConstants.GBASE_WRITER : writer = new GbaseWriter(config); break; + case PluginNameConstants.ES_WRITER : writer = new EsWriter(config); break; + case PluginNameConstants.FTP_WRITER : writer = new FtpWriter(config); break; + case PluginNameConstants.HBASE_WRITER : writer = new HbaseWriter(config); break; + case PluginNameConstants.HDFS_WRITER : writer = new HdfsWriter(config); break; + case PluginNameConstants.MONGODB_WRITER : writer = new MongodbWriter(config); break; + case PluginNameConstants.ODPS_WRITER : writer = new OdpsWriter(config); break; + case PluginNameConstants.REDIS_WRITER : writer = new RedisWriter(config); break; + case PluginNameConstants.HIVE_WRITER : writer = new HiveWriter(config); break; + case PluginNameConstants.KAFKA10_WRITER : writer = new Kafka10Writer(config); break; + case PluginNameConstants.KAFKA11_WRITER : writer = new Kafka11Writer(config); break; + case PluginNameConstants.KUDU_WRITER : writer = new KuduWriter(config); break; + case PluginNameConstants.CLICKHOUSE_WRITER : writer = new ClickhouseWriter(config); break; + case PluginNameConstants.POLARDB_WRITER : writer = new PolardbWriter(config); break; + case PluginNameConstants.KAFKA_WRITER : writer = new KafkaWriter(config); break; + case PluginNameConstants.EMQX_WRITER : writer = new EmqxWriter(config); break; + case PluginNameConstants.DM_WRITER : writer = new DmWriter(config); break; + case PluginNameConstants.GREENPLUM_WRITER : writer = new GreenplumWriter(config); break; + case PluginNameConstants.PHOENIX5_WRITER : writer = new Phoenix5Writer(config); break; + case PluginNameConstants.KINGBASE_WRITER : writer = new KingbaseWriter(config); break; + case PluginNameConstants.RESTAPI_WRITER: writer = new RedisWriter(config); break; + case PluginNameConstants.ALLUXIO_WRITER: writer = new AlluxioWriter(config); break; + case PluginNameConstants.OSS_WRITER: writer = new OssWriter(config); break; default:throw new IllegalArgumentException("Can not find writer by name:" + writerName); } diff --git a/flinkx-test/src/main/java/com/dtstack/flinkx/test/PluginNameConstrant.java b/flinkx-test/src/main/java/com/dtstack/flinkx/test/PluginNameConstants.java similarity index 85% rename from flinkx-test/src/main/java/com/dtstack/flinkx/test/PluginNameConstrant.java rename to flinkx-test/src/main/java/com/dtstack/flinkx/test/PluginNameConstants.java index f3c5a4c197..39fcc1e38a 100644 --- a/flinkx-test/src/main/java/com/dtstack/flinkx/test/PluginNameConstrant.java +++ b/flinkx-test/src/main/java/com/dtstack/flinkx/test/PluginNameConstants.java @@ -21,7 +21,7 @@ /** * @author jiangbo */ -public class PluginNameConstrant { +public class PluginNameConstants { public static final String STREAM_READER = "streamreader"; public static final String CARBONDATA_READER = "carbondatareader"; @@ -40,6 +40,7 @@ public class PluginNameConstrant { public static final String GBASE_READER = "gbasereader"; public static final String KUDU_READER = "kudureader"; public static final String BINLOG_READER = "binlogreader"; + public static final String SQLSERVER_CDC_READER = "sqlservercdcreader"; public static final String KAFKA09_READER = "kafka09reader"; public static final String KAFKA10_READER = "kafka10reader"; public static final String KAFKA11_READER = "kafka11reader"; @@ -49,22 +50,12 @@ public class PluginNameConstrant { public static final String ORACLE_LOG_MINER_READER = "oraclelogminerreader"; public static final String PHOENIX_READER = "phoenixreader"; public static final String EMQX_READER = "emqxreader"; - public static final String SQLSERVER_CDC_READER = "sqlservercdcreader"; - public static final String METADATAHIVE2_READER = "metadatahive2reader"; public static final String DM_READER = "dmreader"; - public static final String METADATATIDB_READER = "metadatatidbreader"; - public static final String METADATAORACLE_READER = "metadataoraclereader"; - public static final String METADATAMYSQL_READER = "metadatamysqlreader"; - public static final String METADATASQLSERVER_READER = "metadatasqlserverreader"; - public static final String METADATAPHOENIX_READER = "metadataphoenixreader"; - public static final String METADATAHBASE_READER = "metadatahbasereader"; - public static final String METADATAES6_READER = "metadataes6reader"; - public static final String METADATAVERTICA_READER = "metadataverticareader"; public static final String GREENPLUM_READER = "greenplumreader"; public static final String PHOENIX5_READER = "phoenix5reader"; public static final String KINGBASE_READER = "kingbasereader"; public static final String RESTAPI_READER = "restapireader"; - public static final String SOCKET_READER = "socketreader"; + public static final String STREAM_WRITER = "streamwriter"; public static final String CARBONDATA_WRITER = "carbondatawriter"; @@ -96,4 +87,6 @@ public class PluginNameConstrant { public static final String GREENPLUM_WRITER = "greenplumwriter"; public static final String PHOENIX5_WRITER = "phoenix5writer"; public static final String KINGBASE_WRITER = "kingbasewriter"; + public static final String ALLUXIO_WRITER = "alluxiowriter"; + public static final String OSS_WRITER = "osswriter"; } diff --git a/flinkx-websocket/flinkx-websocket-core/src/main/java/com/dtstack/flinkx/websocket/constants/WebSocketConfig.java b/flinkx-websocket/flinkx-websocket-core/src/main/java/com/dtstack/flinkx/websocket/constants/WebSocketConfig.java deleted file mode 100644 index 5ede60ca6d..0000000000 --- a/flinkx-websocket/flinkx-websocket-core/src/main/java/com/dtstack/flinkx/websocket/constants/WebSocketConfig.java +++ /dev/null @@ -1,52 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package com.dtstack.flinkx.websocket.constants; - -/** 定义常量 - * @Company: www.dtstack.com - * @author kunni@dtstack.com - */ - -public class WebSocketConfig { - - public static final int DEFAULT_RETRY_TIME = 5; - - public static final int DEFAULT_RETRY_INTERVAL = 2000; - - /** - * 设置一个websocket client失败时的标志 - */ - public static final String KEY_EXIT0 = "exit0"; - - /** - * 以下是reader端读取的key值 - */ - public static final String KEY_WEB_SOCKET_SERVER_URL = "url"; - - public static final String KEY_RETRY_TIME = "retry"; - - public static final String KEY_RETRY_INTERVAL = "interval"; - - public static final String KEY_MESSAGE = "message"; - - public static final String KEY_CODEC = "codec"; - - public static final String KEY_PARAMS = "webSocketParams"; - -} diff --git a/flinkx-websocket/flinkx-websocket-reader/src/main/java/com/dtstack/flinkx/websocket/format/WebSocketClient.java b/flinkx-websocket/flinkx-websocket-reader/src/main/java/com/dtstack/flinkx/websocket/format/WebSocketClient.java deleted file mode 100644 index e23ed44a6e..0000000000 --- a/flinkx-websocket/flinkx-websocket-reader/src/main/java/com/dtstack/flinkx/websocket/format/WebSocketClient.java +++ /dev/null @@ -1,157 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package com.dtstack.flinkx.websocket.format; - -import com.dtstack.flinkx.util.ExceptionUtil; -import io.netty.bootstrap.Bootstrap; -import io.netty.channel.ChannelInitializer; -import io.netty.channel.ChannelOption; -import io.netty.channel.EventLoopGroup; -import io.netty.channel.nio.NioEventLoopGroup; -import io.netty.channel.socket.SocketChannel; -import io.netty.channel.socket.nio.NioSocketChannel; -import io.netty.handler.codec.http.HttpClientCodec; -import io.netty.handler.codec.http.HttpObjectAggregator; -import io.netty.handler.logging.LogLevel; -import io.netty.handler.logging.LoggingHandler; -import org.apache.flink.types.Row; -import org.slf4j.Logger; -import org.slf4j.LoggerFactory; - -import java.net.URI; -import java.net.URISyntaxException; -import java.util.concurrent.SynchronousQueue; -import java.util.concurrent.TimeUnit; - -import static com.dtstack.flinkx.websocket.constants.WebSocketConfig.KEY_EXIT0; - -/** - * 基于webSocket的netty客户端 - * @Company: www.dtstack.com - * @author kunni@dtstack.com - */ - -public class WebSocketClient { - - protected final Logger LOG = LoggerFactory.getLogger(getClass()); - - private URI uri; - - private EventLoopGroup group; - - private WebSocketClientHandler webSocketClientHandler; - - /** - * 重试次数 - */ - protected int retryTime; - - /** - * 重试间隔 - */ - protected int retryInterval; - - /** - * 通知客户端开启读写的信息 - */ - protected String message; - - protected SynchronousQueue queue; - - public WebSocketClient(SynchronousQueue queue, String serverUrl) throws URISyntaxException { - uri = new URI(serverUrl); - this.queue = queue; - } - - public void run() { - if(retryTime==0){ - return; - } - Bootstrap boot = new Bootstrap(); - group = new NioEventLoopGroup(); - webSocketClientHandler = new WebSocketClientHandler(queue, uri, this); - webSocketClientHandler.setMessage(message); - boot.option(ChannelOption.SO_KEEPALIVE, true) - .option(ChannelOption.TCP_NODELAY, true) - .group(group) - .handler(new LoggingHandler(LogLevel.INFO)) - .channel(NioSocketChannel.class) - .handler(new ChannelInitializer() { - @Override - protected void initChannel(SocketChannel socketChannel) { - socketChannel.pipeline() - .addLast(new HttpClientCodec()) - .addLast(new HttpObjectAggregator(1024*1024*10)) - .addLast(webSocketClientHandler); - } - }); - LOG.info("start bootstrapping"); - try{ - connect(boot, uri, retryTime, retryInterval); - }catch (Exception e){ - LOG.error(ExceptionUtil.getErrorMessage(e)); - } - } - - - /** - * 连接重试,只有这个方法才能使主任务失败 - * @param boot 启动引导器 - * @param uri uri - * @param retry 剩余重试次数 - */ - public void connect(Bootstrap boot, URI uri, int retry, int delay) { - boot.connect(uri.getHost(), uri.getPort()).addListener((future)->{ - if(future.isSuccess()){ - LOG.info("connect success"); - }else if(retry == 0){ - LOG.error("connect failed"); - // 设置失败标志位 - queue.put(Row.of(KEY_EXIT0)); - throw new RuntimeException("connect failed"); - }else{ - LOG.info("it's the {} time(s) try to connect to {}.", this.retryTime - retry + 1, uri.getRawPath()); - // 放到bootstrap重新调度运行 - boot.group().schedule( - () -> connect(boot, uri,retry-1, delay), delay, TimeUnit.MILLISECONDS); - } - }); - } - - - public WebSocketClient setRetryTime(int retryTime){ - this.retryTime = retryTime; - return this; - } - - public WebSocketClient setRetryInterval(int retryInterval){ - this.retryInterval = retryInterval; - return this; - } - - public WebSocketClient setMessage(String message){ - this.message = message; - return this; - } - - public void close(){ - group.shutdownGracefully(); - } - -} diff --git a/flinkx-websocket/flinkx-websocket-reader/src/main/java/com/dtstack/flinkx/websocket/format/WebSocketClientHandler.java b/flinkx-websocket/flinkx-websocket-reader/src/main/java/com/dtstack/flinkx/websocket/format/WebSocketClientHandler.java deleted file mode 100644 index 238ff4d40e..0000000000 --- a/flinkx-websocket/flinkx-websocket-reader/src/main/java/com/dtstack/flinkx/websocket/format/WebSocketClientHandler.java +++ /dev/null @@ -1,172 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package com.dtstack.flinkx.websocket.format; - -import com.dtstack.flinkx.util.ExceptionUtil; -import io.netty.channel.Channel; -import io.netty.channel.ChannelHandler; -import io.netty.channel.ChannelHandlerContext; -import io.netty.channel.ChannelPromise; -import io.netty.channel.SimpleChannelInboundHandler; -import io.netty.handler.codec.http.DefaultHttpHeaders; -import io.netty.handler.codec.http.FullHttpResponse; -import io.netty.handler.codec.http.websocketx.CloseWebSocketFrame; -import io.netty.handler.codec.http.websocketx.PongWebSocketFrame; -import io.netty.handler.codec.http.websocketx.TextWebSocketFrame; -import io.netty.handler.codec.http.websocketx.WebSocketClientHandshaker; -import io.netty.handler.codec.http.websocketx.WebSocketClientHandshakerFactory; -import io.netty.handler.codec.http.websocketx.WebSocketFrame; -import io.netty.handler.codec.http.websocketx.WebSocketHandshakeException; -import io.netty.handler.codec.http.websocketx.WebSocketVersion; -import io.netty.util.CharsetUtil; -import org.apache.flink.types.Row; -import org.slf4j.Logger; -import org.slf4j.LoggerFactory; - -import java.net.URI; -import java.util.concurrent.SynchronousQueue; - -/** - * webSocket消息处理 - * @Company: www.dtstack.com - * @author kunni@dtstack.com - */ - -@ChannelHandler.Sharable -public class WebSocketClientHandler extends SimpleChannelInboundHandler { - - protected final Logger LOG = LoggerFactory.getLogger(getClass()); - - private WebSocketClientHandshaker handShaker; - - /** - * 存放握手是否成功的Promise - */ - private ChannelPromise handshakeFuture; - - private SynchronousQueue queue; - - /** - * 保存client,用于重连 - */ - private WebSocketClient client; - - protected String message; - - public WebSocketClientHandler(SynchronousQueue queue, URI uri, WebSocketClient client) { - this.queue = queue; - this.client = client; - // 采用默认生成 - handShaker = WebSocketClientHandshakerFactory.newHandshaker( - uri, WebSocketVersion.V13, null, true, new DefaultHttpHeaders()); - } - /** - * 创建一个新的Promise - * 成功或失败在{@link WebSocketClientHandler#channelRead0(ChannelHandlerContext, Object)}设置 - */ - @Override - public void handlerAdded(ChannelHandlerContext ctx) { - this.handshakeFuture = ctx.newPromise(); - handshakeFuture.addListener((future)-> { - if(future.isSuccess()){ - LOG.info("handshake success!"); - // 发送开启读写信息 - LOG.info("send start commend {}", message); - WebSocketFrame frame = new TextWebSocketFrame(message); - ctx.channel().writeAndFlush(frame); - }else { - LOG.info("handShake failed"); - } - }); - } - - @Override - public void channelActive(ChannelHandlerContext ctx) { - handShaker.handshake(ctx.channel()); - } - - /** - * 关闭channel,设置异常失败标志 - * @param ctx channel上下文 - * @param cause 异常 - */ - @Override - public void exceptionCaught(ChannelHandlerContext ctx, Throwable cause) { - LOG.error("connect exception :{}", ExceptionUtil.getErrorMessage(cause)); - if (!handshakeFuture.isDone()) { - handshakeFuture.setFailure(cause); - } - ctx.close(); - LOG.info("connection is closed by server"); - LOG.info("reconnecting ......."); - // 通过调用run方法,实现重连尝试 - client.run(); - } - - /** - * 处理读取事件 - * @param ctx channel上下文 - * @param msg 消息体 - * @throws Exception 异常 - */ - @Override - protected void channelRead0(ChannelHandlerContext ctx, Object msg) throws Exception { - Channel ch = ctx.channel(); - // 设置握手结果 - if (!handShaker.isHandshakeComplete()) { - try { - handShaker.finishHandshake(ch, (FullHttpResponse) msg); - LOG.info("WebSocket Client connected!"); - handshakeFuture.setSuccess(); - } catch (WebSocketHandshakeException e) { - LOG.info("WebSocket Client failed to connect"); - handshakeFuture.setFailure(e); - } - return; - } - if (msg instanceof FullHttpResponse) { - FullHttpResponse response = (FullHttpResponse) msg; - throw new IllegalStateException("Unexpected FullHttpResponse (content=" + response.content().toString(CharsetUtil.UTF_8) + ')'); - } - //接收服务端的消息 - WebSocketFrame frame = (WebSocketFrame)msg; - //文本信息 - if (frame instanceof TextWebSocketFrame) { - TextWebSocketFrame textFrame = (TextWebSocketFrame)frame; - queue.put(Row.of(textFrame.text())); - LOG.debug("print webSocket message: {}", textFrame.text()); - } - //ping信息 - if (frame instanceof PongWebSocketFrame) { - LOG.info("WebSocket Client received pong"); - } - //关闭消息 - if (frame instanceof CloseWebSocketFrame) { - LOG.info("receive close frame"); - ch.close(); - // 进行重连尝试以设置任务运行结果 - client.run(); - } - } - - public void setMessage(String message){ - this.message = message; - } - -} diff --git a/flinkx-websocket/flinkx-websocket-reader/src/main/java/com/dtstack/flinkx/websocket/format/WebSocketInputFormat.java b/flinkx-websocket/flinkx-websocket-reader/src/main/java/com/dtstack/flinkx/websocket/format/WebSocketInputFormat.java deleted file mode 100644 index 82fd8cfd1d..0000000000 --- a/flinkx-websocket/flinkx-websocket-reader/src/main/java/com/dtstack/flinkx/websocket/format/WebSocketInputFormat.java +++ /dev/null @@ -1,140 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package com.dtstack.flinkx.websocket.format; - -import com.dtstack.flinkx.inputformat.BaseRichInputFormat; -import com.dtstack.flinkx.util.ExceptionUtil; -import org.apache.commons.lang3.StringUtils; -import org.apache.flink.core.io.GenericInputSplit; -import org.apache.flink.core.io.InputSplit; -import org.apache.flink.types.Row; - -import java.io.IOException; -import java.util.concurrent.SynchronousQueue; - -import static com.dtstack.flinkx.websocket.constants.WebSocketConfig.KEY_EXIT0; - -/** 读取指定WebSocketUrl中的数据 - * @Company: www.dtstack.com - * @author kunni - */ - -public class WebSocketInputFormat extends BaseRichInputFormat { - - private static final long serialVersionUID = 1L; - - public String serverUrl; - - protected WebSocketClient client; - - /** - * 重试次数 - */ - protected int retryTime; - - /** - * 重试间隔 - */ - protected int retryInterval; - - /** - * 通知客户端开启读写的信息 - */ - protected String message; - - /** - * todo 消息处理方法,暂不使用 - */ - protected String codec; - - /** - * 存放数据的队列 - */ - private final SynchronousQueue queue = new SynchronousQueue<>(); - - - @Override - protected void openInternal(InputSplit inputSplit) throws IOException { - try { - client = new WebSocketClient(queue, serverUrl) - .setRetryInterval(retryInterval) - .setRetryTime(retryTime) - .setMessage(message); - client.run(); - }catch (Exception e){ - throw new IOException(e); - } - } - - @Override - protected InputSplit[] createInputSplitsInternal(int minNumSplits) { - InputSplit[] inputSplits = new InputSplit[minNumSplits]; - for (int i = 0; i < minNumSplits; i++) { - inputSplits[i] = new GenericInputSplit(i,minNumSplits); - } - return inputSplits; - } - - @Override - protected Row nextRecordInternal(Row row) throws IOException { - try { - row = queue.take(); - // 设置特殊字符串,作为失败标志 - if(StringUtils.equals((CharSequence) row.getField(0), KEY_EXIT0)){ - throw new RuntimeException("webSocket client lost connection completely, job failed."); - } - } catch (InterruptedException e) { - LOG.error("takeEvent interrupted error: {}", ExceptionUtil.getErrorMessage(e)); - throw new IOException(e); - } - return row; - } - - @Override - protected void closeInternal() { - if(client != null){ - client.close(); - } - } - - @Override - public boolean reachedEnd() { - return false; - } - - public void setServerUrl(String serverUrl){ - this.serverUrl = serverUrl; - } - - public void setRetryTime(int retryTime){ - this.retryTime = retryTime; - } - - public void setRetryInterval(int retryInterval){ - this.retryInterval = retryInterval; - } - - public void setMessage(String message) { - this.message = message; - } - - public void setCodec(String codec){ - this.codec = codec; - } -} diff --git a/flinkx-websocket/flinkx-websocket-reader/src/main/java/com/dtstack/flinkx/websocket/reader/WebSocketInputFormatBuilder.java b/flinkx-websocket/flinkx-websocket-reader/src/main/java/com/dtstack/flinkx/websocket/reader/WebSocketInputFormatBuilder.java deleted file mode 100644 index b50072c25c..0000000000 --- a/flinkx-websocket/flinkx-websocket-reader/src/main/java/com/dtstack/flinkx/websocket/reader/WebSocketInputFormatBuilder.java +++ /dev/null @@ -1,125 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package com.dtstack.flinkx.websocket.reader; - -import com.dtstack.flinkx.config.SpeedConfig; -import com.dtstack.flinkx.constants.ConstantValue; -import com.dtstack.flinkx.inputformat.BaseRichInputFormatBuilder; -import com.dtstack.flinkx.util.ExceptionUtil; -import com.dtstack.flinkx.util.TelnetUtil; -import com.dtstack.flinkx.websocket.format.WebSocketInputFormat; -import org.apache.commons.collections.MapUtils; -import org.apache.commons.lang3.StringUtils; - -import java.net.URI; -import java.util.Iterator; -import java.util.Map; -import java.util.Set; - -/** 构建 WebSocketInputFormat - * @Company: www.dtstack.com - * @author kunni@dtstack.com - */ - -public class WebSocketInputFormatBuilder extends BaseRichInputFormatBuilder { - - private WebSocketInputFormat format; - - private String serverUrl; - - /** - * webSocket url前缀 - */ - private static final String WEB_SOCKET_PREFIX = "ws"; - - public WebSocketInputFormatBuilder(){ - super.format = format = new WebSocketInputFormat(); - } - - - protected void setServerUrl(String serverUrl, Map params){ - // 在url的基础上加上授权认证 - if(MapUtils.isNotEmpty(params)){ - StringBuilder stringBuilder = new StringBuilder(30); - stringBuilder.append('?'); - Set> set = params.entrySet(); - Iterator> iterator = set.iterator(); - while (iterator.hasNext()){ - Map.Entry entry = iterator.next(); - stringBuilder.append(entry.getKey()) - .append(ConstantValue.EQUAL_SYMBOL) - .append(entry.getValue()); - if(iterator.hasNext()){ - stringBuilder.append('&'); - } - } - serverUrl += stringBuilder.toString(); - } - this.serverUrl = serverUrl; - format.setServerUrl(serverUrl); - } - - protected void setRetryTime(int retryTime){ - format.setRetryTime(retryTime); - } - - protected void setRetryInterval(int retryInterval){ - format.setRetryInterval(retryInterval); - } - - protected void setMessage(String message) { - format.setMessage(message); - } - - protected void setCodec(String codec){ - format.setCodec(codec); - } - - @Override - protected void checkFormat() { - SpeedConfig speed = format.getDataTransferConfig().getJob().getSetting().getSpeed(); - StringBuilder sb = new StringBuilder(256); - if(StringUtils.isBlank(serverUrl)){ - sb.append("config error:[serverUrl] cannot be blank \n"); - }else{ - if(StringUtils.startsWith(serverUrl, WEB_SOCKET_PREFIX)){ - try{ - URI uri = new URI(serverUrl); - TelnetUtil.telnet(uri.getHost(), uri.getPort()); - } catch (Exception e) { - sb.append(String.format("telnet error:[serverUrl] = %s, e = %s ", serverUrl, ExceptionUtil.getErrorMessage(e))).append(" \n"); - } - }else { - sb.append("config error:[serverUrl] must start with [ws], current serverUrl is ").append(serverUrl).append(" \n"); - } - } - if(speed.getReaderChannel() > 1){ - sb.append("webSocket can not support readerChannel bigger than 1, current readerChannel is [") - .append(speed.getReaderChannel()) - .append("];\n"); - }else if(speed.getChannel() > 1){ - sb.append("webSocket can not support channel bigger than 1, current channel is [") - .append(speed.getChannel()) - .append("];\n"); - } - if(sb.length() > 0){ - throw new IllegalArgumentException(sb.toString()); - } - } -} diff --git a/flinkx-websocket/flinkx-websocket-reader/src/main/java/com/dtstack/flinkx/websocket/reader/WebsocketReader.java b/flinkx-websocket/flinkx-websocket-reader/src/main/java/com/dtstack/flinkx/websocket/reader/WebsocketReader.java deleted file mode 100644 index 844ff9a943..0000000000 --- a/flinkx-websocket/flinkx-websocket-reader/src/main/java/com/dtstack/flinkx/websocket/reader/WebsocketReader.java +++ /dev/null @@ -1,77 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package com.dtstack.flinkx.websocket.reader; - -import com.dtstack.flinkx.config.DataTransferConfig; -import com.dtstack.flinkx.config.ReaderConfig; -import com.dtstack.flinkx.reader.BaseDataReader; -import org.apache.flink.streaming.api.datastream.DataStream; -import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; -import org.apache.flink.types.Row; - -import java.util.Map; - -import static com.dtstack.flinkx.websocket.constants.WebSocketConfig.DEFAULT_RETRY_INTERVAL; -import static com.dtstack.flinkx.websocket.constants.WebSocketConfig.DEFAULT_RETRY_TIME; -import static com.dtstack.flinkx.websocket.constants.WebSocketConfig.KEY_CODEC; -import static com.dtstack.flinkx.websocket.constants.WebSocketConfig.KEY_MESSAGE; -import static com.dtstack.flinkx.websocket.constants.WebSocketConfig.KEY_PARAMS; -import static com.dtstack.flinkx.websocket.constants.WebSocketConfig.KEY_RETRY_INTERVAL; -import static com.dtstack.flinkx.websocket.constants.WebSocketConfig.KEY_RETRY_TIME; -import static com.dtstack.flinkx.websocket.constants.WebSocketConfig.KEY_WEB_SOCKET_SERVER_URL; - - -/** 从入参中获取配置信息 - * @Company: www.dtstack.com - * @author kunni@dtstack.com - */ - -public class WebsocketReader extends BaseDataReader { - - protected String serverUrl; - protected int retryTime; - protected int interval; - protected String message; - protected String codec; - protected Map params; - - @SuppressWarnings("unchecked") - public WebsocketReader(DataTransferConfig config, StreamExecutionEnvironment env) { - super(config, env); - ReaderConfig readerConfig = config.getJob().getContent().get(0).getReader(); - serverUrl = readerConfig.getParameter().getStringVal(KEY_WEB_SOCKET_SERVER_URL); - retryTime = readerConfig.getParameter().getIntVal(KEY_RETRY_TIME, DEFAULT_RETRY_TIME); - interval = readerConfig.getParameter().getIntVal(KEY_RETRY_INTERVAL, DEFAULT_RETRY_INTERVAL); - message = readerConfig.getParameter().getStringVal(KEY_MESSAGE); - codec = readerConfig.getParameter().getStringVal(KEY_CODEC); - params = (Map)readerConfig.getParameter().getVal(KEY_PARAMS); - } - - @Override - public DataStream readData() { - WebSocketInputFormatBuilder builder = new WebSocketInputFormatBuilder(); - builder.setServerUrl(serverUrl, params); - builder.setRetryTime(retryTime); - builder.setRetryInterval(interval); - builder.setMessage(message); - builder.setCodec(codec); - builder.setDataTransferConfig(dataTransferConfig); - return createInput(builder.finish()); - } -} diff --git a/flinkx-websocket/pom.xml b/flinkx-websocket/pom.xml deleted file mode 100644 index 6cecb78342..0000000000 --- a/flinkx-websocket/pom.xml +++ /dev/null @@ -1,28 +0,0 @@ - - - - flinkx-all - com.dtstack.flinkx - 1.6 - - 4.0.0 - - flinkx-websocket - pom - - flinkx-websocket-core - flinkx-websocket-reader - - - - - com.dtstack.flinkx - flinkx-core - 1.6 - provided - - - - \ No newline at end of file diff --git a/jars/readme.md b/jars/readme.md index e6a930809e..312bd13725 100644 --- a/jars/readme.md +++ b/jars/readme.md @@ -1,6 +1,6 @@ -# 打包找不到db2和oracle相关驱动包临时解决办法 +# 打包找不到驱动包解决办法 -下载这连个驱动包,上传到本地仓库: +下载对应驱动包,上传到本地仓库: db2:[下载](db2jcc-3.72.44.jar) @@ -10,16 +10,28 @@ gbase:[下载](gbase-8.3.81.53.jar) 达梦:[下载](Dm7JdbcDriver18.jar) +人大金仓:[下载](kingbase8-8.2.0.jar) + +vertica:[下载](vertica-jdbc-9.1.1-0.jar) + 然后上传到本地仓库: ``` -mvn install:install-file -DgroupId=com.ibm.db2 -DartifactId=db2jcc -Dversion=3.72.44 -Dpackaging=jar -Dfile=db2jcc-3.72.44.jar +## db2 driver +mvn install:install-file -DgroupId=com.ibm.db2 -DartifactId=db2jcc -Dversion=3.72.44 -Dpackaging=jar -Dfile=../jars/db2jcc-3.72.44.jar -mvn install:install-file -DgroupId=com.github.noraui -DartifactId=ojdbc8 -Dversion=12.2.0.1 -Dpackaging=jar -Dfile=ojdbc8-12.2.0.1.jar +## oracle driver +mvn install:install-file -DgroupId=com.github.noraui -DartifactId=ojdbc8 -Dversion=12.2.0.1 -Dpackaging=jar -Dfile=../jars/ojdbc8-12.2.0.1.jar -mvn install:install-file -DgroupId=com.esen.jdbc -DartifactId=gbase -Dversion=8.3.81.53 -Dpackaging=jar -Dfile=gbase-8.3.81.53.jar +## gbase driver +mvn install:install-file -DgroupId=com.esen.jdbc -DartifactId=gbase -Dversion=8.3.81.53 -Dpackaging=jar -Dfile=../jars/gbase-8.3.81.53.jar -mvn install:install-file -DgroupId=com.dm -DartifactId=Dm7JdbcDriver18 -Dversion=7.6.0.197 -Dpackaging=jar -Dfile=Dm7JdbcDriver18.jar -``` +## dm driver +mvn install:install-file -DgroupId=dm.jdbc.driver -DartifactId=dm7 -Dversion=18.0.0 -Dpackaging=jar -Dfile=../jars/Dm7JdbcDriver18.jar -说明:这几个驱动包在我们自己搭建的仓库里有,并且这几个版本的驱动包在已经在生产环境中使用,所以不能很快修改版本,需要做相关测试,我们会在后期的版本中修改这两个驱动包的版本,可以先暂时下载安装驱动来解决。 +## kingbase driver +mvn install:install-file -DgroupId=com.kingbase8 -DartifactId=kingbase8 -Dversion=8.2.0 -Dpackaging=jar -Dfile=../jars/kingbase8-8.2.0.jar + +## vertica driver +mvn install:install-file -DgroupId=fakepath -DartifactId=vertica-jdbc -Dversion=9.1.1-0 -Dpackaging=jar -Dfile=../jars/vertica-jdbc-9.1.1-0.jar +``` diff --git a/jars/settings.xml b/jars/settings.xml new file mode 100644 index 0000000000..85c4fa82f5 --- /dev/null +++ b/jars/settings.xml @@ -0,0 +1,18 @@ + + + + + /home/apache-maven-3.6.1/repository + + + + alimaven + aliyun maven + http://maven.aliyun.com/nexus/content/groups/public/ + central + + + + \ No newline at end of file diff --git a/pom.xml b/pom.xml index 19cbe63546..23a8facc61 100644 --- a/pom.xml +++ b/pom.xml @@ -13,7 +13,7 @@ flinkx-core flinkx-launcher - + flinkx-test flinkx-stream @@ -28,20 +28,23 @@ flinkx-gbase flinkx-clickhouse flinkx-saphana + flinkx-teradata flinkx-greenplum flinkx-kingbase flinkx-hdfs + flinkx-oss flinkx-hive flinkx-es flinkx-ftp flinkx-odps flinkx-hbase - + flinkx-hbase2 flinkx-phoenix5 flinkx-carbondata flinkx-kudu flinkx-cassandra + flinkx-alluxio flinkx-redis flinkx-mongodb @@ -49,39 +52,27 @@ flinkx-binlog flinkx-kb - flinkx-kafka09 flinkx-kafka10 flinkx-kafka11 flinkx-kafka flinkx-emqx - + flinkx-pulsar + flinkx-pgwal + flinkx-restapi flinkx-sqlservercdc flinkx-oraclelogminer - - - flinkx-metadata - flinkx-metadata-hive2 - flinkx-metadata-hive1 - flinkx-metadata-tidb - flinkx-metadata-oracle - flinkx-metadata-mysql - flinkx-metadata-sqlserver - flinkx-metadata-phoenix5 - flinkx-metadata-hbase - flinkx-metadata-vertica - - - flinkx-restapi - flinkx-websocket - flinkx-socket + flinkx-hudi UTF-8 - 1.10.1 + 1.11.3 + 2.12 2.7.3 + 2.3.1 4.5.3 ${basedir}/dev + release_1.11.0 @@ -97,6 +88,12 @@ org.apache.hadoop hadoop-common ${hadoop.version} + + + org.apache.commons + commons-math3 + + provided @@ -142,7 +139,7 @@ junit junit - 4.12 + 4.13.1 test @@ -201,28 +198,6 @@ flinkx-java-docs - - pl.project13.maven - git-commit-id-plugin - 2.2.6 - - - - revision - - - - - yyyy.MM.dd HH:mm:ss - true - true - - false - -dirty - false - - -