环境
hudi版本0.14(此时hudi的pom中指定了hbase版本是2.4.9)
spark版本3.3.3
hadoop版本3.1.3
存储:HDFS
问题
使用spark3向hudi表insert数据时报如下错误
insert into hudi_cow_pt_tbl partition (dt, hh)
select 1 as id, 'a1' as name, 1000 as ts, '2021-12-09' as dt, '10' as hh;
java.lang.NoSuchMethodError: org.apache.hadoop.hdfs.client.HdfsDataInputStream.getReadStatistics()Lorg/apache/hadoop/hdfs/DFSInputStream$ReadStatistics;
at org.apache.hudi.org.apache.hadoop.hbase.io.FSDataInputStreamWrapper.updateInputStreamStatistics(FSDataInputStreamWrapper.java:249)
at org.apache.hudi.org.apache.hadoop.hbase.io.FSDataInputStreamWrapper.close(FSDataInputStreamWrapper.java:296)
at org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.closeStreams(HFileBlock.java:1825)
at org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFilePreadReader.close(HFilePreadReader.java:107)
at org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.close(HFileReaderImpl.java:1421)
at org.apache.hudi.io.storage.HoodieHFileReader.close(HoodieHFileReader.java:218)
at org.apache.hudi.metadata.HoodieBackedTableMetadata.closeReader(HoodieBackedTableMetadata.java:574)
at org.apache.hudi.metadata.HoodieBackedTableMetadata.close(HoodieBackedTableMetadata.java:567)
at org.apache.hudi.metadata.HoodieBackedTableMetadata.close(HoodieBackedTableMetadata.java:554)
at org.apache.hudi.metadata.HoodieMetadataFileSystemView.close(HoodieMetadataFileSystemView.java:83)
at org.apache.hudi.common.table.view.FileSystemViewManager.clearFileSystemView(FileSystemViewManager.java:86)
at org.apache.hudi.timeline.service.handlers.FileSliceHandler.refreshTable(FileSliceHandler.java:118)
at org.apache.hudi.timeline.service.RequestHandler.lambda$registerFileSlicesAPI$19(RequestHandler.java:390)
at org.apache.hudi.timeline.service.RequestHandler$ViewHandler.handle(RequestHandler.java:501)
at io.javalin.security.SecurityUtil.noopAccessManager(SecurityUtil.kt:22)
at io.javalin.Javalin.lambda$addHandler$0(Javalin.java:606)
at io.javalin.core.JavalinServlet$service$2$1.invoke(JavalinServlet.kt:46)
at io.javalin.core.JavalinServlet$service$2$1.invoke(JavalinServlet.kt:17)
at io.javalin.core.JavalinServlet$service$1.invoke(JavalinServlet.kt:143)
at io.javalin.core.JavalinServlet$service$2.invoke(JavalinServlet.kt:41)
解决办法
查看FSDataInputStreamWrapper.updateInputStreamStatistics,发现FSDataInputStreamWrapper是hbase-server jar包中的方法,于是看了下hbase源码中FSDataInputStreamWrapper::updateInputStreamStatistics方法。
private void updateInputStreamStatistics(FSDataInputStream stream) {
// If the underlying file system is HDFS, update read statistics upon close.
if (stream instanceof HdfsDataInputStream) {
/**
* Because HDFS ReadStatistics is calculated per input stream, it is not
* feasible to update the aggregated number in real time. Instead, the
* metrics are updated when an input stream is closed.
*/
HdfsDataInputStream hdfsDataInputStream = (HdfsDataInputStream)stream;
synchronized (readStatistics) {
readStatistics.totalBytesRead += hdfsDataInputStream.getReadStatistics().getTotalBytesRead();
readStatistics.totalLocalBytesRead += hdfsDataInputStream.getReadStatistics().getTotalLocalBytesRead();
readStatistics.totalShortCircuitBytesRead += hdfsDataInputStream.getReadStatistics().getTotalShortCircuitBytesRead();
readStatistics.totalZeroCopyBytesRead += hdfsDataInputStream.getReadStatistics().getTotalZeroCopyBytesRead();
}
}
}
报错是在hdfsDataInputStream.getReadStatistics()中,进入getReadStatistics()方法(此方法是在hadoop-hdfs-client的jar包中),此方法返回的是ReadStatistics对象(org.apache.hadoop.hdfs.ReadStatistics),但是在源码中返回的却不是这个,很明显不对!!!
此时想到是不是依赖的版本问题。于是看了下hbase根目录下hadoop-hdfs-client的版本,发现hbase2.4.9导入的是Hadoop2版本,但是Hadoop2版本中的hadoop-hdfs-client没有ReadStatistics这个类!
将其改为hadoop-three.version,然后maven重新reload下。接下来将hbase重新编译下,将hbase-server-2.4.9jar包放到自己的maven仓库中,然后对hudi再重新编译即可