hadoop(2)-HA/federation集群原理及搭建

1. HA 机制

HA:High Available,高可用

  1. 在Hadoop 2.0之前,在HDFS 集群中NameNode 存在单点故障 (SPOF:A Single Point of Failure)。对于只有一个 NameNode 的集群,如果 NameNode 机器出现故障(比如宕机或是软件、硬件升级),那么整个集群将无法使用,直到 NameNode 重新启动。

  2. 那如何解决呢?HDFS 的 HA 功能通过配置 Active/Standby 两个 NameNodes 实现在集群中对 NameNode 的热备来解决上述问题。如果出现故障,如机器崩溃或机器需要升级维护,这时可通过此种方式将 NameNode 很快的切换到另外一台机器。

  3. 在一个典型的 HDFS(HA) 集群中,使用两台单独的机器配置为 NameNodes 。在任何时间点,确保 NameNodes 中只有一个处于 Active 状态,其他的处在 Standby 状态。其中ActiveNameNode 负责集群中的所有客户端操作,StandbyNameNode 仅仅充当备机,保证一旦 ActiveNameNode 出现问题能够快速切换。

  4. 为了能够实时同步 Active 和 Standby 两个 NameNode 的元数据信息(实际上 editlog),需提供一个共享存储系统,可以是 NFS、QJM(Quorum Journal Manager)或者 Zookeeper,ActiveNamenode 将数据写入共享存储系统,而 Standby 监听该系统,一旦发现有新数据写入,则读取这些数据,并加载到自己内存中,以保证自己内存状态与 Active NameNode 保持基本一致,如此这般,在紧急情况下 standby 便可快速切为 active namenode。为了实现快速切换,Standby 节点获取集群的最新文件块信息也是很有必要的。为了实现这一目标,DataNode 需要配置 NameNodes 的位置,并同时给他们发送文件块信息以及心跳检测。

在典型的HA群集中,两台独立的计算机配置为NameNode。在任何时间点,其中一个NameNode处于活动状态,另一个处于待机状态。

为了使备用节点保持其状态与Active节点同步,两个节点都与一组称为“JournalNodes”(JN)的单独守护进程通信。

机器分配:

  • NN 需要大内存的机器
  • DN 需要大硬盘量的机器
  • JN 无要求 仅仅数据读写

JN集群保证NN和SNN数据的同步,即使没有ZK也能正常运作。但是需要手动切换NN和SNN,即使NN挂掉也需要手动向SNN激活。

2. HDFS-HA集群的搭建

hadoop(1)中,我们搭建了最简单的hadoop集群。本文我们将搭建HDFS HA环境。
ZKZKFC是自动切换时的组件。如没有ZK可以手动切换。HDFS-HA必要组件:

  • ActiveNameNode
  • StandbyNameNode
  • Journalnode
  • ZK
  • ZKFC

为了同步速度,我们通常将ZKFC、JN和NameNode在同一台机器(线上环境也是这样)。如下:

机器服务
master1NameNode,ZKFC,JN
master2NameNode,ZKFC,JN
zk1ZK,JN
zk2ZK
zk3ZK
hadoop1~6DN

部署步骤总述

  1. 配置好HA的JN配置文件(下文有配置)
  2. 启动JNN(jouaryNode集群) hadoop-daemon.sh start journalnode
  3. hdfs namenode –format(格式化namenode前启动jouaryNode是因为namenode格式化fsimage和edits.log文件,如果两个namenode都格式化,会造成fsimage中的数据不一致,所以要先启动jounaryNode,然后再格式化Namenode,然后将fsimage共享给另一个namenode)hdfs namenode -format
  4. 启动第一个namenode(hadoop-daemon.sh start namenode)
  5. 然后以-bootstrapStandby启动第二个namenode (将这个namenode设置为备用namenode) hdfs namenode -bootstrapStandby 如果这步报错了,看下面注意部分
  6. 启动zookeeper集群,每个zookeeper节点上执行: zkServer.sh start
  7. 格式化zkfc:hdfs zkfc -formatZK(因为zkfc依赖于zookeeper,因为格式化后会加一把锁在zookeeper上,会在zookeeper集群上创建一个节点,如果此时zookeeper未启动的话,会报错)
  8. 启动zookeeper客户端查看zkfc是否生成对应的目录结构 zkCli.sh 回车
  9. 启动hadoop集群 start-dfs.sh
  10. 在zookeeper客户端可以查看到生成了锁,使用get获取锁的信息:锁是node1的

前提:

  • ssh环境
  • jdk环境

1.JN配置

core-site.xml

<configuration>
  <property>
    <name>ha.zookeeper.quorum</name>
    <value>zk1,zk2,zk3</value>
  </property>
  <property>
    <name>ha.zookeeper.session-timeout.ms</name>
    <value>10000</value>
  </property>
  <property>
    <name>hadoop.tmp.dir</name>
    <value>/home/hadoop/cluster-data</value>
  </property>
</configuration>

hdfs-site.xml

<configuration>
  <property>
    <name>dfs.journalnode.edits.dir</name>
    <value>/home/hadoop/hadoopdata/journaldata</value>
  </property>
  <property>
    <name>dfs.namenode.shared.edits.dir.ns1</name>
    <value>qjournal://master1:8485;master2:8485;zk1:8485/ns1</value>
  </property>
</configuration>

注意:这里dfs.namenode.shared.edits.dir.ns1中的ns1需要和自己配置的ns对应准确。

操作记录

[hadoop@master1 ~]$ tar -zxf hadoop-3.2.0.tar.gz
[hadoop@master1 ~]$ mv hadoop-3.2.0 hadoop-journalnode-3.2.0
[hadoop@master1 ~]$ ln -s hadoop-journalnode-3.2.0 journalnode-current
[hadoop@master1 ~]$ cd journalnode-current/etc/hadoop/
# 编辑core-site.xml和hdfs-site.xml
# 分发
[hadoop@master1 ~]$ scp -r hadoop-journalnode-3.2.0 master2:~/
[hadoop@master1 ~]$ scp -r hadoop-journalnode-3.2.0 zk1:~/
# 启动
[hadoop@master1 ~]$ ./journalnode-current/sbin/hadoop-daemon.sh start journalnode
WARNING: Attempting to execute replacement "hdfs --daemon start" instead.
[hadoop@master1 ~]$ jps -l
48405 org.apache.hadoop.hdfs.qjournal.server.JournalNode
48493 sun.tools.jps.Jps

3台机器均启动JN: ./journalnode-current/sbin/hadoop-daemon.sh start journalnode


2.NN

hadoop-env.sh

export JAVA_HOME=/home/hadoop/java-current
export HADOOP_TMP_DIR=$HOME/cluster-data
export HADOOP_LOG_DIR=${HADOOP_TMP_DIR}/logs

core-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<configuration>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://ns1</value>
  </property>
  <property>
    <name>hadoop.tmp.dir</name>
    <value>/home/hadoop/cluster-data</value>
  </property>
</configuration>

hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>

<configuration>
  <property>
    <name>dfs.nameservices</name>
    <value>ns1</value>
  </property>
 <property>
    <name>dfs.client.failover.proxy.provider.ns1</name>
    <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
  </property>
  <property>
     <name>dfs.ha.fencing.methods</name>
     <value>sshfence</value>
  </property>
  <property>
    <name>dfs.ha.fencing.ssh.private-key-files</name>
    <value>/home/hadoop/.ssh/id_rsa</value>
  </property>
  <property>
    <name>dfs.ha.namenodes.ns1</name>
    <value>nn11,nn12</value>
  </property>
  <property>
    <name>dfs.namenode.http-address.ns1.nn11</name>
    <value>master1:50070</value>
  </property>
  <property>
    <name>dfs.namenode.http-address.ns1.nn12</name>
    <value>master2:50070</value>
  </property>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:///home/hadoop/hadoopdata/hadoop-namenode</value>
  </property>
  <property>
    <name>dfs.namenode.rpc-address.ns1.nn11</name>
    <value>master1:8020</value>
  </property>
  <property>
    <name>dfs.namenode.rpc-address.ns1.nn12</name>
    <value>master2:8020</value>
  </property>
    <property>
    <name>dfs.namenode.servicerpc-address.ns1.nn11</name>
    <value>master1:8021</value>
  </property>
  <property>
    <name>dfs.namenode.servicerpc-address.ns1.nn12</name>
    <value>master2:8021</value>
  </property>
    <property>
    <name>dfs.namenode.shared.edits.dir</name>
    <value>qjournal://master1:8485;master2:8485;zk1:8485/ns1</value>
  </property>
  <property>
    <name>dfs.namenode.lifeline.rpc-address.ns1.nn11</name>
    <value>master1:8022</value>
  </property>
  <property>
    <name>dfs.namenode.lifeline.rpc-address.ns1.nn12</name>
    <value>master2:8022</value>
  </property>
</configuration>

注意:

  1. 直接启动会报需要新建上述namenode目录:/home/hadoop/hadoopdata/hadoop-namenode
  2. 即使新建了上述目录再次启动,也会报没有格式化,需要格式化。

格式化:./hadoop-current/bin/hdfs namenode –format
启动 : ./hadoop-current/bin/hdfs --daemon start namenode

看到该节点的master已经启动,不过是standby模式。

此时还不能访问hdfs文件系统。需要启动active namenode。
nn2在格式化时,不能再使用相同的命令,要使用standby将nn1的同步过来:
在另一台机器上使用命令:hdfs namenode -bootstrapStandby 或 (hadoop namenode -bootstrapStandby),并启动:

[hadoop@master1]$ scp -r hadoop-namenode-3.2.0 master2:~/
[hadoop@master2 ~]$ ln -s  hadoop-namenode-3.2.0 hadoop-current
[hadoop@master2 ~]$ mkdir -p /home/hadoop/hadoopdata/hadoop-namenode

# 格式化nn2(仅首次),但是不能同nn1一样命令
[hadoop@master2 ~]$ ./hadoop-current/bin/hdfs namenode -bootstrapStandby

会看到格式化bootstrapStandby时同步信息:

STARTUP_MSG:   java = 1.8.0_77
************************************************************/
2020-11-08 18:52:43,545 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
2020-11-08 18:52:43,626 INFO namenode.NameNode: createNameNode [-bootstrapStandby]
2020-11-08 18:52:43,726 INFO ha.BootstrapStandby: Found nn: nn11, ipc: master1/10.96.83.87:8021
=====================================================
About to bootstrap Standby ID nn12 from:
           Nameservice ID: ns1
        Other Namenode ID: nn11
  Other NN's HTTP address: http://master1:50070
  Other NN's IPC  address: master1/10.96.83.87:8021
             Namespace ID: 1133849645
            Block pool ID: BP-1585020858-10.96.83.87-1604831278266
               Cluster ID: CID-bbab029f-eb45-4021-8d1e-e6e00fa73f2d
           Layout version: -64
       isUpgradeFinalized: true
=====================================================
2020-11-08 18:52:44,639 INFO common.Storage: Storage directory /home/hadoop/hadoopdata/hadoop-namenode has been successfully formatted.
2020-11-08 18:52:44,690 INFO namenode.FSEditLog: Edit logging is async:true
2020-11-08 18:52:44,782 INFO namenode.TransferFsImage: Opening connection to http://master1:50070/imagetransfer?getimage=1&txid=0&storageInfo=-64:1133849645:1604831278266:CID-bbab029f-eb45-4021-8d1e-e6e00fa73f2d&bootstrapstandby=true
2020-11-08 18:52:44,805 INFO common.Util: Combined time for file download and fsync to all disks took 0.00s. The file download took 0.00s at 0.00 KB/s. Synchronous (fsync) write to disk of /home/hadoop/hadoopdata/hadoop-namenode/current/fsimage.ckpt_0000000000000000000 took 0.00s.
2020-11-08 18:52:44,805 INFO namenode.TransferFsImage: Downloaded file fsimage.ckpt_0000000000000000000 size 391 bytes.
2020-11-08 18:52:44,818 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at master2/10.96.95.222
************************************************************/

启动namenode后,master2也有namenode进程。

[hadoop@master2 ~]$ ./hadoop-current/bin/hdfs --daemon start namenode

master2此时也是standby状态。

【启动active】 from master2 to master1 将1变成active:

此时master1变成active:

[hadoop@master2 ~]$ ./hadoop-current/bin/hdfs haadmin -failover --forceactive nn12 nn11
Failover from nn12 to nn11 successful

[排错记录]

  1. 注意:上述曾出现过faillover失败了,是因为hdfs-site.xml少了配置。namenode必须强制需要一种fencing方法。在hdfs-site.xml中必须要有关于ha的3条配置(上述已经更新过了):
[hadoop@master2 ~]$ ./hadoop-current/bin/hdfs haadmin -failover --forceactive nn12 nn11
Illegal argument: failover requires a fencer
 <property>
    <name>dfs.client.failover.proxy.provider.ns1</name>
    <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
  </property>
  <property>
      <name>dfs.ha.fencing.methods</name>
      <value>sshfence</value>
  </property>
  <property>
    <name>dfs.ha.fencing.ssh.private-key-files</name>
    <value>/home/hadoop/.ssh/id_rsa</value>
  </property>
  1. 注意:namenode format只能进行一次!不小心在slave上format后,导致集群VERSION文件中的cluster_Id对应不上,datanode不能起起来。
  2. 不能将上述nn11,nn12写成主机名,否则不能识别。nn11,nn12是在hdfs-site.xml中配置的引用。
  3. 有问题通过NN或DN的启动日志来排查。

关于切换的其他命令:

$ hdfs haadmin -transitionToActive nn1
$ hdfs haadmin -getServiceState nn1
active

3. 手动切换HA

关闭active的NN,直接stop或者kill:

[hadoop@master1 ~]$ hadoop-current/bin/hdfs --daemon stop namenode

此时已经不能再操作hdfs文件系统。如hadoop fs -ls /等。
我们通过命令将master2激活:

[hadoop@master2 ~]$ hadoop-current/bin/hdfs haadmin -failover --forceactive nn11 nn12
Failover from nn11 to nn12 successful

切换成功,虽然报错链接不上master1,因为已经关闭了master1,正常。
此时,访问hdfs正常。

配置slaves并刷新(非必选项,可不配)
要让NN能够知道自己的slave文件在哪里(默认就是etc/hadoop/slaves)以及里面的配置是什么。
配主机名即可:

hadoop1
hadoop2
hadoop3
hadoop4
hadoop5
hadoop6

在NN上刷新slaves配置:

[hadoop@master1 ~]$ ./hadoop-current/bin/hadoop dfsadmin  -refreshNodes
Refresh nodes successful for master1/10.96.83.87:8020
Refresh nodes successful for master2/10.96.95.222:8020

说明:不配置slaves时,直接启动dn,让dn来注册也是可以的。


4. 启动DN

DN配置(注意:测试环境也可以直接和NN配置一样也行) :
core-site.xml

<configuration>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://ns1</value>
  </property>
</configuration>

hdfs-site.xml

<configuration>
  <!-- need modify-->
  <property>
    <name>dfs.data.dir</name>
    <value>file:///home/hadoop/data/dfs/dn</value>
  </property>
<property>
   <name>dfs.datanode.failed.volumes.tolerated</name>
   <value>0</value>
</property>
  <property>
    <name>dfs.nameservices</name>
    <value>ns1</value>
  </property>
  <property>
    <name>dfs.ha.namenodes.ns1</name>
    <value>nn11,nn12</value>
  </property>

  <property>
    <name>dfs.namenode.rpc-address.ns1.nn11</name>
    <value>master1:8020</value>
  </property>
  <property>
    <name>dfs.namenode.rpc-address.ns1.nn12</name>
    <value>master2:8020</value>
  </property>
  <property>
    <name>dfs.namenode.http-address.ns1.nn11</name>
    <value>master1:50070</value>
  </property>
  <property>
    <name>dfs.namenode.http-address.ns1.nn12</name>
    <value>master2:50070</value>
  </property>
  <property>
    <name>dfs.namenode.servicerpc-address.ns1.nn11</name>
    <value>master1:8021</value>
  </property>
  <property>
    <name>dfs.namenode.servicerpc-address.ns1.nn12</name>
    <value>master2:8021</value>
  </property>
  <property>
    <name>dfs.client.failover.proxy.provider.ns1</name>
    <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
  </property>
    <property>
    <name>dfs.datanode.max.xcievers</name>
    <value>8192</value>
  </property>
  <property>
    <name>dfs.datanode.handler.count</name>
    <value>196</value>
  </property>
  <property>
    <name>dfs.datanode.balance.bandwidthPerSec</name>
    <value>104857600</value>
  </property>
  <property>
    <name>dfs.datanode.balance.max.concurrent.moves</name>
    <value>100</value>
  </property>
<!--
  <property>
    <name>dfs.client.read.shortcircuit</name>
    <value>true</value>
  </property>

  <property>
    <name>dfs.domain.socket.path</name>
    <value>/var/lib/hadoop-hdfs/dn_socket</value>
  </property>
-->
  <property>
    <name>dfs.datanode.du.reserved</name>
    <value>322122547200</value>
  </property>
  <property>
    <name>fs.du.interval</name>
    <value>3600000</value>
  </property>

    <property>
    <name>dfs.namenode.lifeline.rpc-address.ns1.nn11</name>
    <value>master1:8022</value>
  </property>
  <property>
    <name>dfs.namenode.lifeline.rpc-address.ns1.nn12</name>
    <value>master2:8022</value>
  </property>

   <!-- 以下是DN端口的配置,也可以直接不配,走默认配置,hdfs3和hdfs2可能有些默认端口不一样,如50010端口 -->
   <property>
    <name>dfs.datanode.https.address</name>
    <value>0.0.0.0:50475</value>
    <description>The datanode secure http server address and port.</description>
  </property>
   <property>
    <name>dfs.datanode.address</name>
    <value>0.0.0.0:50010</value>
    <description>
    The datanode server address and port for data transfer.
    </description>
  </property>

  <property>
    <name>dfs.datanode.http.address</name>
    <value>0.0.0.0:50075</value>
    <description>
    The datanode http server address and port.
    </description>
  </property>

  <property>
    <name>dfs.datanode.ipc.address</name>
    <value>0.0.0.0:50020</value>
    <description>
    The datanode ipc server address and port.
    </description>
  </property>

    <property>
    <name>dfs.blockreport.intervalMsec</name>
    <value>43200000</value>
    <description>Determines block reporting interval in milliseconds.</description>
  </property>
</configuration>

启动一台DN:

$ ./hadoop-current/bin/hdfs --daemon start datanode

将配置分发到其他机器,并启动其他机器的DN进程。

[排错记录]
1.挂盘容忍失败设置必须小于挂盘数才行

Invalid value configured for dfs.datanode.failed.volumes.tolerated - 1. Value configured is >= to the n
umber of configured volumes (1)

2.短路读路径

Although a UNIX domain socket path is configured as /var/lib/hadoop-hdfs/dn_socket, we cannot start a localDataXceiverServer because libhadoop cannot be loaded

3.启动dn时,可能会报clusterIDs不一致,原因是该dn节点之前做过其他集群的dn。所以要删掉dn文件目录。注意,即使做的federation,集群id也是一样的。

2021-01-17 18:45:53,160 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /home/hadoop/data/dfs/dn/in_use.lock acquired by nodename 239@hadoop2
2021-01-17 18:45:53,160 WARN org.apache.hadoop.hdfs.server.common.Storage: Failed to add storage directory [DISK]file:/home/hadoop/data/dfs/dn
java.io.IOException: Incompatible clusterIDs in /home/hadoop/data/dfs/dn: namenode clusterID = CID-bbab029f-eb45-4021-8d1e-e6e00fa73f2d; datanode clusterID =
CID-35e7b038-59af-4f89-989e-bf21834a537b
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:736)
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.loadStorageDirectory(DataStorage.java:294)
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.loadDataStorage(DataStorage.java:407)

4.集群总容量为0:Configured Capacity: 0 B,如下(ANN和SNN均显示如此):

  • (1) 将hosts文件中还存在127.0.0.1 localhost的映射。直接将127.0.0.1的映射全部删掉。操作后,仍然不行。
    在dn上执行:
[hadoop@10 ~]$ ./hadoop-current/bin/hdfs dfsadmin -report
2021-01-29 17:28:17,880 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
report: The short-circuit local reads feature is enabled but dfs.domain.socket.path is not set.
  • (2) 该告警是配置了的短路读,但是短路读的路径没设置。但是如果设置了短路读路径,又会报native lib找不到。在测试环境,干脆不配置短路读。
java.lang.RuntimeException: Although a UNIX domain socket path is configured as /var/lib/hadoop-hdfs/dn_socket, we cannot start a localDataXceiverServer because libhadoop cannot be loaded.
  • (3) 去掉短路读的配置后,仍然是0。最后发现罪魁祸首是以下配置原因:
  <property>
    <name>dfs.datanode.du.reserved</name>
    <value>322122547200</value>
  </property>
  <property>
    <name>fs.du.interval</name>
    <value>3600000</value>
  </property>

dfs.datanode.du.reserved 是非HDFS空间,就是系统保留空间,上述300G,自然是太多了,直接去掉解决该问题。

5. ZK和ZKFC

ZK和ZKFC的原理见:ZKFC原理解析
ZKFC需要依赖ZK环境,确保先启动ZK服务。
ZKFC本质上也是hadoop包起的进程,需要配置core-site.xmlhdfs-site.xml

core-site.xml

<configuration>
  <property>
    <name>ha.failover-controller.graceful-fence.rpc-timeout.ms</name>
    <value>600000</value>
  </property>
  <property>
    <name>ha.failover-controller.new-active.rpc-timeout.ms</name>
    <value>600000</value>
  </property>
  <property>
    <name>ha.health-monitor.rpc-timeout.ms</name>
    <value>360000</value>
  </property>
  <property>
    <name>ha.zookeeper.quorum</name>
    <value>zk1,zk2,zk3</value>
  </property>
  <property>
    <name>ha.zookeeper.session-timeout.ms</name>
    <value>10000</value>
  </property>
  <property>
    <name>hadoop.tmp.dir</name>
    <value>/home/hadoop/cluster-data</value>
  </property>
</configuration>

hdfs-site.xml

注意:再加上zkfc后,NameNode的配置也需要更改。 主要是dfs.ha相关的3条记录要改成和zkfc一致。NN中把原有的dfs.ha.fencing.methods去掉即可。

<configuration>
  <property>
    <name>dfs.ha.automatic-failover.enabled</name>
    <value>true</value>
  </property>
  <property>
    <name>dfs.ha.fencing.ssh.private-key-files</name>
    <value>/home/hadoop/.ssh/id_rsa</value>
  </property>
  <property>
    <name>dfs.ha.namenodes.ns1</name>
    <value>nn11,nn12</value>
  </property>
  <property>
    <name>dfs.namenode.rpc-address.ns1.nn11</name>
    <value>master1:8020</value>
  </property>
  <property>
    <name>dfs.namenode.rpc-address.ns1.nn12</name>
    <value>master2:8020</value>
  </property>
  <property>
    <name>dfs.nameservices</name>
    <value>ns1</value>
  </property>
    <property>
    <name>dfs.ha.fencing.methods</name>
    <value>
      sshfence
      sshfence
      sshfence
      shell(/bin/true)
    </value>
  </property>
</configuration>
[hadoop@master1 ~]$ cp -r hadoop-3.2.0-125 hadoop-zkfc-3.2.0-125
[hadoop@master1 ~]$ ln -snf  hadoop-zkfc-3.2.0-125 zkfc-current
# 编辑zkfc-current/etc/hadoop/core-site.xml 和 hdfs-site.xml
  • 格式化zkfc(任选一台格式化)
[hadoop@master1 ~]$ ./zkfc-current/bin/hdfs zkfc -formatZK
2021-02-03 17:44:40,061 INFO ha.ActiveStandbyElector: Successfully created /hadoop-ha/ns1 in ZK.
SHUTDOWN_MSG: Shutting down DFSZKFailoverController at master1/10.96.83.87
  • 启动zkfc
[hadoop@master1 ~]$ ./zkfc-current/bin/hdfs --daemon start zkfc
# log: /home/hadoop/cluster-data/logs/hadoop-hadoop-zkfc-master1.log

[排错记录]
查看log发现:

2021-02-03 17:48:08,380 INFO org.apache.hadoop.ha.ActiveStandbyElector: Session connected.
2021-02-03 17:48:08,380 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down for session: 0x2150ffb4c7d0020
2021-02-03 17:48:08,383 INFO org.apache.hadoop.ha.ActiveStandbyElector: Checking for any old active which needs to be fenced...
2021-02-03 17:48:08,385 INFO org.apache.hadoop.ha.ActiveStandbyElector: Old node exists: 0a036e733112046e6e31311a076d61737465723120d43e28d33e
2021-02-03 17:48:08,385 INFO org.apache.hadoop.ha.ActiveStandbyElector: But old node has our own data, so don't need to fence it.
2021-02-03 17:48:08,385 INFO org.apache.hadoop.ha.ActiveStandbyElector: Writing znode /hadoop-ha/ns1/ActiveBreadCrumb to indicate that the local node is the m
ost recent active...
2021-02-03 17:48:08,391 INFO org.apache.hadoop.ha.ZKFailoverController: Trying to make NameNode at master1/10.96.83.87:8020 active...
2021-02-03 17:48:08,392 ERROR org.apache.hadoop.ha.ZKFailoverController: Couldn't make NameNode at master1/10.96.83.87:8020 active
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Request from ZK failover controller at 10.96.83.87 denied since auto
matic HA is not enabled
        at org.apache.hadoop.hdfs.server.namenode.NameNode.checkHaStateChange(NameNode.java:2045)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:1851)
//...
2021-02-03 17:56:52,009 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master1/10.96.83.87:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)
2021-02-03 17:56:52,010 WARN org.apache.hadoop.ha.HealthMonitor: Transport-level exception trying to monitor health of NameNode at master1/10.96.83.87:8020
java.net.ConnectException: Call From master1/10.96.83.87 to master1:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
        at sun.reflect.GeneratedConstructorAccessor9.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:833)
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:757)
        at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1515)
        at org.apache.hadoop.ipc.Client.call(Client.java:1457)
        at org.apache.hadoop.ipc.Client.call(Client.java:1367)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
        at com.sun.proxy.$Proxy9.getServiceStatus(Unknown Source)
        at org.apache.hadoop.ha.protocolPB.HAServiceProtocolClientSideTranslatorPB.getServiceStatus(HAServiceProtocolClientSideTranslatorPB.java:122)
        at org.apache.hadoop.ha.HealthMonitor.doHealthChecks(HealthMonitor.java:202)
        at org.apache.hadoop.ha.HealthMonitor.access$600(HealthMonitor.java:49)
        at org.apache.hadoop.ha.HealthMonitor$MonitorDaemon.run(HealthMonitor.java:296)
Caused by: java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
        at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:533)
        at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:690)
        at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:794)
        at org.apache.hadoop.ipc.Client$Connection.access$3700(Client.java:411)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:1572)
        at org.apache.hadoop.ipc.Client.call(Client.java:1403)

上述报错后,NN自动停止。原因是NN的hdfs-site.xml中没有配置dfs.ha.automatic-failover.enabled,配置后恢复正常。
两台机器启动后,自动会有机器为active。如下:


将master1 kill后,master2自动切换为active,如下:

至此,HDFS HA版集群搭建完毕!


3. Federation集群搭建

  1. 复制jn包hadoop-journalnode-3.2.0-125到master3、master4、zk2
  2. 复制namenode包hadoop-3.2.0-125到master3、master4
  3. 复制zkfc包hadoop-zkfc-3.2.0-125到master3、master4
  4. 设置对应的软链接并修改配置。
[hadoop@master1 ~]$ scp -r -q hadoop-journalnode-3.2.0-125 master3:~/
[hadoop@master1 ~]$ scp -r -q hadoop-3.2.0-125  master3:~/
[hadoop@master1 ~]$ scp -r -q hadoop-zkfc-3.2.0-125 master3:~/
[hadoop@master3 ~]$ ln -s hadoop-journalnode-3.2.0-125 journalnode-current
[hadoop@master3 ~]$ ln -s hadoop-3.2.0-125 hadoop-current
[hadoop@master3 ~]$ ln -s hadoop-zkfc-3.2.0-125 zkfc-current
  1. 启动jn。
$  ./journalnode-current/sbin/hadoop-daemon.sh start journalnode
  1. 格式化一台nn。注意:这时需要指定clusterId,因为federation的集群的clusterId是同一个。
$  [hadoop@master3 ~]$ ./hadoop-current/bin/hdfs namenode -format -clusterid CID-bbab029f-eb45-4021-8d1e-e6e00fa73f2d
  1. 启动上述格式化的nn,会看到启动会standby模式
$ ./hadoop-current/bin/hdfs --daemon start namenode
  1. 另外一台nn不能直接启动,需要同步格式化
$ hdfs namenode -bootstrapStandby
  1. 启动上述格式化的nn,会看到启动会standby模式。此时有两台standby启动的nn
  2. 在两台机器上格式化zkfc
 ./zkfc-current/bin/hdfs zkfc -formatZK
  1. 在两台机器上启动zkfc,启动后可以看到NN自动切换,一主一备
 ./zkfc-current/bin/hdfs --daemon start zkfc

至此,federation NameNode环境搭建完毕。不过此时看到的集群容量为空。还需要全量刷新DataNode节点来识别新的NS。

  1. 在DataNode配置中增加ns2的配置,并在每台dn上执行刷新。

至此,federation 环境搭建完毕。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值