Hadoop分布式集群的搭建（Apache 版本）下-CSDN博客

本文链接：https://blog.csdn.net/superme_yong/article/details/86524442

本文详细介绍了如何部署Apache Hadoop分布式集群，包括节点分布、common、HDFS、YARN和MapReduce模块的配置，并提供了测试步骤。在配置过程中，强调了NameNode、DataNode、ResourceManager和NodeManager的角色分配，以及如何通过SSH免密钥分发和启动HDFS和YARN进程。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

部署Hadoop：

在配置之前首先要确定我们的集群节点的分布：

节点分布：

hdfs的节点：主节点：NameNode；从节点：DataNode；
yarn的节点：主节点：ResourceManager；从节点：NodeManager ；

bigdata-01.superyong.com      NodeManager      DataNode      NameNode（active）
bigdata-02.superyong.com   NodeManager DataNode NameNode（standby）
bigdata-03.superyong.com   NodeManager DataNode ResourceManager

备注：高可用配置会在配置安装zookeeper时配置。

接下来我会按照hadoop的模块分布来配置，在这之前需要将hadoop的环境配置完成，在Hadoop分布式集群的搭建（Apache 版本）上中有写到。

common模块：

core-site.xml

<configuration>

    <!--指定 HDFS 的 NameNode 运行主机名和端口号-->
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://bigdata-01.superyong.com:8020</value>
    </property>
    
    <!--指定 HDFS 本地临时存储目录，默认linuxn系统的 /tmp 目录-->
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/opt/modules/hadoop-2.7.3/data/tmpData</value>
    </property>
    
</configuration>

HDFS模块：

hdfs-site.xml

<configuration>

   <!-- HDFS 会将文件分为多个块，每个块会默认保存三份副本，在这里就可以配置块的存储个数-->
    <property>
        <name>dfs.replication</name>
        <value>3</value>
    </property>

   <!--hdfs是主从架构，主节点在哪里运行就是在这里指定的，从节点在slaves文件中指定 -->
   <!-- 指定secondarynamenode在哪台机器上运行，一般和namenode在同一台机器上，协助namenode工作-->
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>bigdata-01.superyong.com:50090</value>
    </property>

</configuration>

slaves

说明：这个文件指定了 DataNode 从节点在哪台主机上运行，同时yarn也是主从架构，它的从节点也是在这里指定的； slaves 文件中一行代表一个主机名。配置所有从节点的地址。

bigdata-01.superyong.com
bigdata-02.superyong.com
bigdata-03.superyong.com

YARN模块：

扩展一下：
    分布式的资源管理和任务调度框架，在yarn上可以运行多种应用程序
         -》MapReduce
                  并行数据处理框架
         -》Spark
                  基于内存的分布式计算框架
         -》stom/flink
                  实时的流式计算框架

yarn-site.xml

    <!-- 告知 YARN ，MapReduce 程序将运行在它上面 -->
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>

    <!-- 指定YARN主节点ResourceManager运行在哪台主机上 -->
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>bigdata-03.superyong.com</value>
    </property>

MapReduce模块

并行计算框架

mapred-site.xml

在配置这个文件的时候发现没有这个文件，但是有他的模板文件：mapred-site.xml.template，需要将这个模板文件修改为mapred-site.xml文件名，然后进行配置：

    <!-- 指定MapReduce程序运行在yarn上，默认是在local本地上 -->
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>

配置到这里就基本结束了。现在开始分发。

====================================华丽的分割线==================================================

之前我们在bigdata-01.superyong.com主机上配置好了hadoop框架，现在将配置好的框架分发给所有的节点机器上：

使用命令分别向另外两台主机分发配置好的hadoop文件：

scp -r hadoop-2.7.3/ super-yong@bigdata-02.superyong.com:/opt/modules/

scp -r hadoop-2.7.3/ super-yong@bigdata-03.superyong.com:/opt/modules/

配置到这里就结束了，接下来测试搭建好的集群：

测试：

在启动测试之前，首先要将hdfs文件系统进行格式化操作：

bin/hdfs  namenode -format

成功的标志：

util.ExitUtil: Exiting with status 0

然后启动namenode，使用守护进程命令脚本启动：

[super-yong@bigdata-01 hadoop-2.7.3]$ sbin/hadoop-daemon.sh start namenode
starting namenode, logging to /opt/modules/hadoop-2.7.3/logs/hadoop-super-yong-namenode-bigdata-01.superyong.com.out
[super-yong@bigdata-01 hadoop-2.7.3]$ jps
3591 Jps
3516 NameNode
[super-yong@bigdata-01 hadoop-2.7.3]$

可以发现NameNode已经启动了；

接下来启动从节点：

bigdata-01.superyong.com：

[super-yong@bigdata-01 hadoop-2.7.3]$ sbin/hadoop-daemon.sh start datanode
starting datanode, logging to /opt/modules/hadoop-2.7.3/logs/hadoop-super-yong-datanode-bigdata-01.superyong.com.out
[super-yong@bigdata-01 hadoop-2.7.3]$ jps
3704 Jps
3625 DataNode
3516 NameNode
[super-yong@bigdata-01 hadoop-2.7.3]$

[super-yong@bigdata-01 hadoop-2.7.3]$ sbin/hadoop-daemon.sh start secondarynamenode
starting secondarynamenode, logging to /opt/modules/hadoop-2.7.3/logs/hadoop-super-yong-secondarynamenode-bigdata-01.superyong.com.out
[super-yong@bigdata-01 hadoop-2.7.3]$ jps
4769 NameNode
4991 Jps
4943 SecondaryNameNode
[super-yong@bigdata-01 hadoop-2.7.3]$

bigdata-02.superyong.com：


[super-yong@bigdata-02 hadoop-2.7.3]$ sbin/hadoop-daemon.sh start datanode
starting datanode, logging to /opt/modules/hadoop-2.7.3/logs/hadoop-super-yong-datanode-bigdata-02.superyong.com.out
[super-yong@bigdata-02 hadoop-2.7.3]$ jps
2821 DataNode
2894 Jps
[super-yong@bigdata-02 hadoop-2.7.3]$

bigdata-03.superyong.com：

[super-yong@bigdata-03 hadoop-2.7.3]$ sbin/hadoop-daemon.sh start datanode
starting datanode, logging to /opt/modules/hadoop-2.7.3/logs/hadoop-super-yong-datanode-bigdata-03.superyong.com.out
[super-yong@bigdata-03 hadoop-2.7.3]$ jps
2864 Jps
2820 DataNode
[super-yong@bigdata-03 hadoop-2.7.3]$

也可以通过Web UI查看，通过50070端口：（你的namenode在那台机器上，就在那台机器上启动namenode进程）

http://bigdata-01.superyong.com:50070

前面一个个启动是不是很麻烦，之前配过ssh免秘钥了所以也可以一步到位全部启动起来：

[super-yong@bigdata-01 hadoop-2.7.3]$ sbin/start-dfs.sh
Starting namenodes on [bigdata-01.superyong.com]
bigdata-01.superyong.com: starting namenode, logging to /opt/modules/hadoop-2.7.3/logs/hadoop-super-yong-namenode-bigdata-01.superyong.com.out
bigdata-01.superyong.com: starting datanode, logging to /opt/modules/hadoop-2.7.3/logs/hadoop-super-yong-datanode-bigdata-01.superyong.com.out
bigdata-02.superyong.com: starting datanode, logging to /opt/modules/hadoop-2.7.3/logs/hadoop-super-yong-datanode-bigdata-02.superyong.com.out
bigdata-03.superyong.com: starting datanode, logging to /opt/modules/hadoop-2.7.3/logs/hadoop-super-yong-datanode-bigdata-03.superyong.com.out
Starting secondary namenodes [0.0.0.0]
The authenticity of host '0.0.0.0 (0.0.0.0)' can't be established.
RSA key fingerprint is a6:8d:2e:4d:dd:ef:20:d3:d7:87:db:7a:33:5a:04:e3.
Are you sure you want to continue connecting (yes/no)? yes
0.0.0.0: Warning: Permanently added '0.0.0.0' (RSA) to the list of known hosts.
0.0.0.0: starting secondarynamenode, logging to /opt/modules/hadoop-2.7.3/logs/hadoop-super-yong-secondarynamenode-bigdata-01.superyong.com.out
[super-yong@bigdata-01 hadoop-2.7.3]$ jps
3926 NameNode
4234 SecondaryNameNode
4364 Jps
4063 DataNode
[super-yong@bigdata-01 hadoop-2.7.3]$

这样所有的 HDFS 进程就都启动起来了；

接下来启动 YARN 进程：（你的resourcemanager在那台机器上，就在那台机器上启动resourcemanager进程）

[super-yong@bigdata-03 hadoop-2.7.3]$ sbin/start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /opt/modules/hadoop-2.7.3/logs/yarn-super-yong-resourcemanager-bigdata-03.superyong.com.out
bigdata-03.superyong.com: starting nodemanager, logging to /opt/modules/hadoop-2.7.3/logs/yarn-super-yong-nodemanager-bigdata-03.superyong.com.out
bigdata-02.superyong.com: starting nodemanager, logging to /opt/modules/hadoop-2.7.3/logs/yarn-super-yong-nodemanager-bigdata-02.superyong.com.out
bigdata-01.superyong.com: starting nodemanager, logging to /opt/modules/hadoop-2.7.3/logs/yarn-super-yong-nodemanager-bigdata-01.superyong.com.out
[super-yong@bigdata-03 hadoop-2.7.3]$

测试完成结束！