部署Hadoop:
在配置之前首先要确定我们的集群节点的分布:
节点分布:
hdfs的节点:主节点:NameNode; 从节点:DataNode;
yarn的节点:主节点:ResourceManager;从节点:NodeManager ;
bigdata-01.superyong.com NodeManager DataNode NameNode(active)
bigdata-02.superyong.com NodeManager DataNode NameNode(standby)
bigdata-03.superyong.com NodeManager DataNode ResourceManager
备注:高可用配置会在配置安装zookeeper时配置。
接下来我会按照hadoop的模块分布来配置,在这之前需要将hadoop的环境配置完成,在Hadoop分布式集群的搭建(Apache 版本)上中有写到。
common模块:
core-site.xml
<configuration>
<!--指定 HDFS 的 NameNode 运行主机名和端口号-->
<property>
<name>fs.defaultFS</name>
<value>hdfs://bigdata-01.superyong.com:8020</value>
</property>
<!--指定 HDFS 本地临时存储目录,默认linuxn系统的 /tmp 目录-->
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/modules/hadoop-2.7.3/data/tmpData</value>
</property>
</configuration>
HDFS模块:
hdfs-site.xml
<configuration>
<!-- HDFS 会将文件分为多个块,每个块会默认保存三份副本,在这里就可以配置块的存储个数-->
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<!--hdfs是主从架构,主节点在哪里运行就是在这里指定的,从节点在slaves文件中指定 -->
<!-- 指定secondarynamenode在哪台机器上运行,一般和namenode在同一台机器上,协助namenode工作-->
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>bigdata-01.superyong.com:50090</value>
</property>
</configuration>
slaves
说明:这个文件指定了 DataNode 从节点在哪台主机上运行,同时yarn也是主从架构,它的从节点也是在这里指定的; slaves 文件中一行代表一个主机名。配置所有从节点的地址。
bigdata-01.superyong.com
bigdata-02.superyong.com
bigdata-03.superyong.com
YARN模块:
扩展一下:
分布式的资源管理和任务调度框架,在yarn上可以运行多种应用程序
-》MapReduce
并行数据处理框架
-》Spark
基于内存的分布式计算框架
-》stom/flink
实时的流式计算框架
yarn-site.xml
<!-- 告知 YARN ,MapReduce 程序将运行在它上面 -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!-- 指定YARN主节点ResourceManager运行在哪台主机上 -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>bigdata-03.superyong.com</value>
</property>
MapReduce模块
并行计算框架
mapred-site.xml
在配置这个文件的时候发现没有这个文件,但是有他的模板文件:mapred-site.xml.template,需要将这个模板文件修改为mapred-site.xml文件名,然后进行配置:
<!-- 指定MapReduce程序运行在yarn上,默认是在local本地上 -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
配置到这里就基本结束了。现在开始分发。
====================================华丽的分割线==================================================
之前我们在bigdata-01.superyong.com主机上配置好了hadoop框架,现在将配置好的框架分发给所有的节点机器上:
使用命令分别向另外两台主机分发配置好的hadoop文件:
scp -r hadoop-2.7.3/ super-yong@bigdata-02.superyong.com:/opt/modules/
scp -r hadoop-2.7.3/ super-yong@bigdata-03.superyong.com:/opt/modules/
配置到这里就结束了,接下来测试搭建好的集群:
测试:
在启动测试之前,首先要将hdfs文件系统进行格式化操作:
bin/hdfs namenode -format
成功的标志:
util.ExitUtil: Exiting with status 0
然后启动namenode,使用守护进程命令脚本启动:
[super-yong@bigdata-01 hadoop-2.7.3]$ sbin/hadoop-daemon.sh start namenode
starting namenode, logging to /opt/modules/hadoop-2.7.3/logs/hadoop-super-yong-namenode-bigdata-01.superyong.com.out
[super-yong@bigdata-01 hadoop-2.7.3]$ jps
3591 Jps
3516 NameNode
[super-yong@bigdata-01 hadoop-2.7.3]$
可以发现NameNode已经启动了;
接下来启动从节点:
bigdata-01.superyong.com:
[super-yong@bigdata-01 hadoop-2.7.3]$ sbin/hadoop-daemon.sh start datanode
starting datanode, logging to /opt/modules/hadoop-2.7.3/logs/hadoop-super-yong-datanode-bigdata-01.superyong.com.out
[super-yong@bigdata-01 hadoop-2.7.3]$ jps
3704 Jps
3625 DataNode
3516 NameNode
[super-yong@bigdata-01 hadoop-2.7.3]$
[super-yong@bigdata-01 hadoop-2.7.3]$ sbin/hadoop-daemon.sh start secondarynamenode
starting secondarynamenode, logging to /opt/modules/hadoop-2.7.3/logs/hadoop-super-yong-secondarynamenode-bigdata-01.superyong.com.out
[super-yong@bigdata-01 hadoop-2.7.3]$ jps
4769 NameNode
4991 Jps
4943 SecondaryNameNode
[super-yong@bigdata-01 hadoop-2.7.3]$
bigdata-02.superyong.com:
[super-yong@bigdata-02 hadoop-2.7.3]$ sbin/hadoop-daemon.sh start datanode
starting datanode, logging to /opt/modules/hadoop-2.7.3/logs/hadoop-super-yong-datanode-bigdata-02.superyong.com.out
[super-yong@bigdata-02 hadoop-2.7.3]$ jps
2821 DataNode
2894 Jps
[super-yong@bigdata-02 hadoop-2.7.3]$
bigdata-03.superyong.com:
[super-yong@bigdata-03 hadoop-2.7.3]$ sbin/hadoop-daemon.sh start datanode
starting datanode, logging to /opt/modules/hadoop-2.7.3/logs/hadoop-super-yong-datanode-bigdata-03.superyong.com.out
[super-yong@bigdata-03 hadoop-2.7.3]$ jps
2864 Jps
2820 DataNode
[super-yong@bigdata-03 hadoop-2.7.3]$
也可以通过Web UI查看,通过50070端口:(你的namenode在那台机器上,就在那台机器上启动namenode进程)
http://bigdata-01.superyong.com:50070
前面一个个启动是不是很麻烦,之前配过ssh免秘钥了所以也可以一步到位全部启动起来:
[super-yong@bigdata-01 hadoop-2.7.3]$ sbin/start-dfs.sh
Starting namenodes on [bigdata-01.superyong.com]
bigdata-01.superyong.com: starting namenode, logging to /opt/modules/hadoop-2.7.3/logs/hadoop-super-yong-namenode-bigdata-01.superyong.com.out
bigdata-01.superyong.com: starting datanode, logging to /opt/modules/hadoop-2.7.3/logs/hadoop-super-yong-datanode-bigdata-01.superyong.com.out
bigdata-02.superyong.com: starting datanode, logging to /opt/modules/hadoop-2.7.3/logs/hadoop-super-yong-datanode-bigdata-02.superyong.com.out
bigdata-03.superyong.com: starting datanode, logging to /opt/modules/hadoop-2.7.3/logs/hadoop-super-yong-datanode-bigdata-03.superyong.com.out
Starting secondary namenodes [0.0.0.0]
The authenticity of host '0.0.0.0 (0.0.0.0)' can't be established.
RSA key fingerprint is a6:8d:2e:4d:dd:ef:20:d3:d7:87:db:7a:33:5a:04:e3.
Are you sure you want to continue connecting (yes/no)? yes
0.0.0.0: Warning: Permanently added '0.0.0.0' (RSA) to the list of known hosts.
0.0.0.0: starting secondarynamenode, logging to /opt/modules/hadoop-2.7.3/logs/hadoop-super-yong-secondarynamenode-bigdata-01.superyong.com.out
[super-yong@bigdata-01 hadoop-2.7.3]$ jps
3926 NameNode
4234 SecondaryNameNode
4364 Jps
4063 DataNode
[super-yong@bigdata-01 hadoop-2.7.3]$
这样所有的 HDFS 进程就都启动起来了;
接下来启动 YARN 进程:(你的resourcemanager在那台机器上,就在那台机器上启动resourcemanager进程)
[super-yong@bigdata-03 hadoop-2.7.3]$ sbin/start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /opt/modules/hadoop-2.7.3/logs/yarn-super-yong-resourcemanager-bigdata-03.superyong.com.out
bigdata-03.superyong.com: starting nodemanager, logging to /opt/modules/hadoop-2.7.3/logs/yarn-super-yong-nodemanager-bigdata-03.superyong.com.out
bigdata-02.superyong.com: starting nodemanager, logging to /opt/modules/hadoop-2.7.3/logs/yarn-super-yong-nodemanager-bigdata-02.superyong.com.out
bigdata-01.superyong.com: starting nodemanager, logging to /opt/modules/hadoop-2.7.3/logs/yarn-super-yong-nodemanager-bigdata-01.superyong.com.out
[super-yong@bigdata-03 hadoop-2.7.3]$
测试完成结束!