1.安装包下载
下载对应的版本即可
http://archive.cloudera.com/cdh5/cdh/5/
2.基础环境配置
2.1闭防火墙
[root@host151 bigdata]# systemctl stop firewalld
[root@host151 bigdata]# systemctl disable firewalld
2.2关闭SELinux
[root@host151 bigdata]# setenforce 0
[root@host151 bigdata]# sed -i 's/enforcing/disabled/' /etc/sysconfig/selinux
2.3配置ssh免密设置
生成密钥,用hadoop用户安装,用root用户提交不yarn任务
[hadoop@host151 bigdata]# ssh-keygen -t rsa
分发密钥
[hadoop@host151 bigdata]# ssh-copy-id root@host151
注意:集群所有机器的ssh,包括本机的ssh也要配置免密
2.4配置jdk环境变量
[hadoop@host151 bigdata]# vim /home/hadoop/.bash_profile
添加如下:
export JAVA_HOME=/opt/jdk1.8.0_131
export PATH=$PATH:$JAVA_HOME/bin
生效文件:
[hadoop@host151 bigdata]# source /home/hadoop/.bash_profile
注意:可以scp分发到各个机器,不要忘source生效配置jdk配置文件
2.安装hadoop
2.1解压hadoop,并重命名为hadoop
[hadoop@host151 bigdata]# tar -zxvf hadoop-2.6.0-cdh5.8.4.tar.gz
[hadoop@host151 bigdata]# mv hadoop-2.6.0-cdh5.8.4 hadoop
2.2修改配置文件
[hadoop@host151 hadoop]# cd hadoop/etc/hadoop
修改hadoop-env.sh
[hadoop@host151 hadoop]# vim hadoop-env.sh
修改jdk路径:
export JAVA_HOME=/opt/jdk1.8.0_131
修改yarn-env.sh
[hadoop@host151 hadoop]# vim yarn-env.sh
修改jdk路径:
export JAVA_HOME=/opt/jdk1.8.0_131
修改mapred-env.sh
[hadoop@host151 hadoop]# vim mapred-env.sh
修改jdk路径:
export JAVA_HOME=/opt/jdk1.8.0_131
修改core-site.xml,在<configuration>中添加如下配置:
<property>
<name>fs.defaultFS</name>
<value>hdfs://host151</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/home/hadoop/bigdata/datas/hadoop/tmp</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131702</value>
</property>
修改hdfs-site.xml,在<configuration>中添加如下配置,并创建对应的数据文件目录即可
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hadoop/bigdata/datas/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hadoop/bigdata/datas/hadoop/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>host151:50090</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.balance.bandwidthPerSec</name>
<value>10485760</value>
</property>
修改mapred-site.xml,将mapred-site.xml.template复制一份为mapred-site.xml,编辑添加:
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
修改yarn-site.xml,<configuration>添加配置:
<property>
<name>yarn.resourcemanager.hostname</name>
<value>host151</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>host151:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>host151:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>host151:8031</value>
</property>
修改slaves文件,添加从节点,结果如下:
[root@host150 hadoop]# cat slaves
host151
host152
编辑并添加hadoop到环境变量中
[hadoop@host151 hadoop]$ vim /home/hadoop/.bash_profile
export HADOOP_HOME=/home/hadoop/bigdata/hadoop
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib:$HADOOP_COMMON_LIB_NATIVE_DIR"
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
生效文件
[hadoop@host151 hadoop]$ source /home/hadoop/.bash_profile
分发到各个节点
[hadoop@host151 bigdata]# scp -r hadoop hadoop@host152:/home/hadoop/bigdata
[hadoop@host151 bigdata]# scp -r hadoop hadoop@host153:/home/hadoop/bigdata
3.初始化和启动
3.1初始化hdfs
[hadoop@host151 bigdata]# hdfs namenode -format
3.2启动hdfs和yarn
[hadoop@host151 sbin]# start-dfs.sh
[hadoop@host151 sbin]# start-yarn.sh
查看进程:
[hadoop@host151 sbin]# jps
6178 NameNode
8844 ResourceManager
12477 Jps
6367 SecondaryNameNode
也可以一次性启动hdfs和yarn:
[hadoop@host151 sbin]# start-all.sh
MapReducer PI运算
[hadoop@host151 mapreduce]# hadoop jar /home/hadoop/bigdata/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.8.4.jar pi 5 10
结果:Estimated value of Pi is 3.28000000000000000000
hadoop和yarn的访问地址
hadoop集群:http://192.168.206.150:50070/
hadoop调度:http://192.168.206.150:8088/
4.测试
4.1常用hdfs命令
创建目录
[hadoop@host151 sbin]# hadoop fs -mkdir /input
[hadoop@host151 sbin]# hadoop fs -mkdir /output
列出文件
[hadoop@host151 sbin]# hadoop fs -ls /
Found 2 items
drwxr-xr-x - root supergroup 0 2020-01-24 16:02 /input
drwxr-xr-x - root supergroup 0 2020-01-24 16:02 /output
上传文件
[hadoop@host151 bigdata]# hadoop fs -put hello.txt /input
查看文件内容
[hadoop@host151 bigdata]# hadoop fs -cat /input/hello.txt
hello
hello
my name is job
windows
spark
4.2对hello.txt文件进行wordcount测试
执行wordcount
[hadoop@host151 mapreduce]# hadoop jar /home/hadoop/bigdata/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.8.4.jar wordcount /input/hello.txt /output/wordcounttest
对wordcount结果查看排序输出
[hadoop@host151 mapreduce]# hadoop fs -ls /output/wordcounttest
Found 2 items
-rw-r--r-- 2 root supergroup 0 2020-01-24 16:13 /output/wordcounttest/_SUCCESS
-rw-r--r-- 2 root supergroup 49 2020-01-24 16:13 /output/wordcounttest/part-r-00000
[root@host150 mapreduce]# hadoop fs -cat /output/wordcounttest/part-r-00000|sort -k2 -nr|head
hello 1
windows 1
spark 1
name 1
my 1
job 1
is 1