上篇已经介绍完CentOS7的安装。现在来说说hadoop的安装.
上 hadoop.apache.org 下载2.6的版本。
安装jdk, ssh, rsync, wget
>yum -y install java-1.8.0-openjdk-devel
>yum -y install openssh
>yum -y install rsync
>yum -y install wget
>wget http://apache.fayea.com/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz
>tar xzvf hadoop-2.6.0.tar.gz
>mv hadoop-2.6.0 /usr/local/
设置ssh免密码登录
Now check that you can ssh to the localhost without a passphrase:
$ ssh localhost
If you cannot ssh to localhost without a passphrase, execute the following commands:
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
CentOS 7.0默认使用的是firewall作为防火墙,这里改为iptables防火墙。
firewall:
systemctl start firewalld.service#启动firewall
systemctl stop firewalld.service#停止firewall
systemctl disable firewalld.service#禁止firewall开机启动
设置DNS
>vi /etc/hosts
192.168.1.147 Master.Hadoop
192.168.1.148 Slave1.Hadoop
192.168.1.149 Slave2.Hadoop
192.168.1.150 Slave3.Hadoop
>mv /usr/local/hadoop-2.6.0 /usr/local/hadoop
>chown -R hadoop:hadoop hadoop
>cd /usr/local/hadoop
>vi etc/hadoop/hadoop-env.sh
# set to the root of your Java installation
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.45-30.b13.el7_1.x86_64/jre
# Assuming your installation directory is /usr/local/hadoop
export HADOOP_PREFIX=/usr/local/hadoop
>bin/hadoop
test OK
配置cluster
etc/hadoop/core-site.xml:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/hadoop/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://Master.Hadoop:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>4096</value>
</property>
</configuration>
etc/hadoop/hdfs-site.xml:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///usr/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///usr/hadoop/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.nameservices</name>
<value>hadoop-cluster1</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>Master.Hadoop:50090</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
If set dfs.namenode.datanode.registration.ip-hostname-check=false, the name node will not do the check, which could be useful if your cluster is inside AWS VPC and you do not have proper reverse DNS.
Another usage of this feature is to decommission a data node by listing it in the hosts.deny file
etc/hadoop/mapred-site.xml:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
<final>true</final>
</property>
<property>
<name>mapreduce.jobtracker.http.address</name>
<value>Master.Hadoop:50030</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>Master.Hadoop:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>Master.Hadoop:19888</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>http://Master.Hadoop:9001</value>
</property>
</configuration>
etc/hadoop/yarn-site.xml:
<?xml version="1.0"?>
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>Master.Hadoop</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>Master.Hadoop:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>Master.Hadoop:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>Master.Hadoop:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>Master.Hadoop:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>Master.Hadoop:8088</value>
</property>
</configuration>
>vi /usr/local/hadoop/etc/hadoop/slaves
Slave1.Hadoop
Slave2.Hadoop
>shutdown now
把这个VM copy3份,
分别是 Master.hadoop Slave1.hadoop Slave2.hadoop
通过nmtui 进行设置修改IP
然后进入Master.hadoop
安装hadoop Namenode
>/usr/local/hadoop/bin/hdfs namenode -format
>sbin/hadoop-daemon.sh --config etc/hadoop --script hdfs start namenode
>sbin/yarn-daemon.sh --config etc/hadoop start resourcemanager
>sbin/yarn-daemon.sh start proxyserver --config etc/hadoop
>sbin/mr-jobhistory-daemon.sh start historyserver --config etc/hadoop/
http://master.hadoop:50070
http://master.hadoop:8088
安装datanode
登陆datanode
>sbin/hadoop-daemon.sh --config etc/hadoop --script hdfs start datanode
>sbin/yarn-daemon.sh --config etc/hadoop start nodemanager