hadoop cluster tips

最新推荐文章于 2022-03-24 11:44:44 发布

冽夫

最新推荐文章于 2022-03-24 11:44:44 发布

阅读量1.4k

点赞数

分类专栏： Hadoop

本文链接：https://blog.csdn.net/helloworld0906/article/details/89191389

版权

Hadoop 专栏收录该内容

17 篇文章

订阅专栏

用user用户运行课本上的单机版测试程序报错

org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://Master:9000/user/root/input

问题分析：Input path does not exist

问题解决：分布式环境中创建input目录似乎要用root用户

　　hdfs dfs -mkdir -p /user/hadoop

　　hdfs dfs -mkdir input

　　hdfs dfs -put ./*.xml input

hadoop jar ...........

hdfs://localhost:9000/user/local/urls 在Hadoop分布式文件系统hdfs上，可通过hdfs dfs -ls /user/local/urls,前提是这个文件存在

hadoop fs -ls /user/hadoop/output #查看指定目录下的文件和文件夹。/user/hadoop/output是HDFS上的目录，不是本地目录

看hdfs的目录和内容的方法

一

[user@node-17h hadoop]$ bin/hadoop fs -ls /user/root/output

Found 2 items

-rw-r--r-- 1 root supergroup 0 2019-03-22 03:16 /user/root/output/_SUCCESS

-rw-r--r-- 1 root supergroup 77 2019-03-22 03:16 /user/root/output/part-r-00000

[user@node-17h hadoop]$ bin/hadoop fs -cat /user/root/output/*

1 dfsadmin

1 dfs.replication

1 dfs.namenode.name.dir

1 dfs.datanode.data.dir

二

Hdfs web界面

访问http://node-17h:50070/ (node-17h是主机名)

点击browse the file system查询文件可通过load下载并查看文件

一般以 / 开始go

三

使用插件——Hadoop-Eclipse-Plugin

此方法需要借助Eclipse，插件的安装及使用请参考博文使用Eclipse编译运行MapReduce程序_Hadoop2.6.0_Ubuntu/CentOS。

借助Eclipse只能进行简单的HDFS操作，如查看，删除，上传，下载。以下是操作界面：

关于hdfs文件的一些操作

看hdfs的内容 hadoop命令

[user@node-17h hadoop]$ bin/hadoop fs -ls /user/

Found 2 items

drwxr-xr-x - root supergroup 0 2019-03-22 03:11 /user/hadoop

drwxr-xr-x - root supergroup 0 2019-03-22 03:22 /user/root

[user@node-17h hadoop]$ sudo bin/hdfs dfs -ls /

Found 1 items

-rw-r--r-- 1 root supergroup 1148 2019-03-23 05:09 /hdfs-site.xml

在hdfs里创建文件 hdfs命令

[user@node-17h hadoop]$ bin/hdfs dfs -mkdir /usertest

mkdir: Permission denied: user=user, access=WRITE, inode="/":root:supergroup:drwxr-xr-x

[user@node-17h hadoop]$ sudo bin/hdfs dfs -mkdir /usertest

[user@node-17h hadoop]$ bin/hadoop fs -ls /

Found 2 items

drwxr-xr-x - root supergroup 0 2019-03-22 03:11 /user

drwxr-xr-x - root supergroup 0 2019-03-22 06:37 /usertest

格式化hdfs文件系统时，出错：

Format aborted in /home/hadoop/dfs/name
/home/hadoop/dfs/name 是 dfs.name.dir 配置的本地路径，把这个目录删除，
再格式化就可以了。

不能多次format（格式化）

如果不得已要多次格式化就把hdfs-site.xml文件的配置项dfs.namenode.name.dir、dfs.datanode.data.dir所指定的位置生成相应的name和data的文件夹删掉再重新format

装东西要装在 /home/user

别什么都往/root里装…

Namenode启动失败

Cannot create directory /home/user/bigdata/hadoop/tmp/dfs/name/current

[user@node-17h hadoop]$ sudo chmod -R a+w /home/user/bigdata/hadoop/tmp/dfs/

hadoop2.5.2学习及实践笔记（五）—— HDFS shell命令行常见操作

附：HDFS shell guide文档地址

http://hadoop.apache.org/docs/r2.5.2/hadoop-project-dist/hadoop-common/FileSystemShell.html

启动HDFS后，输入hadoop fs命令，即可显示HDFS常用命令的用法

[hadoop@localhost hadoop-2.5.2]$ hadoop fs

Usage: hadoop fs [generic options]

[-appendToFile <localsrc> ... <dst>]

[-cat [-ignoreCrc] <src> ...]

[-checksum <src> ...]

[-chgrp [-R] GROUP PATH...]

[-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]

[-chown [-R] [OWNER][:[GROUP]] PATH...]

[-copyFromLocal [-f] [-p] <localsrc> ... <dst>]

[-copyToLocal [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]

[-count [-q] <path> ...]

[-cp [-f] [-p | -p[topax]] <src> ... <dst>]

[-createSnapshot <snapshotDir> [<snapshotName>]]

[-deleteSnapshot <snapshotDir> <snapshotName>]

[-df [-h] [<path> ...]]

[-du [-s] [-h] <path> ...]

[-expunge]

[-get [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]

[-getfacl [-R] <path>]

[-getfattr [-R] {-n name | -d} [-e en] <path>]

[-getmerge [-nl] <src> <localdst>]

[-help [cmd ...]]

[-ls [-d] [-h] [-R] [<path> ...]]

[-mkdir [-p] <path> ...]

[-moveFromLocal <localsrc> ... <dst>]

[-moveToLocal <src> <localdst>]

[-mv <src> ... <dst>]

[-put [-f] [-p] <localsrc> ... <dst>]

[-renameSnapshot <snapshotDir> <oldName> <newName>]

[-rm [-f] [-r|-R] [-skipTrash] <src> ...]

[-rmdir [--ignore-fail-on-non-empty] <dir> ...]

[-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]]

[-setfattr {-n name [-v value] | -x name} <path>]

[-setrep [-R] [-w] <rep> <path> ...]

[-stat [format] <path> ...]

[-tail [-f] <file>]

[-test -[defsz] <path>]

[-text [-ignoreCrc] <src> ...]

[-touchz <path> ...]

[-usage [cmd ...]]

Generic options supported are

-conf <configuration file> specify an application configuration file

-D <property=value> use value for given property

-fs <local|namenode:port> specify a namenode

-jt <local|jobtracker:port> specify a job tracker

-files <comma separated list of files> specify comma separated files to be copied to the map reduce cluster

-libjars <comma separated list of jars> specify comma separated jar files to include in the classpath.

-archives <comma separated list of archives> specify comma separated archives to be unarchived on the compute machines.

The general command line syntax is

bin/hadoop command [genericOptions] [commandOptions]

>帮助相关命令

usage

　　查看命令的用法，例查看ls的用法

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -usage ls

Usage: hadoop fs [generic options] -ls [-d] [-h] [-R] [<path> ...]

help

　　查看命令的详细帮助，例查看ls命令的帮助：

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -help ls

-ls [-d] [-h] [-R] [<path> ...] :

List the contents that match the specified file pattern. If path is not

specified, the contents of /user/<currentUser> will be listed. Directory entries

are of the form:

permissions - userId groupId sizeOfDirectory(in bytes)

modificationDate(yyyy-MM-dd HH:mm) directoryName

and file entries are of the form:

permissions numberOfReplicas userId groupId sizeOfFile(in bytes)

modificationDate(yyyy-MM-dd HH:mm) fileName

-d Directories are listed as plain files.

-h Formats the sizes of files in a human-readable fashion rather than a number

of bytes.

-R Recursively list the contents of directories.

>查看相关命令

　　查看文件或目录，下例中：hdfs://localhost:9000是fs.defaultFS配置的值，hdfs://localhost:9000/即表示HDFS文件系统中根目录，如果使用的是HDFS文件系统，可以简写为/。

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -ls hdfs://localhost:9000/

Found 3 items

drwxr-xr-x - hadoop supergroup 0 2015-03-27 19:19 hdfs://localhost:9000/input

-rw-r--r-- 1 hadoop supergroup 14 2015-03-31 07:17 hdfs://localhost:9000/input1.txt

drwxr-xr-x - hadoop supergroup 0 2015-03-27 19:16 hdfs://localhost:9000/output

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -ls /

Found 3 items

drwxr-xr-x - hadoop supergroup 0 2015-03-27 19:19 /input

-rw-r--r-- 1 hadoop supergroup 14 2015-03-31 07:17 /input1.txt

drwxr-xr-x - hadoop supergroup 0 2015-03-27 19:16 /output

　　选项-R：连同子目录的文件一起列出，例：

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -ls -R /

drwxr-xr-x - hadoop supergroup 0 2015-03-27 19:19 /input

-rw-r--r-- 1 hadoop supergroup 14 2015-03-27 19:19 /input/input1.txt --子目录下的文件也被列出

-rw-r--r-- 1 hadoop supergroup 32 2015-03-27 19:19 /input/input2.txt

-rw-r--r-- 1 hadoop supergroup 14 2015-03-31 07:17 /input1.txt

drwxr-xr-x - hadoop supergroup 0 2015-03-27 19:16 /output

　　显示文件内容

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -cat /input1.txt

hello hadoop!

text

　　将给定的文件以文本的格式输出，允许的格式zip、TextRecordInputStream、Avro。当文件为文本文件时，等同于cat。例：

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -text /input1.txt

hello hadoop!

tail

　　显示文件最后1KB的内容

　　选项-f：当文件内容增加时显示追加的内容

checksum

　　显示文件的校验和信息。因为需要和存储文件每个块的datanode互相通信，因此对大量的文件使用此命令效率可能会低

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -checksum /input.zip

/input.zip MD5-of-0MD5-of-0CRC32 00000000000000000000000070bc8f4b72a86921468bf8e8441dce51

>文件及目录相关命令

touchz

　　创建一个空文件，如果存在指定名称的非空文件，则返回错误

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -ls /

Found 3 items

drwxr-xr-x - hadoop supergroup 0 2015-03-27 19:19 /input

-rw-r--r-- 1 hadoop supergroup 184 2015-03-31 08:14 /input1.zip

drwxr-xr-x - hadoop supergroup 0 2015-04-02 08:34 /output

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -touchz /input1.zip

touchz: `/input1.zip': Not a zero-length file --非空时给出错误提示

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -touchz /input.zip

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -ls /

Found 4 items

drwxr-xr-x - hadoop supergroup 0 2015-03-27 19:19 /input

-rw-r--r-- 1 hadoop supergroup 0 2015-04-02 08:43 /input.zip --创建成功

-rw-r--r-- 1 hadoop supergroup 184 2015-03-31 08:14 /input1.zip

drwxr-xr-x - hadoop supergroup 0 2015-04-02 08:34 /output

appendToFile

　　向现有文件中追加内容，例：

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -text /input1.txt

hello hadoop!

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -appendToFile ~/Desktop/input1.txt /input1.txt

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -text /input1.txt
hello hadoop!

hello hadoop! --查看追加后的文件内容

　　从本地文件系统上传文件到HDFS

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -put ~/Desktop/input1.txt /

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -text /input1.txt --查看上传后的文件内容

hello hadoop!

　　选项-f：如果文件已经存在，覆盖已有文件

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -put ~/Desktop/input1.txt /

put: `/input1.txt': File exists --文件已存在时给出错误提示

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -put -f ~/Desktop/input1.txt /

[hadoop@localhost hadoop-2.5.2]$ --使用-f选项后没有再报错

　　选项-p：保留原文件的访问、修改时间，用户和组，权限属性

[hadoop@localhost hadoop-2.5.2]$ ll ~/input1.txt

-rw-r--r--. 1 hadoop hadoops 28 Mar 31 08:59 /home/hadoop/input1.txt --本地文件属性

[hadoop@localhost hadoop-2.5.2]$ chmod 777 ~/input1.txt --修改权限为rwxrwxrwx

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -put ~/input1.txt /

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -ls /input1.txt

-rw-r--r-- 1 hadoop supergroup 28 2015-04-02 05:19 /input1.txt --不使用-p选项，上传后文件属性

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -put -f -p ~/input1.txt /

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -ls /input1.txt

-rwxrwxrwx 1 hadoop hadoops 28 2015-03-31 08:59 /input1.txt --使用-p选项，上传后文件属性

　　从HDFS上下载文件到本地，与put不同，没有覆盖本地已有文件的选项

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -get /input1.txt ~

[hadoop@localhost hadoop-2.5.2]$ cat ~/input1.txt --查看本地下载的文件

hello hadoop!

hellp hadoop!

getmerge

　　将指定的HDFS中原目录下的文件合并成一个文件并下载到本地，源文件保留

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -text /input/input1.txt

hello hadoop! --input1.txt内容

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -text /input/input2.txt

welcome to the world of hadoop! --input2.txt内容

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -getmerge /input/ ~/merge.txt

[hadoop@localhost hadoop-2.5.2]$ cat ~/merge.txt

hello hadoop! --合并后本地文件的内容

welcome to the world of hadoop!

　　选项-nl：在每个文件的最后增加一个新行

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -getmerge -nl /input/ ~/merge.txt

[hadoop@localhost hadoop-2.5.2]$ cat ~/merge.txt

hello hadoop!

--input1.txt增加的新行

welcome to the world of hadoop!

--input2.txt增加的新行

[hadoop@localhost hadoop-2.5.2]$

copyFromLocal

　　从本地文件系统上传文件到HDFS，与put命令相同

copyToLocal

　　从HDFS下载文件到本地文件系统，与get命令相同

moveFromLocal

　　与put命令相同，只是上传成功后本地文件会被删除

moveToLocal

　　该命令还未实现

　　同linux的mv命令，移动或重命名文件

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -ls /

Found 5 items

drwxr-xr-x - hadoop supergroup 0 2015-03-27 19:19 /input

-rw-r--r-- 1 hadoop supergroup 184 2015-03-31 08:14 /input.zip

-rwxrwxrwx 1 hadoop hadoops 28 2015-03-31 08:59 /input1.txt

drwxr-xr-x - hadoop supergroup 0 2015-03-27 19:16 /output

drwxr-xr-x - hadoop supergroup 0 2015-04-02 07:10 /text

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -mv /input.zip /input1.zip

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -ls /

Found 5 items

drwxr-xr-x - hadoop supergroup 0 2015-03-27 19:19 /input

-rwxrwxrwx 1 hadoop hadoops 28 2015-03-31 08:59 /input1.txt

-rw-r--r-- 1 hadoop supergroup 184 2015-03-31 08:14 /input1.zip --重命名

drwxr-xr-x - hadoop supergroup 0 2015-03-27 19:16 /output

drwxr-xr-x - hadoop supergroup 0 2015-04-02 07:10 /text

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -mv /input1.zip /text/

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -ls -R /

drwxr-xr-x - hadoop supergroup 0 2015-03-27 19:19 /input

-rw-r--r-- 1 hadoop supergroup 14 2015-03-27 19:19 /input/input1.txt

-rw-r--r-- 1 hadoop supergroup 32 2015-03-27 19:19 /input/input2.txt

-rwxrwxrwx 1 hadoop hadoops 28 2015-03-31 08:59 /input1.txt

drwxr-xr-x - hadoop supergroup 0 2015-03-27 19:16 /output

drwxr-xr-x - hadoop supergroup 0 2015-04-02 07:12 /text

-rw-r--r-- 1 hadoop supergroup 184 2015-03-31 08:14 /text/input1.zip --移动文件

　　复制文件

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -ls -R /

drwxr-xr-x - hadoop supergroup 0 2015-03-27 19:19 /input

-rw-r--r-- 1 hadoop supergroup 14 2015-03-27 19:19 /input/input1.txt

-rw-r--r-- 1 hadoop supergroup 32 2015-03-27 19:19 /input/input2.txt

-rwxrwxrwx 1 hadoop hadoops 28 2015-03-31 08:59 /input1.txt

-rw-r--r-- 1 hadoop supergroup 184 2015-03-31 08:14 /input1.zip

drwxr-xr-x - hadoop supergroup 0 2015-03-27 19:16 /output

drwxr-xr-x - hadoop supergroup 0 2015-04-02 07:29 /text

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -cp /input1.txt /input.txt

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -ls -R /

drwxr-xr-x - hadoop supergroup 0 2015-03-27 19:19 /input

-rw-r--r-- 1 hadoop supergroup 14 2015-03-27 19:19 /input/input1.txt

-rw-r--r-- 1 hadoop supergroup 32 2015-03-27 19:19 /input/input2.txt

-rw-r--r-- 1 hadoop supergroup 28 2015-04-02 07:31 /input.txt --新复制文件

-rwxrwxrwx 1 hadoop hadoops 28 2015-03-31 08:59 /input1.txt

-rw-r--r-- 1 hadoop supergroup 184 2015-03-31 08:14 /input1.zip

drwxr-xr-x - hadoop supergroup 0 2015-03-27 19:16 /output

drwxr-xr-x - hadoop supergroup 0 2015-04-02 07:29 /text

　　选项-f：如果文件已存在，覆盖已有文件

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -cp /input1.txt /input.txt

cp: `/input.txt': File exists --文件已存在时给出错误提示

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -cp -f /input1.txt /input.txt

[hadoop@localhost hadoop-2.5.2]$

mkdir

　　创建文件夹

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -mkdir /text

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -ls /

Found 5 items

drwxr-xr-x - hadoop supergroup 0 2015-03-27 19:19 /input

-rw-r--r-- 1 hadoop supergroup 184 2015-03-31 08:14 /input.zip

-rw-r--r-- 1 hadoop supergroup 210 2015-03-31 07:49 /input1.txt

drwxr-xr-x - hadoop supergroup 0 2015-03-27 19:16 /output

drwxr-xr-x - hadoop supergroup 0 2015-03-31 08:23 /text

　　选项-p：如果上层目录不存在，递归建立所需目录

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -mkdir /text1/text2

mkdir: `/text1/text2': No such file or directory --上层目录不存在，给出错误提示

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -mkdir -p /text1/text2

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -ls -R /

drwxr-xr-x - hadoop supergroup 0 2015-03-27 19:19 /input

-rw-r--r-- 1 hadoop supergroup 14 2015-03-27 19:19 /input/input1.txt

-rw-r--r-- 1 hadoop supergroup 32 2015-03-27 19:19 /input/input2.txt

-rw-r--r-- 1 hadoop supergroup 184 2015-03-31 08:14 /input.zip

-rw-r--r-- 1 hadoop supergroup 210 2015-03-31 07:49 /input1.txt

drwxr-xr-x - hadoop supergroup 0 2015-03-27 19:16 /output

drwxr-xr-x - hadoop supergroup 0 2015-03-31 08:23 /text

drwxr-xr-x - hadoop supergroup 0 2015-03-31 08:26 /text1

drwxr-xr-x - hadoop supergroup 0 2015-03-31 08:26 /text1/text2 --使用-p选项，创建成功

　　删除文件

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -rm /input.zip

15/03/31 08:02:32 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.

Deleted /input.zip

　　选项-r：递归的删除，可以删除非空目录

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -rm /text

rm: `/text': Is a directory --删除文件夹时，给出错误提示

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -rm -r /text --使用-r选项，文件夹及文件夹下文件删除成功

15/04/02 08:28:42 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.

Deleted /text

rmdir

　　删除空目录

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -ls -R /

drwxr-xr-x - hadoop supergroup 0 2015-03-27 19:19 /input

-rw-r--r-- 1 hadoop supergroup 14 2015-03-27 19:19 /input/input1.txt

-rw-r--r-- 1 hadoop supergroup 32 2015-03-27 19:19 /input/input2.txt

-rw-r--r-- 1 hadoop supergroup 184 2015-03-31 08:14 /input1.zip

drwxr-xr-x - hadoop supergroup 0 2015-04-02 08:34 /output

-rwxrwxrwx 1 hadoop hadoops 28 2015-03-31 08:59 /output/input1.txt

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -rmdir /output

rmdir: `/output': Directory is not empty --不能删除非空目录

　　选项--ignore-fail-on-non-empty：忽略非空删除失败时的提示

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -rmdir --ignore-fail-on-non-empty /output

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -ls -R /

drwxr-xr-x - hadoop supergroup 0 2015-03-27 19:19 /input

-rw-r--r-- 1 hadoop supergroup 14 2015-03-27 19:19 /input/input1.txt

-rw-r--r-- 1 hadoop supergroup 32 2015-03-27 19:19 /input/input2.txt

-rw-r--r-- 1 hadoop supergroup 184 2015-03-31 08:14 /input1.zip

drwxr-xr-x - hadoop supergroup 0 2015-04-02 08:34 /output --不给出错误提示，但文件未删除

-rwxrwxrwx 1 hadoop hadoops 28 2015-03-31 08:59 /output/input1.txt

setrep

　　改变一个文件的副本数

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -stat %r /input.zip

1 --原副本数

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -setrep 2 /input.zip

Replication 2 set: /input.zip

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -stat %r /input.zip

2 --改变后副本数

　　选项-w：命令等待副本数调整完成

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -setrep -w 1 /input.zip

Replication 1 set: /input.zip

Waiting for /input.zip ... done

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -stat %r /input.zip

expunge

　　清空回收站

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -expunge

15/04/03 01:52:46 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.

chgrp

　　修改文件用户组

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -ls -R /

drwxr-xr-x - hadoop supergroup 0 2015-03-27 19:19 /input

-rw-r--r-- 1 hadoop supergroup 14 2015-03-27 19:19 /input/input1.txt

-rw-r--r-- 1 hadoop supergroup 32 2015-03-27 19:19 /input/input2.txt

-rw-r--r-- 1 hadoop supergroup 0 2015-04-02 08:43 /input.zip

-rw-r--r-- 1 hadoop supergroup 184 2015-03-31 08:14 /input1.zip

drwxr-xr-x - hadoop supergroup 0 2015-04-02 08:34 /output --文件原用户组

-rwxrwxrwx 1 hadoop hadoops 28 2015-03-31 08:59 /output/input1.txt

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -chgrp test /output

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -ls -R /

drwxr-xr-x - hadoop supergroup 0 2015-03-27 19:19 /input

-rw-r--r-- 1 hadoop supergroup 14 2015-03-27 19:19 /input/input1.txt

-rw-r--r-- 1 hadoop supergroup 32 2015-03-27 19:19 /input/input2.txt

-rw-r--r-- 1 hadoop supergroup 0 2015-04-02 08:43 /input.zip

-rw-r--r-- 1 hadoop supergroup 184 2015-03-31 08:14 /input1.zip

drwxr-xr-x - hadoop test 0 2015-04-02 08:34 /output --修改后的用户组（未建立test组，仍可成功）

-rwxrwxrwx 1 hadoop hadoops 28 2015-03-31 08:59 /output/input1.txt --目录下文件的用户组未修改

　　选项-R：递归修，如果是目录，则递归的修改其下的文件及目录

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -chgrp -R testgrp /output

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -ls -R /

drwxr-xr-x - hadoop supergroup 0 2015-03-27 19:19 /input

-rw-r--r-- 1 hadoop supergroup 14 2015-03-27 19:19 /input/input1.txt

-rw-r--r-- 1 hadoop supergroup 32 2015-03-27 19:19 /input/input2.txt

-rw-r--r-- 1 hadoop supergroup 0 2015-04-02 08:43 /input.zip

-rw-r--r-- 1 hadoop supergroup 184 2015-03-31 08:14 /input1.zip

drwxr-xr-x - hadoop testgrp 0 2015-04-02 08:34 /output --目录及其下文件都被更改

-rwxrwxrwx 1 hadoop testgrp 28 2015-03-31 08:59 /output/input1.txt

chmod

　　修改文件权限，权限模式同linux shell命令中的模式

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -ls -R /

drwxr-xr-x - hadoop supergroup 0 2015-03-27 19:19 /input

-rw-r--r-- 1 hadoop supergroup 14 2015-03-27 19:19 /input/input1.txt

-rw-r--r-- 1 hadoop supergroup 32 2015-03-27 19:19 /input/input2.txt

-rw-r--r-- 1 hadoop supergroup 0 2015-04-02 08:43 /input.zip

-rw-r--r-- 1 hadoop supergroup 184 2015-03-31 08:14 /input1.zip

drwxr-xr-x - hadoop supergroup 0 2015-04-02 08:34 /output --文件原权限

-rwxrwxrwx 1 hadoop supergroup 28 2015-03-31 08:59 /output/input1.txt

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -chmod 754 /output

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -ls -R /

drwxr-xr-x - hadoop supergroup 0 2015-03-27 19:19 /input

-rw-r--r-- 1 hadoop supergroup 14 2015-03-27 19:19 /input/input1.txt

-rw-r--r-- 1 hadoop supergroup 32 2015-03-27 19:19 /input/input2.txt

-rw-r--r-- 1 hadoop supergroup 0 2015-04-02 08:43 /input.zip

-rw-r--r-- 1 hadoop supergroup 184 2015-03-31 08:14 /input1.zip

drwxr-xr-- - hadoop supergroup 0 2015-04-02 08:34 /output --修改后的权限

-rwxrwxrwx 1 hadoop supergroup 28 2015-03-31 08:59 /output/input1.txt --目录下文件的权限未修改

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -chmod -R 775 /output

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -ls -R /

drwxr-xr-x - hadoop supergroup 0 2015-03-27 19:19 /input

-rw-r--r-- 1 hadoop supergroup 14 2015-03-27 19:19 /input/input1.txt

-rw-r--r-- 1 hadoop supergroup 32 2015-03-27 19:19 /input/input2.txt

-rw-r--r-- 1 hadoop supergroup 0 2015-04-02 08:43 /input.zip

-rw-r--r-- 1 hadoop supergroup 184 2015-03-31 08:14 /input1.zip

drwxrwxr-x - hadoop supergroup 0 2015-04-02 08:34 /output --目录及其下文件都被更改

-rwxrwxr-x 1 hadoop supergroup 28 2015-03-31 08:59 /output/input1.txt

chown

　　修改文件的用户或组

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -ls -R /

drwxr-xr-x - hadoop supergroup 0 2015-03-27 19:19 /input

-rw-r--r-- 1 hadoop supergroup 14 2015-03-27 19:19 /input/input1.txt

-rw-r--r-- 1 hadoop supergroup 32 2015-03-27 19:19 /input/input2.txt

-rw-r--r-- 1 hadoop supergroup 0 2015-04-02 08:43 /input.zip

-rw-r--r-- 1 hadoop supergroup 184 2015-03-31 08:14 /input1.zip

drwxrwxr-x - hadoop supergroup 0 2015-04-02 08:34 /output --文件原用户和组

-rwxrwxr-x 1 hadoop supergroup 28 2015-03-31 08:59 /output/input1.txt

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -chown test /output

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -ls -R /

drwxr-xr-x - hadoop supergroup 0 2015-03-27 19:19 /input

-rw-r--r-- 1 hadoop supergroup 14 2015-03-27 19:19 /input/input1.txt

-rw-r--r-- 1 hadoop supergroup 32 2015-03-27 19:19 /input/input2.txt

-rw-r--r-- 1 hadoop supergroup 0 2015-04-02 08:43 /input.zip

-rw-r--r-- 1 hadoop supergroup 184 2015-03-31 08:14 /input1.zip

drwxrwxr-x - test supergroup 0 2015-04-02 08:34 /output --修改后的用户（未建立test用户，仍可成功）

-rwxrwxr-x 1 hadoop supergroup 28 2015-03-31 08:59 /output/input1.txt --目录下文件的用户未修改

　　选项-R：递归修改，如果是目录，则递归的修改其下的文件及目录

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -chown -R testown:testgrp /output

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -ls -R /

drwxr-xr-x - hadoop supergroup 0 2015-03-27 19:19 /input

-rw-r--r-- 1 hadoop supergroup 14 2015-03-27 19:19 /input/input1.txt

-rw-r--r-- 1 hadoop supergroup 32 2015-03-27 19:19 /input/input2.txt

-rw-r--r-- 1 hadoop supergroup 0 2015-04-02 08:43 /input.zip

-rw-r--r-- 1 hadoop supergroup 184 2015-03-31 08:14 /input1.zip

drwxrwxr-x - testown testgrp 0 2015-04-02 08:34 /output --目录及其下文件都被更改

-rwxrwxr-x 1 testown testgrp 28 2015-03-31 08:59 /output/input1.txt

getfacl

　　显示访问控制列表ACLs(Access Control Lists)

[hadoop@localhost bin]$ hadoop fs -getfacl /input.zip

# file: /input.zip

# owner: hadoop

# group: supergroup

user::rw-

group::r--

other::r--

　　选项-R：递归显示

[hadoop@localhost bin]$ hadoop fs -getfacl -R /input

# file: /input

# owner: hadoop

# group: supergroup

user::rwx

group::r-x

other::r-x

# file: /input/input1.txt

# owner: hadoop

# group: supergroup

user::rw-

group::r--

other::r--

# file: /input/input2.txt

# owner: hadoop

# group: supergroup

user::rw-

group::r--

other::r--

setfacl

　　设置访问控制列表，acls默认未开启，直接使用该命令会报错

[hadoop@localhost bin]$ hadoop fs -setfacl -b /output/input1.txt

setfacl: The ACL operation has been rejected. Support for ACLs has been disabled by setting dfs.namenode.acls.enabled to false.

　　开启acls，配置hdfs-site.xml

[hadoop@localhost hadoop-2.5.2]$ vi etc/hadoop/hdfs-site.xml

<name>dfs.namenode.acls.enabled</name>

</property>

　　选项-m：修改acls

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -getfacl /output/input1.txt

# file: /output/input1.txt

# owner: testown

# group: testgrp

user::rwx

group::rwx

other::r-x

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -setfacl -m user::rw-,user:hadoop:rw-,group::r--,other::r-- /output/input1.txt

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -getfacl /output/input1.txt

# file: /output/input1.txt

# owner: testown

# group: testgrp

user::rw-

user:hadoop:rw-

group::r--

mask::rw-

other::r--

　　选项-x：删除指定规则

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -setfacl -m user::rw-,user:hadoop:rw-,group::r--,other::r-- /output/input1.txt

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -getfacl /output/input1.txt

# file: /output/input1.txt

# owner: testown

# group: testgrp

user::rw-

user:hadoop:rw-

group::r--

mask::rw-

other::r--

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -setfacl -x user:hadoop /output/input1.txt

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -getfacl /output/input1.txt

# file: /output/input1.txt

# owner: testown

# group: testgrp

user::rw-

group::r--

mask::r--

other::r--

　　以下选项未做实验

　　选项-b：基本的acl规则(所有者，群组，其他）被保留，其他规则全部删除.

　　选项-k：删除缺省规则

setfattr

　　设置扩展属性的名称和值

　　选项-n：属性名称选项-v：属性值

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -getfattr -d /input.zip

# file: /input.zip

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -setfattr -n user.web -v www.baidu.com /input.zip

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -getfattr -d /input.zip

# file: /input.zip

user.web="www.baidu.com"

　　选项-x：删除扩展属性

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -getfattr -d /input.zip

# file: /input.zip

user.web="www.baidu.com"

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -setfattr -x user.web /input.zip

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -getfattr -d /input.zip

# file: /input.zip

getfattr

　　显示扩展属性的名称和值

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -getfattr -d /input.zip

# file: /input.zip

user.web="www.baidu.com"

user.web2="www.google.com"

　　选项-n：显示指定名称的属性值

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -getfattr -n user.web /input.zip# file: /input.zip

user.web="www.baidu.com"

>统计相关命令

count

显示指定文件或目录的：DIR_COUNT、FILE_COUNT、CONTENT_SIZE、 FILE_NAME，分别表示：子目录个数(如果指定路径是目录，则包含该目录本身)、文件个数、使用字节个数，以及文件或目录名。

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -ls -R /

drwxr-xr-x - hadoop supergroup 0 2015-03-27 19:19 /input

-rw-r--r-- 1 hadoop supergroup 14 2015-03-27 19:19 /input/input1.txt

-rw-r--r-- 1 hadoop supergroup 32 2015-03-27 19:19 /input/input2.txt

-rw-r--r-- 1 hadoop supergroup 28 2015-04-02 07:32 /input.txt

-rwxrwxrwx 1 hadoop hadoops 28 2015-03-31 08:59 /input1.txt

-rw-r--r-- 1 hadoop supergroup 184 2015-03-31 08:14 /input1.zip

drwxr-xr-x - hadoop supergroup 0 2015-03-27 19:16 /output

drwxr-xr-x - hadoop supergroup 0 2015-04-02 07:29 /text

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -count /

4 5 286 /

　　选项-q：显示配额信息（在多人共用的情况下，可以通过限制用户写入目录，并设置目录的quota ，防止不小心就把所有的空间用完造成别人无法存取的情况）。配额信息包括：QUOTA、REMAINING_QUOTA、SPACE_QUOTA、REMAINING_SPACE_QUOTA，分别表示某个目录下档案及目录的总数、剩余目录或文档数量、目录下空间的大小、目录下剩余空间。

　　计算公式：

　　QUOTA – (DIR_COUNT + FILE_COUNT) = REMAINING_QUOTA；

　　SPACE_QUOTA – CONTENT_SIZE = REMAINING_SPACE_QUOTA。

　　none和inf表示未配置。

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -count -q /

9223372036854775807 9223372036854775798 none inf 4 5 286 /

　　显示文件大小，如果指定目录，会显示该目录中每个文件的大小

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -ls -R /

drwxr-xr-x - hadoop supergroup 0 2015-03-27 19:19 /input

-rw-r--r-- 1 hadoop supergroup 14 2015-03-27 19:19 /input/input1.txt

-rw-r--r-- 1 hadoop supergroup 32 2015-03-27 19:19 /input/input2.txt

-rw-r--r-- 1 hadoop supergroup 28 2015-04-02 07:32 /input.txt

-rwxrwxrwx 1 hadoop hadoops 28 2015-03-31 08:59 /input1.txt

-rw-r--r-- 1 hadoop supergroup 184 2015-03-31 08:14 /input1.zip

drwxr-xr-x - hadoop supergroup 0 2015-03-27 19:16 /output

drwxr-xr-x - hadoop supergroup 0 2015-04-02 07:29 /text

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -du /

46 /input

28 /input.txt

28 /input1.txt

184 /input1.zip

0 /output

0 /text

　　选项-s：显示总的统计信息，而不是显示每个文件的信息

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -du -s /

286 /

　　检查文件系统的磁盘空间占用情况

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -df /

Filesystem Size Used Available Use%

hdfs://localhost:9000 18713219072 73728 8864460800 0%

stat

　　显示文件统计信息。

　　格式： %b - 文件所占的块数； %g - 文件所属的用户组；%n - 文件名； %o - 文件块大小；%r - 备份数；%u - 文件所属用户；%y - 文件修改时间

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -stat %b,%g,%n,%o,%r,%u,%y /input.zip

0,supergroup,input.zip,134217728,1,hadoop,2015-04-02 15:43:24

>快照命令

createSnapshot

　　创建快照，

　　附：官方文档 http://hadoop.apache.org/docs/r2.5.2/hadoop-project-dist/hadoop-hdfs/HdfsSnapshots.html

　　snapshot(快照)是一个全部文件系统、或者某个目录在某一时刻的镜像。创建动作仅仅是在目录对应的Inode上加个快照的标签，不会涉及到数据块的拷贝操作，也不会对读写性能有影响，但是会占用namenode一定的额外内存来存放快照中被修改的文件和目录的元信息

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -ls -R /output

-rwxrwxr-x 1 testown testgrp 28 2015-03-31 08:59 /output/input1.txt

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -createSnapshot /output s1

createSnapshot: Directory is not a snapshottable directory: /output --直接创建给出错误

[hadoop@localhost hadoop-2.5.2]$ hdfs dfsadmin -allowSnapshot /output --对开启某一目录的快照功能

Allowing snaphot on /output succeeded

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -createSnapshot /output s1 --创建快照

Created snapshot /output/.snapshot/s1

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -ls -R /output

-rwxrwxr-x 1 testown testgrp 28 2015-03-31 08:59 /output/input1.txt

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -ls /output/.snapshot/s1

Found 1 items

-rwxrwxr-x 1 testown testgrp 28 2015-03-31 08:59 /output/.snapshot/s1/input1.txt --查看快照

renameSnapshot

　　重命名快照

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -ls /output/.snapshot/s1Found 1 items

-rwxrwxr-x 1 testown testgrp 28 2015-03-31 08:59 /output/.snapshot/s1/input1.txt --原快照

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -renameSnapshot /output/ s1 s2[hadoop@localhost hadoop-2.5.2]$ hadoop fs -ls /output/.snapshot/s2

Found 1 items

-rwxrwxr-x 1 testown testgrp 28 2015-03-31 08:59 /output/.snapshot/s2/input1.txt --新快照

deleteSnapshot

　　删除快照

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -ls /output/.snapshot/s2

Found 1 items

-rwxrwxr-x 1 testown testgrp 28 2015-03-31 08:59 /output/.snapshot/s2/input1.txt

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -deleteSnapshot /output/ s2

[hadoop@localhost hadoop-2.5.2]$ hadoop fs -ls /output/.snapshot/s2

ls: `/output/.snapshot/s2': No such file or directory

一个写伪分布式hadoop搭建写的非常好的一个博主：

https://www.cnblogs.com/biehongli/p/7026809.html

[user@node-17h bin]$ hadoop fs -ls hdfs://node-17h:50070/ 时报错

ls: Failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message end-group tag did not match expected tag.; Host Details : local host is: "node-17h/127.0.0.1"; destination host is: "node-17h":50070;

报错的原因是端口错掉辽

[user@node-17h bin]$ hdfs getconf confKey fs.default.name

hdfs getconf is utility for getting configuration information from the config file.

hadoop getconf

[-namenodes] gets list of namenodes in the cluster.

[-secondaryNameNodes] gets list of secondary namenodes in the cluster.

[-backupNodes] gets list of backup nodes in the cluster.

[-includeFile] gets the include file path that defines the datanodes that can join the cluster.

[-excludeFile] gets the exclude file path that defines the datanodes that need to decommissioned.

[-nnRpcAddresses] gets the namenode rpc addresses

[-confKey [key]] gets a specific key from the configuration

[user@node-17h bin]$ hdfs getconf -confKey fs.default.name

19/03/23 06:40:34 INFO Configuration.deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS

hdfs://node-17h:9000

查看集群是否正常时报错Connection refused

[fyj@localhost bin]$ sudo ./hdfs dfsadmin -report

19/03/29 02:37:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

report: Call From localhost/127.0.0.1 to localhost:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused

报错原因：（伪分布式）集群没有启动

解决办法：

[fyj@localhost sbin]$ ./start-all.sh

This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh

19/03/29 02:44:16 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Starting namenodes on [localhost]

localhost: starting namenode, logging to /home/fyj/hadoop/logs/hadoop-fyj-namenode-localhost.localdomain.out

localhost: starting datanode, logging to /home/fyj/hadoop/logs/hadoop-fyj-datanode-localhost.localdomain.out

Starting secondary namenodes [0.0.0.0]

0.0.0.0: starting secondarynamenode, logging to /home/fyj/hadoop/logs/hadoop-fyj-secondarynamenode-localhost.localdomain.out

19/03/29 02:45:04 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

starting yarn daemons