https://blog.csdn.net/jediael_lu/article/details/76902739
1. Tool / Configurable / Configured
一个tool
接口用于支持处理普通的命令行参数
。
Tool
,代表的是任何抽象的Map-Reduce
工具/应用。Tool/application
应该代表 ToolRunner.run(Tool,String[])
标准命令行的处理,以及处理自定义的参数。
Tool
接口继承Configurable
,仅定义了一个方法。
public interface Tool extends Configurable {
/**
* Execute the command with the given arguments.
*
* @param args command specific arguments.
* @return exit code.
* @throws Exception
*/
int run(String [] args) throws Exception;
}
Configurable的源文件如下:
public interface Configurable {
void setConf(Configuration conf);
Configuration getConf();
}
所以,可以利用Tool的实现,来打印所有的属性,如 以下自定义程序 ConfigurationPrinter
public class ConfigurationPrinter extends Configured implements Tool {
static {
Configuration.addDefaultResource("hdfs-default.xml");
Configuration.addDefaultResource("hdfs-site.xml");
Configuration.addDefaultResource("mapred-default.xml");
Configuration.addDefaultResource("mapred-site.xml");
}
@Override
public int run(String[] args) throws Exception {
Configuration conf = getConf();
for (Entry entry: conf) {
System.out.printf("%s=%s\n", entry.getKey(), entry.getValue());
}
return 0;
}
public static void main(String[] args) throws Exception {
int exitCode = ToolRunner.run(new ConfigurationPrinter(), args);
System.exit(exitCode);
}
}
再看Configured
:
public class Configured implements Configurable {
public Configuration conf;
public Configured() { this(null); }
public Configured(Configuration conf) {
setConf(conf);
}
@Override
public void setConf(Configuration conf) {
this.conf = conf;
}
@Override
public Configuration getConf() {
return conf;
}
}
Configured.java
的作用是设置conf
。
所以,一个典型的Tool
实现为:继承Configured
并实现Tool
,需要实现run方法即可。
public class MyApp extends Configured implements Tool {
public int run(String[] args) throws Exception {
// Configuration processed by ToolRunner
Configuration conf = getConf();
// Create a JobConf using the processed conf
JobConf job = new JobConf(conf, MyApp.class);
// Process custom command-line options
Path in = new Path(args[1]);
Path out = new Path(args[2]);
// Specify various job-specific parameters
job.setJobName("my-app");
job.setInputPath(in);
job.setOutputPath(out);
job.setMapperClass(MyMapper.class);
job.setReducerClass(MyReducer.class);
// Submit the job, then poll for progress until the job is complete
RunningJob runningJob = JobClient.runJob(job);
if (runningJob.isSuccessful()) {
return 0;
} else {
return 1;
}
}
public static void main(String[] args) throws Exception {
// Let ToolRunner handle generic command-line options
int res = ToolRunner.run(new Configuration(), new MyApp(), args);
System.exit(res);
}
}
由上可见,关于ToolRunner
的典型用法是:
- 定义一个类,继承
Configured
,实现Tool
接口。其中Configured
提供了getConf()
与setConfig()
方法,而Tool
则提供了run()
方法。 - 在
main()
方法中通过ToolRunner.run(…)
方法调用上述类的run
(String[]方法)。
2. ToolRunner
ToolRunner
与上图中的类、接口无任何的继承、实现关系,它只继承了Object,没实现任何接口。ToolRunner
可以方便的运行那些实现了Tool接口的类(调用其run(String[])方法,并通过GenericOptionsParser
可以方便的处理hadoop命令行参数。
ToolRunner类分析如下:
public class ToolRunner {
public static int run(Configuration conf, Tool tool, String[] args)
...
}
public static int run(Tool tool, String[] args)
return run(tool.getConf(), tool, args);
}
public static void printGenericCommandUsage(PrintStream out) { ... }
public static boolean confirmPrompt(String prompt) { ... }
}
ToolRunner
完成以下2个功能:
(1)为Tool
创建一个Configuration
对象。
(2)使得程序可以方便的读取参数配置。
其中run方法如下:
public static int run(Configuration conf, Tool tool, String[] args)
throws Exception{
if(conf == null) {
conf = new Configuration();
}
GenericOptionsParser parser = new GenericOptionsParser(conf, args);
//set the configuration back, so that Tool can configure itself
tool.setConf(conf);
//get the args w/o generic hadoop args
String[] toolArgs = parser.getRemainingArgs();
return tool.run(toolArgs);
}