Tags

, , , , , , , , , , , , , , , , , , , , , , , , , , , ,

Assuming you have done your day 4 exercises well.

VoteCountApplication example starts execution as other java program does with main().

//Calls ToolRunner.run() with Configuration instance, VoteCountApplication instance and //arguments.
int res = ToolRunner.run(new Configuration(), new VoteCountApplication(), args);

//run() takes input file path and output file path from console and checks if two inputs //has been provided or not.
public int run(String[] args) throws Exception {
if (args.length != 2) {
System.out.println(“usage: [input] [output]”);
System.exit(-1);
}

//New Job instance has been created.
Job job = Job.getInstance(new Configuration());

//Output key and value class has been set as output will be generated in first text and //second integer. e.g. Arvind 75
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);

//Mapper and Reducer classes has been set.
job.setMapperClass(VoteCountMapper.class);
job.setReducerClass(VoteCountReducer.class);

//Input format class and output format class been set.
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);

//file input path and file output path has been specified here.
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));

//class has been set.
job.setJarByClass(VoteCountApplication.class);

//job has been submitted.
job.submit();

VoteCountMapper.java does tokenisation and generates key/value pairs

//used to write value against each candidate name
private final static IntWritable one = new IntWritable(1);

//takes 3 inputs i.e. key – which runs this method for one key at a time i.e. candidate //name
//value – holds one line from input file i.e. candidate name
//output – it is used to write (candidate name, 1) to output.

After Mapper sorting and shuffling done by framework itself.

VoteCounterReducer.java does reducer job on the sorted key/value pairs and generates result

//It takes 3 inputs i.e.
//key as a text value i.e. candidate name
//iterable values – list of values matching to the same key i.e. same candidate name
//output – to write values in key/value format as output

//loops through and add count for each iteration
for(IntWritable value: values){
voteCount+= value.get();
}

//writes key and totalcount to output
output.write(key, new IntWritable(voteCount));

I hope it is understandable…Cheers!

Follow Day 6 for Hive Basics….

Advertisements