【Hadoop】MapReduce 编程案例-WordCount

【Hadoop】MapReduce 编程案例-WordCount 2022-08-31 957

（1）MrAppMaster：负责整个程序的过程调度及状态协调。

（2）MapTask：负责 Map 阶段的整个数据处理流程。

（3）ReduceTask：负责 Reduce 阶段的整个数据处理流程。

1.Mapper 阶段

（1）用户自定义的 Mapper 要继承自己的父类；

（2）Mapper 的输入数据是 KV 对的形式（KV 的类型可自定义）；

（3）Mapper 中的业务逻辑写在 map() 方法中；

（4）Mapper 的输出数据是 KV 对的形式（KV 的类型可自定义）

（5）map() 方法（MapTask 进程）对每一个 <K,V> 调用一次。

2.Reducer 阶段

（1）用户自定义的 Reducer 要继承自己的父类；

（2）Reducer 的输入数据类型对应 Mapper 的输出数据类型，也是 KV；

（3）Reducer 的业务逻辑写在 reduce() 方法中；

（4）ReduceTask 进程对每一组相同 k 的 <k,v> 组调用一次 reduce() 方法。

3.WordCount 案例

输入：单词

输出：<k,v>，即 (单词，数量)。

（1）编写Mapper类

public class WordcountMapper extends Mapper<LongWritable, Text, Text, IntWritable>{
          
   
	
	Text k = new Text();
	IntWritable v = new IntWritable(1);
	
	@Override
	protected void map(LongWritable key, Text value, Context context)	throws IOException, InterruptedException {
          
   
		
		// 1 获取一行
		String line = value.toString();
		
		// 2 切割
		String[] words = line.split(" ");
		
		// 3 输出
		for (String word : words) {
          
   
			k.set(word);
			context.write(k, v);
		}
	}
}

（2）编写 Reducer 类

public class WordcountReducer extends Reducer<Text, IntWritable, Text, IntWritable>{
          
   

    int sum;
    IntWritable v = new IntWritable();

	@Override
	protected void reduce(Text key, Iterable<IntWritable> values,Context context) throws IOException, InterruptedException {
          
   
		
		// 1 累加求和
		sum = 0;
		for (IntWritable count : values) {
          
   
			sum += count.get();
		}
		
		// 2 输出
		v.set(sum);
		context.write(key,v);
	}
}

免费搭建微信查券返利机器人来轻松赚佣金