java - Chain Multiple MapReduce Jobs with sending data to same mappers -


i'm facing issue chained multiple mapreduce jobs.

the current scenario works follows: application process 2 data-sources, each of them go separated mapper using "multipleinputs" 2 different directories

first job reads inputs in 2 mappers, process data , output data on 2 different directories using "multipleoutputs".

now, second job should work on output of first task using same task ids of first job.


for example,

job1: 2 different mappers, 1 reducer

  • mapper1_1 reads datasource1 directory, creates 2 tasks process them , output ds1/ds1-m-00000 , ds1/ds1-m-00001 intermediate files

  • mapper1_2 reads datasource2 directory, creates 1 task process , output ds2/ds2-m-00002 intermediate file

  • reducer1 makes calculations , outputs statistics

job2: 2 different mappers, 1 reducer

  • mapper2_1 reads ds1 directory, creates 2 tasks process 2 intermediate files.

  • mapper2_2 reads ds2 directory, creates 1 task process intermediate file.

  • reducer2 makes calculations output final result

for reasons, same task id must used in second job ones generated in first job actually, task ids generated randomly, in second job, generated first jobs , process complete , not.

is there anyway control generating same task id 2 jobs or other method generate unique number identify same file in both jobs?


Comments