Skip to main content

Mistakes in my first MapRed prog

Took the Big Data University course for MapReduce. Following issues in the lab exercise


sampleData folder is located on the hadoop fs and not local.
test.jar located in the local fs
using MapReduce Model V1 for the programming.

1. when providing the input file location, used /sampleData/ XXX.dat instead of sampleData/XXX.dat


This resulted in

Exception in thread "main" org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://yyy:8020/sampleData/XXX.dat

hadoop jar test.jar com.pk.hadoop.MapReduce.Samp sampleData/XXX.dat sampleData/XXX.dat.out


2. Error: java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, received org.apache.hadoop.io.LongWritable

Every time the code executed, it was throwing the above error.

Problem was that i forgot to declare the mapper and reducer classes for the job. And therefore the job was trying to use its default mappings and failing


Popular posts from this blog

AWS Developer Associate certification

Almost a month since i completed my AWS DA certification. Got 94% overall score. This post is about my preparation time and resources. I spent almost 4-5 hours  on weekdays for 1 calendar month to learn about AWS. I did not have any background of AWS before this. I researched on which resources to use to learn and prepare for the certification. Undoubtedly, acloudguru comes up as the most popular study guide followed by whizlabs for the timed exam preparation. I bought the course on acloudguru and whizlabs. I setup my AWS free tier account. And at the end of each lesson i did the labs on my aws account and read the FAQ about that lesson from the AWS website. My approach to preparation was as much hands-on as possible. I found the course material for developer associate on acloudguru an OK resource. Something that will help you focus on what topics to cover rather than something that helps you understand the AWS ocean. The quizzes are nowhere close to the exam. Real exam has ...

why not to sysout from inside a loop!!

After 2 year of being an out of touch coder, I committed the sin and the penalty -- "a code that is 5 times slower." try (BufferedReader reader = new BufferedReader(new InputStreamReader(System.in))) { while (...) { ...                 System.out.println(..myresult..); ... } } v/s try (BufferedReader reader = new BufferedReader(new InputStreamReader(System.in)); PrintWriter printWriter = new PrintWriter(System.out)) { while (...) {             ... printWriter.write(..myresult..+"\n"); ... } printWriter.flush(); } the improvised code can by written in multiple ways.. use StringBuilder/buffer and append the result inside the loop and sysout at the end of the loop OR like the code above spool it to writer and flush it.

Java 8 Tutorial for Lambda

For those of you who are searching internet for good resource on Lambda, you must read the oracle tutorial.  Java 8 Lambda Quickstart I think this is one of the best tutorial with some simple examples that help you understand when and how to use Lambdas.