Skip to main content

Posts

Showing posts from February, 2017

Mistakes in my first MapRed prog

Took the Big Data University course for MapReduce.  Following issues in the lab exercise sampleData folder is located on the hadoop fs and not local. test.jar located in the local fs using MapReduce Model V1 for the programming. 1. when providing the input file location, used /sampleData/ XXX.dat instead of sampleData/XXX.dat This resulted in Exception in thread "main" org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://yyy:8020/sampleData/XXX.dat hadoop jar test.jar com.pk.hadoop.MapReduce.Samp sampleData/XXX.dat sampleData/XXX.dat.out 2. Error: java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, received org.apache.hadoop.io.LongWritable Every time the code executed, it was throwing the above error. Problem was that i forgot to declare the mapper and reducer classes for the job. And therefore the job was trying to use its default mappings and failing

overhead of String comparision over int

refer to the last 2 successful submissions on my codechef site for the permut problem https://www.codechef.com/problems/PERMUT2 As you can see, the only difference is use of int[] in place of string[] and hence comparing the int[] with each other as opposed to string[] with each other. one with int[] is ~40% faster than with String[]

To use lambda or not

For a problem as simple as repeat division on each of the N numbers from console, you might be tempted to use single line of code by making use of lambda expressions. ( like below ) reader.lines().limit(numOfTCs).mapToInt(parseInt(x)).foreach(buffer.append(y/divisor)); But, the choice will cost you performance and readability!! At times, its the more verbose but easier to understand piece of code that wins over the newer methods of coding. Lesson: Just because you can code in Java8 syntax, you need not overuse it.

why not to sysout from inside a loop!!

After 2 year of being an out of touch coder, I committed the sin and the penalty -- "a code that is 5 times slower." try (BufferedReader reader = new BufferedReader(new InputStreamReader(System.in))) { while (...) { ...                 System.out.println(..myresult..); ... } } v/s try (BufferedReader reader = new BufferedReader(new InputStreamReader(System.in)); PrintWriter printWriter = new PrintWriter(System.out)) { while (...) {             ... printWriter.write(..myresult..+"\n"); ... } printWriter.flush(); } the improvised code can by written in multiple ways.. use StringBuilder/buffer and append the result inside the loop and sysout at the end of the loop OR like the code above spool it to writer and flush it.