This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
The Assignment• You are provided with United States Census data
– Download Zip Code Tabulation Areas Gazetteer File (1.1MB)which contains:• Zip code identify ing an area• Population count• Housing unit count• Land area (m2), water area (m2),• Latitude, longitude
• Find out the potential trend of rise of housing prices in northeast, northwest, southeast, and southwest.– The potential trend of the rise of housing price simply based on the ratio of supply and
demand, which is housing unit count / population density. The smaller the result the higher the trend. This ignores other factors like public infrastructures, community, environment, etc.
– Population density is s imply based on population count / land area.
• Install Java v1.7+• Add a dedicated Hadoop system user• Configure SSH access• Disable IPv6
• Or configure your Hadoop environment on LCSR:– http://www.cs.rutgers.edu/~watrous/hadoop.html– We will give instructions on setting your own cluster in this recitation
public void map(Object key, Text value, Context context) { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } }
public void reduce(Text key, Iterable<IntWritable> values,Context context){ int sum = 0; for (IntWritable val : values) { sum += val.get(); } result.set(sum); context.write(key, result); }
CS417 11/10/15
PaulKrzyzanowski 4
Step 3: Copy data to HDFS & Run jar file• Before run the actual MapReduce job, you must first copy
the file from your local file system to Hadoop’s HDFS• Download input data and copy data it to HDFS• Run the MapReduce job$ bin/hadoop jar hadoop*your_program*.jar \
Documentation• Document your work NEATLY• For your submission, explain:
– The files you’re submitting and what they do– how input is mapped into (key, value) pairs– how (key, value) pairs are processed by reduce phase– If job cannot be done in a single map-reduce pass, describe how it
would be structured into two or more map-reduce jobs– How to compile & run – Any bugs or peculiarities