, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

Download Examples Jar

1. Locate hadoop-mapreduce-examples-*.jar in sandbox. Usually it could be found in /usr/lib/hadoop/.
Alternatively you can download it from here!

Add your local folder to sandbox

2. Extract downloaded jar in your local machine under bigdata directory. I have created a directory i.e. d:\bigdata
There is no restriction for the path or directory name.

3. if your sandbox is running, click view->close->power-off

4. Now in virtual box, click on “Shared Folders”

5. Add bigdata folder from your machine in Shared Folders

6. Select “Auto Mount” check box and say “ok”.

7. Start sandbox and login.

8. type cd /media/sf_bigdata

9. type ls

It should show files of your big data directory from your machine.

Creating Sample Input Files

10. create another folder in bigdata directory d:\bigdata\wcinput

11. create two files file1.txt, file2.txt and put text – “is this a tough program or this is a simple program or this is wrong program” in both files

Run WordCount Example

12. type command,
hadoop jar hadoop-mapreduce-examples-2.5.2.jar wordcount sf_bigdata/wcinput sf_bigdata/output

– I have used hadoop-mapreduce-examples-2.5.2.jar but you can use your version instead.
– when local directories are mounted “sf_” prefix gets added automatically
– output directory will be created automatically.

Check JobStatus

13. in browser type,
It will open cluster and will show running job

14. click on “ID” it will open job status page.

15. click on history to see it running. if it shows 404 not found then replace hosturl to and try

I think it has worked for you…Have fun!

Follow Day 4 for a sample use Case to understand how New Delhi State Elections Vote counting happened.