, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

Flume Installation

Reading Tweets from Twitter requires Flume setup. I am using Hortonworks2.2 and i have Flume already installed on my sandbox.

If you don’t have Flume install it by,

/>yum flume install

Create Twitter Account and Application

Next Step after flume installation is to make a Twitter Developer Account. Follow below steps,

  • Open dev.twitter.com and it will open a page like below,


  • Login with Twitter credentials. If not registered, sign-up and then login.



  • At the down of the page there is a link “Manage Your Apps” Click on that.


Assuming you have already logged-in. Then it will take you to this page. If you have applications already created then it will list applications.

  • Click on “create new app”


  • Enter required details in above fields, accept the agreement. Your mobile number must be registered before creating an app in twitter.
  • It will show a page for the newly created application. Click on TestOAuth button on the top right side. It will generate required keys.login3

Update Flume.conf file

  • Copy those keys one by one and update in the below file,


TwitterAgent.sources = Twitter
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS

TwitterAgent.sources.Twitter.type = com.cloudera.flume.source.TwitterSource
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sources.Twitter.consumerKey = <>
TwitterAgent.sources.Twitter.consumerSecret = <>
TwitterAgent.sources.Twitter.accessToken = <>
TwitterAgent.sources.Twitter.accessTokenSecret = <>

TwitterAgent.sources.Twitter.keywords = #CWC15

TwitterAgent.sinks.HDFS.channel = MemChannel
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://sandbox.hortonworks.com:8020/user/tweets/%Y/%m/%d/%H/
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000

TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 10000
TwitterAgent.channels.MemChannel.transactionCapacity = 100


flume.conf is located at <flume_home>/conf/

  • Update Keys from Twitter Screen.

Keywords could be what exactly you want to fetch from Twitter. i have entered #CWC15 i.e. Cricket World Cup 2015 related tweets

Path must be correct to store data. HDFS Path information you can find in site-core.xml. I have created /tweets folder in /user directory to store tweets.

Create Application to fetch Tweets

  • Download MVN from internet
  • Download twitter source code from internet
  • Build twitter source code using MVN

->add MVN to your path
->add JAVA_HOME to your path
->go to twitter source folder
->run “mvn package” command

it will build the jar for you.

  • copy jar to flume/lib/ folder using below command

/> cp <jarlocation>/flumejarname.jar /<flume_home>/lib/

  • add lib folder to classpath using below command

/> export CLASSPATH=/lib/*

You have source jar ready.

You have flume setup ready and jar files are on classpath.

twitter access keys are configured.

Running Flume to fetch Tweets


above command copies output to flume_twitteragent.log file which helps in debugging in case if there is an error occured.

Now open Browser to see the tweets as below,

open http://localhost:8000/filebrowser/#/user/tweets

and then browse into the folder


there are so many tweets listed. Have fun!

Follow me for analyzing these tweets and generating reports from this one.