What is a Piggybank?
Piggybank is a jar and its a collection of user contributed UDF’s that is released along with Pig. These are not included in the Pig JAR, so we have to register them manually in our script.
1. Download piggybank.jar
2. Copy this jar to /usr/lib/pig/lib
Terminal > sudo cp /home/cloudera/Desktop/piggybank.jar /usr/lib/pig/lib/
3. Register this jar to Pig:
Terminal > Pig
Grunt > Register piggybank.jar;
4.Now we are set to use UDF’s of Piggybank like below to process CSV file in Pig:
Grunt > tweets = load ‘/user/cloudera/tweets.csv’ using org.apache.pig.piggybank.storage.CSVExcelStorage() as (date: chararray,timing:chararray,Tweet_Text:chararray,Type:chararray,Media_Type:chararray,Hashtags:chararray,Tweet_Id:long,
Tweet_Url:chararray,twt_favourites:long,Retweets:long,col1:chararray,col2:chararray);
5. Dump its result:
Grunt> Dump tweets;