hadoop uncompress / process gzip

stackoverflow’s post kindly provides the answer (I used different keywords initially; so it took several hops until I found it):

TextInputFormat and descendants should automatically handle .gz compressed files. you can also implement your own InputFormat (which will split the input file into chunks for processing) and RecordReader (which extract one record at a time from the chunk)

 

Keywords:

Searched for hadoop process compressed file 3:38pm
compression – Very basic question about… – stackoverflow.com 3:38pm
Searched for hadoop decompress 3:37pm
Parallel LZO: Splittable Compression… – cloudera.com 3:37pm
Searched for hadoop uncompress 3:37pm
Searched for hadoop unzip mapper 3:35pm
Searched for hadoop gzip mapper 3:34pm
Hadoop gzip input file using only one… – stackoverflow.com 3:35pm
Hadoop Streaming – apache.org 3:35pm
Searched for hadoop gzip 3:33pm
java – Hadoop gzip compressed files -… – stackoverflow.com 3:33pm
Hadoop at Twitter (part 1): Splittable… – cloudera.com

About Neil Rubens

see http://ActiveIntelligence.org

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

*