Wednesday, March 26, 2014

Compressing big data files in R

Right now I am working with a big data file (+ 1million rows and several columns). Let's suppose that the name of the file is bigdata.txt. The size of this file is around 128MB; thus copying it into my Dropbox is not a good idea. However, after compressing my file in R, now the size of the compressed file decreased dramatically (3.5Mb).


To do so, after reading the original file (the big one) in R, just write the following code in R:


system(“gzip bigdata.txt”)

The code creates a new file bigdata.txt.gz  and you can read it with the read.table function.

3 comments:

  1. And if you read and compressed in. dta or. rdata

    http://experienceinstatistics.blogspot.com/2010/01/tiempo-de-lectura-rapida-y-comprension_17.html

    ReplyDelete
  2. Gzip es un comando de linux. En este caso es invocado por la función system, no estoy seguro si funciona en otros sistemas operativos.

    ReplyDelete
  3. ok, ya hice la prueba y en Windows no funciona. Mi sistema operativo es OSX en un MAC. Entonces, para Windows tenemos la solución de Alex.

    ReplyDelete