Gzip is the ubiquitous compression tool for linux and other *nix based systems. But even given that it is fairly quick, when you are working with a large archive it can take a while. I am sure you have asked yourself the same question I have. How can I speed up gzip compression time?
There are a couple different ways to speed up Gzip compression. Obviously you can get the smallest archives by using the “-9” compression flag. But this takes the longest amount of time.
~/$ gzip -9 file.txt
So switching to the least compression reduces the compression time. But at the cost of not saving as much disk space.
~/$ gzip -1 file.txt
Let’s Really Speed Up Gzip Compression
If you have watched your CPU usage while using Gzip you may have noticed that your CPU is pegged. In the age of multi-core systems, you might notice that only one of your computer or servers cores are pegged out. This is because the Gzip process is only single threaded. So it operates by taking the file(s) that are being compressed one bit at a time and compressing it.
This is obviously not the most efficient practice, especially when you have 2 or more idle cores available on your system. But since Gzip is a single threaded application, there is no way to utilize all those idle cores.
The Best Way To Speed Up Gzip is Not To Use Gzip
There is an alternative that will speed up your Gzip compression. Pigz is a threaded implementation of Gzip. It allows you to still use Gzip compression without having to wait so long. This is especially important when working with a very large archive.
Pigz breaks the compression task in to multiple pieces which allows the process to accelerate the compression x the number of available cores. So if you have four available cores, you can expect the compression to complete in about 1/4th the time. Don’t be worried about using all the CPU resources on your system since you can specify the number of cores to use.
Here is a basic Pigz example with the highest compression:
tar -c /inputDirectory/ | pigz -9 > outputFile.tar.gz
In this example we are using “tar” to “-c” create an archive from the contents of “/inputDirectory/”. The output of “tar” is then piped into the Pigz command which compresses it with the highest compression “-9”. That compressed content is then redirected into the file “outputFile.tar.gz”. By default the command will utilize all the available cores on the system.
We can then take the same command and alter it a bit to reduce it’s resource usage and minimize impact on the system load. While still able to speed up the Gzip compresson.
tar -c /inputDirectory/ | pigz -9 -p2 > outputFile.tar.gz
Using the “-p2” option limits the process to using 2 cores. Changing that option to be “-p3” would limit it to 3 cores, and “-p4” would limit it to 4, etc…
Call Pigz just like Gzip
There are some other ways to call Pigz. You can use it directly like vanilla Gzip.
pigz -9 compressfile.tar
By default the above command will replace the original file with the new compressed file “compressfile.tar.gz”. If you want to keep the original uncompressed file and just create a new file along side it add the “-k” or keep option.
pigz -k -9 compressfile.tar
Or you can use the more common formatting of “tar” just by adding a long form option.
tar cf outputFile.tar.gz --use-compress-prog=pigz inputDirectory/
So there you have the best way to speed up Gzip compression. Hopefully it saves you some time and frustration next time you have a large archive. It might even be able to compress your mysqldump output?