Bzip2 is easily the best compression tool when it comes to speed and archive size. But even given that it is fast, Bzip2 can still seem to take forever to complete the shrinking of an archive. I am sure you have asked yourself the same question I have. How can I speed up Bzip2 Compression time? Wether you are performing a backup, or just archiving some files Bzip2 does a good job.
There are a couple different ways to speed up Bzip2 compression. Obviously you can get the smallest archives by using the “-9” compression flag. But this takes the longest amount of time.
~/$ bzip2 -9 file.txt
So switching to the least compression reduces the compression time. But at the cost of not saving as much disk space.
~/$ bzip2 -1 file.txt
Let’s Really Speed Up Bzip2 Compression
If you have watched your CPU usage while using Bzip2 you have probably noticed that your CPU is pegged. In the age of multi-core systems, you will easily notice that only one of your computer or servers cores are pegged out. This is because the Bzip2 process is only single threaded. So it operates by taking the file(s) that are being compressed one bit at a time and compressing it.
This is obviously not the most efficient practice, especially when you have 2, 4, 6, or more idle cores available on your system. But Bzip2 is a single threaded application, so there is no way to utilize those idle cores.
The Best Way To Use Bzip2 is Not To Use Bzip2
Fortunately there is an alternative that will speed up Bzip2 compression. Pbzip2 is a threaded implementation of Bzip2. It allows you to still use Bzip2 compression without having to wait. This is especially important when working with a very large archive.
Pbzip2 breaks the compression task in to multiple pieces which allows the process to accelerate the compression x the number of available cores. Don’t be worried about using all the CPU resources on your system since you can specify the number of cores to use. Or even load the file completely into RAM before starting compression to speed up the process.
Here is a basic example with the highest compression:
tar -c /inputDirectory/ | pbzip2 -c -9 > outputFile.tar.bz2
In this example we are using “tar” to “-c” create an archive from the contents of “/inputDirectory/”. The output of “tar” is then piped into the Pbzip2 command which compresses it and “-c” outputs to stdout with the highest compression “-9”. That compressed content is then redirected into the file “outputFile.tar.bz2”. By default the command will utilize all the available cores on the system.
We can then take the same command and alter it a bit to reduce it’s resource usage and minimize impact on the system load. While still able to speed up the Bzip2 compresson.
tar -c /inputDirectory/ | pbzip2 -c -9 -p2 -m50 > outputFile.tar.bz2
Using the “-p2” option limits the process to using 2 cores. Changing that option to be “-p3” would limit it to 3 cores, and “-p4” would limit it to 4, etc… The “-m100” option limits the amount of RAM that the process utilizes. Our example shows it is limited to 50MB of RAM.
There are some other ways to call Pbzip2. You can use it directly like vanilla Bzip2.
pbzip2 -9 compressfile.tar
Or you can use the more common formatting of “tar” just by adding a long form option.
tar cf outputFile.tar.bz2 --use-compress-prog=pbzip2 inputDirectory/
So there you have the best way to speed up bzip2 compression. Hopefully it saves you some time and frustration next time you have a large archive. It might even be able to compress your mysqldump output?