How To Speed Up Gzip Compression

Gzip is the ubiquitous compression tool for linux and other *nix based systems. But even given that it is fairly quick, when you are working with a large archive it can take a while. I am sure you have asked yourself the same question I have. How can I speed up gzip compression time?

There are a couple different ways to speed up Gzip compression. Obviously you can get the smallest archives by using the “-9” compression flag. But this takes the longest amount of time.

 ~/$ gzip -9 file.txt

So switching to the least compression reduces the compression time. But at the cost of not saving as much disk space.

 ~/$ gzip -1 file.txt

Let’s Really Speed Up Gzip Compression

If you have watched your CPU usage while using Gzip you may have noticed that your CPU is pegged. In the age of multi-core systems, you might notice that only one of your computer or servers cores are pegged out. This is because the Gzip process is only single threaded. So it operates by taking the file(s) that are being compressed one bit at a time and compressing it.

This is obviously not the most efficient practice, especially when you have 2 or more idle cores available on your system. But since Gzip is a single threaded application, there is no way to utilize all those idle cores.

The Best Way To Speed Up Gzip is Not To Use Gzip

There is an alternative that will speed up your Gzip compression. Pigz is a threaded implementation of Gzip. It allows you to still use Gzip compression without having to wait so long. This is especially important when working with a very large archive.

Pigz breaks the compression task in to multiple pieces which allows the process to accelerate the compression x the number of available cores. So if you have four available cores, you can expect the compression to complete in about 1/4th the time. Don’t be worried about using all the CPU resources on your system since you can specify the number of cores to use.

Here is a basic Pigz example with the highest compression:

tar -c /inputDirectory/ | pigz -9 > outputFile.tar.gz

In this example we are using “tar” to “-c” create an archive from the contents of “/inputDirectory/”. The output of “tar” is then piped into the Pigz command which compresses it with the highest compression “-9”. That compressed content is then redirected into the file “outputFile.tar.gz”. By default the command will utilize all the available cores on the system.

We can then take the same command and alter it a bit to reduce it’s resource usage and minimize impact on the system load. While still able to speed up the Gzip compresson.

tar -c /inputDirectory/ | pigz -9 -p2 > outputFile.tar.gz

Using the “-p2” option limits the process to using 2 cores. Changing that option to be “-p3” would limit it to 3 cores, and “-p4” would limit it to 4, etc…

Call Pigz just like Gzip

There are some other ways to call Pigz. You can use it directly like vanilla Gzip.

pigz -9 compressfile.tar

By default the above command will replace the original file with the new compressed file “compressfile.tar.gz”. If you want to keep the original uncompressed file and just create a new file along side it add the “-k” or keep option.

pigz -k -9 compressfile.tar

Or you can use the more common formatting of “tar” just by adding a long form option.

tar cf outputFile.tar.gz --use-compress-prog=pigz inputDirectory/

So there you have the best way to speed up Gzip compression. Hopefully it saves you some time and frustration next time you have a large archive. It might even be able to compress your mysqldump output?

Speed Up Bzip2 Compression

Bzip2 is easily the best compression tool when it comes to speed and archive size. But even given that it is fast, Bzip2 can still seem to take forever to complete the shrinking of an archive. I am sure you have asked yourself the same question I have. How can I speed up Bzip2 Compression time? Wether you are performing a backup, or just archiving some files Bzip2 does a good job.

There are a couple different ways to speed up Bzip2 compression. Obviously you can get the smallest archives by using the “-9” compression flag. But this takes the longest amount of time.

 ~/$ bzip2 -9 file.txt

So switching to the least compression reduces the compression time. But at the cost of not saving as much disk space.

 ~/$ bzip2 -1 file.txt

Let’s Really Speed Up Bzip2 Compression

If you have watched your CPU usage while using Bzip2 you have probably noticed that your CPU is pegged. In the age of multi-core systems, you will easily notice that only one of your computer or servers cores are pegged out. This is because the Bzip2 process is only single threaded. So it operates by taking the file(s) that are being compressed one bit at a time and compressing it.

This is obviously not the most efficient practice, especially when you have 2, 4, 6, or more idle cores available on your system. But Bzip2 is a single threaded application, so there is no way to utilize those idle cores.

The Best Way To Use Bzip2 is Not To Use Bzip2

Fortunately there is an alternative that will speed up Bzip2 compression. Pbzip2 is a threaded implementation of Bzip2. It allows you to still use Bzip2 compression without having to wait. This is especially important when working with a very large archive.

Pbzip2 breaks the compression task in to multiple pieces which allows the process to accelerate the compression x the number of available cores. Don’t be worried about using all the CPU resources on your system since you can specify the number of cores to use. Or even load the file completely into RAM before starting compression to speed up the process.

Here is a basic example with the highest compression:

tar -c /inputDirectory/ | pbzip2 -c -9 > outputFile.tar.bz2

In this example we are using “tar” to “-c” create an archive from the contents of “/inputDirectory/”. The output of “tar” is then piped into the Pbzip2 command which compresses it and “-c” outputs to stdout with the highest compression “-9”. That compressed content is then redirected into the file “outputFile.tar.bz2”. By default the command will utilize all the available cores on the system.

We can then take the same command and alter it a bit to reduce it’s resource usage and minimize impact on the system load. While still able to speed up the Bzip2 compresson.

tar -c /inputDirectory/ | pbzip2 -c -9 -p2 -m50 > outputFile.tar.bz2

Using the “-p2” option limits the process to using 2 cores. Changing that option to be “-p3” would limit it to 3 cores, and “-p4” would limit it to 4, etc… The “-m100” option limits the amount of RAM that the process utilizes. Our example shows it is limited to 50MB of RAM.

There are some other ways to call Pbzip2. You can use it directly like vanilla Bzip2.

pbzip2 -9 compressfile.tar

Or you can use the more common formatting of “tar” just by adding a long form option.

tar cf outputFile.tar.bz2 --use-compress-prog=pbzip2 inputDirectory/

So there you have the best way to speed up bzip2 compression. Hopefully it saves you some time and frustration next time you have a large archive. It might even be able to compress your mysqldump output?

How To Compress Mysqldump Output

if you read my previous writeup on dumping all mysql databases you will recognize some of this information. I wanted to pay some specific attention to some of the different methods for how to compress mysqldump output.

Obviously compressing your mysql databased exports can have some major benefits. The biggest benefit is the smallness of the file size. Mysql databases and really all databases have the tendency to grow to large sizes. Even small websites can quickly find hundreds of megabytes worth of data in their database. Storing large database export files in your backup can eat up disk space pretty rapidly. Compressing your mysql output can reduce the size of your export file by seven or more times.

If you need to keep individual database backups then compression really makes sense. But if you are using something like rdiff-backup then it makes more sense to skip the compression. Rdiff-backup is unable to do a diff on the compressed data, so it won’t save the space you expect.

Basic Mysqldump Compression Commands

Here are a couple different variations of mysqldump piped compression commands which we will breakdown.

1: mysqldump -u dbUser -p DBName > OutputFile.sql
2: mysqldump -u dbUser -p DBName | gzip > OutputFile.sql.gz
3: mysqldump -u dbUser -p DBName | gzip -9 > OutputFile.sql.gz
4: mysqldump -u dbUser -p DBName | zip > OutputFile.sql.zip
5: mysqldump -u dbUser -p DBName | bzip2 > OutputFile.sql.bz2

In these examples we see the same database being exported in each command. But there are a couple differences, in #1 we are employing no compression. Command #2 is using gzip with its default settings. Then command #3 is utilizing gzip with maximum compression. Command #4 is using zip to perform its compression. And finally command #4 is using bzip2 to perform its compression.

Compression Commands Comparison

Testing the commands above on the same database and on the same hardware yielded the following results.

CommandFilesizeOutput Time
#1391MB13.827s
#257MB16.122s
#355MB32.357s
#457MB16.169s
#544MB1m 18.701s
Output Mysql Database command results

The table above shows the effectiveness of each compression method on the same dataset. The first command sets the baseline for data export with no compression. Gzip applies basic compression and gives a significant size reduction with a very small speed hit. It comes in just a hair faster than zip with about the same compression results.

Adding the -9 to the Gzip command in #3 doubles the output time, and only provides 2MB of space savings. But then Bzip2 weighs in on command #5 taking an extra minute over Gzip or Zip. That extra minute was required to pack the file small enough to rescue another 13MB of space.

Compress Mysqldump Output Conclusions

If you can compress your database output, then you will see significant space savings in your backup storage. Even if backup speed is essential, gzip or zip offer a major reduction in size for minimal extra time. And if time is not a major issue then going with bzip2 will give you much larger space savings in exchange.

Understanding and utilizing compression as part of your backup methodology is an essential element for storage success. Proper implementation can ensure that you save the needed space and reduce backup transfer time. Especially in the event that you need to transfer your backup over a slow network connection. Compression will come to your aid and save the day. So don’t hesitate to compress mysqldump output, it might be just what the doctor ordered.

Further Reading

For additional details and info check out this post which talks more about Compressing Mysqldump Output

Dump All MySQL Databases into Individual SQL Files

Anyone who is responsible for managing a MySQL database will eventually run into this problem. You either need to dump all your MySQL databases for a backup, or to prepare for an upgrade. Whatever your circumstances are there are several different methods that can be employed to dump your MySQL DBs. Hopefully I can give you the basic tools to get you on your way.

Dump All DBs into a Single SQL File

This first method is what I would call the quick and dirty method. It is straight forward and just dumps all of the databases on a server into a single SQL file. For many individuals this is sufficient, but may not be a good option for larger databases or backup/restore processes. Since all of the DBs are bundled into a single file. Each of the values in brackets “[]” are placeholders for your own values.

mysqldump -u [username] -p --all-databases > allDB.sql

If all you need is to get a quick backup of everything this may be your ticket. The “–all-databases” flag does the magic here, dumping all of the MySQL databases into a single SQL file. More details on this method can be found here. But if you are wanting to be able to easily restore an individual database you may want to use an approach like this.

Dump All MySQL Databases into Individual Files

for I in $(mysql -u [username] -p[mypassword] -h [Hostname/IP] -e 'show databases' -s --skip-column-names); do mysqldump -u [username] -p[mypassword] -h [Hostname/IP] $I > "/home/user/$I.sql"; done

This command first calls “mysql” and gets a list of databases on the server. The command feeds that list of MySQL databases into “mysqldump” to get an individual SQL file for each database on the server. Finally those SQL files are then saved in the location indicated, “/home/user/dbname.sql” in this example.

A few things to note in this command are that first you will notice that I have included the password in the command. The command will complain that using the password on the command line is not safe. However it is required for the command to work. The “-p” is immediately followed by the password, without a space. This is how the option functions, if you add a space it will not work.

The “$I” is a variable and it will have all the database names in it, as it iterates through your listing of DBs. So as you modify the command to fit your specific setup, just make sure to keep that for consistency.

Additional Mysqldump Options

There are a couple of additional mysqldump options that you may want to add depending on your requirements.

--single-transaction

This option keeps mysqldump from trying to get a complete lock on the database. It tells mysqldump to just grab a single transaction and dump the DB contents at the time of that transaction. This can be especially helpful when you are dumping a DB that is especially large, or on a very busy server. Without this option mysqldump may timeout while waiting to get a lock on the DB. For more details on how this option works check out this post.

--default-character-set=utf8mb4 

If you are working with data that may contain emoticons you will want this flag. This ensures that your dumped sql file has the correct information to recreate the emoticons when the file gets reimported. Without this option, your emoticons will show up as strange/random text characters when you restore your backup.

Adding Compression

I don’t typically use compression on my SQL dumps since I like to use rdiff-backup as my backup mechanism. But for those who would like to compress their mysqldump in one step here is the basic gist of it.

mysqldump -u [username] -p --all-databases | gzip > allDB.sql.gz

You just pipe the output from the mysqldump command into gzip, or bzip2 to compress the contents. That can be easily added to the command above to dump all MySQL databases into individual files like so.

for I in $(mysql -u [username] -p[mypassword] -h [Hostname/IP] -e 'show databases' -s --skip-column-names); do mysqldump -u [username] -p[mypassword] -h [Hostname/IP] $I | gzip > "/home/user/$I.sql.gz"; done

Hopefully these examples help you get the backups you need. Have some fun along the way.