Change Your Hostname in CentOS 8

Changing your computer or servers hostname is an infrequent activity for most. But if you are like me periodically I will hastily provision a VM. And only realize after the provisioning is complete that I should have used a more descriptive hostname. Or to have chosen a hostname that fits in the theme of the other servers (Middle Earth, Stormlight Archive, Planets, etc…). But sometimes that process can be tedious and end up with you questioning if you got it right. Fortunately it is easy to change your hostname in CentOS 8.

The ever useful “hostnamectl” command makes this a simple process. If you execute the command with no options it will give you the current hostname as well as many details about the system.

[bdoga@host ~]$ hostnamectl
   Static hostname: host.bdoga.local
         Icon name: computer-vm
           Chassis: vm
        Machine ID: b1ce9c049f6d4a9589ad540ae9aa1c43
           Boot ID: 1906ec0120c246aa84bd407e46a237b6
    Virtualization: kvm
  Operating System: CentOS Linux 8 (Core)
       CPE OS Name: cpe:/o:centos:centos:8
            Kernel: Linux 4.18.0-147.8.1.el8.lve.1.x86_64
      Architecture: x86-64

Change Your Hostname in CentOS 8

As shown in the example above, this servers hostname is “host.bdoga.local”. But I am ready for a change, and want to start naming my servers with Stormlight Archive Names. One of my favorite characters is Kaladin, and I want to have this server on my full domain “bdoga.com”. So to change the domain name to “Kaladin.bdoga.com” I would issue the following command.

[bdoga@host ~]$ sudo hostnamectl set-hostname kaladin.bdoga.com

After issuing the command you will not see any sort of confirmation. You should just be greeted with an empty command prompt, but with your new hostname.

[bdoga@host ~]$ sudo hostnamectl set-hostname kaladin.bdoga.com
[bdoga@kaladin ~]$

And there you have it, you have changed your hostname in CentOS 8. This method should also work for Ubuntu 16.04+, Debian 8.0+, CentOS 7+, and other Systemd based systems.

To learn some more details about this and other tools for changing your hostname on Centos 8 please visit linuxize’s post.

And feel free to check out some more of our content regarding CentOS based systems. Or visit some of our posts that will help you increase your Command Line prowess.

Cron Time String Modifiers

Cron is one of the most useful elements of any *nix based system. Giving you an easy interface to run any command on a periodic basis with a down to the minute granularity. As a systems administrator or systems user you will find yourself using cron to schedule tasks on a regular basis. But to get the best granularity you may need to use the full list of available time string modifiers. This will ensure your process only runs when you absolutely need it to.

The Cron Time String Format

The cron time string has a simple format. Minute / Hour / Day of the Month / Month / Day of the week. For a full run down on proper Cron Time String Formatting please visit this post.

Cron Time String Modifier List

Here is a full list of the available modifiers for your Cron Time String

ModifierPurpose
*Matches All Values
Specify a range of values
,Specify a list of values
/Skip a given number of values
Cron time string modifiers

Cron Scheduling Examples With Modifiers

You can use modifiers to match some pretty specific time intervals for scheduling your process. If you wanted to run a process at noon on the first day of every 3rd month you would write your cron time string like this.

0 12 1 */3 *

Or another example for a process that you want to run every 15 minutes from 2-5AM every Monday, Wednesday and Friday you would format your cron string like this.

*/15 2-5 * * 1,3,5

So now you know how to format your cron time string so that you can easily set your process to happen whenever you need it to run.

For more details on this topic visit this post on Formatting your Cron String or for help setting up your cron time string you can use the Crontab Guru’s interactive interface to create your time string.

And if you like this post check out one of our other posts on how to Speed Up Gzip Compression

Cron Time String Format

Cron is one of the most useful elements of any *nix based system. Giving you an easy interface to run any command on a periodic basis with a down to the minute granularity. As a systems administrator or systems user you will find yourself using cron to schedule tasks on a regular basis. But with as useful as cron is and as frequently as it gets used, I regularly need a reference for the cron time string format. Hopefully this simple reference will help you and me remember the format for your future scheduled processes.

The Cron Time String Format

The cron time string has a simple format. Minute / Hour / Day of the Month / Month / Day of the week

PositionDescriptionUsable Values
1Minute0 – 59, or * (Every Minute)
2Hour0 – 23, or * (Every Hour)
3Day of the Month1 – 31, or * (Every Day of the Month)
4Month1 – 12, , jan – dec, JAN – DEC or * (Every Month)
5Day of the Week0 – 7, sun – sat, SUN – SAT, (0 and 7 both equal Sunday), or * (Every Day of the Week)
Cron Time String Values

Cron Scheduling Examples

So given the values above a service that you would like to have run at 1:35 every day would be formatted like this.

35 1 * * *

But maybe we just want that process to run once a week we can modify the string and just add a value for the day of the week you want it to run. So for a process that you want to run every Thursday at 11:50PM you would format it like so.

50 23 * * 4

But you can then use modifiers to match some pretty specific time intervals for scheduling your process. The accepted modifiers are ‘/’ (skip a given number of values */3 for every 3rd, */10 for every 10th), ‘,’ (for a list of acceptable values), and ‘-‘ (for a range of values). So if you wanted to run a process at noon on the first day of every 3rd month you would write your cron time string like this.

0 12 1 */3 *

Or another example for a process that you want to run every 15 minutes from 2-5AM every Monday, Wednesday and Friday you would format your cron string like this.

*/15 2-5 * * 1,3,5

So now you know how to format your cron time string so that you can easily set your process to happen whenever you need it to run.

For more details on this topic visit this post on Formatting your Cron String or for help setting up your cron time string you can use the Crontab Guru’s interactive interface to create your time string.

And if you like this post check out one of our other posts on how to fix an APT NO_PUBKEY Error.

Fix Apt NO_PUBKEY Error

If you have used Debian, Ubuntu, Mint or any other linux distribution that uses APT based package management system. You are sure to have run into the NO_PUBKEY error. It can be marginally frustrating but fortunately it can be easy to fix the apt NO_PUBKEY error and get your system back up and ready to roll.

What is the NO_PUBKEY error?

The APT NO_PUBKEY error shows up when the public/private key pair has changed for one of your APT repositories. When this happens, if your local system or server does not have the correct public key, then it cannot verify the repository. And therefore you get the error. This process is in place to ensure you don’t accidentally download packages from an unknown APT source.

Fix the NO_PUBKEY error

There is a simple command that you can run to download the missing public key from one of the APT key servers. You will just need to replace the portion of the command that says “THE_MISSING_KEY_HERE” with the key that is reported in the error.

sudo apt-key adv --keyserver hkp://pool.sks-keyservers.net:80 --recv-keys THE_MISSING_KEY_HERE

So if you receive the following error

W: Failed to fetch http://ppa.launchpad.net/myrepository/apps/ubuntu/dists/bionic/InRelease The following signatures couldn't be verified because the public key is not available: NO_PUBKEY EA8CACC073C3DB2A

you would run the following command to get the working public key for the apt repository.

sudo apt-key adv --keyserver hkp://pool.sks-keyservers.net:80 --recv-keys EA8CACC073C3DB2A

After the key has been updated you can then run your “apt update” and it should complete successfully.

Fix Multiple Keys with One Command

The following command can be used to fix multiple NO_PUBKEY errors with one command. Or can be used to fix a single NO_PUBKEY error without having to edit the command. It might be overkill but will still get the job done.

sudo apt update 2>&1 1>/dev/null | sed -ne 's/.*NO_PUBKEY //p' | while read key; do if ! [[ ${keys[*]} =~ "$key" ]]; then sudo apt-key adv --keyserver hkp://pool.sks-keyservers.net:80 --recv-keys "$key"; keys+=("$key"); fi; done

So now you know how to perform a Fix APT NO_PUBKEY error. This will keep you up and running, and ensure that you don’t fall behind on your package updates.

For additional details check out Linux Uprisings article about fixing NO_PUBKEY errors.

If you like this post, you might also like my post about how to Recursively Count the number of folders in a directory.

Recursively Count the Number of Files in a Directory

Why would you want to recursively count the number of files or folders in a directory? There could be a lot of different reasons. For myself, I had a client that repeatedly added new directories to a folder. Some of those directories had unique contents in them, and some were copies of other folders. The folders contained text documents, zip files, images, database files, you name it it was in there. Running a recursive ‘du’ command on the root folder showed a size of approximately 50GB. And it was obvious that there were thousands of folders and subfolders to check.

One might think of trying to use ‘ls’ (list) to get count the number of files in a directory. But running an ‘ls’ command alone will only show you the files in the directory. It won’t count the files for you. You can pair it with the ‘wc’ (word count) command and get a count of the number of lines returned. Using a command like this will give you the number of files in your current working directory:

ls -1 | wc -l

But that will only give us the number of files and folders in the current directory. So it will not give you an accurate picture of the number of files or folders in subfolders of your current working directory.

How To Recursively Count the Number of Files in a Directory

So since the “ls” command won’t give us a recursive listing of files or folders we will have to turn to the “find” utility to fulfill that requirement. Find searches recursively through a directory tree to find specific filenames or attributes you want to search for. We can use its versatility to fulfill the searching requirement of our command. For example the following command will search recursively through your current directory tree to hunt for all files and return a list of those files.

find . -type f

And likewise you can do the same to specify searching for only directories.

find . -type d

Or removing the “-type” option will return all files and folders in this folder and its children.

find .

So now that we have the list of all folders or files in this directory and its subdirectories we can count them up by adding our old friend “wc” again. Thus with a command like this we can get the full list of all the files in your current working directory and its children:

find . -type f | wc -l

or for directories only:

find . -type d | wc -l

Now you can quickly count the files and folders in a given directory to easily assess how many files you are dealing with.

A special thanks to these sites that I referenced when searching this topic myself. And may have some more details for you. You can visit those sites Here and Here.

How To Speed Up Gzip Compression

Gzip is the ubiquitous compression tool for linux and other *nix based systems. But even given that it is fairly quick, when you are working with a large archive it can take a while. I am sure you have asked yourself the same question I have. How can I speed up gzip compression time?

There are a couple different ways to speed up Gzip compression. Obviously you can get the smallest archives by using the “-9” compression flag. But this takes the longest amount of time.

 ~/$ gzip -9 file.txt

So switching to the least compression reduces the compression time. But at the cost of not saving as much disk space.

 ~/$ gzip -1 file.txt

Let’s Really Speed Up Gzip Compression

If you have watched your CPU usage while using Gzip you may have noticed that your CPU is pegged. In the age of multi-core systems, you might notice that only one of your computer or servers cores are pegged out. This is because the Gzip process is only single threaded. So it operates by taking the file(s) that are being compressed one bit at a time and compressing it.

This is obviously not the most efficient practice, especially when you have 2 or more idle cores available on your system. But since Gzip is a single threaded application, there is no way to utilize all those idle cores.

The Best Way To Speed Up Gzip is Not To Use Gzip

There is an alternative that will speed up your Gzip compression. Pigz is a threaded implementation of Gzip. It allows you to still use Gzip compression without having to wait so long. This is especially important when working with a very large archive.

Pigz breaks the compression task in to multiple pieces which allows the process to accelerate the compression x the number of available cores. So if you have four available cores, you can expect the compression to complete in about 1/4th the time. Don’t be worried about using all the CPU resources on your system since you can specify the number of cores to use.

Here is a basic Pigz example with the highest compression:

tar -c /inputDirectory/ | pigz -9 > outputFile.tar.gz

In this example we are using “tar” to “-c” create an archive from the contents of “/inputDirectory/”. The output of “tar” is then piped into the Pigz command which compresses it with the highest compression “-9”. That compressed content is then redirected into the file “outputFile.tar.gz”. By default the command will utilize all the available cores on the system.

We can then take the same command and alter it a bit to reduce it’s resource usage and minimize impact on the system load. While still able to speed up the Gzip compresson.

tar -c /inputDirectory/ | pigz -9 -p2 > outputFile.tar.gz

Using the “-p2” option limits the process to using 2 cores. Changing that option to be “-p3” would limit it to 3 cores, and “-p4” would limit it to 4, etc…

Call Pigz just like Gzip

There are some other ways to call Pigz. You can use it directly like vanilla Gzip.

pigz -9 compressfile.tar

By default the above command will replace the original file with the new compressed file “compressfile.tar.gz”. If you want to keep the original uncompressed file and just create a new file along side it add the “-k” or keep option.

pigz -k -9 compressfile.tar

Or you can use the more common formatting of “tar” just by adding a long form option.

tar cf outputFile.tar.gz --use-compress-prog=pigz inputDirectory/

So there you have the best way to speed up Gzip compression. Hopefully it saves you some time and frustration next time you have a large archive. It might even be able to compress your mysqldump output?

Change the SNMP Log Level in Ubuntu

The default SNMP settings for a Ubuntu server can end up filling your syslog file with tons of unnecessary entries. This makes it virtually impossible to sift through for anything which is actually useful. So it can be very advantageous to change the SNMP log level in Ubuntu.

I have a cacti setup which I use to log and report on the details of many linux and windows servers. This tool is amazing, and really gives me some great information to diagnose issues. Or catch issues as they are progressing, but before they become urgent. Sometimes it is just easier to see something when your data is represented visually.

Cacti relies upon SNMP as the technology to grab data from the machines or devices that it is monitoring. SNMP is an industry standard, supported by all major operating systems and network enabled devices. But by default, at least in Ubuntu, the log level is set so high that every SNMP request that comes to the server is reported in your syslog file. Cacti polls lots of different SNMP records to build its graphs. Under those default settings it can leave dozens of entries in the syslog every 5 minutes. As you could imagine this can quickly fill up your log file and make it virtually unusable. Fortunately we just need to make a quick adjustment in order to change the SNMP log level in Ubuntu. Here is a quick example of some of the Syslog entries that I you may be receiving.

Jul 8 06:28:48 server snmpd[7885]: error on subcontainer 'ia_addr' insert (-1)
Jul 8 06:29:18 server snmpd[7885]: error on subcontainer 'ia_addr' insert (-1)
Jul 8 06:29:48 server snmpd[7885]: error on subcontainer 'ia_addr' insert (-1)
Jul 8 06:30:02 server snmpd[7885]: Connection from UDP: [Originating IP]:41028->[Current Host IP]:161
Jul 8 06:30:02 server snmpd[7885]: Connection from UDP: [Originating IP]:48694->[Current Host IP]:161
Jul 8 06:30:02 server snmpd[7885]: Connection from UDP: [Originating IP]:39372->[Current Host IP]:161
Jul 8 06:30:02 server snmpd[7885]: Connection from UDP: [Originating IP]:54823->[Current Host IP]:161

Change the SNMP Log Level in Ubuntu

The change is just a quick flag in the /etc/default/snmpd file which changes how the system logs SNMP requests. The different log levels that are available are:

0 or ! for LOG_EMERG
1 or a for LOG_ALERT
2 or c for LOG_CRIT
3 or e for LOG_ERR
4 or w for LOG_WARNING
5 or n for LOG_NOTICE
6 or i for LOG_INFO
7 or d for LOG_DEBUG

By default a log level is not set so it is either dumping at the info or debug level. I prefer to switch it to level 3 (Error) which ensures that I still see any errors that come through. But doesn’t tell me every time a connection is made. This change can be made very easily. Basically you can just open up the /etc/default/snmpd file in your favorite editor and change the following line (Ubuntu 14.04 and 16.04).

SNMPDOPTS='-Lsd -Lf /dev/null -u snmp -g snmp -I -smux,mteTrigger,mteTriggerConf -p /run/snmpd.pid'

To look like this:

SNMPDOPTS='-LS3d -Lf /dev/null -u snmp -g snmp -I -smux,mteTrigger,mteTriggerConf -p /run/snmpd.pid'

The only part that changed was the “-Lsd” flags that changed to be “-LS3d”. The default entry is a little different between 14.04/16.04, 18.04 and 20.04. But I have included a few single commands you can copy/paste into your terminal to make the change.

Copy/Paste Command Line Changes

For Ubuntu 14.04 and 16.04:

sed -i -- "s@SNMPDOPTS='-Lsd -Lf /dev/null -u snmp -g snmp -I -smux,mteTrigger,mteTriggerConf -p /run/snmpd.pid'@SNMPDOPTS='-LS3d -Lf /dev/null -u snmp -g snmp -I -smux,mteTrigger,mteTriggerConf -p /run/snmpd.pid'@g" /etc/default/snmpd
service snmpd restart

In Ubuntu 18.04:

sed -i -- "s@SNMPDOPTS='-Lsd -Lf /dev/null -u Debian-snmp -g Debian-snmp -I -smux,mteTrigger,mteTriggerConf -p /run/snmpd.pid'@SNMPDOPTS='-LS3d -Lf /dev/null -u Debian-snmp -g Debian-snmp -I -smux,mteTrigger,mteTriggerConf -p /run/snmpd.pid'@g" /etc/default/snmpd
service snmpd restart

Finally Ubuntu 20.04:

sed -i -- "s@#SNMPDOPTS='-LSwd -Lf /dev/null -u Debian-snmp -g Debian-snmp -I -smux,mteTrigger,mteTriggerConf -p /run/snmpd.pid'@SNMPDOPTS='-LS3d -Lf /dev/null -u Debian-snmp -g Debian-snmp -I -smux,mteTrigger,mteTriggerConf -p /run/snmpd.pid'@g" /etc/default/snmpd
service snmpd restart

So there you go, now you can stop those annoying error log messages from filling up your syslog file. A big thanks to this ServerFault post on the subject for helping me figure it out.

Make a Full Disk Backup with DD

Recently I had a drive that was showing the early warning signs of failure. So I decided I had better make a backup copy of the drive. And then subsequently push that image onto another drive to avoid failure. Consequently I found that the drive was fine. It was the SATA cable that was failing. But the process helped remind me of what a useful tool dd is. Subsequently it refreshed my knowledge of how to use this remarkable tool. And finally helped remind me how to make a full disk backup with dd.

What is DD?

DD stands for “Data Definition”, it has been around since about 1974. It can be used to read write and convert data between filesystems, folders and other block level devices. As a result dd can be used effectively for copying the content of a partition, obtaining a fixed amount of random data from /dev/random, or performing a byte order transformation on data.

So Lets Make a Full Disk Backup with DD

I will start with the command I used to make a full disk backup with dd. And then give you a breakdown of the different command elements to help you understand what it is doing.

dd if=/dev/sdc conv=sync,noerror status=progress bs=64K | gzip -c > backup_image.img.gz

The command options break down like this:

if=/dev/sdc this defines the “input file” which in this case is the full drive “/dev/sdc”. You could do the same with a single partition like “/dev/sdc1”, but I want all the partitions on the drive stored in the same image.

conv=sync,noerror the “sync” part tells dd to pad each block with nulls, so that if there is an error and the full block cannot be read the original data will be preserved. The “noerror” portion prevents dd from stopping when an error is encountered. The “sync” and “noerror” options are almost always used together.

status=progress tells the command to regularly give an update on how much data has been copied. Without this option the command will still run but it won’t give any output until the command is complete. So making a backup of a very large drive could sit for hours before letting you know it is done. With this option a line like this is constantly updated to let you know how far along the process has gone.

1993998336 bytes (2.0 GB, 1.9 GiB) copied, 59.5038 s, 33.5 MB/s

bs=64K specifies that the “Block Size” of each chunk of data processed will be 64 Kilobytes. The block size can greatly affect the speed of the copy process. A larger block size will typically accelerate the copy process unless the block size is so large that it overwhelms the amount of RAM on your computer.

Making a compressed backup image file

At this point you could use the “of=/dev/sdb” option to output the contents directly to another drive /dev/sdb. But I opted to make an image file of the drive, and piping the dd output through gzip allowed me to compress the resulting image into a much smaller image file.

| gzip -c pipes the output of dd into the gzip command and writes the compressed data to stdout. Other options could be added here to change the compression ratio, but the default compression was sufficient for my needs.

> backup_image.img.gz redirects the output of the gzip command into the backup_image.img.gz file.

With that command complete I had copied my 115GB drive into a 585MB compressed image. Most of the drive had been empty space, but without the compression the image would have been 115GB. So this approach can make a lot of sense if you are planning on keeping the image around. If you are just copying from one drive to another then no compression is needed.

So there you have it, the process of making a full disk backup with dd. But I guess that is only half the story, so now I will share the command I used to restore that image file to another drive with dd.

Restoring a Full Drive Backup with DD

Fortunately the dd restore process is a bit more straightforward than the backup process. So without further adieu here is the command.

gunzip -c backup_image.img.gz | dd of=/dev/sdc status=progress

gunzip -c backup_image.img.gz right off the bat “gunzip” starts decompressing the file “backup_image.img.gz” and the “-c” sends the decompressed output to stdout.

| dd of=/dev/sdc pipes the output from gunzip into the dd command which is only specifying the “output file” of “/dev/sdc”.

status=progress again this option displays some useful stats about how the dd process is proceeding.

Once the has completed the transfer you should be good to go. But a couple caveats to remember. First the drive you restore to should be the same size or larger than the backup drive. Second, if the restore drive is larger, you will end up with empty space after the restore is complete. ie: 115GB image restored to a 200GB drive will result in the first 115GB of the drive being usable, and 85GB of free space at the end of the drive. So you may want to expand the restored partition(s) to fill up the extra space on the new drive with parted, or a similar tool. Lastly, if you use a smaller drive for the restore dd will not warn you that it won’t fit, it will just start copying and will fail when it runs out of space.

Conclusion

DD is an amazing tool that has been around for a while. And it continues to be relevant and useful each day. It can get you out of a bind and save your data, so give it a whirl and see what it can help you with today.

Here are a couple resources that I referenced to help me build my dd command. A guide on making a full metal backup with dd. And a general DD usage guide.

Speed Up Bzip2 Compression

Bzip2 is easily the best compression tool when it comes to speed and archive size. But even given that it is fast, Bzip2 can still seem to take forever to complete the shrinking of an archive. I am sure you have asked yourself the same question I have. How can I speed up Bzip2 Compression time? Wether you are performing a backup, or just archiving some files Bzip2 does a good job.

There are a couple different ways to speed up Bzip2 compression. Obviously you can get the smallest archives by using the “-9” compression flag. But this takes the longest amount of time.

 ~/$ bzip2 -9 file.txt

So switching to the least compression reduces the compression time. But at the cost of not saving as much disk space.

 ~/$ bzip2 -1 file.txt

Let’s Really Speed Up Bzip2 Compression

If you have watched your CPU usage while using Bzip2 you have probably noticed that your CPU is pegged. In the age of multi-core systems, you will easily notice that only one of your computer or servers cores are pegged out. This is because the Bzip2 process is only single threaded. So it operates by taking the file(s) that are being compressed one bit at a time and compressing it.

This is obviously not the most efficient practice, especially when you have 2, 4, 6, or more idle cores available on your system. But Bzip2 is a single threaded application, so there is no way to utilize those idle cores.

The Best Way To Use Bzip2 is Not To Use Bzip2

Fortunately there is an alternative that will speed up Bzip2 compression. Pbzip2 is a threaded implementation of Bzip2. It allows you to still use Bzip2 compression without having to wait. This is especially important when working with a very large archive.

Pbzip2 breaks the compression task in to multiple pieces which allows the process to accelerate the compression x the number of available cores. Don’t be worried about using all the CPU resources on your system since you can specify the number of cores to use. Or even load the file completely into RAM before starting compression to speed up the process.

Here is a basic example with the highest compression:

tar -c /inputDirectory/ | pbzip2 -c -9 > outputFile.tar.bz2

In this example we are using “tar” to “-c” create an archive from the contents of “/inputDirectory/”. The output of “tar” is then piped into the Pbzip2 command which compresses it and “-c” outputs to stdout with the highest compression “-9”. That compressed content is then redirected into the file “outputFile.tar.bz2”. By default the command will utilize all the available cores on the system.

We can then take the same command and alter it a bit to reduce it’s resource usage and minimize impact on the system load. While still able to speed up the Bzip2 compresson.

tar -c /inputDirectory/ | pbzip2 -c -9 -p2 -m50 > outputFile.tar.bz2

Using the “-p2” option limits the process to using 2 cores. Changing that option to be “-p3” would limit it to 3 cores, and “-p4” would limit it to 4, etc… The “-m100” option limits the amount of RAM that the process utilizes. Our example shows it is limited to 50MB of RAM.

There are some other ways to call Pbzip2. You can use it directly like vanilla Bzip2.

pbzip2 -9 compressfile.tar

Or you can use the more common formatting of “tar” just by adding a long form option.

tar cf outputFile.tar.bz2 --use-compress-prog=pbzip2 inputDirectory/

So there you have the best way to speed up bzip2 compression. Hopefully it saves you some time and frustration next time you have a large archive. It might even be able to compress your mysqldump output?

How To Compress Mysqldump Output

if you read my previous writeup on dumping all mysql databases you will recognize some of this information. I wanted to pay some specific attention to some of the different methods for how to compress mysqldump output.

Obviously compressing your mysql databased exports can have some major benefits. The biggest benefit is the smallness of the file size. Mysql databases and really all databases have the tendency to grow to large sizes. Even small websites can quickly find hundreds of megabytes worth of data in their database. Storing large database export files in your backup can eat up disk space pretty rapidly. Compressing your mysql output can reduce the size of your export file by seven or more times.

If you need to keep individual database backups then compression really makes sense. But if you are using something like rdiff-backup then it makes more sense to skip the compression. Rdiff-backup is unable to do a diff on the compressed data, so it won’t save the space you expect.

Basic Mysqldump Compression Commands

Here are a couple different variations of mysqldump piped compression commands which we will breakdown.

1: mysqldump -u dbUser -p DBName > OutputFile.sql
2: mysqldump -u dbUser -p DBName | gzip > OutputFile.sql.gz
3: mysqldump -u dbUser -p DBName | gzip -9 > OutputFile.sql.gz
4: mysqldump -u dbUser -p DBName | zip > OutputFile.sql.zip
5: mysqldump -u dbUser -p DBName | bzip2 > OutputFile.sql.bz2

In these examples we see the same database being exported in each command. But there are a couple differences, in #1 we are employing no compression. Command #2 is using gzip with its default settings. Then command #3 is utilizing gzip with maximum compression. And finally command #4 is using zip to perform its compression.

Compression Commands Comparison

Testing the commands above on the same database and on the same hardware yielded the following results.

CommandFilesizeOutput Time
#1391MB13.827s
#257MB16.122s
#355MB32.357s
#457MB16.169s
#544MB1m 18.701s
Output Mysql Database command results

The table above shows the effectiveness of each compression method on the same dataset. The first command sets the baseline for data export with no compression. Gzip applies basic compression and gives a significant size reduction with a very small speed hit. It comes in just a hair faster than zip with about the same compression results.

Adding the -9 to the Gzip command in #3 doubles the output time, and only provides 2MB of space savings. But then Bzip2 weighs in on command #5 taking an extra minute over Gzip or Zip. That extra minute was required to pack the file small enough to rescue another 13MB of space.

Compress Mysqldump Output Conclusions

If you can compress your database output, then you will see significant space savings in your backup storage. Even if backup speed is essential, gzip or zip offer a major reduction in size for minimal extra time. And if time is not a major issue then going with bzip2 will give you much larger space savings in exchange.

Understanding and utilizing compression as part of your backup methodology is an essential element for storage success. Proper implementation can ensure that you save the needed space and reduce backup transfer time. Especially in the event that you need to transfer your backup over a slow network connection. Compression will come to your aid and save the day. So don’t hesitate to compress mysqldump output, it might be just what the doctor ordered.

Further Reading

For additional details and info check out this post which talks more about Compressing Mysqldump Output