How To Run Program Before Login Prompt Ubuntu

I recently installed a new server in my home office. I typically just leave my servers to run headless. But with an old monitor laying around and plenty of idle CPU time I decided to play a bit. I mounted the monitor to my office rack and then started to work.

Rather than just display the normal text login prompt, I wanted it to show something cool at boot. I started to dig around on the web and found this article. It quickly described how to run a program before the login prompt on Ubuntu 16.04+.

Run Your Program before login

So I wrote a simple script /root/loginMatrix.sh which would simply run cmatrix on the main (tty1) console. Once I exited cmatrix it would display the normal login prompt. The sample script is as follows:

#!/bin/sh
/usr/bin/cmatrix -abs
exec /bin/login

I then edited the config file for getty@tty1 here (for Ubuntu 16.04+ only, not sure on other distrubutions):

/etc/systemd/system/getty@tty1.service.d/override.conf

I changed the contents to be:

[Service]
ExecStart=
ExecStart=-/root/loginMatrix.sh
StandardInput=tty
StandardOutput=tty

and then I ran the following command to activate it:

systemctl daemon-reload; systemctl restart getty@tty1.service

After the change the system started to show the cmatrix terminal animation immediately. But once I quit the application it was back to the login prompt.

how to run a program before the login prompt - Cmatrix running on the console prior to the login prompt
CMATRIX running on the console

Getting tricky

After running cmatrix for a few days straight, I decided that I wanted to change it up a bit. So I made a few adjustments to the /root/loginMatrix.sh script to make it a bit more dynamic. With the following changes I was now able to display something different each time I used the command prompt.

 #!/bin/bash
 declare -a arr=("/usr/bin/cmatrix -abs" "/snap/bin/asciiquarium" "/usr/sbin/iftop" "/usr/bin/htop")
 size=${#arr[@]}
 index=$(($RANDOM % $size))
 eval "${arr[$index]}"
 
 exec /bin/login 

These changes told the script to randomly choose either, cmatrix, asciiquarium, iftop, or htop and execute it. Then as before once I quit the application that was randomly executed it would again display the login prompt. My kids got way to excited when asciiquarium was chosen and had to watch the fish swim by. This solution worked for a while, but eventually I got tired of having to change the displayed program manually. So I started playing with options to automate the program change.

Automating the switch

These changes got a bit trickier. The script had to track the application PID so it could kill it when the timeout was reached. After trying several different methods I finally ran across this basic method for timing out a process. And the process isn’t perfect, but it does rotate through the different options on a ten minute interval. So that works, but the exiting to the login prompt doesn’t. So it’s only most of the way there. Here is my current /root/loginMatrix.sh script:

#!/bin/bash
declare -a arr=("/usr/bin/cmatrix -abs" "/snap/bin/asciiquarium" "/usr/sbin/iftop" "/usr/bin/cacafire")
size=${#arr[@]}
continue=1
timeout=600
interval=1
 
while [ $continue -eq 1 ]
do
index=$(($RANDOM % $size))
eval "${arr[$index]} &"
cmdpid=$!

 ((t = timeout))
 
     while ((t > 0)); do
         sleep 1
         kill -0 $cmdpid || exit 0
         ((t -= interval))
     done

 exit_status=$?
 echo $exit_status > ext.txt 
 if [[ $exit_status -ne 1 ]]; then
     continue=0
 fi
 
 kill -s SIGTERM $cmdpid && kill -0 $cmdpid || exit 0
 sleep 1
 #kill -s SIGKILL $cmdpid
 
 done
 
 exec /bin/login 

So this script accomplishes the switching of applications on the primary console. And I was able to add cacafire to the mix for a nice colored ascii fire animation. But if I have to use the console for the login, I will have to hit ctrl-alt-F2 and switch over to tty2. That won’t be the end of the world, lol. And in the meantime I have some fun console effects to keep my office interesting.

how to run a program before the login prompt - Cacafire running on the console prior to the login prompt
CACAFIRE running on the console

Did you like this article on how to run a program before the login prompt? If so you may like this article on how to change your hostname on Centos

How To Use Rsync Between Computers

If you are new to Rsync, please visit our How To Use Rsync – The Basics post. In it we break down what Rsync is and its basic usage. It will provide you with a good background to understand the details of using Rsync between computers.

Rsync Between Remote Computers

Although Rsync does a great job of synchronizing files between local folders it really shines when working between remote computers. And if you are familiar with using ssh from the command line, you will find it relatively easy to use Rsync.

The basic command is pretty simple, and so long as you have ssh available and rsync installed on the remote machine(s) this format will work.

From a remote source:

rsync [options] [user]@[source computer]:[source folder] [destination folder].

Or to a remote source:

rsync [options] [source folder] [user]@[destination computer]:[destination folder].

Or between two remote computers:

rsync [options] [user]@[source computer]:[source folder] [user]@[destination computer]:[destination folder]

Rsync Between Remote Computers with SSH Examples

rsync user@192.168.1.1:~/source/file /home/user/destination/
rsync /home/bdoga/source/file user@192.168.1.2:~/destination/
rsync user@192.168.1.1:~/source/file user@192.168.1.2:~/destination/

In these examples the “file” will be placed in the destination directory on either the local or remote computers. Also for the remote machines you will notice that a single “:” colon was used. This indicates that rsync should use a remote shell, typically SSH to make the connection. And it will fire up rsync on the remote side of the connection to handle the details. Additionally you can force the connection to use an rsync daemon by specifying a “::” double colon instead.

Using the native rsync protocol alone is a little faster, because it doesn’t have any SSH connection overhead. But it also is not an encrypted connection, so there are trade offs to either option. I typically just use the SSH option since I typically have SSH already available and configured on my servers.

Some more useful options

I already discussed the “-a” archive option, in my Rsync Basics post. But it is my goto option for ensuring an exact copy, permissions and all is made. Now that we are connecting to a remote machine, the “-z” (zip) option gets the chance to shine a bit. When you are transferring data over the internet you may not always have a fast connection. The Zip option will ensure that, potentially, much less bandwidth is required to transfer your data.

Another option that is sometimes useful with remote connections is the “-P” (–Progress –Partial) option. This will display the current progress of the file that is being copied. And it will keep “partial” copies of files if a transfer gets interrupted during the sync. In my opinion the Progress that is displayed is great if you are transferring larger files. But if you are moving lots of little files the output is not very useful. And the overhead to produce the Progress output can cause some noticeable slowdown in a transfer.

One additional par of options are the –include, and –exclude options. They are pretty self explanatory, in that they allow you to include or exclude specific files from your sync. These options can be used to fine tune what you are copying from a directory, and ensure you only get what you want. –include ‘R‘ –exclude ‘

More Remote Computer Rsync Examples

rsync -avzhP user@192.168.1.1:/home/user/source/ /home/user/destination/
rsync -avzhP --include '*.sql' --exclude 'dbname*.sql' user@192.168.1.1:/home/user/source/ /home/user/destination/

In the above example only .sql files would be copied from the source. But no .sql files where the file name started with “dbname” would be copied. Or you could add multiple entries to ensure you got all the files you needed in one go.

rsync -avzhP --include '*.html' --include '*.php' user@192.168.1.1:/home/user/source/ /home/user/destination/

In this next example, all .html and .php files will be copied. But no other files.

Conclusion

Rsync continues to be a super useful utility in your systems administration toolkit. Now that you have a good understanding of its usage you are ready to tackle some of Rsync’s more advanced features. Or learn how other programs like Rdiff-backup build upon it to create an awesome tools. And a big thanks to some other sites which we have referenced over the years. Check them out here, and here.

How To Use Rsync – The Basics

Rsync is one of the most useful tools for a systems administrator. Regardless of what your specific roll or responsibility is. At some point you are going to need to copy the data from one place to another. And Rsync is the tool which will help ensure you quickly and accurately make a copy of your data. So in this post I hope to convey how to use Rsync, but focusing on the basic uses that I find most helpful each day.

What is Rsync

Rsync was initially built as a basic clone of “rcp” (Remote Copy) but with a handful of additional features. That handful of additional features has expanded over the years and made Rsync an indispensable tool. This simple tool can be used to copy files between directories on a local computer. Or you can use it to copy files to and from remote systems. My favorite part of Rsync is its ability to quickly compare the source and target locations. This ensures that only new, updated, or other file changes are transferred. Helping you save time and bandwidth when copying larger numbers of files.

So How do I Use Rsync?

The basic command is pretty simple, rsync [options] [source] [destination], and in this simple form you can easily copy data between local directories. ie:

rsync /home/bdoga/source/file /home/bdoga/destination/

This command will take “file” and place it inside the “/home/bdoga/destination/” directory. If you instead would like to copy all of the contents of one directory into another you simply need to add the “-r” (recursive) option. ie:

rsync -r /home/bdoga/source/ /home/bdoga/destination/

Thus all of the contents of “/home/bdoga/source/” will now be copied into “/home/bdoga/destination”. But it is important to note, that if a file with an identical name exists in the destination, it will be overwritten. In addition the “-r” option does not preserve ownership, permissions, or access/modification timestamps. But that is where the next option comes in “-a” (archive).

It is also important to note that if you want to copy just the contents of the source directory, you must end with a trailing “/”. If you fail to add the trailing “/” Rsync will copy the specified directory as well as the contents into the destination. Rather than just the contents of the directory.

The most useful options

The archive option not only copies the files recursively, but also preserves the file permissions and timestamps. I find this the most useful option because when I want to copy a source directory I typically want to be able to restore it with the permissions intact.

Another option that is sometimes useful, depending on the scenario is the “-z” (zip) option. It instructs Rsync to compress the files being copied to ensure they use less bandwidth. Not always useful when copying files over a Gigabit or faster lan, but can be helpful over a slower internet connection.

The next most useful option I frequently use is “-v” (verbose) which tells Rsync to give you more information about the files being transferred. This can be useful to see exactly what is being transferred. It also lets you know exactly what was and was not copied if there is an issue.

And then there is the “-h” (Human Readable) option which makes sure that all numbers/sizes are printed in an easily readable format. For instance rather than reporting that 856342348 bytes were transferred, it would report 816.67 MB were transferred.

And all of these options can be used together as needed. As in this example which will recursively transfer the files while preserving their permissions and timestamps. Also giving verbose output and zipping the files during transfer.

rsync -avzh /home/bdoga/source/ /home/bdoga/destination/

Sample Command Output

 
 root@bdoga:~/test# ls -lah test1
 total 4.0K
 drwxr-xr-x 3 root  root    76 Dec 21 18:33 .
 drwxr-xr-x 4 root  root    32 Dec 21 17:45 ..
 -rw-r--r-- 1 bdoga bdoga    7 Dec 21 17:47 bob
 -rw-r--r-- 1 bdoga bdoga    0 Dec 21 17:46 doug
 drwxr-xr-x 2 root  root    18 Dec 21 17:46 subdir
 -rw-r--r-- 1 bdoga bdoga  10M Dec 21 18:33 test.img
 -rw-r--r-- 1 bdoga bdoga 100M Dec 21 18:33 test2.img
 
 root@bdoga:~/test# rsync -avh ./test1/ ./test2
 sending incremental file list
 ./
 bob
 doug
 test.img
 test2.img
 subdir/
 subdir/file
 
 sent 115.37M bytes  received 122 bytes  46.15M bytes/sec
 total size is 115.34M  speedup is 1.00

 root@bdoga:~/test# rm -rf test2/*
 root@bdoga:~/test# rsync -avzh ./test1/ ./test2
 sending incremental file list
 ./
 bob
 doug
 test.img
 test2.img
 subdir/
 subdir/file
 
 sent 112.61K bytes  received 122 bytes  25.05K bytes/sec
 total size is 115.34M  speedup is 1,023.21
 
 root@bdoga:~/test# ls -lah test2
 total 111M
 drwxr-xr-x 3 root  root    76 Dec 21 18:33 .
 drwxr-xr-x 4 root  root    32 Dec 21 17:45 ..
 -rw-r--r-- 1 bdoga bdoga    7 Dec 21 17:47 bob
 -rw-r--r-- 1 bdoga bdoga    0 Dec 21 17:46 doug
 drwxr-xr-x 2 root  root    18 Dec 21 17:46 subdir
 -rw-r--r-- 1 bdoga bdoga  10M Dec 21 18:33 test.img
 -rw-r--r-- 1 bdoga bdoga 100M Dec 21 18:33 test2.img 

The above command output shows the contents of the source and destination directories. And also shows the difference between running rsync with and without the “-z” option.

Conclusion

Rsync will become a super useful part of your systems administration toolkit. Now that you have a basic understanding of how to use Rsync you are ready to see how to connect to a remote computer. Or learn how other programs like Rdiff-backup build upon it to create an awesome tools. And a big thanks to some other sites which we have referenced over the years. Check them out here, and here.

Cron Time String Modifiers

Cron is one of the most useful elements of any *nix based system. Giving you an easy interface to run any command on a periodic basis with a down to the minute granularity. As a systems administrator or systems user you will find yourself using cron to schedule tasks on a regular basis. But to get the best granularity you may need to use the full list of available time string modifiers. This will ensure your process only runs when you absolutely need it to.

The Cron Time String Format

The cron time string has a simple format. Minute / Hour / Day of the Month / Month / Day of the week. For a full run down on proper Cron Time String Formatting please visit this post.

Cron Time String Modifier List

Here is a full list of the available modifiers for your Cron Time String

ModifierPurpose
*Matches All Values
Specify a range of values
,Specify a list of values
/Skip a given number of values
Cron time string modifiers

Cron Scheduling Examples With Modifiers

You can use modifiers to match some pretty specific time intervals for scheduling your process. If you wanted to run a process at noon on the first day of every 3rd month you would write your cron time string like this.

0 12 1 */3 *

Or another example for a process that you want to run every 15 minutes from 2-5AM every Monday, Wednesday and Friday you would format your cron string like this.

*/15 2-5 * * 1,3,5

So now you know how to format your cron time string so that you can easily set your process to happen whenever you need it to run.

For more details on this topic visit this post on Formatting your Cron String or for help setting up your cron time string you can use the Crontab Guru’s interactive interface to create your time string.

And if you like this post check out one of our other posts on how to Speed Up Gzip Compression

Cron Time String Format

Cron is one of the most useful elements of any *nix based system. Giving you an easy interface to run any command on a periodic basis with a down to the minute granularity. As a systems administrator or systems user you will find yourself using cron to schedule tasks on a regular basis. But with as useful as cron is and as frequently as it gets used, I regularly need a reference for the cron time string format. Hopefully this simple reference will help you and me remember the format for your future scheduled processes.

The Cron Time String Format

The cron time string has a simple format. Minute / Hour / Day of the Month / Month / Day of the week

PositionDescriptionUsable Values
1Minute0 – 59, or * (Every Minute)
2Hour0 – 23, or * (Every Hour)
3Day of the Month1 – 31, or * (Every Day of the Month)
4Month1 – 12, , jan – dec, JAN – DEC or * (Every Month)
5Day of the Week0 – 7, sun – sat, SUN – SAT, (0 and 7 both equal Sunday), or * (Every Day of the Week)
Cron Time String Values

Cron Scheduling Examples

So given the values above a service that you would like to have run at 1:35 every day would be formatted like this.

35 1 * * *

But maybe we just want that process to run once a week we can modify the string and just add a value for the day of the week you want it to run. So for a process that you want to run every Thursday at 11:50PM you would format it like so.

50 23 * * 4

But you can then use modifiers to match some pretty specific time intervals for scheduling your process. The accepted modifiers are ‘/’ (skip a given number of values */3 for every 3rd, */10 for every 10th), ‘,’ (for a list of acceptable values), and ‘-‘ (for a range of values). So if you wanted to run a process at noon on the first day of every 3rd month you would write your cron time string like this.

0 12 1 */3 *

Or another example for a process that you want to run every 15 minutes from 2-5AM every Monday, Wednesday and Friday you would format your cron string like this.

*/15 2-5 * * 1,3,5

So now you know how to format your cron time string so that you can easily set your process to happen whenever you need it to run.

For more details on this topic visit this post on Formatting your Cron String or for help setting up your cron time string you can use the Crontab Guru’s interactive interface to create your time string.

And if you like this post check out one of our other posts on how to fix an APT NO_PUBKEY Error.

Recursively Count the Number of Files in a Directory

Why would you want to recursively count the number of files or folders in a directory? There could be a lot of different reasons. For myself, I had a client that repeatedly added new directories to a folder. Some of those directories had unique contents in them, and some were copies of other folders. The folders contained text documents, zip files, images, database files, you name it it was in there. Running a recursive ‘du’ command on the root folder showed a size of approximately 50GB. And it was obvious that there were thousands of folders and subfolders to check.

One might think of trying to use ‘ls’ (list) to get count the number of files in a directory. But running an ‘ls’ command alone will only show you the files in the directory. It won’t count the files for you. You can pair it with the ‘wc’ (word count) command and get a count of the number of lines returned. Using a command like this will give you the number of files in your current working directory:

ls -1 | wc -l

But that will only give us the number of files and folders in the current directory. So it will not give you an accurate picture of the number of files or folders in subfolders of your current working directory.

How To Recursively Count the Number of Files in a Directory

So since the “ls” command won’t give us a recursive listing of files or folders we will have to turn to the “find” utility to fulfill that requirement. Find searches recursively through a directory tree to find specific filenames or attributes you want to search for. We can use its versatility to fulfill the searching requirement of our command. For example the following command will search recursively through your current directory tree to hunt for all files and return a list of those files.

find . -type f

And likewise you can do the same to specify searching for only directories.

find . -type d

Or removing the “-type” option will return all files and folders in this folder and its children.

find .

So now that we have the list of all folders or files in this directory and its subdirectories we can count them up by adding our old friend “wc” again. Thus with a command like this we can get the full list of all the files in your current working directory and its children:

find . -type f | wc -l

or for directories only:

find . -type d | wc -l

Now you can quickly count the files and folders in a given directory to easily assess how many files you are dealing with.

A special thanks to these sites that I referenced when searching this topic myself. And may have some more details for you. You can visit those sites Here and Here.

How To Speed Up Gzip Compression

Gzip is the ubiquitous compression tool for linux and other *nix based systems. But even given that it is fairly quick, when you are working with a large archive it can take a while. I am sure you have asked yourself the same question I have. How can I speed up gzip compression time?

There are a couple different ways to speed up Gzip compression. Obviously you can get the smallest archives by using the “-9” compression flag. But this takes the longest amount of time.

 ~/$ gzip -9 file.txt

So switching to the least compression reduces the compression time. But at the cost of not saving as much disk space.

 ~/$ gzip -1 file.txt

Let’s Really Speed Up Gzip Compression

If you have watched your CPU usage while using Gzip you may have noticed that your CPU is pegged. In the age of multi-core systems, you might notice that only one of your computer or servers cores are pegged out. This is because the Gzip process is only single threaded. So it operates by taking the file(s) that are being compressed one bit at a time and compressing it.

This is obviously not the most efficient practice, especially when you have 2 or more idle cores available on your system. But since Gzip is a single threaded application, there is no way to utilize all those idle cores.

The Best Way To Speed Up Gzip is Not To Use Gzip

There is an alternative that will speed up your Gzip compression. Pigz is a threaded implementation of Gzip. It allows you to still use Gzip compression without having to wait so long. This is especially important when working with a very large archive.

Pigz breaks the compression task in to multiple pieces which allows the process to accelerate the compression x the number of available cores. So if you have four available cores, you can expect the compression to complete in about 1/4th the time. Don’t be worried about using all the CPU resources on your system since you can specify the number of cores to use.

Here is a basic Pigz example with the highest compression:

tar -c /inputDirectory/ | pigz -9 > outputFile.tar.gz

In this example we are using “tar” to “-c” create an archive from the contents of “/inputDirectory/”. The output of “tar” is then piped into the Pigz command which compresses it with the highest compression “-9”. That compressed content is then redirected into the file “outputFile.tar.gz”. By default the command will utilize all the available cores on the system.

We can then take the same command and alter it a bit to reduce it’s resource usage and minimize impact on the system load. While still able to speed up the Gzip compresson.

tar -c /inputDirectory/ | pigz -9 -p2 > outputFile.tar.gz

Using the “-p2” option limits the process to using 2 cores. Changing that option to be “-p3” would limit it to 3 cores, and “-p4” would limit it to 4, etc…

Call Pigz just like Gzip

There are some other ways to call Pigz. You can use it directly like vanilla Gzip.

pigz -9 compressfile.tar

By default the above command will replace the original file with the new compressed file “compressfile.tar.gz”. If you want to keep the original uncompressed file and just create a new file along side it add the “-k” or keep option.

pigz -k -9 compressfile.tar

Or you can use the more common formatting of “tar” just by adding a long form option.

tar cf outputFile.tar.gz --use-compress-prog=pigz inputDirectory/

So there you have the best way to speed up Gzip compression. Hopefully it saves you some time and frustration next time you have a large archive. It might even be able to compress your mysqldump output?

Make a Full Disk Backup with DD

Recently I had a drive that was showing the early warning signs of failure. So I decided I had better make a backup copy of the drive. And then subsequently push that image onto another drive to avoid failure. Consequently I found that the drive was fine. It was the SATA cable that was failing. But the process helped remind me of what a useful tool dd is. Subsequently it refreshed my knowledge of how to use this remarkable tool. And finally helped remind me how to make a full disk backup with dd.

What is DD?

DD stands for “Data Definition”, it has been around since about 1974. It can be used to read write and convert data between filesystems, folders and other block level devices. As a result dd can be used effectively for copying the content of a partition, obtaining a fixed amount of random data from /dev/random, or performing a byte order transformation on data.

So Lets Make a Full Disk Backup with DD

I will start with the command I used to make a full disk backup with dd. And then give you a breakdown of the different command elements to help you understand what it is doing.

dd if=/dev/sdc conv=sync,noerror status=progress bs=64K | gzip -c > backup_image.img.gz

The command options break down like this:

if=/dev/sdc this defines the “input file” which in this case is the full drive “/dev/sdc”. You could do the same with a single partition like “/dev/sdc1”, but I want all the partitions on the drive stored in the same image.

conv=sync,noerror the “sync” part tells dd to pad each block with nulls, so that if there is an error and the full block cannot be read the original data will be preserved. The “noerror” portion prevents dd from stopping when an error is encountered. The “sync” and “noerror” options are almost always used together.

status=progress tells the command to regularly give an update on how much data has been copied. Without this option the command will still run but it won’t give any output until the command is complete. So making a backup of a very large drive could sit for hours before letting you know it is done. With this option a line like this is constantly updated to let you know how far along the process has gone.

1993998336 bytes (2.0 GB, 1.9 GiB) copied, 59.5038 s, 33.5 MB/s

bs=64K specifies that the “Block Size” of each chunk of data processed will be 64 Kilobytes. The block size can greatly affect the speed of the copy process. A larger block size will typically accelerate the copy process unless the block size is so large that it overwhelms the amount of RAM on your computer.

Making a compressed backup image file

At this point you could use the “of=/dev/sdb” option to output the contents directly to another drive /dev/sdb. But I opted to make an image file of the drive, and piping the dd output through gzip allowed me to compress the resulting image into a much smaller image file.

| gzip -c pipes the output of dd into the gzip command and writes the compressed data to stdout. Other options could be added here to change the compression ratio, but the default compression was sufficient for my needs.

> backup_image.img.gz redirects the output of the gzip command into the backup_image.img.gz file.

With that command complete I had copied my 115GB drive into a 585MB compressed image. Most of the drive had been empty space, but without the compression the image would have been 115GB. So this approach can make a lot of sense if you are planning on keeping the image around. If you are just copying from one drive to another then no compression is needed.

So there you have it, the process of making a full disk backup with dd. But I guess that is only half the story, so now I will share the command I used to restore that image file to another drive with dd.

Restoring a Full Drive Backup with DD

Fortunately the dd restore process is a bit more straightforward than the backup process. So without further adieu here is the command.

gunzip -c backup_image.img.gz | dd of=/dev/sdc status=progress

gunzip -c backup_image.img.gz right off the bat “gunzip” starts decompressing the file “backup_image.img.gz” and the “-c” sends the decompressed output to stdout.

| dd of=/dev/sdc pipes the output from gunzip into the dd command which is only specifying the “output file” of “/dev/sdc”.

status=progress again this option displays some useful stats about how the dd process is proceeding.

Once the has completed the transfer you should be good to go. But a couple caveats to remember. First the drive you restore to should be the same size or larger than the backup drive. Second, if the restore drive is larger, you will end up with empty space after the restore is complete. ie: 115GB image restored to a 200GB drive will result in the first 115GB of the drive being usable, and 85GB of free space at the end of the drive. So you may want to expand the restored partition(s) to fill up the extra space on the new drive with parted, or a similar tool. Lastly, if you use a smaller drive for the restore dd will not warn you that it won’t fit, it will just start copying and will fail when it runs out of space.

Conclusion

DD is an amazing tool that has been around for a while. And it continues to be relevant and useful each day. It can get you out of a bind and save your data, so give it a whirl and see what it can help you with today.

Here are a couple resources that I referenced to help me build my dd command. A guide on making a full metal backup with dd. And a general DD usage guide.

Speed Up Bzip2 Compression

Bzip2 is easily the best compression tool when it comes to speed and archive size. But even given that it is fast, Bzip2 can still seem to take forever to complete the shrinking of an archive. I am sure you have asked yourself the same question I have. How can I speed up Bzip2 Compression time? Wether you are performing a backup, or just archiving some files Bzip2 does a good job.

There are a couple different ways to speed up Bzip2 compression. Obviously you can get the smallest archives by using the “-9” compression flag. But this takes the longest amount of time.

 ~/$ bzip2 -9 file.txt

So switching to the least compression reduces the compression time. But at the cost of not saving as much disk space.

 ~/$ bzip2 -1 file.txt

Let’s Really Speed Up Bzip2 Compression

If you have watched your CPU usage while using Bzip2 you have probably noticed that your CPU is pegged. In the age of multi-core systems, you will easily notice that only one of your computer or servers cores are pegged out. This is because the Bzip2 process is only single threaded. So it operates by taking the file(s) that are being compressed one bit at a time and compressing it.

This is obviously not the most efficient practice, especially when you have 2, 4, 6, or more idle cores available on your system. But Bzip2 is a single threaded application, so there is no way to utilize those idle cores.

The Best Way To Use Bzip2 is Not To Use Bzip2

Fortunately there is an alternative that will speed up Bzip2 compression. Pbzip2 is a threaded implementation of Bzip2. It allows you to still use Bzip2 compression without having to wait. This is especially important when working with a very large archive.

Pbzip2 breaks the compression task in to multiple pieces which allows the process to accelerate the compression x the number of available cores. Don’t be worried about using all the CPU resources on your system since you can specify the number of cores to use. Or even load the file completely into RAM before starting compression to speed up the process.

Here is a basic example with the highest compression:

tar -c /inputDirectory/ | pbzip2 -c -9 > outputFile.tar.bz2

In this example we are using “tar” to “-c” create an archive from the contents of “/inputDirectory/”. The output of “tar” is then piped into the Pbzip2 command which compresses it and “-c” outputs to stdout with the highest compression “-9”. That compressed content is then redirected into the file “outputFile.tar.bz2”. By default the command will utilize all the available cores on the system.

We can then take the same command and alter it a bit to reduce it’s resource usage and minimize impact on the system load. While still able to speed up the Bzip2 compresson.

tar -c /inputDirectory/ | pbzip2 -c -9 -p2 -m50 > outputFile.tar.bz2

Using the “-p2” option limits the process to using 2 cores. Changing that option to be “-p3” would limit it to 3 cores, and “-p4” would limit it to 4, etc… The “-m100” option limits the amount of RAM that the process utilizes. Our example shows it is limited to 50MB of RAM.

There are some other ways to call Pbzip2. You can use it directly like vanilla Bzip2.

pbzip2 -9 compressfile.tar

Or you can use the more common formatting of “tar” just by adding a long form option.

tar cf outputFile.tar.bz2 --use-compress-prog=pbzip2 inputDirectory/

So there you have the best way to speed up bzip2 compression. Hopefully it saves you some time and frustration next time you have a large archive. It might even be able to compress your mysqldump output?

How To Compress Mysqldump Output

if you read my previous writeup on dumping all mysql databases you will recognize some of this information. I wanted to pay some specific attention to some of the different methods for how to compress mysqldump output.

Obviously compressing your mysql databased exports can have some major benefits. The biggest benefit is the smallness of the file size. Mysql databases and really all databases have the tendency to grow to large sizes. Even small websites can quickly find hundreds of megabytes worth of data in their database. Storing large database export files in your backup can eat up disk space pretty rapidly. Compressing your mysql output can reduce the size of your export file by seven or more times.

If you need to keep individual database backups then compression really makes sense. But if you are using something like rdiff-backup then it makes more sense to skip the compression. Rdiff-backup is unable to do a diff on the compressed data, so it won’t save the space you expect.

Basic Mysqldump Compression Commands

Here are a couple different variations of mysqldump piped compression commands which we will breakdown.

1: mysqldump -u dbUser -p DBName > OutputFile.sql
2: mysqldump -u dbUser -p DBName | gzip > OutputFile.sql.gz
3: mysqldump -u dbUser -p DBName | gzip -9 > OutputFile.sql.gz
4: mysqldump -u dbUser -p DBName | zip > OutputFile.sql.zip
5: mysqldump -u dbUser -p DBName | bzip2 > OutputFile.sql.bz2

In these examples we see the same database being exported in each command. But there are a couple differences, in #1 we are employing no compression. Command #2 is using gzip with its default settings. Then command #3 is utilizing gzip with maximum compression. Command #4 is using zip to perform its compression. And finally command #4 is using bzip2 to perform its compression.

Compression Commands Comparison

Testing the commands above on the same database and on the same hardware yielded the following results.

CommandFilesizeOutput Time
#1391MB13.827s
#257MB16.122s
#355MB32.357s
#457MB16.169s
#544MB1m 18.701s
Output Mysql Database command results

The table above shows the effectiveness of each compression method on the same dataset. The first command sets the baseline for data export with no compression. Gzip applies basic compression and gives a significant size reduction with a very small speed hit. It comes in just a hair faster than zip with about the same compression results.

Adding the -9 to the Gzip command in #3 doubles the output time, and only provides 2MB of space savings. But then Bzip2 weighs in on command #5 taking an extra minute over Gzip or Zip. That extra minute was required to pack the file small enough to rescue another 13MB of space.

Compress Mysqldump Output Conclusions

If you can compress your database output, then you will see significant space savings in your backup storage. Even if backup speed is essential, gzip or zip offer a major reduction in size for minimal extra time. And if time is not a major issue then going with bzip2 will give you much larger space savings in exchange.

Understanding and utilizing compression as part of your backup methodology is an essential element for storage success. Proper implementation can ensure that you save the needed space and reduce backup transfer time. Especially in the event that you need to transfer your backup over a slow network connection. Compression will come to your aid and save the day. So don’t hesitate to compress mysqldump output, it might be just what the doctor ordered.

Further Reading

For additional details and info check out this post which talks more about Compressing Mysqldump Output