Recursively Count the Number of Files in a Directory

Why would you want to recursively count the number of files or folders in a directory? There could be a lot of different reasons. For myself, I had a client that repeatedly added new directories to a folder. Some of those directories had unique contents in them, and some were copies of other folders. The folders contained text documents, zip files, images, database files, you name it it was in there. Running a recursive ‘du’ command on the root folder showed a size of approximately 50GB. And it was obvious that there were thousands of folders and subfolders to check.

One might think of trying to use ‘ls’ (list) to get count the number of files in a directory. But running an ‘ls’ command alone will only show you the files in the directory. It won’t count the files for you. You can pair it with the ‘wc’ (word count) command and get a count of the number of lines returned. Using a command like this will give you the number of files in your current working directory:

ls -1 | wc -l

But that will only give us the number of files and folders in the current directory. So it will not give you an accurate picture of the number of files or folders in subfolders of your current working directory.

How To Recursively Count the Number of Files in a Directory

So since the “ls” command won’t give us a recursive listing of files or folders we will have to turn to the “find” utility to fulfill that requirement. Find searches recursively through a directory tree to find specific filenames or attributes you want to search for. We can use its versatility to fulfill the searching requirement of our command. For example the following command will search recursively through your current directory tree to hunt for all files and return a list of those files.

find . -type f

And likewise you can do the same to specify searching for only directories.

find . -type d

Or removing the “-type” option will return all files and folders in this folder and its children.

find .

So now that we have the list of all folders or files in this directory and its subdirectories we can count them up by adding our old friend “wc” again. Thus with a command like this we can get the full list of all the files in your current working directory and its children:

find . -type f | wc -l

or for directories only:

find . -type d | wc -l

Now you can quickly count the files and folders in a given directory to easily assess how many files you are dealing with.

A special thanks to these sites that I referenced when searching this topic myself. And may have some more details for you. You can visit those sites Here and Here.

How To Speed Up Gzip Compression

Gzip is the ubiquitous compression tool for linux and other *nix based systems. But even given that it is fairly quick, when you are working with a large archive it can take a while. I am sure you have asked yourself the same question I have. How can I speed up gzip compression time?

There are a couple different ways to speed up Gzip compression. Obviously you can get the smallest archives by using the “-9” compression flag. But this takes the longest amount of time.

 ~/$ gzip -9 file.txt

So switching to the least compression reduces the compression time. But at the cost of not saving as much disk space.

 ~/$ gzip -1 file.txt

Let’s Really Speed Up Gzip Compression

If you have watched your CPU usage while using Gzip you may have noticed that your CPU is pegged. In the age of multi-core systems, you might notice that only one of your computer or servers cores are pegged out. This is because the Gzip process is only single threaded. So it operates by taking the file(s) that are being compressed one bit at a time and compressing it.

This is obviously not the most efficient practice, especially when you have 2 or more idle cores available on your system. But since Gzip is a single threaded application, there is no way to utilize all those idle cores.

The Best Way To Speed Up Gzip is Not To Use Gzip

There is an alternative that will speed up your Gzip compression. Pigz is a threaded implementation of Gzip. It allows you to still use Gzip compression without having to wait so long. This is especially important when working with a very large archive.

Pigz breaks the compression task in to multiple pieces which allows the process to accelerate the compression x the number of available cores. So if you have four available cores, you can expect the compression to complete in about 1/4th the time. Don’t be worried about using all the CPU resources on your system since you can specify the number of cores to use.

Here is a basic Pigz example with the highest compression:

tar -c /inputDirectory/ | pigz -9 > outputFile.tar.gz

In this example we are using “tar” to “-c” create an archive from the contents of “/inputDirectory/”. The output of “tar” is then piped into the Pigz command which compresses it with the highest compression “-9”. That compressed content is then redirected into the file “outputFile.tar.gz”. By default the command will utilize all the available cores on the system.

We can then take the same command and alter it a bit to reduce it’s resource usage and minimize impact on the system load. While still able to speed up the Gzip compresson.

tar -c /inputDirectory/ | pigz -9 -p2 > outputFile.tar.gz

Using the “-p2” option limits the process to using 2 cores. Changing that option to be “-p3” would limit it to 3 cores, and “-p4” would limit it to 4, etc…

Call Pigz just like Gzip

There are some other ways to call Pigz. You can use it directly like vanilla Gzip.

pigz -9 compressfile.tar

By default the above command will replace the original file with the new compressed file “compressfile.tar.gz”. If you want to keep the original uncompressed file and just create a new file along side it add the “-k” or keep option.

pigz -k -9 compressfile.tar

Or you can use the more common formatting of “tar” just by adding a long form option.

tar cf outputFile.tar.gz --use-compress-prog=pigz inputDirectory/

So there you have the best way to speed up Gzip compression. Hopefully it saves you some time and frustration next time you have a large archive. It might even be able to compress your mysqldump output?

Change the SNMP Log Level in Ubuntu

The default SNMP settings for a Ubuntu server can end up filling your syslog file with tons of unnecessary entries. This makes it virtually impossible to sift through for anything which is actually useful. So it can be very advantageous to change the SNMP log level in Ubuntu.

I have a cacti setup which I use to log and report on the details of many linux and windows servers. This tool is amazing, and really gives me some great information to diagnose issues. Or catch issues as they are progressing, but before they become urgent. Sometimes it is just easier to see something when your data is represented visually.

Cacti relies upon SNMP as the technology to grab data from the machines or devices that it is monitoring. SNMP is an industry standard, supported by all major operating systems and network enabled devices. But by default, at least in Ubuntu, the log level is set so high that every SNMP request that comes to the server is reported in your syslog file. Cacti polls lots of different SNMP records to build its graphs. Under those default settings it can leave dozens of entries in the syslog every 5 minutes. As you could imagine this can quickly fill up your log file and make it virtually unusable. Fortunately we just need to make a quick adjustment in order to change the SNMP log level in Ubuntu. Here is a quick example of some of the Syslog entries that I you may be receiving.

Jul 8 06:28:48 server snmpd[7885]: error on subcontainer 'ia_addr' insert (-1)
Jul 8 06:29:18 server snmpd[7885]: error on subcontainer 'ia_addr' insert (-1)
Jul 8 06:29:48 server snmpd[7885]: error on subcontainer 'ia_addr' insert (-1)
Jul 8 06:30:02 server snmpd[7885]: Connection from UDP: [Originating IP]:41028->[Current Host IP]:161
Jul 8 06:30:02 server snmpd[7885]: Connection from UDP: [Originating IP]:48694->[Current Host IP]:161
Jul 8 06:30:02 server snmpd[7885]: Connection from UDP: [Originating IP]:39372->[Current Host IP]:161
Jul 8 06:30:02 server snmpd[7885]: Connection from UDP: [Originating IP]:54823->[Current Host IP]:161

Change the SNMP Log Level in Ubuntu

The change is just a quick flag in the /etc/default/snmpd file which changes how the system logs SNMP requests. The different log levels that are available are:

0 or ! for LOG_EMERG
1 or a for LOG_ALERT
2 or c for LOG_CRIT
3 or e for LOG_ERR
4 or w for LOG_WARNING
5 or n for LOG_NOTICE
6 or i for LOG_INFO
7 or d for LOG_DEBUG

By default a log level is not set so it is either dumping at the info or debug level. I prefer to switch it to level 3 (Error) which ensures that I still see any errors that come through. But doesn’t tell me every time a connection is made. This change can be made very easily. Basically you can just open up the /etc/default/snmpd file in your favorite editor and change the following line (Ubuntu 14.04 and 16.04).

SNMPDOPTS='-Lsd -Lf /dev/null -u snmp -g snmp -I -smux,mteTrigger,mteTriggerConf -p /run/snmpd.pid'

To look like this:

SNMPDOPTS='-LS3d -Lf /dev/null -u snmp -g snmp -I -smux,mteTrigger,mteTriggerConf -p /run/snmpd.pid'

The only part that changed was the “-Lsd” flags that changed to be “-LS3d”. The default entry is a little different between 14.04/16.04, 18.04 and 20.04. But I have included a few single commands you can copy/paste into your terminal to make the change.

Copy/Paste Command Line Changes

For Ubuntu 14.04 and 16.04:

sed -i -- "s@SNMPDOPTS='-Lsd -Lf /dev/null -u snmp -g snmp -I -smux,mteTrigger,mteTriggerConf -p /run/snmpd.pid'@SNMPDOPTS='-LS3d -Lf /dev/null -u snmp -g snmp -I -smux,mteTrigger,mteTriggerConf -p /run/snmpd.pid'@g" /etc/default/snmpd
service snmpd restart

In Ubuntu 18.04:

sed -i -- "s@SNMPDOPTS='-Lsd -Lf /dev/null -u Debian-snmp -g Debian-snmp -I -smux,mteTrigger,mteTriggerConf -p /run/snmpd.pid'@SNMPDOPTS='-LS3d -Lf /dev/null -u Debian-snmp -g Debian-snmp -I -smux,mteTrigger,mteTriggerConf -p /run/snmpd.pid'@g" /etc/default/snmpd
service snmpd restart

Finally Ubuntu 20.04:

sed -i -- "s@#SNMPDOPTS='-LSwd -Lf /dev/null -u Debian-snmp -g Debian-snmp -I -smux,mteTrigger,mteTriggerConf -p /run/snmpd.pid'@SNMPDOPTS='-LS3d -Lf /dev/null -u Debian-snmp -g Debian-snmp -I -smux,mteTrigger,mteTriggerConf -p /run/snmpd.pid'@g" /etc/default/snmpd
service snmpd restart

So there you go, now you can stop those annoying error log messages from filling up your syslog file. A big thanks to this ServerFault post on the subject for helping me figure it out.

Make a Full Disk Backup with DD

Recently I had a drive that was showing the early warning signs of failure. So I decided I had better make a backup copy of the drive. And then subsequently push that image onto another drive to avoid failure. Consequently I found that the drive was fine. It was the SATA cable that was failing. But the process helped remind me of what a useful tool dd is. Subsequently it refreshed my knowledge of how to use this remarkable tool. And finally helped remind me how to make a full disk backup with dd.

What is DD?

DD stands for “Data Definition”, it has been around since about 1974. It can be used to read write and convert data between filesystems, folders and other block level devices. As a result dd can be used effectively for copying the content of a partition, obtaining a fixed amount of random data from /dev/random, or performing a byte order transformation on data.

So Lets Make a Full Disk Backup with DD

I will start with the command I used to make a full disk backup with dd. And then give you a breakdown of the different command elements to help you understand what it is doing.

dd if=/dev/sdc conv=sync,noerror status=progress bs=64K | gzip -c > backup_image.img.gz

The command options break down like this:

if=/dev/sdc this defines the “input file” which in this case is the full drive “/dev/sdc”. You could do the same with a single partition like “/dev/sdc1”, but I want all the partitions on the drive stored in the same image.

conv=sync,noerror the “sync” part tells dd to pad each block with nulls, so that if there is an error and the full block cannot be read the original data will be preserved. The “noerror” portion prevents dd from stopping when an error is encountered. The “sync” and “noerror” options are almost always used together.

status=progress tells the command to regularly give an update on how much data has been copied. Without this option the command will still run but it won’t give any output until the command is complete. So making a backup of a very large drive could sit for hours before letting you know it is done. With this option a line like this is constantly updated to let you know how far along the process has gone.

1993998336 bytes (2.0 GB, 1.9 GiB) copied, 59.5038 s, 33.5 MB/s

bs=64K specifies that the “Block Size” of each chunk of data processed will be 64 Kilobytes. The block size can greatly affect the speed of the copy process. A larger block size will typically accelerate the copy process unless the block size is so large that it overwhelms the amount of RAM on your computer.

Making a compressed backup image file

At this point you could use the “of=/dev/sdb” option to output the contents directly to another drive /dev/sdb. But I opted to make an image file of the drive, and piping the dd output through gzip allowed me to compress the resulting image into a much smaller image file.

| gzip -c pipes the output of dd into the gzip command and writes the compressed data to stdout. Other options could be added here to change the compression ratio, but the default compression was sufficient for my needs.

> backup_image.img.gz redirects the output of the gzip command into the backup_image.img.gz file.

With that command complete I had copied my 115GB drive into a 585MB compressed image. Most of the drive had been empty space, but without the compression the image would have been 115GB. So this approach can make a lot of sense if you are planning on keeping the image around. If you are just copying from one drive to another then no compression is needed.

So there you have it, the process of making a full disk backup with dd. But I guess that is only half the story, so now I will share the command I used to restore that image file to another drive with dd.

Restoring a Full Drive Backup with DD

Fortunately the dd restore process is a bit more straightforward than the backup process. So without further adieu here is the command.

gunzip -c backup_image.img.gz | dd of=/dev/sdc status=progress

gunzip -c backup_image.img.gz right off the bat “gunzip” starts decompressing the file “backup_image.img.gz” and the “-c” sends the decompressed output to stdout.

| dd of=/dev/sdc pipes the output from gunzip into the dd command which is only specifying the “output file” of “/dev/sdc”.

status=progress again this option displays some useful stats about how the dd process is proceeding.

Once the has completed the transfer you should be good to go. But a couple caveats to remember. First the drive you restore to should be the same size or larger than the backup drive. Second, if the restore drive is larger, you will end up with empty space after the restore is complete. ie: 115GB image restored to a 200GB drive will result in the first 115GB of the drive being usable, and 85GB of free space at the end of the drive. So you may want to expand the restored partition(s) to fill up the extra space on the new drive with parted, or a similar tool. Lastly, if you use a smaller drive for the restore dd will not warn you that it won’t fit, it will just start copying and will fail when it runs out of space.

Conclusion

DD is an amazing tool that has been around for a while. And it continues to be relevant and useful each day. It can get you out of a bind and save your data, so give it a whirl and see what it can help you with today.

Here are a couple resources that I referenced to help me build my dd command. A guide on making a full metal backup with dd. And a general DD usage guide.

Speed Up Bzip2 Compression

Bzip2 is easily the best compression tool when it comes to speed and archive size. But even given that it is fast, Bzip2 can still seem to take forever to complete the shrinking of an archive. I am sure you have asked yourself the same question I have. How can I speed up Bzip2 Compression time? Wether you are performing a backup, or just archiving some files Bzip2 does a good job.

There are a couple different ways to speed up Bzip2 compression. Obviously you can get the smallest archives by using the “-9” compression flag. But this takes the longest amount of time.

 ~/$ bzip2 -9 file.txt

So switching to the least compression reduces the compression time. But at the cost of not saving as much disk space.

 ~/$ bzip2 -1 file.txt

Let’s Really Speed Up Bzip2 Compression

If you have watched your CPU usage while using Bzip2 you have probably noticed that your CPU is pegged. In the age of multi-core systems, you will easily notice that only one of your computer or servers cores are pegged out. This is because the Bzip2 process is only single threaded. So it operates by taking the file(s) that are being compressed one bit at a time and compressing it.

This is obviously not the most efficient practice, especially when you have 2, 4, 6, or more idle cores available on your system. But Bzip2 is a single threaded application, so there is no way to utilize those idle cores.

The Best Way To Use Bzip2 is Not To Use Bzip2

Fortunately there is an alternative that will speed up Bzip2 compression. Pbzip2 is a threaded implementation of Bzip2. It allows you to still use Bzip2 compression without having to wait. This is especially important when working with a very large archive.

Pbzip2 breaks the compression task in to multiple pieces which allows the process to accelerate the compression x the number of available cores. Don’t be worried about using all the CPU resources on your system since you can specify the number of cores to use. Or even load the file completely into RAM before starting compression to speed up the process.

Here is a basic example with the highest compression:

tar -c /inputDirectory/ | pbzip2 -c -9 > outputFile.tar.bz2

In this example we are using “tar” to “-c” create an archive from the contents of “/inputDirectory/”. The output of “tar” is then piped into the Pbzip2 command which compresses it and “-c” outputs to stdout with the highest compression “-9”. That compressed content is then redirected into the file “outputFile.tar.bz2”. By default the command will utilize all the available cores on the system.

We can then take the same command and alter it a bit to reduce it’s resource usage and minimize impact on the system load. While still able to speed up the Bzip2 compresson.

tar -c /inputDirectory/ | pbzip2 -c -9 -p2 -m50 > outputFile.tar.bz2

Using the “-p2” option limits the process to using 2 cores. Changing that option to be “-p3” would limit it to 3 cores, and “-p4” would limit it to 4, etc… The “-m100” option limits the amount of RAM that the process utilizes. Our example shows it is limited to 50MB of RAM.

There are some other ways to call Pbzip2. You can use it directly like vanilla Bzip2.

pbzip2 -9 compressfile.tar

Or you can use the more common formatting of “tar” just by adding a long form option.

tar cf outputFile.tar.bz2 --use-compress-prog=pbzip2 inputDirectory/

So there you have the best way to speed up bzip2 compression. Hopefully it saves you some time and frustration next time you have a large archive. It might even be able to compress your mysqldump output?

How To Compress Mysqldump Output

if you read my previous writeup on dumping all mysql databases you will recognize some of this information. I wanted to pay some specific attention to some of the different methods for how to compress mysqldump output.

Obviously compressing your mysql databased exports can have some major benefits. The biggest benefit is the smallness of the file size. Mysql databases and really all databases have the tendency to grow to large sizes. Even small websites can quickly find hundreds of megabytes worth of data in their database. Storing large database export files in your backup can eat up disk space pretty rapidly. Compressing your mysql output can reduce the size of your export file by seven or more times.

If you need to keep individual database backups then compression really makes sense. But if you are using something like rdiff-backup then it makes more sense to skip the compression. Rdiff-backup is unable to do a diff on the compressed data, so it won’t save the space you expect.

Basic Mysqldump Compression Commands

Here are a couple different variations of mysqldump piped compression commands which we will breakdown.

1: mysqldump -u dbUser -p DBName > OutputFile.sql
2: mysqldump -u dbUser -p DBName | gzip > OutputFile.sql.gz
3: mysqldump -u dbUser -p DBName | gzip -9 > OutputFile.sql.gz
4: mysqldump -u dbUser -p DBName | zip > OutputFile.sql.zip
5: mysqldump -u dbUser -p DBName | bzip2 > OutputFile.sql.bz2

In these examples we see the same database being exported in each command. But there are a couple differences, in #1 we are employing no compression. Command #2 is using gzip with its default settings. Then command #3 is utilizing gzip with maximum compression. Command #4 is using zip to perform its compression. And finally command #4 is using bzip2 to perform its compression.

Compression Commands Comparison

Testing the commands above on the same database and on the same hardware yielded the following results.

CommandFilesizeOutput Time
#1391MB13.827s
#257MB16.122s
#355MB32.357s
#457MB16.169s
#544MB1m 18.701s
Output Mysql Database command results

The table above shows the effectiveness of each compression method on the same dataset. The first command sets the baseline for data export with no compression. Gzip applies basic compression and gives a significant size reduction with a very small speed hit. It comes in just a hair faster than zip with about the same compression results.

Adding the -9 to the Gzip command in #3 doubles the output time, and only provides 2MB of space savings. But then Bzip2 weighs in on command #5 taking an extra minute over Gzip or Zip. That extra minute was required to pack the file small enough to rescue another 13MB of space.

Compress Mysqldump Output Conclusions

If you can compress your database output, then you will see significant space savings in your backup storage. Even if backup speed is essential, gzip or zip offer a major reduction in size for minimal extra time. And if time is not a major issue then going with bzip2 will give you much larger space savings in exchange.

Understanding and utilizing compression as part of your backup methodology is an essential element for storage success. Proper implementation can ensure that you save the needed space and reduce backup transfer time. Especially in the event that you need to transfer your backup over a slow network connection. Compression will come to your aid and save the day. So don’t hesitate to compress mysqldump output, it might be just what the doctor ordered.

Further Reading

For additional details and info check out this post which talks more about Compressing Mysqldump Output

Dump All MySQL Databases into Individual SQL Files

Anyone who is responsible for managing a MySQL database will eventually run into this problem. You either need to dump all your MySQL databases for a backup, or to prepare for an upgrade. Whatever your circumstances are there are several different methods that can be employed to dump your MySQL DBs. Hopefully I can give you the basic tools to get you on your way.

Dump All DBs into a Single SQL File

This first method is what I would call the quick and dirty method. It is straight forward and just dumps all of the databases on a server into a single SQL file. For many individuals this is sufficient, but may not be a good option for larger databases or backup/restore processes. Since all of the DBs are bundled into a single file. Each of the values in brackets “[]” are placeholders for your own values.

mysqldump -u [username] -p --all-databases > allDB.sql

If all you need is to get a quick backup of everything this may be your ticket. The “–all-databases” flag does the magic here, dumping all of the MySQL databases into a single SQL file. More details on this method can be found here. But if you are wanting to be able to easily restore an individual database you may want to use an approach like this.

Dump All MySQL Databases into Individual Files

for I in $(mysql -u [username] -p[mypassword] -h [Hostname/IP] -e 'show databases' -s --skip-column-names); do mysqldump -u [username] -p[mypassword] -h [Hostname/IP] $I > "/home/user/$I.sql"; done

This command first calls “mysql” and gets a list of databases on the server. The command feeds that list of MySQL databases into “mysqldump” to get an individual SQL file for each database on the server. Finally those SQL files are then saved in the location indicated, “/home/user/dbname.sql” in this example.

A few things to note in this command are that first you will notice that I have included the password in the command. The command will complain that using the password on the command line is not safe. However it is required for the command to work. The “-p” is immediately followed by the password, without a space. This is how the option functions, if you add a space it will not work.

The “$I” is a variable and it will have all the database names in it, as it iterates through your listing of DBs. So as you modify the command to fit your specific setup, just make sure to keep that for consistency.

Additional Mysqldump Options

There are a couple of additional mysqldump options that you may want to add depending on your requirements.

--single-transaction

This option keeps mysqldump from trying to get a complete lock on the database. It tells mysqldump to just grab a single transaction and dump the DB contents at the time of that transaction. This can be especially helpful when you are dumping a DB that is especially large, or on a very busy server. Without this option mysqldump may timeout while waiting to get a lock on the DB. For more details on how this option works check out this post.

--default-character-set=utf8mb4 

If you are working with data that may contain emoticons you will want this flag. This ensures that your dumped sql file has the correct information to recreate the emoticons when the file gets reimported. Without this option, your emoticons will show up as strange/random text characters when you restore your backup.

Adding Compression

I don’t typically use compression on my SQL dumps since I like to use rdiff-backup as my backup mechanism. But for those who would like to compress their mysqldump in one step here is the basic gist of it.

mysqldump -u [username] -p --all-databases | gzip > allDB.sql.gz

You just pipe the output from the mysqldump command into gzip, or bzip2 to compress the contents. That can be easily added to the command above to dump all MySQL databases into individual files like so.

for I in $(mysql -u [username] -p[mypassword] -h [Hostname/IP] -e 'show databases' -s --skip-column-names); do mysqldump -u [username] -p[mypassword] -h [Hostname/IP] $I | gzip > "/home/user/$I.sql.gz"; done

Hopefully these examples help you get the backups you need. Have some fun along the way.

How to Use Rdiff-backup – A Simply Powerful Backup Tool

In this post I hope to help you understand how to use Rdiff-backup. I stumbled across Rdiff-backup several years ago. It has helped me streamline and simplify most of my file based backup processes from Linux servers. As the name implies, a diff is performed on the files being backed up, so only differences get backed up. As you can probably guess only storing the diffs of changes can lead to a much smaller backup.

Some advantages of Rdiff-backup is that it utilizes rsync, so it can quickly and easily mirror a directory. Backups can happen in seconds if there have been few changes. Your data all travels over a secure ssh connection, so your files are safe during transport. And being a simple command line tool, you can easily script out a backup scenario. Or just use it straight from your terminal.

How I Use Rdiff-backup

I typically use Rdiff-backup with my web hosting clients, it allows for quick backup and restore of their web files. And because most of the the files don’t change from day to day the backup is lightning fast.

rdiff-backup --exclude '**cache/' --exclude '**debug.log' /var/www user@ipAddress::/home/user/backup
rdiff-backup --remove-older-than 52W user@ipAddress::/home/user/backup

Let’s break down the command(s), the first Rdiff-backup command has two “–exclude” options. Those options will as indicated exclude the referenced files or directories from the backup. The exclude options can either have a full relative directory structure or in this case a “**” will match any path. A single asterisk “*” could also be used, but it matches any part of a path not containing a “/” in it.

The “/var/www” part of the command is the directory to backup. Using standard ssh authentication methods “user@ipAddress” ie:”bob@10.1.1.1″ for the login credentials. And then a double colon, which is different from normal rsync/scp formatting. The “::” proceeds the backup destination directory.

So to explain in basic terms, the first command will backup everything under the “/var/www” directory, except any directory ending in “cache/” or file ending in “debug.log”. The backup will be made in the “/home/user/backup” directory.

Backup Lifecycle

Now that you have a backup created how do you manage how long the backup will be kept? Without a command like the second one above, the Rdiff-backup will be kept indefinitely. But adding the second command helps us manage the how long to keep the backups.

rdiff-backup --remove-older-than 52W user@ipAddress::/home/user/backup

This command is a little simpler than the first, it doesn’t specify a source directory. But rather only specifies how long to keep files. The “–remove-older-than” option can take a number of different options. I like to keep my backups for a year, so “52W” gives me 52 weeks of backups. Any existing diffs older than the specified time are removed. Other options are s, m, h, D, W, M, or Y (indicating seconds, minutes, hours, days, weeks, months, or years respectively). Additionally a “B” can be used to indicate the number of backups, ie: “3B” would keep the last three backups.

After specifying the number or timeframe of backups to keep, the only other thing to specify is the backup location. The trimming of backup content is then performed on the given location.

Performing a Restore

So now you have your content backed up, and you need to restore something. A backup is only as good as the data you can restore out of it right? Fortunately the restore process fairly simple as well.

rdiff-backup -r 5D user@ipAddress::/home/user/backup/example.com/index.php /home/localUser/www/restore/

The “-r” option tells Rdiff-backup to restore files, and uses the same time format as the delete option above. In this case we are restoring a version of the file “example.com/index.php” from 5 days ago. The file is being restored/copied to “/home/localUser/www/restore/”. The same can be done for an entire directory structure.

rdiff-backup -r 3B user@ipAddress::/home/user/backup/example.com/ /home/localUser/www/restore/

This command restores/copies all the contents of the example.com directory to “/home/localUser/www/restore/” from 3 Backups ago. Or if you are hunting for a specific day you can always do something like this.

rdiff-backup -r 03-05-2020 user@ipAddress::/home/user/backup/example.com/ /home/localUser/www/restore/

That will perform the same restore, but specifically as of the 5th of March 2020. The date used can be “03/05/2020” or “2020-03-05”, and all indicate midnight as of that day.

For a full rundown on all the options and other details for Rdiff-backup check out the project documentation.

There you have a basic rundown on how to use Rdiff-backup. I find it a very useful and powerful tool, and hope it will help you keep your backups running.

Find Files Not Owned By Specific User or Group

Recently ran into an issue where I needed to search recursively through a file structure. And find files that were not owned by a specific user or group. The issue came up because a client would run a recursive chown on a directory before running git. As a result of the chown they would periodically see the process hang. The chown was to ensure that all the files were writable and wouldn’t gum up the git process. However the hung chown command would cause extra load on the server and lead to system instability.

These files were shared between a few web servers on an NFS share. After some lengthy research I am still stumped as to exactly what the source of the issue is. But it lead me to craft a command that would allow the client to easily check permissions on their files. But without forcing a change to the ownership, which seemed to possibly be the cause of the hanging process.

Find files not owned by specific user

The command was created using some help from this site and this site.

find . ! -group web -or ! -user web -printf "%p - user:%u group:%g\n"

The find command searches from the “.” current directory and recursively checks all files and folders. The “! -group web” section tells the find command to check for files that are NOT “!” owned by the “web” group. Then “! -user web” specifies that it should check for files that are NOT “!” owned by the “web” user. The “-or” tells the find command to match files that are either not owned by the “web” user or group. Changing that to “-and” would only show files that are not owned by the “web” user or group.

The final section ‘-printf “%p – user:%u group:%g\n”‘ tells find how to output the results. “%p” outputs the filename and relative directory structure. “%u” and “%g” output the user and group of the file that is found. Some sample output would look like this.

./logs/access.log.2016_09_30.03.gz - user:root group:web
./logs/access.log.2016_10_04.09.gz - user:root group:web
./logs/access.log.2016_10_06.32.gz - user:root group:web
./logs/access.log.2016_10_04.38.gz - user:root group:web

This command easily helps you determine if your permissions are set correctly. And identify which files will need to have their ownership changed. It is more lightweight and doesn’t force any changes to the filesystem when they are unneeded.

Allow Let’s Encrypt to Bypass HTTP Auth

The story is a common one, you or your client wants to have a site protected by password. But you also want it to be secured with a Let’s Encrypt SSL certificate. The Certificate was easy to provision when you first set up the website. But then you added basic HTTP authentication or IP restrictions to your htaccess file. Now Let’s Encrypt can no longer connect to your site to renew the ssl cert. So how can we allow Let’s Encrypt to bypass HTTP auth and issue your cert?

The following method will work for either IP or basic authentication settings.

How to Allow Let’s Encrypt to Bypass HTTP Auth

Let’s Encrypt needs to access the hidden folder called .well-known contained within your document root. Typically your authentication settings or IP restrictions are in the htaccess file in your document root. This establishes the rules for all child folders. But by default htaccess files found inside child folders can override the parent settings.

By adding a htaccess file in your .well-known directory you can allow Let’s Encrypt to bypass HTTP auth security. Your htaccess file should contain the following:

Order allow,deny
Allow from all
Satisfy Any

These directives tell the web server to allow traffic from any source to access that directory and all child folders.

Thanks to these sites for guiding me to this solution in my own search. You can find their write-ups here and here.

HTTPS Redirect Breaks Let’s Encrypt Renewal

One other issue I have run into that causes problems with the renewal process is your sites HTTPS redirect. In some instances once you add your redirect rule to force all traffic to use https, this can break your Let’s Encrypt Renewal process. And no longer allow the Let’s Encrypt Certbot to validate your site.

Making a slight change to your rewriterule can fix your problem. Your initial rule may have looked something like this:

# Redirection to HTTPS
 RewriteCond %{SERVER_PORT} ^80$ [OR]
 RewriteCond %{HTTPS} =off
 RewriteRule ^(.*)$ https://yoursite.com$1 [R=301,L]

That rule checks for traffic coming in on port 80 or that does not use https and then redirects all of it to the https version of your site.

# Redirection to HTTPS
 RewriteCond %{SERVER_PORT} ^80$ [OR]
 RewriteCond %{HTTPS} =off
 RewriteRule ^(?!/.well-known(?:$|/)).* https://yoursite.com$0 [R=301,L]

By adding this change to the Rewriterule line we have told the server to redirect all traffic that does not start with “/.well-known”. So anything under that directory (Let’s Encrypt) are not redirected to HTTPS.

Thanks to this site for posting their fix so I could find it.