the myIT blog

sipXecs Backups to Amazon S3

I recently had a situation where there were around 20 servers that needed to be backed up. In sipXecs 4.6 here is how I tackled it (for those running 4.4 gpg seems to be an issue so I haven't pursued that).

1. Install the s3tools repo:

wget http://s3tools.org/repo/RHEL_6/s3tools.repo

2. Install s3cmd via yum:

yum install s3cmd

3. Once s3cmd is installed you need to have an amazon S3 account setup and know what your access key ID & Secret Access Key are, which you can get in your amazon s3 portal under "security credentials". You will also need to create a bucket and a policy.

4. Bucket Policy: Everyone has a different need to secure their policy. Here is a very basic one to get you started.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AddPerm",
            "Effect": "Allow",
            "Principal": {
                "AWS": "*"
            },
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::alvin/*"
        }
    ]
}

5. Test and configure s3cmd with:

s3cmd --configure

When you finish testing and when successful, choose the option to save the configuration so you can use it.

6. Once you have done this and the test works normally, you can create a local backup and try to sync it. Here is the basic command to sync the local backup directory to your bucket, we'll use the example bucket name of "alvin" and in the bucket we created a directory called "pbx1" for a particular system to send its backups to:

s3cmd sync -r /var/sipxdata/backup/local/ s3://alvin/pbx1/

Pay attention to 'sync -r'. This means sync recursively. So the next time the cron job runs it will sync "wtahever" has changed. This also means if you manually created a backup outside of the schedule and left it on the system, the cron job will pickup just the changes and sync those too. It will also sync (delete) anything that has deleted. So it "should" prune itself based on the local number of backups (just mirroring what is local). When i ran it I showed almost no RAM and 1.3% CPU, so it also does not appear to be service impacting either.

7. Automate it with cron. Everyone has a different need, but since we are doing this locally on the local drive, figure out what you need to save and schedule that in sipXecs. Then automate the offsite backup based on that schedule. (i.e. I have a ssystem that needs to have the configuration backed up weekly and save the last 52 backups on board (locally on the sipx server). That's a year worth. I schedule sipx to do the local backups on Friday Morning at 2am. I run a cron job every Friday at 3am to sync the local backup directory. My basic config file is 80MB in size in this example. So 4.5GB of storage for a system (configuration only, everyone's size will differ). Storage costs at Amazon for 5GB is around 1 cent per month so it won't be pricey.

 0 3 * * 6 s3cmd sync -r /var/sipxdata/backup/local/ s3://alvin/pbx1/

at zero minutes on the third hour on the sixth day of each week run this command...

Conclusion: This totally works and while not for the faint of heart the payoff is completely worthwhile from a cost perspective.  I do strongly suggest saving your amazon credentials in more than one secure place because without them you can't get to your bucket to retrieve the data either. You can also have different Amazon S3 credentials for different systems, so the storage can be paid for by using their credentials.

To figure out your storage costs, do a local backup, peek at the size, determine what your frequency and number of backups will be then launch the amazon storage calculator. I think my monthly cost was 1 cent for 50 weeks worth tallied up.

You do need to fix your access policy in S3 according to your security needs. My example was so anyone somewhat new to S3 can get a backup of data done using this example in an express manner. Then figure out how they want to construct policies, access keys, buckets, folders and the like.

Postlogue:

When you have one organization and multiple systems to configure like this, keep in mind that s3cmd has a corresponding config file, if someone wanted to take this up it could be implemented and configured from within sipx also. Using cron you can probably submit the results from the cron job, inspect them and if you get an error/failure, it can be contributed to the alarms group and notification sent via email. There are simply lots of options. Also keep in mind if you installed s3cmd as root, you will find your config files once you create it as: /root/.s3cfg

Manually copying this file to multiple systems (ahem... replicate via mongo as an example) makes the process that much simpler.

My initial aim was to just get people looking at this. I found s3cmd to be painful until you got the syntax right. While not hard, its unforgiving and the error messages I had during the initial trial  messages were useless for the most part.

Happy S3'ing!