Running my Own Web Services — Calendar

Calendar

In my project to remove my dependence from Google services, I’ve also moved my calendar to my own server. ownCloud, which I talk about setting up back here, has a calendar web application, and provides a CalDAV server. Thunderbird has a calendar extension called Lightning, which I’ve pointed at my CalDAV server. Like with IMAP, CalDAV ensures that your clients always reflects the current state of the server, so things are consistent across devices. On Android, I got an app to provide a CalDAV sync system, and then continued to use the stock calendar app. This transition was surprisingly easy.

Of course, there was a catch. When I was using Google Calendar, my girlfriend had access to my calendar, and I had access to hers. This sharing is easy within the Google ecosystem, but turned out to be a lot harder outside. She could easily provide me with a publicly accessible link to an iCal file which contained all of her calendar events. I subscribed to this in my various calendar clients. Unfortunately, ownCloud did not provide a way for me to give her a similar link. ownCloud does have an iCal URL, but it requires a username and password to access. Google Calendar does not like this — it only lets you subscribe to non-authenticated URLs. I hacked together this page which will let you get an iCal version of your calendar without authentication:

OCP\App::checkAppEnabled('calendar');
$cal = isset($_GET['calid']) ? $_GET['calid'] : null;
$given_secret = isset($_GET['secret']) ? $_GET['secret'] : null;
$secret = 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXX';
if(!($given_secret === $secret)) {
  header('HTTP/1.0 403 Forbidden');
  exit;
}
if(!is_null($cal)) {
  $calendar = OC_Calendar_App::getCalendar($cal, false, false);
  if(!$calendar) {
    header('HTTP/1.0 403 Forbidden');
    exit;
  }
  header('Content-Type: text/calendar');
  header('Content-Disposition: inline; filename=' . str_replace(' ', '-', $calendar['displayname']) . '.ics');
  echo OC_Calendar_Export::export($cal, OC_Calendar_Export::CALENDAR);
} else {
  echo ("Must provide calid");
}

Place this in apps/calendar/share.php in your ownCloud installation. Then, go to http://my.server.name:port/public.php?service=calendar&calid=X&secret=XXXXXXXXXXXXXXXXXXXXXXXXXXXXX. This should be your entire calendar in iCal format. You can then just pass this to Google Calendar in their “Import by URL” option. Hopefully ownCloud will provide a more elegant way to share calendars without requiring login in the future. For now, this will have to do.

Posted in Uncategorized

Running My Own Web Services — Backups

Backups

Now that my email is stored on my own server, backups are critical. Soley relying on DigitalOcean’s backups was not acceptable to me — these backups are easy to use, but they are on-site with my server. And of course, relying on a single backup system is never a good idea. I still have backups turned on in my DigitalOcean account, but I wanted a second backup system as well. I chose Tarsnap because it’s easy to use, cheap, and secure. To make a backup, you run a command telling it what directories to back up, and it uploads a copy to its S3-backed storage. Backups are incremental, so frequent backups cost only a little more than infrequent ones. To save space, I decided to only back up directories where real data is stored, or where I’ve made changes from a usual Ubuntu installation. I set up a nightly cron job to make a new backup every night. Cron runs this command:

tarsnap --keyfile /home/charles/keys/tarsnap.wasp.nodelete.key -c -f $(hostname)-$(date +"%Y-%m-%d") $(cat /home/charles/tarsnap/backup_paths.txt)

This backup system is currently costing me about 2 cents per day, and I feel good about the security of it. I’ve made two copies of my key file: one that can add backups without requiring a password, and one that has all power (including deleting backups), that requires a password. Both key files have been copied to my laptop and to external storage media, so I can be confident that I’ll always be able to recover my backups.

Realistically, one of the biggest uses of a backup system is restoring a file to its previous state (usually a configuration file that I’ve messed up). Tarsnap isn’t great for this. Rolling back a file requires downloading the entire backup (which is a tar file), and then fishing the file you want out of it. Maybe putting /etc under git would be a better way to go.

Posted in Uncategorized

Estimating π using MapReduce

I am taking a class called Senior Seminar in which we needed to give a talk about a technical subject. I wanted my talk to be unique, so I thought about doing some research about concurrency performance on mobile devices versus PCs. Quickly I decided this was fairly boring, since I could see the conclusion coming from a mile away: concurrency works a lot better on PCs than on mobile devices.

I was using the calculation of π as a CPU-bound workload for this testing, so I decided to pivot on this. In particular, how can concurrency help with the computation of π? I found the paper “The BBP Algorithm for Pi,” by David H. Bailey, which gave me a detailed description of a good way forward. Since I’m interested in finding new ways to use MapReduce, I decided to implement the ideas of the paper inside a small MapReduce job. This paper details a method for computing a handful of hexadecimal digits starting at a certain place after the decimal point. For some n number of digits requested, I apply this method repeatedly to get the digits from 0 through n. Each call to map() generates one handful of digits (currently set to 5). The further this handful is from the decimal point, the more time it takes to generate. Unfortunately, this means the total running time grows geometrically. On a high-performance Hadoop cluster, calculating 100,000 digits took 31 seconds. This is not coming close to breaking any records, and is probably slower than you could do on your personal computer by using a more efficient (but less distributed) algorithm. Nonetheless, this was a very educational project.

The source code for the project is available on my GitHub. The results of computing the first 100,000 digits are also available.

Posted in Uncategorized

Running My Own Web Services — Email and Nagios

Email

One of my big goals was to have my own email server. I followed this tutorial on how to set it up. There’s a surprising amount of manual SQL interaction. I had an SSL certificate that covered “www.charlesconnell.com” and “charlesconnell.com”, so I needed a new one to interact with my mail server. Since the machine itself is called “wasp.charlesconnell.com”, I decided to connect to it that way, so I got a second certificate for that name. This was another free certificate from StartSSL.

I do my email from a Windows machine, an Ubuntu machine, and my Android phone. I set up Thunderbird on Windows and Linux, connecting to my server with IMAP. I seriously recommend the “Thunderbird Conversions” add-on. I got the K-9 Mail app on Android, which I’ve been happy with. IMAP does a good job keeping your email consistent across devices, so it’s not much different from GMail. Both Thunderbird and K-9 have pretty easy shortcuts for archiving emails, which is also crucial for a former GMail user.

I’ve now been using my own email server for several months. The experience has been great overall. It took some time to get accustomed to the new clients I’m using, now I have no trouble with them.

Nagios

On several occasions, my MySQL daemon stopped running, so Postfix couldn’t accept mail. The first time this happened, I noticed after 30 minutes. The next time, it took me 8 hours to notice. While I eventually solved the problem with MySQL, I decided to set up monitoring to make sure I find out right away if it ever happens again.

I started up a second droplet on DigitalOcean, in a different data center than my original one. This machine’s purpose is to monitor my email server. Nagios comes with configurations for monitoring SMTP and MySQL, so I set that up. This requires exposing your MySQL server, but I made sure that IPTables only allowed connections to it from the IP of my monitoring box. Nagios’s default behavior when it notices a problem is to send you an email. Since the purpose of this system to alert me when my email is not working, this obviously would not do! I wanted a text message, since that would alert me quickly and reliably. SendHub has free accounts that will allow you to send texts to whatever number you like. I got one of their accounts, set up my own phone number as the only one in my SendHub contacts, and wrote this little script:

#!/bin/sh
echo "{\"contacts\":[XXXXX], \"text\":\"$1\"}" > /tmp/sendhub-input-$(whoami).json
http POST 'https://api.sendhub.com/v1/messages/?username=5555555555&api_key=XXXX' < /tmp/sendhub-input-$(whoami).json

This will send the text in the script’s first argument to the specified contact. Then I defined this command in /etc/nagios3/commands.cfg:

# 'notify-service-by-sms' command definition
define command{
command_name notify-service-by-sms
command_line /home/charles/sms_me.sh "** $NOTIFICATIONTYPE$ Service Alert: $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ **"
}

Of course, I’m also letting Nagios notifications go to my email as well. Most are warnings about CPU usage, and numbers of processes. Every so often, tests fail because the box is unreachable. This is outside of my control, and never seems to last for very long.

Overall, my email has been quite reliable. There have only been a few downtimes, and only one was long. Emails sent to me during this time came in once Postfix was working again, so there have been no bad consequences of this. As you can see, getting a reasonably reliable setup was not just a matter of installing some packages and leaving the default settings, but it wasn’t too bad.

Posted in Uncategorized

AudioCompare

I just finished up a semester project for my Software Development class. Our assignment was to write a program that compares audio files to see if any are similar. I’m very proud of my group’s result. I’m calling it AudioCompare. AudioCompare accepts WAVE and MP3 files, and is agnostic to sample rate and number of channels. You might find this program useful if you have a large music library, and suspect that you have duplicate files for the same song. If there’s no way to spot this duplicate using the filenames or tag data, here’s your solution.

Efficiency at scale is my guiding principle when working on software. This program shows that. The algorithm used to compare audio files was designed from the beginning so that comparing two large sets of files can be done in linear time. The naive approach would, of course, take quadratic time. I also made use of concurrency as much as possible. Most of the work done in AudioCompare is on a per-file basis, making it easy to parallelize. The majority of the time the software spends running, all CPUs on your system should be at full capacity. In a test of comparing a set of 77 short files against themselves (almost 6000 pairs) on a modern machine, AudioCompare took just 20 seconds. The code is entirely Python, but thanks to the awesome NumPy module, runs as fast as C.

For more detail about how it works, check out the file TECHNICAL_OVERVIEW in the code repository.

Posted in Uncategorized

Running My Own Web Services, Part 2

You may want to start with Part 1. In this post, I continue to add functionality to the server that I’ve started administering.

DNS

When I set up my server I pointed my domain’s nameservers at DigitalOcean, and added an A record and CNAME record to DigitalOcean’s DNS control panel. I knew this wasn’t really good enough, especially if I wanted to run an email server. I used this tutorial to help me add more records. My configuration turned out as follows.

DNS records screenshot

DNS records screenshot

I am using A records to associate charlesconnell.com and wasp.charlesconnell.com with my droplet’s IP address. I also added CNAME records to alias <anything>.charlesconnell.com to point at the same IP. Finally, I added an MX record so that email sent to <anything>@charlesconnell.com would be directed to wasp.charlesconnell.com.

It is also important to ensure that reverse DNS is set up. The DNS system lets computers translate hostnames to IP addresses. Reverse DNS does the opposite, translating IP addresses to hostnames. In DigitalOcean, to set up reverse DNS, just rename your droplet to its full hostname. I set mine to be wasp.charlesconnell.com.

Lastly, I needed to configure my droplet to know its full hostname. I added this line to the very beginning of /etc/hosts:

37.139.15.208 wasp.charlesconnell.com wasp

It’s important that this be the first line of the file. I was stumped at why it wasn’t working until it turned out that this was the problem. To verify I set this up correctly, I checked that hostname returned wasp and hostname -f returned wasp.charlesconnell.com.

ownCloud

One of the cloud services I use most heavily is Dropbox. I love being able to have a place I can put files and know they will just appear on my other devices automatically. There is an open-source alternative to Dropbox called ownCloud that allows you to run a file sync server yourself. This runs as a PHP service, so it sits behind your web server just like WordPress does. ownCloud also has extensions that provide other services, like a calendar web app.

I wanted to make sure that all traffic with ownCloud would be over SSL, so I decided to create a new virtual host in Apache, this time on a nonstandard port. Recall that you need to add an exception in your IPTables configuration for this port. This virtual host points to the document root of ownCloud, and has SSL enabled. I approximately followed this tutorial on how to install ownCloud and configure Apache to work with it. The most annoying part of this was adding a new package repository to Ubuntu.  I visited charlesconnell.com:8080 to log into ownCloud and configure it. It will prompt you to download the desktop clients to set up syncing. Downloading the desktop client on Linux requires messing with your repository sources again, but the Windows version is simple.

After using ownCloud for a few weeks, I’m not super happy with the desktop client on Linux or Windows. On a few occasions they seemed to stop syncing after resuming my laptop from sleeping. They give me frequent notifications they are ignoring certain files, seemingly at random. That said, they do correctly sync my files most of the time. When I edit a file, they will notice and upload the new version of the file within a few seconds. I am thinking of moving to a different file synchronization service, but ownCloud will work for now.

Piwik

Since I’m running my own website now, I thought it would be interesting to do log-based website analytics alongside the more common Javascript analytics. It’s debatable which system is more accurate. Log-based analytics will capture visitors who have Javascript disabled or run with Ghostery enabled (like myself). However, log-based analytics also tend to over-report the number of visitors, because they also capture robotic crawlers, and do not do well at distinguishing unique visitors from repeated visitors.

I chose to use Piwik for website analytics.  They have a nice installation tutorial that I followed. I created two “sites” in my Piwik installation, both for tracking visitors to this blog. For one site, I copied the Javascript tracking snippet into my WordPress settings so it would appear in the header of every page. This was all the configuration needed to make this work.

For the other site, I set up a nightly cron job to process my site’s access logs and import that data into Piwik. The Piwik site has instructions on how to run a script that will read from your web server’s access logs into Piwik. After trying different combinations of command line arguments, I settled on the command that I wanted to run every night, listed here:

/home/charles/public/charlesconnell.com/public/piwik/misc/log-analytics/import_logs.py --url=http://charlesconnell.com/piwik /home/charles/public/charlesconnell.com/log/access.log --idsite=1 --recorders=1 --enable-http-errors --enable-http-redirects --enable-static --enable-reverse-dns --exclude-path=/blog --exclude-path=/piwik

I put this in a shell script I called piwik_log_import.sh. Then, I added a call to it to my crontab. Running crontab -e opens an editor with your user-specific crontab in it. I added this line to run my log import every night at 3am:

0 3 * * * /home/charles/public/charlesconnell.com/piwik_log_import.sh

Up Next

My next post will discuss how I set up my own email server, which was the hardest thing I’ve done so far.

Posted in Uncategorized

Running My Own Web Services, Part 1

For some time I have been thinking about moving my personal data away from cloud service providers and into my own infrastructure. This became especially important when it was revealed that the US government had backdoors into most of the large US tech companies. I spend my life on the Internet, so changing how I use it will not be an easy task. Cloud services provided by companies like Google, Yahoo,  Microsoft, Dropbox, etc. make it easy and usually free to have your data always available on every device. I just started this blog on my new personal server. Let’s go over what I’ve done so far. This is not intended to be a detailed tutorial. You can follow the individual tutorials that I will link to, just like I did.

Infrastructure

My server is a DigitalOcean ”droplet.” I got the cheapest option they have, for $6 per month with backups ($5 without). This is a virtual private server (VPS), meaning they give me root access to a virtual machine with a public IP address. I chose to have my droplet hosted in their Amsterdam data center because it is outside US territory. I also chose the “LAMP on Ubuntu 12.04″ machine image because I’m comfortable with Ubuntu.

I’m essentially renting space on another company’s computer, so I still don’t have total control over my data, but it’s better than before.

The first thing I did was follow the “Getting Started” and “Securing Your Server” tutorial on Linode’s website. When you’re running your own server and services on it, security is serious business. A lack of attention to security issues, or just a mistake, can have serious consequences. The number one reason for security problems in software is bad configuration, not bad code. I’m not an expert in tech ops, so I try to follow instructions as closely as I can.

The $5 droplet is a low-power machine with only 512 MB of memory, and a 20 GB disk. So far, it’s been plenty powerful, and I think it will continue to be fine in the future, unless this blog starts generating large amounts of traffic. In that case, I can move the blog onto its own droplet with more power.

Shell

I am picky about my shell. I like to use zsh with the oh-my-zsh extension installed. Installing this is just a matter of running the one-liner listed on the oh-my-zsh README. I’m going to be spending a lot of time logged into my droplet over SSH, and I already spend a lot of time doing command line work on my local machine. I wanted the shell prompt on my droplet to make it immediately obvious to me that I was not working locally. I modified the “phillips” theme that comes with oh-my-zsh to be just the way I like it. I called the new theme “connell“. It shows your username, machine name, and the deepest 5 levels of the directory tree that you’re in. It does all of this with a minimum of extra characters.

Web Server

I had a personal website hosted by a traditional web host, which worked well for me for a long time. I never had any problems with them, but since I was now running my own server, I wanted to host my website myself. I decided to switch over to WordPress from a custom site in the process, so I could blog. At the same time, I bought a domain name for my girlfriend and offered to host her site myself as well.

First, I needed to get my domain name pointing at my new server. I updated the nameserver entries with my registrar to point to DigitalOcean’s nameserver, and then added an A record on their nameservers to get to my server’s IP. I did the same with my girlfriend’s new domain. As always, this can take a day to propagate.

I wanted a legitimate SSL certificate so I could use my site securely. StartSLL offers free certificates that browsers will accept, and they verify that you are in control of the domain you’re signing. There is a series of steps that StartSSL makes you go through. When asked, I generated my own private key and certificate request, rather than allowing StartSSL to do it for me, since that would defeat the purpose of a private key.

I chose to use Apache for my web server since I am already comfortable configuring it. I set up 3 virtual hosts inside Apache. On port 80, I have a virtual host for unencrypted traffic to my site. I have a NamedVirtualHost for traffic on port 80 to my girlfriend’s domain. Apache will discriminate traffic based on the domain requested. These two virtual hosts obviously point to different document roots, each underneath our home folders. Another virtual host is for encrypted traffic on port 443 to my main site. This points to the same document root as my first virtual host did. You can follow this tutorial about how to set up SSL on Apache.

WordPress

I chose to use WordPress to run my website because it’s easy, looks good, and I was familiar with installing it. I had to create a new database in MySQL, create a new user in MySQL, and give them access to that database. Then I moved the WordPress files into my website’s document root. The tricky part was getting file permissions right. I needed to set the group of every file and directory in the WordPress directory, as well as the root directory itself, to belong to group www-data. (sudo chgrp -R www-data <document_root>). The files already had group write permission, so this allowed WordPress to modify its own files, which is generally necessary. It’s also a good idea to install the package libssh2-php  so that WordPress can SCP into your server to upgrade itself.

Conclusion

Considering all that could have gone wrong and given me headaches, everything went pretty smoothly. I don’t know if things have gotten better since a few years ago, but every problem I encountered was easily fixed with a fairly obvious configuration change. This was pleasantly refreshing, since I’m used to pulling my hair out wondering why something won’t work. So far I’ve got a server running, configured securely, and I’m running a public web site. This was a very good first dive into dealing with server administration. I use Google for email and calendar, and Dropbox for file storage. In my next post, I’ll talk about moving my data out of their servers and into my own.

Posted in Uncategorized