Backups competition

I’m currently comparing two popular open source Linux disk backup solutions that support Backblaze B2 Cloud Storage – restic and duplicity.

Both do deduplication to reduce archive size, but which is better for me? I’ve read a review that concluded duplicity took more wall clock time, but used less total disk space by as much as 30%.

I’d like to try to reproduce that effort, using my own Linux hosts and the Backblaze B2 Cloud. Duplicity makes a distinction between full and incremental backups, and lets you clean all but the “n” most recent full backups, or clean all incrementals except for those from the “n” most recent full backups. I like this, because it lets me handle disk space used in used of daily full backups and hourly incrementals. You can make long term monthly snapshots to store offsite, for long term archival.

Restic requires we ‘init’ a repo before the first full backup, and after that, it’s always incremental. It’s less convenient for disk space management, I’m thinking monthly repositories, and starting over on the 1st of each month with a new archive.

I wanted to start a scheme where I store daily full backups, and hourly incrementals. It required I write a few shell scripts to run through cron. Got to use expect, which is fun, and makes ignoring errors that occur when calling child shell scripts. I would like to let the child script die, and then move on with mine, so calling them via expect not only provides a tty, it handles all signals like a user might.

inter process communications

There seem to be so many varied solutions to get one program to talk to another. From SOAP and REST over HTTP, to Apache ActiveMQ, Apache Camel, Etcd, Zookeeper, and Kafka, it’s all about sending information to, or fetching information from, another program for processing. It’s amazing how many unique and specialized solutions there are.

ActiveMQ is mostly used as a (multi-)point to (multi-)point queuing system, but similar to Kafka, as it also supports publish/subscribe data transfers. Etcd and Zookeeper are just key/value databases, mostly used for cluster management. Camel is sort of like a language interpreter, allowing connections between programs that use different forms of technology (eg converting SOAP calls to ActiveMQ JSON requests, and interpreting the results returned as well). Camel is useful for bridging disparate systems together.

40 Years Of Blakes 7 – time flies

Wow, the old British sci-fi campy fun show, Blakes 7, launched 40 years ago, in 1978. It only lasted four seasons (well, Blake himself really only lasted the first two) but it was still a fun romp, good campy laugh. Gosh, TV in the late 70s was bizarrely bad, overacted, overhyped. Such low budgets too. But back then, Cable TV cost $15 per month, and a 1 bedroom apartment could be rented for just $320/month.

God mode – we always suspected it was there, and we were right

This blows me away.

2FA – more than just an acronym

Two Factor Authentication is critical to protecting our online identities, especially for things like desktop/laptop home/work access, bank accounts, email accounts, social media accounts. I’m not in favor of making it harder to login everywhere, quite the opposite. There exist standard ways to make it easier, by learning to trust who we say to trust. And who we trust should always need to learn two ways to identify and authenticate myself whenever I try to login from a new device, that’s all. If I’ve never used that laptop or cell phone before, ask twice, for two different things, really make sure it’s me, and not some hacker or stalker.

Apple phones often use fingerprint to unlock, or faceprint to unlock, and that’s just fine, but they are still only one factor authentication. Why not make 2FA an option. Require a password AND a fingerprint. I can understand not making it the default, I mean, gramma, c’mon. Grandpa will never get it, so not the default, obviously. But make 2FA an option, please. And make usb keys, like the yubikey, useful on cell phones, which currently only have a lightning or microusb port. I’m reasonably confident you can make a waterproof USB-A port by now.

2FA requires two of the standard three methods to identify yourself; 1) something you are, 2) something you know, like a password or PIN, 3) something you have, like your retina, a Yubikey usb key, a physical SecurID card, or access to a software program, eg. Google Authenticator, that generates the correctly encoded time based one time passwords. If you think about it hard enough, you’ll quickly figure out why that first one raises so many issues, so we really only focus on the other two, know and have.

Since Google made Yubikeys standard for their engineers, they bragged how it virtually eliminated successful virus attacks and spear fishing campaigns. Honestly, I don’t see how that’s possible. Sure 2FA is great, but people are going to be fallible. If a social virus convinces one person that this message is from someone they trust, somebody somewhere is going to think “oh, nerd shit, okay, whatever. Just make Instagram work.”

I’ve used the Google Authenticator and SecurID apps on my iPhone for years, for 2FA into multiple sites. Surprisingly, none of them are my banks. Banks mostly rely on well known to be hackable channels as second factors, such as SMS text messages to your cell phone number. It’s trivial to setup 2FA like Google Authenticator on most Linux OSs. The only real problem with widespread deployment is the hardware cost, training, and the common limitations of all time based one time password generators, like SecurID and Google Authenticator.

If you only generate a new password once every 60 seconds, and you obviously do not allow reuse of any one time use only password, then network administrators can not login to more than one server or network device every minute. When you have to troubleshoot a cluster of 4-128 app servers, you can see how this may appear to be tossing a proverbial monkey wrench into the gearbox of IT operations.

If I need to run a simple “ls” program on each server, it can take forever to use my own account and 2FA. So what does that lead to, if not accounts shared by a large group, with no accountability. OR worse, trusted root access.

I’ve never experienced a successful Kerberos setup, I’ve only read about it, so I have no idea how to make it work the way it was intended. With Kerberos, One does a remote procedure call to a server to authenticate oneself, and in response, are granted a “token”, which one presents when logging into other servers, accessing remote file servers, printers, or storage. The token has a limited length of time it is valid, and any two factor authentication is done at registration and renewals. Any access to resources is governed by network and/or local rulesets. It’s a nice theory, but like I said, I’ve never seen a working model to learn the dos and don’ts from.

One other factor you have, can be an X.509 certificate, like an SSL client certificate, to be used to identify you to internal websites. They can still request your network/ldap password, but they have to accept that you already have something identifying you being issused by an authority you trust, and it was supplied automatically.

When I restructured the Employease datacenter authentication structure, I created a BSD jail that was the Certificate Authority for our internal.eease.com domain. That server had a few bash scripts that created server and client certificates for access to various things, like the ldaps server certificates, https webserver certs, and client certs for openvpn authentication, which made autostarting it easy to support on MacOS and Linux.