Pages

Tuesday, July 7, 2009

Conquer Your Data (Part III)

“Conquered Data” - Data that is optimally organized and managed according to an overall strategic plan.
Caveats:
  1. This is not the perfect solution. In fact a perfect solution probably doesn't exist, technology changes so quickly, that a data conquering system must evolve overtime.
  2. This is a solution I have come up with to fit my data needs; it might not satisfy yours.
  3. Don't think by using this strategy you will have solved all of your data management problems. Part of conquering data requires personal discipline in following the strategy you set forth.
  4. I have done my best to make my solution operating system agnostic.
  5. I have only tried to solve the problem of managing data, not applications (as these differ widely across operating systems).
  6. The system is not meant to be setup in a day. It should take serious thought, effort, and time on your part to get one up and running. Hopefully, I've cleared some of the bigger hurdles.
  7. All of the topics I have written about, I have already implemented in my own system, to ensure they are practical. If you have more specific questions about the details feel free to ask.

To conquer your data you must be able to properly manage it, which means it must be organized, backed up, synchronized across devices, and secured. Organization is the critical step in this process because it determines what files need to be backed up, synced, and secured. Thus, I will spend most of this post describing my organizational scheme.

Organization:


I'm not the first person to organize files. In fact, a great Lifehacker article got me on the right track.

The goal of organizing my data was to make it
  • Canonical - any file in my system has a single logical location.
  • Fast – Information retrieval should be quick as well as the act of storing that information.
  • Robust – the ability to easily accommodate changes to or the introduction of new data sets in the system.

There are four properties that govern the entire organizational properties of a file:
  • Activity – Is the file accessed regularly? Old high school papers, or a todo list you use everyday?
  • Ownership – Who owns this information? Is it an original piece of work such as your college thesis, or is it a purchased mp3 song, or part of your company's confidential papers
  • Privacy – Can the file be made public or is it sensitive and/or confidential information that no one but you should be able to access?
  • Size – Although more of a practical attribute, it currently is a big factor in how to manage the file.
These attributes help determine the location and level of security for each file. For example, all older files should be in one place, and currently active files in another place, and identifying sensitive information helps protect it.

Here is the high level organizational structure of system, analogous to real world objects to make it more logically meaningful.:
  • Archives [properties: low activity, original content]
  • Briefcase [properties: high activity]
  • Media [properties: purchased content, usually large]
  • Vault [properties: sensitive information]
Archives encompass all historic files, these are files that you rarely look at. A good test for the archivability of a set of data is whether it can be compressed. In your life you will accumulate a lot of data, as such I have created a directory structure to try to handle the variety of files one encounters (at least the ones I have encountered so far):
  • Career – a place for your past resumes, employers, etc.
  • Communications – email history, contact information, etc.
  • Education – your schoolwork over the years
  • Life Management – real world type stuff such as: housing papers, legal documents, insurance, financial documents, etc.
  • Projects – past projects that you have worked on
The Media directory contains all forms of digital entertainment and applications, the structure is as follows:
  • Audio – all types of audio files from music to audio books
  • Games – Video games for the desktop or other devices
  • Images – Where pictures and art go
  • Programs – A place for computer programs
  • Text – electronic books and magazines can be found here
  • Video – podcasts, movies, and TV shows live here
The Vault is where sensitive information is stored and secured to the user's satisfaction.

The Briefcase consists of all regularly accessed files. The directory structure of the briefcase should mirror the system's high level structure, consisting of a documents (as opposed to archives), media, and a vault directory. In fact, the Archives directory should also mirror the system with similar sub-directories.

An optional directory one might consider is a “projects,” “workspace,” or “office space” directory. A place to put sets of related files too substantial to store in a briefcase. Since I am a software engineer, most of my coding projects fall into this category, but other things could apply such as the current year's classwork. Once completed, these projects can migrate to the archives directory.

The entire top level hierarchy can be found at the end of the post.

Synchronization [data property: high activity]:

With this type of data hierarchy, it is clear what information needs to be synchronized across all of your devices, the Briefcase. As the most accessed directory, by definition, the briefcase should always be synced across your devices. I use Dropbox to sync to all of my computers and it works wonderfully.

Security [data property: sensitive]:

Knowing that the files in the Vault are sensitive, it is important to secure them against unauthorized eyes. A sensible security scheme for me* is to encrypt the content, especially when backing it up over the Internet. I use Truecrypt for this purpose.

Backup [data property: owned content]:

The critical data to back up really comes down to whether it is original content. If you lose your first paper in your Philosophy class freshman year, you can't go email your teacher for the backup copy, it's gone forever. For everything else copies exist. However, I still like keeping records of important data for myself, these include: email, tax forms, legal papers, etc. Essentially, anything I am willing to archive should have a backup. So what does this leave out? Mostly media. Although it took hours of downloading to amass a music collection, it's not the end of the world if it is lost. It might hurt financially, but it can be restored. Since media files are usually large, it's quite a relief to not back up multiple copies of them anyway.

I backup my files in the following scheme based on the following Lifehacker article. I have a local external hard drive that performs automated daily, weekly, and monthly backups onto with Syncback. I also have an automated daily remote backup to the Mozy web service. My local backup stores all of my data excluding the media directory, but due to storage limits on my free Mozy account, I only backup my Briefcase there.

Since backup settings can be complicated, I recommend writing a backup readme file, containing your specific backup instructions. A couple of years down the road this can come in handy when trying to upgrade your system. The same goes for the organization, synchronization, and encryption schemes.

Conclusion:

I can report that it feels great to use a conquered data management system. Many everyday tasks on my machine have become simpler, and I am not completely frightened if my hard drive were to crash tomorrow. However, my mission is not complete, it may never be, as systems are constantly in flux. I must remain vigilant, keep the system up to date, and allow it to evolve over time.

I recommend you do the same.

*This is not claiming to be an unbreakable system. Absolute security requires a lot of sophistication and would require a blog in itself to explain.

Top Level Hierarchy:
  • David

    • Archives

      • Documents

        • Career

        • Communications

        • Education

        • Life Management

      • Media

      • Projects

      • Vault

    • Briefcase

      • Documents

      • Media

      • Projects

      • Vault

    • Media

      • Audio

        • Music

        • Podcasts

        • Books

      • Games

      • Images

      • Programs

      • Text

        • Articles

        • Books

      • Video

    • Projects

    • Vault