Software Stewardship
One Service to Rule Them All

I signed up for dotdotdot last week. The people behind the service want to tackle the long reading form by allowing importation of eBooks, storage of comments and so on (a bunch of social networking aspects for which I care little). Then a couple of days ago I got a personal email from one of their co-founders asking for feedback. I replied with what I think of their current level of service and what I want to see from such a service. Here’s an expanded reasoning on the subject of reading services.

I have been using Instapaper for several years now and I have generally been very satisfied with it. Its biggest downside is its lack of annotations. Its developer Marco Arment has said that the problem is the changing nature of the pages but frankly I don’t care. You are already capturing the pages as they are, now just allow me to annotate over that captured state and keep it around as long as I’m paying for the service. I don’t read to read, I read to learn or have fun. I would love to be able to count on Instapaper to help me remeber the stuff I found notable. But it doesn’t so it’s not complete for my purposes.

I tried using Pinboard in a combination with Instapaper under a misguided impression that it would help me with annotations. Whatever gave me that idea I don’t know. So last year I paid extra for capturing and storing entire pages, not just bookmarks. But it doesn’t have annotations and it didn’t have automated access to my data so that it could be backed up. Web is such a maleable media (as this site is my witness) that storing bookmarks without pages is close to useless. And even if sites didn’t shut down, if links didn’t get broken, I want so see what I read back when I read it the first time - not the content that is currently available on the same link. It would be nice to be able to compare historic and current content but I can do that manually. What I can’t do is cheaply and easily manually save all the pages that I think are interesting.

I’m an Amazon customer for books and eBooks and most of eBooks I read on my Kindle. So for me it’s again very important that I be able to automatically extract annotations from Kindle into some larger data repository where I can cross reference them. Plugging Kindle to copy “My Clippings.txt” and parse it, is not what I think of as a smooth experience. But even more importantly Kindle eBooks are in a proprietary format so I can’t have them stored anywhere else but in the Kindle ecosystem. I should have thought about that before but my thinking on the subject has been evolving only slowly.

Finally, I have also been using Evernote which I consider useful for what it was designed. It has the right idea with capturing pages and data and not booksmarks and it has and some cool features (e.g. OCR of images). But it’s not for reading (nor was it designed for it), it has strange issues (e.g. no HTTPS scraping!) and it feels cumbersome, at least to me. Though I have to add that I’m now trying out its Mac client and it seems to be more useful.

Anyway, this is just the top layer - what I have been using and how I find it lacking, not what I actually want. This is what I want from a reading/note-taking/journaling service:

  1. Open access to my data. I want to be able to automatically download all the data and related metadata that I generated (e.g. when was something added, read, commented, etc.) through interactions with the service. When and where I read something, when I read it again, when was a particular annotation made, all the version of it - everything.
  2. Scraping of web pages a la Instapaper, inclusion of eBooks like dotdotdot and inclusion of other data like Evernote. Inclusions of books I buy and any other material that I want (e.g. images, stuff that I write, etc.)
  3. Free annotations and associations on articles, tags, texts, highlights, images, etc. including other annotations, associations and actions. I want to be able to make new annotations on old annotations, edit old annotations, add new associations, delete them and annotate the delete action and so on.
  4. Automatic association of data and building of association networks. This can be done through relationship extraction or text analytics or something else enirely. It doesn’t matter how it’s done as long as it achieves a solid level of accuracy. Of course every time extraction algorithm is improved, the network is regenerated but the previous version of the network can also be loaded.
  5. Comprehensive search of all the data. That’s already working well in most of the services that I’m currently using.
  6. Everything is versioned so that development of the data and its assoction network can be accurately tracked. The development is also data.
  7. Sound business model not based on selling my data to advertisers.

The way I see it having open access to my data not avoids vendor lock-in but also allows me to back it up, store it in different services offering different things and in general hack on it. It’s also means that it’s future proofed as I’m sure that future algorithms on data association and semantic analysis will be better than what we have today. Couple that with versioning to give me the insight into how my own thinking has been changing over the years and that’s it.

Piece of cake.


Last modified on 2013-03-23