Skip to main content

Jeremy Cherfas

Jeremy Cherfas

Further adventures in extricating Kindle notes and highlights

2 min read

Having taken a look inside the My Clippings text file extracted by Calibre, I could see that as a way to move forward. Then, Chris Aldrich commented

“If you sync your books/notes to the Kindle desktop app, you can also alternately export your notes and highlights from individual texts as raw html with one of their interface buttons from the Kindle app”.

Alas, I can see no such button. At least not on the desktop.

A little disappointed, I futzed around on the internet for a while, where the collected wisdom is not very wise. A lot of the advice riffs on “a) Copy the notes and highlights you want to share, and paste them into a txt. file or a docx file.” Either that, or it requires Evernote.

Harrumphing, I tried the Bookcision bookmarklet, on Chrome, and it works beautifully. I downloaded three different packages for the data: text, XML and JSON. The one slight problem is that I have no idea how to turn the JSON file into presentable HTML. I could do it for the XML, but while thinking about that I suddenly remembered that I do also have the Kindle app on my iPhone. No idea why, as I don't believe I'd ever read any book on the phone. (The only document in it was a podcast script, which suggests that I may have tried using it for that purpose before I discovered the brilliant Teleprompter.css in Marked 2.)

So I fired up Kindle on the phone, downloaded one of the books I'd marked up, looked at the marginalia and there was an option to share an HTML file as email. And lo, untouched, it works beautifully.

So, I need first to figure out some quiet little changes to the CSS for those notes (or not) and then I have a workflow: Read the book on Kindle (or, I suppose phone, if I'm desperate), open Kindle App on phone, export notes as HTML, prettify (or not) and copy notes to a new post, which could incorporate additional thoughts, review-like stuff etc.

Jeremy Cherfas

Extracting notes from Kindle may be easier than I thought.

2 min read

Further to my earlier post, I dusted off Calibre and plugged in the Kindle (two things I don't do very often).

Calibre, when connected to the Kindle, shows a text file called My Clippings. The file is very well structured, with book title, author, Note or Highlight details and the note or highlight itself.

There is no separation between books, but of course the title will have changed from one to the next. That would make it possible to break the single file into individual files, one per book.

That could be automated, but in the end it is probably quicker to do it by hand. The first time would take a while, but from then on I can just scroll to the bottom of the file, because the clippings are in chronological order.

Breaking up each note and formatting them must be automatic. But it will also be reasonably simple with decent text manipulation.

  • line 0 is ==========
  • Next line is the title, which will also be the name of the file and can be ignored.
  • Next line is - Your Highlight/Note on page xxx | location xxxx-xxxx | Added on Monday, 2 May 2016 19:22:20
    • Four pieces of information
    • Highlight or note
    • Page in book
    • Location in Kindle file
    • Date (irrelevant)
  • So it should be easy to wrap those in HTML with appropriate classes
  • Next line is blank
  • Next line is the note itself

All I need to do now is make a script to do all that. Ha.


Jeremy Cherfas

What to do about bookmarked pages?

1 min read

The recent outage at Instapaper made me realise that I currently have no secure backup for either the things I've read or my notes on them. Notes live in Kindle and at Instapaper, while "things I've bookmarked" live at Instapaper and Pinboard. Good though those are, I don't have local copies, and I should.

This could take some thought. I can see no way of downloading the Kindle notes and highlights, although it is only HTML, on the surface it looks like pretty complex HTML. Instapaper is probably the same.

I should also try to have some sort of system that leaves unmarked things in Inboard and uses Instapaper specifically for things with notes.

This stuff is hard, but empowering.

Later ... Bookcision will let me get a clean copy of Kindle notes, one book at a time. Use in Chrome for preference. Calibre is also supposed to be able to do this, which might in the end be simpler.

Jeremy Cherfas

The Darkest Town In America

1 min read

Cracking good story, so well reported.

Makes me wonder about the cost -- human and financial -- of doing this sort of thing in this day and age. I don't give 538 any support, but I definitely would if there were an easy way to do so on a per article basis.

Jeremy Cherfas

Transcribing one's own tape is a thankless, dull and sometimes embarrassing task, but the results are always worth it

1 min read

Google speech to text is a great first step, but that needs follow up with something like f5 transcription. It isn't cheap, but it does one job really well, which makes it worthwhile for me.

Jeremy Cherfas

The stream restored.

1 min read

Thanks again to OleVik for quickly fixing a problem with the Grav caching system. As a result, a complete stream of everything here now appears in the sidebar there.

Now I need to think about managing precisely what is transferred there -- no Instagrams seems an obvious one -- and how it is displayed.

Next, perhaps, will be to investigate bringing fornacalia over into a sub-domain.

Jeremy Cherfas

This stream now on the mothership

1 min read

Update: 6:00 PM It seems there's a problem with the Grav cache. If caching is on, then the stream shows in the sidebar the first time the sidebar appears, but not on subsequent times. Until that's sorted, I'm disabling the plugin. Baby steps. I'm still content.

I continue to be astonished by the generosity of strangers. No sooner had I asked on the Grav forum for help to display an RSS feed than OleVik had shared his plugin to Parse RSS and Atom feeds with Twig. And a couple of hours later, there it is in my sidebar.

Of course there's still plenty to be done. At the moment I am displaying only item.title because item.content is often identical and I haven't quite figured out how to do a comparison in order to show meaningful content. But it's a start. I could also restrict the feed to certain types of post here, or even set up more than one feed.

I probably also ought to add some specifically indieweb classes, though I'll have to ask about that.

Nevertheless, very content.


Jeremy Cherfas

Putting my house in order: Phase 1

5 min read

For a while now I've been concerned about owning my own data, in the spirit of IndieWeb. In June 2015 I started an experiment in the indieweb using a CMS called Known, and bits of that worked well enough. Trouble is, I actually have almost no control over the details of the CMS, which has meant that whenever I come across a little problem that might be within my capacity to solve, I generally can't even try. This frustration has finally reached the point where I'm prepared to do something about it, like host my own copy of Known rather than rely on Indiehosters.

I've also been hanging around in the Indieweb Slack channel, where I'm both amazed at what people are doing and increasingly convinced that it is beyond me. But I'm determined to give it a proper try.

The first step is to figure out just how to organise myself, and this post is intended to describe how things are currently and why, in an effort to clarify my own thoughts and maybe get some advice from the indieweb gurus.

The properties

This is the site I currently view as the mothership. It has been through many incarnations, from NucleusCMS to WordPress to Octopress to its current platform Grav. I dumped WordPress because it was just too complex, slow and hard to fix for what was essentially a very simple site. I'm only about 10% of the way through transferring old posts from Octopress to Grav, because I insist on doing it by hand to catch broken links and stuff. The big downside of being on Grav is there doesn't seem to be a huge amount of indieweb interest in that community.

There's nothing really social associated with this site; I have the same username on ADN (for now) and on Flickr (maybe also not long for this world) and on 10Centuries. Also Facebook, but I hardly use that except for promoting episodes of ...

Where my food podcasts live. This is a WordPress site. Why? Because it was relatively easy to set up for podcasting, and that part of it works very well. Why a separate domain? Because I think it is quite likely that people who are interested in that podcast might not be that interested in everything else I do, and it seems a natural to keep it separate. The vast majority of posts are podcast episodes, although there are also copies of the email newsletter and occasional other posts related to topics that have been covered in podcast episodes. I doubt that it would be worth moving this to another CMS.

This site has accounts at Twitter and Instagram. Posts there go beyond the strict confines of the podcast, but generally stay in the area of food studies in the widest sense.

(Got to fix that www thing.)

Like Eat This Podcast, this is a WordPress site. It is dedicated to my various breadmaking activities, again kept separate because I wasn't sure whether people interested in my breadmaking would be interested in my other activities.

No social activity, except that I post breadmaking things to the Eat This Podcast account at Instagram.

This is the site that could most easily become a category of if I wanted to get rid of sites, but I rather like the URI.

The indiweb experiment, running on Known CMS but hosted at Indiehosters, which gives me very little freedom to tinker. [^1] Bits of this work, and work well. Now, however, I think I'm ready to declare an end to the first experimental phase and start to embrace the indieweb in earnest.

I should note that used to have a much more interesting website associated with it, and that I would eventually like to get that back (I have all the content). I used this domain for my experiment because it was one I already owned. In retrospect, that was a mistake. There is a Twitter account called Vaviblog but I have used it very little recently. If I got the old content back, I might use it more often.

[^1]: That's perhaps not fair. It doesn't give me any kind of FTP access, so I can't use that route to add a new plug-in or fiddle beneath the hood. I recently learned that there is a way I could use Git to make changes and have Indiehosters pull those changes, but I haven't actually bitten that bullet. If I'm going to go that route, I may as well save myself a bit of cash and host it where I host other sites.

The site for a short-lived project, hosted at 10Centuries. Although I've almost stopped posting longer things there, it is possibly my most active social site, but for a small society.

What I'm thinking

One approach I've seen and liked is the way Chris Aldrich has implemented his "primary hub" in WordPress with his "social stream" in Known in a sub-domain (though I'm not entirely sure what makes a post there different from a post in the hub). I don't really want to migrate my main site back to WordPress, but maybe I can achieve a similar sort of thing in Grav. Mind you, I still haven't enabled comments on Grav; how on Earth am I going to manage pulling them in from other sites? My main worry there is that because Grav developers are not all that interested in indieweb, it will be beyond my abilities. So maybe back to WordPress really is the best option.

I'm open to any and all suggestions, and I'm going to crosspost to because that should be able to receive replies from elsewhere.

Jeremy Cherfas

Review of H is for Hawk by Helen Macdonald

1 min read

What a wonderful book. When it first came out and got lots of praise I stupidly decided that it wasn't for me, possibly because the praise tended to focus on lyrical nature writing and that's not something I enjoy. However, my friend Nicola Davies was adamant that I give it a shot. I did, and was entranced from the word go. The writing isn't lyrical as I understand it. It is sharp, penetrating, incisive and eye-opening. Helen MacDonald has a gift for metaphors and similes that I envy. Her writing is clear-eyed but not harsh, and I marvelled at how she wove the three threads of her father and his death, TH White and his gos and her own adventures with Mabel into a tight, seamless braid.

Haven't quite managed to work out how to POSSE from here to Goodreads (or bring Goodreads in here) so a standard cut and paste -- rather than embedding from Goodreads -- will have to do.