Data mining Peace Corps blogs

1 minute read

The blogs that are written by overseas expats both past and present are an amazing resource. They allow the international community to convey the experience of what it’s like to live with a community half-way across the world, in or at near real-time.

With the explosion in popularity of blogs in the last few years it has gone in just a few years from being difficult to find to difficult to keep up!

There are literally thousands of blogs out there, and there is no single authoritative repository of overseas blogs, particularly for organizations like VSO and the Peace Corps.

That said, it’s worth a shot at finding them.

To date, there have been a few attempts to highlight some of the more notable blogs – everything from “John Coyne Babbles - The 40 Best Peace Corps Blogs” to Peace Corps’ very own Blog-It-Home annual contest initiative.

But at the end of the day, periodic Google searches end up being the simplest method of pulling in new and exciting blogs.

I was able to use Excel / Google Sheets formulas to automatically extract blog titles and feed URLS.

Like so for blog titles:


and for blog titles (with a little error-correcting):

  =iferror(ImportXML("https://"&F2,"/html/head/link[@rel='alternate'][1]/@href"), ImportXML("http://"&F2,"/html/head/link[@rel='alternate'][1]/@href"))
Blogs as a Spreadsheet.
Blogs as a Spreadsheet.

I worked for awhile on trying to figure out a way to compile all of these into usable OPML (like ‘OPMLBuilder’ and OPML Generator), which is the file format that RSS feed readers use and recognize, but I didn’t have much luck.

In the end, I just decided to reverse-engineer Feedly’s OPML export file, then re-build it with my data in Excel using formulas and a ton of concatenation. It seems like a lot but it probably only took like 30 minutes.

Blogs in raw XML format.
Blogs in raw XML format.

Now I had a good OPML file that I could import into my Feedly program, all pre-sorted by country.

Blogs on Feedly.
Blogs on Feedly.

This is just kind of the start for me—I know that I think Pro features in Feedly, for example, that I could make this a lot more streamlined, and I could benefit from search histories (going back further than just when I imported the blogs, which would be great), but I’ll save those for another day.