200 Posts Interlinked

I've reached a big milestone on my blog: this is the 200th post. I thought this would be a good time to look into how well-connected various posts are. The list of links at the bottom of this page makes a good starting point for a dive into the archives.

To determine how the posts in my archive are interlinked, I wrote a script in Ruby (called interlinktally.rb) to open each post (URLs were in a text file with one on each line) and use a regular expression to find links to other posts (I use relative URLs for links to other posts, so they are easy to distinguish) and list the matches in a file:

#!/usr/bin/env ruby

require 'open-uri'

# open file with list of URLs and file for keeping track of internal links found
source = File.open("AllPosts.txt", "r")
tally = File.open("full_out.txt", "a")

# save the list of URLs to a variable
postList = source.read

# Regular expression for finding links that are written in relative terms (i.e. href="/link/")
links = /(href=\"\/)[a-z0-9\-]*(\/\")/
# quotes and forward slashes have to be escaped

# Find "words" matching the regular expression in each URL in the source file
postList.each_line do |post|
  open(post.chomp).read.split(' ').each {|word| tally.puts "#{post.chomp} : #{word}" if word.match(links)} 
end

##Explanations:
# .chomp is used to remove newlines that are kept by .each_line
# .split(' ') separates the html at every space, to feed discrete words to the regular expression
# because URLs (and other html mark-up) don't have spaces they are treated as single words
# tally.puts prints to the output file (saved in the 'tally' variable)
# 'if word.match(links)' applies the regular expression to the html from each post

Next, I put the links to/from each post into a table in Excel (I did this part manually, although a script would have been more convenient). This is the bottom section of the table, showing links between recent posts.

Table of recent blog links

Observe how most of the links are below the diagonal (every post links to itself) because I only go back and add links to posts written after a certain post if they are part of a series (like the block of links among the entries in my series on Seven Languages in Seven Weeks) or if I've mentioned an upcoming post. You can also see that the "year in review" post links to every post from the previous year.

I graphed the number of links to and from each post (the ones going off the scale are the Year in Review posts) as well as the geometric average of the two numbers (to give a sense of how much area the links cover within the space of the whole set of posts). Taking the average evens out the fact that earlier posts have more time to accumulate links back to them, and later posts have a larger pool of previous posts on similar topics to link to.

Graph of links to and from the 200 posts to date

Aside from the Year in Review posts, a post listing some important rivers had the most links back to it, and one about parks had the most links to other posts.

If I'd had time to do more with these results, I could have tried to analyze the adjacency matrix or to draw the network graph of the best-connected posts.

Here are the top twenty posts by geometric average (i.e. the top 10 percent best-connected). I've excluded the Year in Review posts, links posts, a previous milestone, and multiple entries from the same series:

  1. A post on Erlang from my series on Seven Languages in Seven Weeks
  2. The aforementioned list of rivers
  3. A post about UNESCO sites
  4. The Tao of Water (a book review, etc.)
  5. A post about "seasteading"
  6. A water-related book review
  7. One on monastic orders
  8. A review of a book called Waves and Beaches, about the mechanics and dynamics of those things
  9. About a paper I read on the Water-Energy-Food nexus
  10. About nature-related art from New Brunswick
  11. A pair of book reviews on mushrooms and mosses
  12. From my trip to France in 2016, specifically the portion in Paris
  13. Footbridges and boardwalks are fun trail features
  14. A review of a book about free market environmentalism
  15. A .gif of the Bay of Fundy
  16. About tiling, in a mathematical sense
  17. Some timelines of MENA history
  18. Another book review about conservation strategies
  19. A visit to Kouchibouguac National Park
  20. A visit to Olympic National Park

Permalink