April 26, 2021

Importing external posts into Nanoc

I recently gave a talk on the evolution and origins of arkency blogging platform. When researching material for the talk, it struck me that I've stopped blogging on my own site right after the company blog started. I've been blogging there since 2013 instead.

Given the author's inalienable rights, the fact that you can write once and publish in many places and with conducive circumstances there was simply no reason not to import articles written by me into this blog. Here's how I did it.

I'm using nanoc static-site generator on this site. This is important because it has the ability to pull in data from external sources. The source of posts is publicly available on github. It happens that I've also written a convenient extension for nanoc to fetch remote posts from github. One could perhaps use git submodules trickery for that, I found it cumbersome. Especially with build-and-deployment environments such as netlify.

First step is to fetch external posts. Relevant data sources configuration:

# Gemfile

gem 'nanoc-github'

# lib/default.rb

require 'nanoc/github'

# nanoc.yaml

data_sources:
  - type: filesystem
    encoding: utf-8
  - type: github
    items_root: /external/posts
    encoding: utf-8
    repository: arkency/posts
    path: posts/

This means I have two sources of posts, the ones originally in /content/posts and remote, fetched into /external/posts.

Next step is to make all external posts ignored for processing and then dynamically add them into internal post collection. The reasons for this are:

not all remote posts are written by me, requiring filtering by author
YAML front matter metadata of remote posts differs slightly and needs to be adapted

# lib/default.rb

def external_articles
  @items
    .find_all("/external/**/*.md")
    .select { |item| item[:author] == "Paweł Pacana" }
    .select { |item| item[:publish] }
end

# this overrides Nanoc::Blogging method
def articles
  @items.find_all("/posts/**/*.md")
end

# Rules

ignore '/external/**/*'

preprocess do
  external_articles.each do |article|
    filename = article.identifier.components.last
    url_path = filename[11...-3]
    items.create(
      article.raw_content,
      {
        title:         article[:title] || item_title(article),
        created_at:    article[:created_at],
        kind:          "article",
        canonical_url: "https://blog.arkency.com/#{url_path}"
      },
      "/posts/#{filename}"
    )
  end
end

All of the above makes external posts behave like they're present in /posts/*. Regular processing rules for posts apply:

# Rules

compile '/posts/**/*.md' do
  filter :redcarpet,
    options: {
      fenced_code_blocks: true,
      autolink: true,
      tables: true,
      no_intra_emphasis: true,
      lax_spacing: true
    },
    renderer: ::MarkdownWithSyntax
  layout '/post.*'

  y, m, d, slug = 
    /([0-9]+)\-([0-9]+)\-([0-9]+)\-([^\/]+)/
      .match(item.identifier.without_ext)
      .captures

  write "/#{y}/#{m}/#{slug}/index.html"
end

You may think — why all that effort when you could just one-time copy those posts into site content? And that's a good question!

My justification is that I'd like to receive future post updates. I'll keep blogging elsewhere too. Having an external source keeps my blog up-to-date with just a ping of webhook. No human intervention.

To paraphrase Scott Hanselman: the only way to scale is to do less.