Moving from htmly to hugo for a static site blog

For years, I’ve used htmly for both my personal and work blogs. And it was fine. A flat-file based blogging system, that met my needs. But it has not been updated in a long time, and the sudden influx of hits when a post rolls around the fediverse means that I’ve been looking for a static site alternative.

I’ve settled on hugo and, so far, so good.

Setting up hugo

I just used the Debian package for it. This gave me hugo v0.111.3+extended.

Configuring hugo

There are a lot of options. For now, I’ve gone with very simple hugo.toml file:

baseURL = 'https://neilzone.co.uk/'
languageCode = 'en-gb'
title = "Neil's blog"
theme = 'etch'
CanonifyURLs=true
publishDir = '/var/www/neilzone.co.uk/public_html/'
enableRobotsTXT = "true"

[params]
  copyright = "All posts CC BY-SA-NC 4.0 or any later version, unless otherwise stated."
  disclaimer = "This blog is for interest only; nothing here is legal advice."

Most of this is, I think, self-explanatory.

publishDir is the directory into which hugo publishes the static site, once it has done this stuff. Because I am running hugo on my webserver, this means I can just publish directly into the webroot.

Update: I’ve had some good advice (thanks, bert!) that doing this is a bad idea, in case hugo does not run correctly. So instead, I am now publishing to the normal public directory, and using rsync to copy that into place:

#!/bin/bash

HUGOPATH="/home/neil/blog_neilzone"
WEBROOT="/var/www/neilzone.co.uk/public_html/"

rm -rf "$HUGOPATH"/public ; mkdir "$HUGOPATH"/public
cd "$HUGOPATH" && hugo && rsync -ali --delete "$HUGOPATH"/public/* "$WEBROOT"/  | grep -v d\\.\\.t | grep -v f\\.\\.t

[params] are bits specific to the site/theme, and vary according to theme. For now, I’ve just stuck with some basic bits, but I’d probably like to get a reference to my Mastodon accounts in there somewhere.

Importing posts from htmly

This was the trickiest bit, by a long way.

I had a lot of posts for both sides. Far too many to contemplate moving manually, and I couldn’t find a readily-available export script. So I wrote a rough-and-ready bash script to do the job. Don’t laugh, it (mostly) worked.

#!/bin/bash

# Set these before running it.

# This is where the "old"/"current" blog posts are stored
EXISTINGDIR="/var/www/decoded.legal/backup_of_old_blog/content/neil/blog"

# This is the path to the hugo site
BLOGROOT="/home/neil/blog_decoded.legal"

HUGODIR="/tmp/blogposts_for_hugo"
TMPDIR="/tmp/blogposts_for_processing"

# Delete and recreate the working directories

rm -r "$HUGODIR"
rm -r "$TMPDIR"

mkdir -p "$TMPDIR"
mkdir -p "$HUGODIR"

# get all the current blogpost files containing blogspots (i.e. .md files)

for FILE in $(find "$EXISTINGDIR" -type f -name "*.md"); do

FILENAME=$(basename $FILE)

echo "Working with $FILENAME"

# copy file to $TMPDIR

cp "$FILE" "$TMPDIR/$FILENAME"

echo "$TMPDIR/$FILENAME"


# from file name
# get first 10 chars as date
# date="${filename:0:10}" ;

POSTDATE="${FILENAME:0:10}"
YEAR="${FILENAME:0:4}"
MONTH="${FILENAME:5:2}"

echo "Year is $YEAR"
echo "Month is $MONTH"


echo "The date is: $POSTDATE"

# get title from original file

TITLE=$(cat "$TMPDIR/$FILENAME" | grep -oP "(?<=<\!--t ).*?(?= t-->)")


#remove &quot; and &#039;
TITLE=$(echo "$TITLE" | sed -e 's/\(\&quot;\|\&#039;\)//g')
echo ""
echo "The title is $TITLE"
echo ""

# Extract the tags
# <!--tag netbooks,Linux,computer tag-->
# echo '<!--tag netbooks,Linux,computer tag-->' | grep -oP "(?<=<\!--tag ).*?(?= tag-->)"
#cat "2023-05-06-09-35-32_debian,linux,bookworm,ssh,dropbear,luks_unlocking-a-luks-encrypted-partition-via-ssh-on-debian-12-bookworm.md" | grep -oP "(?<=<\!--tag ).*?(?= tag-->)"

TAGS=$(cat "$TMPDIR/$FILENAME" | grep -oP "(?<=<\!--tag ).*?(?= tag-->)")

TAGSWITHQUOTES='"'$TAGS'"'

TAGSWITHALLQUOTES=$(echo $TAGSWITHQUOTES | sed 's/,/\",\"/g')

echo "The tags are: $TAGSWITHALLQUOTES"


# remove the old header
# sed -i '/^<\!/d' 2023-05-06-09-35-32_debian,linux,bookworm,ssh,dropbear,luks_unlocking-a-luks-encrypted-partition-via-ssh-on-debian-12-bookworm.md

sed -i '/^<\!/d' "$TMPDIR/$FILENAME"


# create the file in $HUGODIR

NEWTITLE="$POSTDATE-$TITLE.md"

echo "New title is $NEWTITLE"

touch "$HUGODIR/$NEWTITLE"

# preserve the post date, to keep the order; ignore the time as it doesn't matter to me
POSTDATEANDTIME="$POSTDATE"T09:00:00-00:00

#convert spaces to hyphens
URLTITLE="${TITLE// /-}"

#convert to lowercase
URLTITLE="${URLTITLE,,}"

# remove three dots

URLTITLE="${URLTITLE//.../}"

#remove some punctuation

URLTITLE=$(echo "$URLTITLE" | tr -d '?&.;:,()#')

# add the header
# +++
# title= "My First Post"
# date= 2022-11-20T09:03:20-08:00
# tags=
# +++
# To try to keep the same URLs as with htmly, I'm using the per-post URL parameter. This works other than where I created a post in htmly, and then changed the title after publication

tee "$HUGODIR/$NEWTITLE" <<EOF
---
title: "$TITLE"
date: $POSTDATEANDTIME
url: /$YEAR/$MONTH/$URLTITLE
tags: [$TAGSWITHALLQUOTES]
draft: "false"
---
EOF

# then add the content from the original file

cat "$TMPDIR/$FILENAME" >> "$HUGODIR/$NEWTITLE"

# And done!

done

# Deploy stuff

# Remove any blogposts from previous testing

rm "$BLOGROOT"/content/posts/*

# Move the newly-created blogposts into the hugo site directory

cp $HUGODIR/* "$BLOGROOT"/content/posts/

# Build and deploy the site

cd "$BLOGROOT"

hugo

Note that shellcheck returns some warnings, and there is definitely scope for improvement. But it did its job: as I say, this mostly worked. I had to finesse it a few times to sort out rogue punctuation, and I’m still not sure it is 100%. But it did enough for me to be happy with the outcome.

Preserving URLs from htmly to hugo

I was quite keen to preserve, as far as I reasonably could, the same URL structure, so that (a) internal links continue to work, and (b) if someone has shared a post of mine somewhere online, clicking on that link should still work.

To do this, I’m using the per-post URL parameter. The script above did this automatically for the imported posts and, for new posts, I’ll script it so it does it automatically for me too.

This works other than where I created a post in htmly, and then changed the title after publication. I’ll monitor the webserver logs to see if there are any obviously-related 404s showing.

Overall, I’m quite pleased with this approach.

Theming the sites

I wanted a simple, clean theme, with dark mode, and which is entirely self-hosted (i.e. no remote resources).

The suggested default, ananke, is fine, but didn’t have dark mode.

I am now using the very basic etch theme, which has a decent-enough dark mode by default.

One of the bits I didn’t like about it was that it lacked any sort of “Related posts” functionality. I don’t know how many, if any, visitors want to get a list of related posts, but it sounded like the kind of thing that might be useful, so I added it.

I found hugo’s documentation hard to follow for this, so here’s what I’ve done.

In the theme’s /layouts/partials directory (i.e. themes/etch/layouts/partials/), I created a new file, related.html:

{{ $related := .Site.RegularPages.Related . | first 15 }}

{{ with $related }}
  <div class="bg-light-gray pa3 nested-list-reset nested-copy-line-height nested-links">
   
	<h1 class="f5 b mb3">You might also like:</h1>
	<p class="f5 b mb3">{{ i18n "related" }}</p>
    <ul class="pa0 list">
	   {{ range . }}
	     <li  class="mb2">
          <a href="{{ .RelPermalink }}">
            {{- .Title -}}
          </a>
        </li>
	    {{ end }}
    </ul>
</div>
{{ end }}

This will show a heading of “You might also like”, and a bulleted list of up to 15 related posts, which might be too many.

In the theme’s single.html file (e.g. themes/etch/layouts/_default/single.html, I added a reference to this “partial” near the end:

{{ define "main" }}
<article>
    <header id="post-header">
        <h1>{{ .Title }}</h1>
        <div>
        {{- if isset .Params "date" -}}
            {{ if eq .Lastmod .Date }}
                <time>{{ .Date | time.Format (i18n "post.created") }}</time>
            {{ else }}
                <time>{{ .Lastmod | time.Format (i18n "post.updated") }}</time>
            {{ end }}
        {{- end -}}
        </div>
    </header>
    {{- .Content -}}
    {{ partial "related.html" . }}
</article>
{{ end }}

And so now, when I build the site, it generates and brings in a list of related posts. I think this is based on tags, but I’m not 100% sure.

Again, good enough for now.

Disclaimers / licensing in the footer

The theme already handled a copyright notice / licensing information in its footer.

You control this using a custom parameter in the [params] of your site’s hugo.toml:

  copyright = "All posts CC BY-SA-NC 4.0 or any later version, unless otherwise stated."

I also wanted a simple disclaimer for the work blog. To do this, I edited themes/etch/layouts/partials/footer.html, to add another field controlled by a new customer parameter, disclaimer, in the [params] section of the config file:

<footer id="footer">
    {{ .Site.Params.copyright }}
    {{ .Site.Params.disclaimer }}
</footer>

And so, in the hugo.toml config file, I can add whatever disclaimer I want:

  disclaimer = "This blog is for interest only; nothing here is legal advice."

In time, I’ll work out how to add a hyperlink back to our work contact page here but, for now, it’s just a simple text statement.

Analytics / viewing stats

There are none. This is fine by me, as it doesn’t matter a huge amount how popular a post is - that’s not an important metric to me.

Cookies, javascript etc.

Nope! Just a simple page of html, with a bit of css to liven it up a little.

Tinkering with the theme’s CSS

Some themes have a way to add custom css by using a custom.css file. I’m afraid to say that I just edited the theme’s normal, main, css file, and made the changes I wanted - basically, to change the font so that the blogs tie in with our main site. I’ve left everything else as is for now.

rss

The theme has default rss support, which is great. But it isn’t the same rss URL as the prevous blog, so people will need to update their readers. I should probably consider some sort of redirect within the server, to help with that…

Update: I’ve used a basic rewrite rule in each site’s .htaccess file for this:

RewriteEngine on
RewriteRule ^feed/rss$ index.xml 

Content-Security-Policy and other webserver-y bits

hugo deals with generating the site, which you can then move to wherever you want. Since the webserver is currently running apache, I just created the headers I wanted, including CSP, in the site’s .htaccess file.

And that was it

It probably took me a solid half-day of tinkering to get this far. I’m happy with that.