Moving from htmly to hugo for a static site blog
For years, I’ve used htmly
for both my personal and work blogs. And it was fine. A flat-file based blogging system, that met my needs. But it has not been updated in a long time, and the sudden influx of hits when a post rolls around the fediverse means that I’ve been looking for a static site alternative.
I’ve settled on hugo
and, so far, so good.
Setting up hugo
I just used the Debian package for it. This gave me hugo v0.111.3+extended
.
Configuring hugo
There are a lot of options. For now, I’ve gone with very simple hugo.toml
file:
baseURL = 'https://neilzone.co.uk/'
languageCode = 'en-gb'
title = "Neil's blog"
theme = 'etch'
CanonifyURLs=true
publishDir = '/var/www/neilzone.co.uk/public_html/'
enableRobotsTXT = "true"
[params]
copyright = "All posts CC BY-SA-NC 4.0 or any later version, unless otherwise stated."
disclaimer = "This blog is for interest only; nothing here is legal advice."
Most of this is, I think, self-explanatory.
publishDir
is the directory into which hugo
publishes the static site, once it has done this stuff. Because I am running hugo
on my webserver, this means I can just publish directly into the webroot.
Update: I’ve had some good advice (thanks, bert!) that doing this is a bad idea, in case hugo
does not run correctly. So instead, I am now publishing to the normal public
directory, and using rsync
to copy that into place:
#!/bin/bash
HUGOPATH="/home/neil/blog_neilzone"
WEBROOT="/var/www/neilzone.co.uk/public_html/"
rm -rf "$HUGOPATH"/public ; mkdir "$HUGOPATH"/public
cd "$HUGOPATH" && hugo && rsync -ali --delete "$HUGOPATH"/public/* "$WEBROOT"/ | grep -v d\\.\\.t | grep -v f\\.\\.t
[params]
are bits specific to the site/theme, and vary according to theme. For now, I’ve just stuck with some basic bits, but I’d probably like to get a reference to my Mastodon accounts in there somewhere.
Importing posts from htmly
This was the trickiest bit, by a long way.
I had a lot of posts for both sides. Far too many to contemplate moving manually, and I couldn’t find a readily-available export script. So I wrote a rough-and-ready bash script to do the job. Don’t laugh, it (mostly) worked.
#!/bin/bash
# Set these before running it.
# This is where the "old"/"current" blog posts are stored
EXISTINGDIR="/var/www/decoded.legal/backup_of_old_blog/content/neil/blog"
# This is the path to the hugo site
BLOGROOT="/home/neil/blog_decoded.legal"
HUGODIR="/tmp/blogposts_for_hugo"
TMPDIR="/tmp/blogposts_for_processing"
# Delete and recreate the working directories
rm -r "$HUGODIR"
rm -r "$TMPDIR"
mkdir -p "$TMPDIR"
mkdir -p "$HUGODIR"
# get all the current blogpost files containing blogspots (i.e. .md files)
for FILE in $(find "$EXISTINGDIR" -type f -name "*.md"); do
FILENAME=$(basename $FILE)
echo "Working with $FILENAME"
# copy file to $TMPDIR
cp "$FILE" "$TMPDIR/$FILENAME"
echo "$TMPDIR/$FILENAME"
# from file name
# get first 10 chars as date
# date="${filename:0:10}" ;
POSTDATE="${FILENAME:0:10}"
YEAR="${FILENAME:0:4}"
MONTH="${FILENAME:5:2}"
echo "Year is $YEAR"
echo "Month is $MONTH"
echo "The date is: $POSTDATE"
# get title from original file
TITLE=$(cat "$TMPDIR/$FILENAME" | grep -oP "(?<=<\!--t ).*?(?= t-->)")
#remove " and '
TITLE=$(echo "$TITLE" | sed -e 's/\(\"\|\'\)//g')
echo ""
echo "The title is $TITLE"
echo ""
# Extract the tags
# <!--tag netbooks,Linux,computer tag-->
# echo '<!--tag netbooks,Linux,computer tag-->' | grep -oP "(?<=<\!--tag ).*?(?= tag-->)"
#cat "2023-05-06-09-35-32_debian,linux,bookworm,ssh,dropbear,luks_unlocking-a-luks-encrypted-partition-via-ssh-on-debian-12-bookworm.md" | grep -oP "(?<=<\!--tag ).*?(?= tag-->)"
TAGS=$(cat "$TMPDIR/$FILENAME" | grep -oP "(?<=<\!--tag ).*?(?= tag-->)")
TAGSWITHQUOTES='"'$TAGS'"'
TAGSWITHALLQUOTES=$(echo $TAGSWITHQUOTES | sed 's/,/\",\"/g')
echo "The tags are: $TAGSWITHALLQUOTES"
# remove the old header
# sed -i '/^<\!/d' 2023-05-06-09-35-32_debian,linux,bookworm,ssh,dropbear,luks_unlocking-a-luks-encrypted-partition-via-ssh-on-debian-12-bookworm.md
sed -i '/^<\!/d' "$TMPDIR/$FILENAME"
# create the file in $HUGODIR
NEWTITLE="$POSTDATE-$TITLE.md"
echo "New title is $NEWTITLE"
touch "$HUGODIR/$NEWTITLE"
# preserve the post date, to keep the order; ignore the time as it doesn't matter to me
POSTDATEANDTIME="$POSTDATE"T09:00:00-00:00
#convert spaces to hyphens
URLTITLE="${TITLE// /-}"
#convert to lowercase
URLTITLE="${URLTITLE,,}"
# remove three dots
URLTITLE="${URLTITLE//.../}"
#remove some punctuation
URLTITLE=$(echo "$URLTITLE" | tr -d '?&.;:,()#')
# add the header
# +++
# title= "My First Post"
# date= 2022-11-20T09:03:20-08:00
# tags=
# +++
# To try to keep the same URLs as with htmly, I'm using the per-post URL parameter. This works other than where I created a post in htmly, and then changed the title after publication
tee "$HUGODIR/$NEWTITLE" <<EOF
---
title: "$TITLE"
date: $POSTDATEANDTIME
url: /$YEAR/$MONTH/$URLTITLE
tags: [$TAGSWITHALLQUOTES]
draft: "false"
---
EOF
# then add the content from the original file
cat "$TMPDIR/$FILENAME" >> "$HUGODIR/$NEWTITLE"
# And done!
done
# Deploy stuff
# Remove any blogposts from previous testing
rm "$BLOGROOT"/content/posts/*
# Move the newly-created blogposts into the hugo site directory
cp $HUGODIR/* "$BLOGROOT"/content/posts/
# Build and deploy the site
cd "$BLOGROOT"
hugo
Note that shellcheck
returns some warnings, and there is definitely scope for improvement. But it did its job: as I say, this mostly worked. I had to finesse it a few times to sort out rogue punctuation, and I’m still not sure it is 100%. But it did enough for me to be happy with the outcome.
Preserving URLs from htmly
to hugo
I was quite keen to preserve, as far as I reasonably could, the same URL structure, so that (a) internal links continue to work, and (b) if someone has shared a post of mine somewhere online, clicking on that link should still work.
To do this, I’m using the per-post URL parameter. The script above did this automatically for the imported posts and, for new posts, I’ll script it so it does it automatically for me too.
This works other than where I created a post in htmly, and then changed the title after publication. I’ll monitor the webserver logs to see if there are any obviously-related 404s showing.
Overall, I’m quite pleased with this approach.
Theming the sites
I wanted a simple, clean theme, with dark mode, and which is entirely self-hosted (i.e. no remote resources).
The suggested default, ananke
, is fine, but didn’t have dark mode.
I am now using the very basic etch
theme, which has a decent-enough dark mode by default.
One of the bits I didn’t like about it was that it lacked any sort of “Related posts” functionality. I don’t know how many, if any, visitors want to get a list of related posts, but it sounded like the kind of thing that might be useful, so I added it.
I found hugo’s documentation hard to follow for this, so here’s what I’ve done.
In the theme’s /layouts/partials
directory (i.e. themes/etch/layouts/partials/
), I created a new file, related.html
:
{{ $related := .Site.RegularPages.Related . | first 15 }}
{{ with $related }}
<div class="bg-light-gray pa3 nested-list-reset nested-copy-line-height nested-links">
<h1 class="f5 b mb3">You might also like:</h1>
<p class="f5 b mb3">{{ i18n "related" }}</p>
<ul class="pa0 list">
{{ range . }}
<li class="mb2">
<a href="{{ .RelPermalink }}">
{{- .Title -}}
</a>
</li>
{{ end }}
</ul>
</div>
{{ end }}
This will show a heading of “You might also like”, and a bulleted list of up to 15 related posts, which might be too many.
In the theme’s single.html
file (e.g. themes/etch/layouts/_default/single.html
, I added a reference to this “partial” near the end:
{{ define "main" }}
<article>
<header id="post-header">
<h1>{{ .Title }}</h1>
<div>
{{- if isset .Params "date" -}}
{{ if eq .Lastmod .Date }}
<time>{{ .Date | time.Format (i18n "post.created") }}</time>
{{ else }}
<time>{{ .Lastmod | time.Format (i18n "post.updated") }}</time>
{{ end }}
{{- end -}}
</div>
</header>
{{- .Content -}}
{{ partial "related.html" . }}
</article>
{{ end }}
And so now, when I build the site, it generates and brings in a list of related posts. I think this is based on tags, but I’m not 100% sure.
Again, good enough for now.
Disclaimers / licensing in the footer
The theme already handled a copyright notice / licensing information in its footer.
You control this using a custom parameter in the [params]
of your site’s hugo.toml
:
copyright = "All posts CC BY-SA-NC 4.0 or any later version, unless otherwise stated."
I also wanted a simple disclaimer for the work blog. To do this, I edited themes/etch/layouts/partials/footer.html
, to add another field controlled by a new customer parameter, disclaimer
, in the [params]
section of the config file:
<footer id="footer">
{{ .Site.Params.copyright }}
{{ .Site.Params.disclaimer }}
</footer>
And so, in the hugo.toml
config file, I can add whatever disclaimer I want:
disclaimer = "This blog is for interest only; nothing here is legal advice."
In time, I’ll work out how to add a hyperlink back to our work contact page here but, for now, it’s just a simple text statement.
Analytics / viewing stats
There are none. This is fine by me, as it doesn’t matter a huge amount how popular a post is - that’s not an important metric to me.
Cookies, javascript etc.
Nope! Just a simple page of html, with a bit of css to liven it up a little.
Tinkering with the theme’s CSS
Some themes have a way to add custom css by using a custom.css
file. I’m afraid to say that I just edited the theme’s normal, main, css file, and made the changes I wanted - basically, to change the font so that the blogs tie in with our main site. I’ve left everything else as is for now.
rss
The theme has default rss support, which is great. But it isn’t the same rss URL as the prevous blog, so people will need to update their readers. I should probably consider some sort of redirect within the server, to help with that…
Update: I’ve used a basic rewrite rule in each site’s .htaccess
file for this:
RewriteEngine on
RewriteRule ^feed/rss$ index.xml
Content-Security-Policy and other webserver-y bits
hugo
deals with generating the site, which you can then move to wherever you want. Since the webserver is currently running apache
, I just created the headers I wanted, including CSP, in the site’s .htaccess
file.
And that was it
It probably took me a solid half-day of tinkering to get this far. I’m happy with that.
You may also like:
- Bodging a web-controlled garage door opener with a Raspberry Pi
- NetworkManager: automatically switch between Ethernet and Wi-Fi
- Backing up to a USB stick automatically via udev
- Automating actions in Nautilus (GNOME's file manager) with scripts
- Scheduling posts on Mastodon, the hack-y way
- RSS as my default web browser (for some stuff)
- Resolving a certbot apache plugin syntax error
- Fixing an array_filter() php error when using PHP 7.4 with dokuwiki's twofactor plugin
- Upgrading my .onion site to https
- Brave, Tor, and http-only .onion sites
- Browsers for people who just want a browser
- A quick and dirty approach to redacting PDFs on Debian 11 Bullseye
- The Wild West Web fallacy
- Confession time
- Getting started