Yesterday I decided to try blogging again. I started writing a post
at blogger.com, but that was like wading through a rotting whale
corpse. Instead I decided to use GitHub Pages
and use the static blog/site generator Nikola to generate the
content, editing reStructuredText (ReST) files.
I wrote my first post and it was good! Using ReST again was much
better than editing in a GUI like blogger.com, and having it hosted by
GitHub Pages was more restful than running a machine hosting a
website.
But then I thought of all the posts I had in my old blog, before I
stopped running machine hosting a website. They were all written in
ReST — maybe I could put them up on my new blog?
I took a couple three hours and wrote a shell script to find the old
pyBloxsom files and feed them into a python script that I also wrote.
Along the way I made sure the files all had #published
and
#tags
lines, in that order, immediately following the title line.
Here's the shell script:
drive-pyblox-to-nikola (Source)
#! /usr/bin/env bash
(cd ~/myblog &&
find notentries/ entries/ -type f -name \*.rst |
~/comp/tkbtools/Scripts/pyblox-to-nikola)
Here's the python script:
pyblox-to-nikola (Source)
|
#! /usr/bin/env python3.7
|
|
|
|
import os
|
|
import os.path
|
|
import sys
|
|
from datetime import datetime
|
|
|
|
# datetime.strptime ('2019-11-05 20:32:24', '%Y-%m-%d %H:%M:%S')
|
|
# dt.strftime ('%Y-%m-%d %H:%M:%S UTC-05:00')
|
|
|
|
entries_prefix = 'entries/'
|
|
notentries_prefix = 'notentries/'
|
|
published_prefix = '#published '
|
|
tags_prefix = '#tags '
|
|
|
|
files_read = 0
|
|
for filename in sys.stdin:
|
|
filename = filename.rstrip ()
|
|
basename = os.path.basename (filename)
|
|
dirname = os.path.dirname (filename)
|
|
if dirname.startswith (entries_prefix):
|
|
category = dirname[len(entries_prefix):]
|
|
elif dirname.startswith (notentries_prefix):
|
|
category = dirname[len(notentries_prefix):]
|
|
else:
|
|
category = ''
|
|
(slug, _) = os.path.splitext (basename)
|
|
print ('filename: %s\nbasename: %s\ndirname: %s\ncategory: %s\nslug: %s' %
|
|
(filename, basename, dirname, category, slug))
|
|
inf = open (filename, 'r')
|
|
files_read = files_read + 1
|
|
title = inf.readline ()
|
|
title = title.rstrip ()
|
|
published = inf.readline ()
|
|
published = published.strip ()
|
|
if published.startswith (published_prefix):
|
|
published = published[len(published_prefix):]
|
|
else:
|
|
raise ('Unknown line should be #published', published)
|
|
published_date = datetime.strptime (published, '%Y-%m-%d %H:%M:%S')
|
|
nikola_date = published_date.strftime ('%Y-%m-%d %H:%M:%S UTC-05:00')
|
|
datepath = published_date.strftime ('%Y/%m/%d')
|
|
newdir = os.path.join ('/Users/tkb/nikola/newblog/posts', datepath)
|
|
os.makedirs (newdir, exist_ok=True)
|
|
tags = inf.readline ()
|
|
tags = tags.rstrip ()
|
|
if tags.startswith (tags_prefix):
|
|
tags = tags[len(tags_prefix):]
|
|
else:
|
|
raise ('Unknown line should be #tags', tags)
|
|
tags = tags.lower ()
|
|
outfname = os.path.join (newdir, basename)
|
|
print ('outfname: %s' % outfname)
|
|
outf = open (outfname, 'w')
|
|
outf.write ('.. title: %s\n' % title)
|
|
outf.write ('.. slug: %s\n' % slug)
|
|
outf.write ('.. date: %s\n' % nikola_date)
|
|
outf.write ('.. tags: %s\n' % tags)
|
|
outf.write ('.. category: %s\n' % category)
|
|
outf.write ('.. link: \n')
|
|
outf.write ('.. description: \n')
|
|
outf.write ('.. type: text\n')
|
|
outf.write ('\n')
|
|
for line in inf:
|
|
outf.write (line)
|
|
inf.close ()
|
|
print ('\n\nFiles Read: %d' % files_read)
|
There were 810 reStructuredText files to process. Once that was
done, I had to work through those files multiple times finding all the
broken internal links, since many of them were absolute links to my
old blog or other pages on my old website. I did grep-find in Emacs
multiple times to find all the occurances of my old website's hostname
(which went through a couple of variations over time), then looked for
site relative links that started with /~tkb
, a tedious but not too
difficult process.
Comments
Comments powered by Disqus