My blog in English
Here you can find blog articles I've done. Usual topics include technology and random tinkering. Also, I write about my life in Finnish.
Cumbersome migration of my old blog
I have spent the last three days with migration of my old 52 blog messages from Saunablog to Codegrove. I don't want to be dependant of my mobile operator so I decided to migrate the blog on a site I host together with my friends.
Saunablog is hosted by my mobile operator and is running on Nucleus CMS. Codegrove, on the other hand, is running on Plone 4.
Migrating postings shouldn't be too hard. To copy and paste 52 articles. Well, that isn't the point. I can do copy-pasting neverendingly and it gives me nothing. Doing a real migration is the art I am willing to know better.
Please note that these migration scripts I provide are useful only for me. If you need to do something similar, you need to adapt those scripts for your needs.
Looting of old data
The first bumb was I have no access to the database of Saunablog. The site is hosted somewhere and all I've got are ordinary user privileges allowing just posting and editing. I'm calling it looting because I don't really own the servers and the techniques can be used to get something you don't even own.
I started with wget. I logged on my Saunablog in Firefox to get the cookie. Then I used exporting script to get cookies.txt, which can be used with wget.
I wrote hallintasivu.xsl and download-uri.xsl for that purpose. It takes blog administrative interface page as input and produces a wget line per posting. It can be directed to a script file which is run after the first pass. In a fresh directory, run:
$ xsltproc --encoding 'ISO 8859-1' hallintasivu.xsl index.php.html >index.xml $ xsltproc download-uri.xsl index.xml >tmp-download.sh $ sh tmp-download.sh
Now you got all those raw messages. But that's only the first step.
About importing to Plone
Plone has support for WebDAV which makes it much simpler to transfer data to and from Plone instance. You don't need to hack Plone, just mount the site to your directory tree. Well, in theory, yes.
Importing data to Plone is like throwing a loaded die. It's getting the desired result most of the time, but there is a chance for a fail. Plone's WebDAV support has been barely documented. It was much easier to think Plone as a black box and to imagine a way to get a rabbit out of the box. After two days of trial and error I managed to do something I was happy with.
So, we need loads of configuration and a mysterious XSLT script to do the trick. Details follow.
Enabling WebDAV on Plone
This is quite staightforward. One has Plone in a directory with buildout configuration. Let's follow the instructions Epeli found and add the following to the [instance] section of buildout.cfg:
zope-conf-additional = enable-ms-author-via on <webdav-source-server> address localhost:1337 force-connection-close off </webdav-source-server>
After editing the file, you need to re-run buildout, of course.
$ sudo bin/buildout $ sudo bin/instance restart
After that, you should have WebDAV running on port 1337. Quite elite, huh?
Mounting Plone instance
You can install WebDAV support for your Debian or Ubuntu box straight from the package manager. First of all, get davfs2.
$ apt-get install davfs2
You can mount the site as an ordinary user but for me it's much better to do it system-wide. I added the following to my local /etc/fstab:
https://my.plone.site/ /mnt/codegrove davfs uid=joell,noauto 0 0
Getting that done was easier than I thought.
Converting data to Plone format
This is the most trickiest part of the work. The results are based on trial and error as I earlier mentioned. So I don't have any sources of information to cite.
Having the blog postings to preserve their timestamps was trickier than I thought. It's difficult to "forge" modification or creation date, but setting effective date (aka publishing date) is a feasible solution. Also, I wanted to enable comments and hide the blog postings from the navigation.
The following headers were optimal for me. The example is taken from one of my postings:
title: Warshavjanka 2.0 description: Kirjoitettu 07.06.2008 klo 15.24 effectiveDate: 2008-06-07 15:24 subject: saunablogi finnish ajatukset allowDiscussion: True language: fi excludeFromNav: True Content-Type: text/html
I also added publishing date to the description field because that makes it easy to show dates in listings. Subject is holding the tags. Additional tags are listed on new, indented lines.
I wrote a script called plonefy.xsl to do the dirty part of transforming. I've run it like this:
$ for JOO in saunablog/msg-*; do xsltproc --encoding 'ISO 8859-1' plonefy.xsl $JOO >plone_raw/$(echo $JOO|sed 's/saunablog\/msg-\(.*\)\.html/sb-\1/');done
Migration of in-line pictures
After having some guru meditation I grasped how to get all the pictures related to my blog. I used the following incantation:
$ cat msg-*|grep -o '<%[^%]*%>'|sed 's/<%image(\([^|]*\)|.*/http:\/\/path.to\/my\/blog\/\1/' > pics.txt $ cd pics $ wget -i ../kuvat.txt
As you can see, sed is a write-only script.
Nucleus has its own inline image tags in the form of url, width, height and title, separated by |'s. To convert those tags to an ordinary XHTML image tags, the following spell can be sent:
$ for MSG in *; do sed 's/<%image(\([^|]*\)|\([^|]*\)|\([^|]*\)|\([^)]*\))%>/<img src="pics\/\1" alt="\4" title="\4" width="\2" height="\3" \/>/g' <$MSG >../plone_final/$MSG; done
Sending the content to Plone
Now we have the final documents ready for uploading to the Plone instance. I just copied the files in plone_final to the blog folder of my plone instance using cp.
Remember to publish the postings to make them visible.
It has something to do with cp or WebDAV, but the order of the postings seems to be quite random. I work-arounded it by creating a collection and set it to descending order of effective dates. Then I limited the collection to the current location (..) to hide all other files on the site from the collection view.
To made it perfect, I set the collection to have 3 postings per page (from Edit...Number of Items) and to show the postings on collection page (from Display...All Content).
Now we are getting somewhere. All old data has been imported and it's time to start posting new ones!
PS. Thanks to everybody at #codegrove for their patience and help when I was getting mad with this migration.