Thursday, 19 August 2010

Convert OpenOffice.org Calc FODS XML to something meaningful

The code in this blog post is now available at:


For a while now I've been receiving data in Excel files that needed to be integrated into some better structured XML data. Originally I was using a little xls2csv from the catdoc package in Arch Linux to pull the data from the horrible bloated XLS files into something usable.

The trouble is the conversion to CSV was a little error prone and it's really hard to see such errors in a 40MB CSV file. What I really needed was something a bit more structured to work with. I decided I would try processing the data directly from an XML version of the spreadsheet. In OpenOffice.org 3 I have a few XML options (Microsoft XLSX, Open Document Format, and FODS) - I wasn't aware of FODS XML before so I took a look. What FODS seems to be is a single file equivalent of the XML content that usually spans multiple files in the Zipped up folder that is an Open Document Format file. (I may be wrong - but this is my naive view of what I saw).

With the handy FODS verison of the spreadsheet I set about writing a two stage XSLT conversion that strips out the data and results in a meaningful XML file. This is all quick and dirty XSLT 2.0 - I've only tested in Saxon 9.2.0.6 and I wouldn't put it forwards as an exemplary piece of XSLT 2.0, but I figured someone might find is useful.

My code assumes the following:

  • The data is arranged in tabular format
  • There is only one table per tab in the sheet
  • There is no extraneous data around the tab
  • The first row contains column titles that are also valid XML element names (i.e. no spaces or other restricted characters).
Stage 1 simplified the FODS data too a simple tabular XML format. The trickiest thing here is the expansion of repeated columns. Note as well, this XSLT 2.0 takes a parameter "collimit" that tells it to ignore columns that fall after that number - this makes the XSLT run a lot faster.

fods2tbl.xsl looks like this:



<?xml version="1.0" encoding="iso-8859-1"?>
<xsl:stylesheet
version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0"
xmlns:style="urn:oasis:names:tc:opendocument:xmlns:style:1.0"
xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0"
xmlns:table="urn:oasis:names:tc:opendocument:xmlns:table:1.0"
xmlns:draw="urn:oasis:names:tc:opendocument:xmlns:drawing:1.0"
xmlns:fo="urn:oasis:names:tc:opendocument:xmlns:xsl-fo-compatible:1.0"
xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:meta="urn:oasis:names:tc:opendocument:xmlns:meta:1.0"
xmlns:number="urn:oasis:names:tc:opendocument:xmlns:datastyle:1.0"
xmlns:presentation="urn:oasis:names:tc:opendocument:xmlns:presentation:1.0"
xmlns:svg="urn:oasis:names:tc:opendocument:xmlns:svg-compatible:1.0"
xmlns:chart="urn:oasis:names:tc:opendocument:xmlns:chart:1.0"
xmlns:dr3d="urn:oasis:names:tc:opendocument:xmlns:dr3d:1.0"
xmlns:math="http://www.w3.org/1998/Math/MathML"
xmlns:form="urn:oasis:names:tc:opendocument:xmlns:form:1.0"
xmlns:script="urn:oasis:names:tc:opendocument:xmlns:script:1.0"
xmlns:config="urn:oasis:names:tc:opendocument:xmlns:config:1.0"
xmlns:ooo="http://openoffice.org/2004/office"
xmlns:ooow="http://openoffice.org/2004/writer"
xmlns:oooc="http://openoffice.org/2004/calc"
xmlns:dom="http://www.w3.org/2001/xml-events"
xmlns:xforms="http://www.w3.org/2002/xforms"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:rpt="http://openoffice.org/2005/report"
xmlns:of="urn:oasis:names:tc:opendocument:xmlns:of:1.2"
xmlns:xhtml="http://www.w3.org/1999/xhtml"
xmlns:grddl="http://www.w3.org/2003/g/data-view#"
xmlns:tableooo="http://openoffice.org/2009/table"
xmlns:field="urn:openoffice:names:experimental:ooo-ms-interop:xmlns:field:1.0"
xmlns:formx="urn:openoffice:names:experimental:ooxml-odf-interop:xmlns:form:1.0"
xmlns:css3t="http://www.w3.org/TR/css3-text/"
xmlns:saxon="http://icl.com/saxon"
extension-element-prefixes="saxon">

<xsl:output indent="yes"
method="xml"
encoding="UTF-8"
saxon:character-representation="native;decimal"/>

<xsl:param name="collimit" select="100"/>

<xsl:template match="*">
<!-- Do Nothing -->
</xsl:template>

<xsl:template match="office:document">
<document>
<xsl:apply-templates />
</document>
</xsl:template>


<xsl:template match="office:body">
<xsl:apply-templates/>
</xsl:template>

<xsl:template match="office:spreadsheet">
<xsl:apply-templates/>
</xsl:template>

<xsl:template match="table:table">
<table>
<xsl:attribute name="name">
<xsl:value-of select="@table:name"/>
</xsl:attribute>
<xsl:apply-templates />
</table>
</xsl:template>

<xsl:template match="table:table-row">
<row>
<xsl:attribute name="rownum">
<xsl:value-of select="count(preceding-sibling::table:table-row) + 1"/>
</xsl:attribute>
<xsl:apply-templates />
</row>
</xsl:template>

<xsl:template name="repeat-cells">
<xsl:param name="index" select="0"/>
<xsl:param name="cellnum" select="1"/>
<xsl:if test="$collimit >= $cellnum">
<xsl:if test="$index > 0">
<cell>
<xsl:attribute name="cellnum">
<xsl:value-of select="$cellnum"/>
</xsl:attribute>
</cell>
<xsl:call-template name="repeat-cells">
<xsl:with-param name="index" select="$index - 1"/>
<xsl:with-param name="cellnum" select="$cellnum + 1"/>
</xsl:call-template>
</xsl:if>
</xsl:if>
</xsl:template>

<xsl:template match="table:table-cell">
<xsl:variable name="repeats"
select="sum(preceding-sibling::table:table-cell/@table:number-columns-repeated) -
count(preceding-sibling::table:table-cell[@table:number-columns-repeated > 0])"/>
<xsl:variable name="cellnum" select="count(preceding-sibling::table:table-cell) + $repeats + 1"/>
<xsl:if test="$collimit >= $cellnum">
<cell>
<xsl:attribute name="cellnum">
<xsl:value-of select="$cellnum"/>
</xsl:attribute>
<xsl:apply-templates />
</cell>
<xsl:if test="@table:number-columns-repeated > 0">
<xsl:call-template name="repeat-cells">
<xsl:with-param name="index" select="@table:number-columns-repeated -1"/>
<xsl:with-param name="cellnum" select="$cellnum + 1"/>
</xsl:call-template>
</xsl:if>
</xsl:if>
</xsl:template>

<xsl:template match="text:p">
<p>
<xsl:value-of select="."/>
</p>
</xsl:template>

</xsl:stylesheet>


The second stage is to take the resulting tabular XML file and convert it to meaningful XML. The output essentially wraps each cell in a XML element with the name of the column it was taking from (extracted from the first row).

tbl2data.xsl looks like this:



<?xml version="1.0" encoding="iso-8859-1"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:output indent="yes"
method="xml"
encoding="UTF-8"
xmlns:saxon="http://icl.com/saxon"
saxon:character-representation="native;decimal"/>

<xsl:template match="*">
<!-- Do Nothing -->
</xsl:template>

<xsl:template match="document">
<subjects>
<xsl:apply-templates select="table/row[@rownum > 1]" />
</subjects>
</xsl:template>

<xsl:template match="row">
<record>
<xsl:apply-templates />
</record>
</xsl:template>

<xsl:template match="cell">
<xsl:variable name="cellnum">
<xsl:value-of select="@cellnum"/>
</xsl:variable>
<xsl:variable name="table">
<xsl:value-of select="ancestor::table/@name"/>
</xsl:variable>
<xsl:variable name="cellname">
<xsl:value-of select="/document/table[@name=$table]/row[@rownum=1]/cell[@cellnum=$cellnum]/p[1]"/>
</xsl:variable>
<xsl:choose>
<xsl:when test="string-length($cellname) > 0">
<xsl:element name="{$cellname}">
<xsl:apply-templates />
</xsl:element>
</xsl:when>
<xsl:otherwise>
<xsl:element name="OOPS">
<xsl:apply-templates />
</xsl:element>
</xsl:otherwise>
</xsl:choose>
</xsl:template>

<xsl:template match="p">
<xsl:value-of select="."/>
</xsl:template>

</xsl:stylesheet>


I hope someone finds that helpful.


Monday, 5 April 2010

Project greyhound: My Granddad's Camera

A year or so ago, whilst I was visiting England on a business trip I took the time to go down and visit my parents. During this visit my dad indulged my interest in old cameras by digging out a few of his own old cameras and a couple of his fathers. The oldest two in this group were a Kodak Number 2 Brownie, in very good order, and an extremely battered looking Ensign Greyhound folding camera.

Since that trip these cameras have been sitting in our camera display case, here in our hall at home in Munich. I had written off the Greyhound as being unusable. The case was battered, the bellows must surely be cracked and the rails and shutter were both jammed up with dirt. That was until Friday night. I had an urge to try out something different and looking through the collection I saw an old Agfa Isolette 6x6 folding camera - it's was in generally good condition, but the screw focus mechanism was jammed. With a little careful cleaning, a tad of brute force and a drop or two of oil I was able to bring it back to full working order. It's a very satisfying feeling to get a camera up and running that's been abandoned for years and I briefly considered running a film through it, but I had the bug to work on a camera. It was then that I started thinking of my Granddad's Ensign Greyhound.

The Ensign Greyhound is, as far as I can establish, a late 1920's brass bodied folding camera. It uses 120 roll film and delivers eight 6cm x 9cm negatives per roll. They were built in London, and the brand would later go on to claim, with more than a little hubris, that they were better than Zeiss. The Greyhound is certainly no match for a Zeiss, but it is definitely well built.

My Granddad's Greyhound was in pretty bad shape. Granddad smoked a pipe, and perhaps this accounts for the thick, black sooty substance that coated the entire camera. Here are a few shots of the camera before I started work cleaning it. They don't really do justice to how filthy it was:

Greyhound Before 1
Greyhound Before 2
Greyhound Before 3

This filth was really ground in. I tried all manner of cloths and cleaning materials. Eventually I had to tentatively rub it down with Silvo silver polishing wadding. That worked wonders, but it still took some 4 hours to get the camera anything like clean. The best indication of the sort of filth I was getting off can be seen by the state of my hands. Despite regular washing to avoid reapplying the dirt to the body they looked like this every couple of minutes or so:

Greyhound During 1

The pay off for all this work is a camera that looks like this:

Greyhound After 1
Greyhound After 2
Greyhound After 7

Still far from show condition - but without actively taking steps to restore it this is as good as it's going to get in terms of the look. Next I turned to the function. I performed a a check of the bellows, miraculously they are in perfect working order, oiling the rails and hinges left me with a faultless folding mechanism which clicks smoothly into both focus positions. Though "9 feet" and "near objects" are far from ideal focussing options!

I cleaned and serviced the lens and shutter and again everything was in working order. There are three aperture settings - Small, Medium and Large - that are achieved by a rotating ring of circular holes, much as in some of the 60's plastic cameras (i.e. the Lomo Diana). The shutter has three settings - a short exposure (perhaps 1/200th of a second based on results, but it's really hard to say), a typical "bulb" exposure and press to open, press to close mode. Clearly this camera is geared for the long exposure.

At some point in it's life it had been converted from 6x9 to 6x6. Presumably this was done for reasons of economy. This conversion would increase the camera from 8 to 12 shots per roll of 120 film. The means of conversion were rather brutal however, two 1.5cm strips of some kind of coated plywood were glued into either side of the frame window inside the body.

Greyhound During 3

Worse still, in order to show the 6x6 frame numbers on the back of the film instead of the 6x9 ones a new window had been cut out of the brass back of the camera, a small red plastic insert was glued in to act as a new window and the original window had been covered with Elastoplast.

Greyhound During 2

As the damage was already done I had no hesitation in removing the modifications and taking the camera back by to it's original 6x9 frame size. I cut the improvised frame blinds out with a penknife and removed the Elastoplast and the plastic window. I resealed the cut window with heavy duty electrical tape and ensured it didn't allow any light through.

Finally I thoroughly cleaned the interior of the camera and tidied up any bare or reflective spots that might effect an image.
At this point the camera was in principal ready to use. The next day I loaded it with Kodak T-Max 100 black and white negative 120 roll film and headed into central Munich. Today I unloaded the film, reeled it onto a Jobo spool, put it in a daylight tank and developed it in Ilford LC-29 for seven and a half minutes. Here are the results:

Ensign Grayhound 1
Ensign Grayhound 2
Ensign Grayhound 3
Ensign Grayhound 5

Nothing spectacular. The camera is very hard to hand-hold and the primitive controls are really not up to any really serious work, but still it feel satisfying to have coaxed this old camera back to life, and there is a certain sentimental feel to taking photos with your Granddad's 80 year old camera.

No bicycle maintenance

Still waiting on parts to show up from the USA. Will update here when I have made progress.

Tuesday, 23 March 2010

Bicylcle maintenance .. Part II

I've been working too hard this last week or so (most evenings and a little at the weekend too). Nasty little data conversion tasks are an ongoing theme - to much human controlled data means lots of special cases and a never ending stream of little problems. Ho hum.

However, I am happy to report that DerailleurHanger.com came through on their online offer and identified the correct part for my bike from my photos. The internet is a wonderful place - it allows companies to exist that are incredibly specialised by extending their catchment area to a global market.

So my experiences so far are good. I've found two online shops I'm very happy with, but I am afraid I hope that I don't need to be a customer again any time soon. Chain rivets and derailleur hanger ordered and my bike is preped and ready to have that new rear deraileur fitted when they arrive.

Sunday, 21 March 2010

Bicylcle maintenance

Those of you who follow me on Twitter will be well aware that back in February (on my birthday no less!) my bike was vandalised. Some enterprising mind burdoned, no doubt, by the unending pressures of modern life, decided to releave a little stress by kicking my rear deraileur through the spokes of my back wheel.

I thought breifly about having a bike shop repair the resulting damage but instead I decided I would repair it myself. I bought a new Shimano deraileur unit, a set of Super-B bike tools and a Haynes bike repair manual.

Today I finally got round to working on the bike. Dilligently I cleaned the bike and then used the chain rivet remover to break my chain. Problem number 1. Shimano UG, HG and IG chains use rivets that leave a bigger hole when removed than when put in - I need to buy some special replacement rivets to rejoing the chain. Bugger... still, I pressed on and removed the bucked and broken deraileur, only to find that the deraileur hanger is both hideously bent and the thread holding the deraileur has shredded as a result. So now I need a new deraileur hanger - trouble is, these seem to be very hard to find - in principle every bike can have a differently shapped hanger and the manufacturer of my bike seems not to bother to sell parts.

Big fat arse!

So.. my next best bet is to contact http://deraileurhanger.com - they specialise in selling just this part. Even they don't list my bike (even though it's one of Germany's biggest brands), however, they do let you send them pictures and see if they can find a match.

On the upside I am learning a lot about how modern bikes are constructed and how to maintain them. As a new father in a household where bikes are the primary means of transport for spring, summer and autumn time, I doubt this will be the last time I need to be involved in bike repair work.

Tuesday, 2 March 2010

Voigtländer Vitomatic II


Voigtländer Vitomatic II, originally uploaded by de ney.

I've purchased one of these on E-bay for the princely sum of 20€. This is essentially the same as the Voigtlaender Vito CL I have been using recently but with a coupled rangefinder and with the meter mirrored in the viewfinder.

Ebay listing suggests it's fully functional, and given the classic West-German build quality of these late 1950s / early 1960s Voitglaenders I have high hopes.

To date I haven't really found the "perfect" 35mm camera. In 120 format I think my Bronica RF645 is the one. I suspect the 35mm camera would be similar. To be a really comfortable, natural experience I need a lightmeter, preferably visible in the view finder, a decent rangefinder and an excellent lens - this Vitomatic might fit the bill. An East German Zeiss Werra IV might also fit the bill, but they're a little rarer and command a higher price so I have not yet had the funds or opportunity to find out. Talking of which, Zeiss-Ikon Contax and Leica M's (and Nikon and Canon copies) are obvious contenders but alas they have names and reputations that carry an unreasonable price. At a push I might one day find the funds to buy one of the new-fangled Besa R4M or R3M models.

Wednesday, 13 January 2010

Occams Razor

We are amused:




Doubt Occam (a monk!) would have appreciated this use, though it is a perfectly good application of the idea.

Sunday, 10 January 2010

If I had a garden this is what I'd grow - 2010-01-09 at 11-28-51

This photo seems to be getting a rather staggering amount of attention on Flickr. Once again I suspect that the key to getting Flickr hits is SEO not the photographs inate qualities.

Last time I got this much attention was for a picture that had Fedora and Thinkpad in it's comments. This time around I think it's Lisp and Emacs that are driving the hits.

HTML5 and all that

So..

Firstly a quick update re: clojure. I'm still using it, productively and occasionally frustraitedly. Most frustration comes from my lack of available time and the relative newness of the language. Now Clojure is 2 years old there are often libraries are available to do things in a way idiomatic to Clojure rather than just using Java libraries directly, however there's no a great deal of documentation of examples for these libraries. I suspect these frustrations will only disapear with time. XML parsing and processing in particular seems a little trickier than it should be given the good built in support and the given that XML is almost a LISP datastructure to start with. Ho hum..

Anyhow, right now I am taking some time to make myself familiar with HTML5. So far so good - it seems a sensible, pragmatic and useful upgrade to the the markup and API standards, with several intreging operunities to move away from annoying plugin technology (Flash and Google Gears spring to mind). I come more from the XHTML camp than the HTML 4 camp (though I have been writing HTML since the mid 1990s) so I am a little disappointed that HTML5 seems to move us to a situation where HTML as valid XML is seen a backwards compatibility issue (please let me know if that's a false impression!).

Naturally when looking into these things I am working on small example files, and by extension that means I am using nxml mode in Emacs. That means I need to get a relax-ng setup for HTML5, I started doing this hit a bump in the first few seconds and in the first google search I hit a ready made solution. Edward O'Connor has put together html5-el for which I am extremely grateful. Thanks Edward!