Shell scripting in Clojure part 2: A concrete example

This is a follow-up to “Shell scripting with Clojure“, which described how to make Clojure scripts easily runnable from the command line. In this part I would like to show the actual script we came up with @hanneshaataja in our afterwork Clojure session.

After taking a second look at the script few days after the session though, I realized that it was pretty damn ugly. We had naturally ran with the first thing that worked in the quick hackathon, so I wanted to refactor it quite a bit before posting it here for all to see. However I think it might Be interesting to show both the original and improved versions and see how they differ.

Purpose of the script

As a recap from the previous post, the purpose of the script was to go through a directory containing test result files generated by the Selenium testing framework and output the names of every failed test case.

Each test result file is an HTML document that contains a summary table that lists the test cases in a particular test suite. Rows of failed test cases have a “status_failed” HTML class attribute:

<tr class="status_failed"><td><a href="#testresult1">TestCaseX</a></td></tr>

So, the script should just go through each file, find the rows with the class status_failed and pick up the test case names which are nested in the <td> and <a> elements.

Version 1 – Not like this

Here is the first version of the script. Now my sense of what is idiomatic Clojure is not very strong yet, but I’m pretty sure this is not proper. After the code I’ll list what I thought was wrong with it.

(By the way, I’m posting the code as embedded Gists because the Clojure syntax highlighting of WordPress doesn’t seem to work that well. Especially the comments are annoying some as some words get highlighted. Unfortunately some content aggregators like Planet Clojure don’t seem to show to display Gists)

;; Require lein-exec plugin so that we can run this as a standalone
;; script
;; Usage:
;; – 'lein exec anyfailures.clj' (looks for tests under ./test/system/results/
;; – 'lein exec anyfailures.clj foo' (looks for tests under ./foo/
(require 'leiningen.exec)
;; Add and require Enlive so we can parse HTML
(leiningen.exec/deps '[[enlive/enlive "1.1.4"]])
(require '[net.cgrand.enlive-html :as html])
;; Read the input directory from command line arguments
(def input-dir (second *command-line-args*))
;; Read all the files from the user-specified (or default)
;; directory
(def files (file-seq (
(if (nil? input-dir)
;; Initialize the output file by writing an empty string to it
;; (so that any previous results are cleared)
(spit "failed-tests.txt" "")
;; Go through the files. Filter only actual files since the file-seq
;; will contain directories also
(doseq [f (filter #(.isFile %) files)]
;; For each file f
(let [
;; read the file contents with `slurp` (got to love that name)
contents (slurp f)
;; parse the html into an Enlive html-resource data structure
parsed-content (html/html-resource ( contents))
;; create a sequence containing the failed test names:
;; the html/select function finds every <a> element inside
;; a <td> element inside a <tr> element with a class "status_failed"
;; in the parsed-content data structure
;; We get a sequence of maps each representing and <a> element, whose
;; text content is under a key :content so we call
;; (map :content seq-of-maps-with-key-:content) to get the contents of
;; each <a> element.
;; Finally, because the contents of each element is in a list already,
;; we call flatten on the sequence of lists to get just a single list
;; with the content strings as elements
failed-tests (flatten
(map :content
parsed-content [:tr.status_failed :td :a])))]
;; Then for each failed test case name, we write the name to the file
;; failed-tests.txt using spit (opposite of slurp..) with the :append true
;; flag so that it appends to an existing file rather than always creating
;; the file from scratch
(doseq [f failed-tests]
(spit "failed-tests.txt" (str f "\n") :append true))))

My first problem with the first version was that it wrote the results into a file. Since this is a command-line script it makes more sense to write the results into standard output so that the user of the script can then decide what to do with the results: write them to a file or e.g. pipe them to another program.

Secondly the code itself feels pretty imperative, reflecting our non-FP-mindset: first we set up “variables” like input-dir and files with def, then iterate over the files and do something with them. I guess this is not a big problem in a quick one-off script like this but I nevertheless wanted try to rewrite it in a more functional way.

Third small issue was that the script would try to parse any file in the directory it was given whether they were HTML files or not. In our case the directory always contains only html files but it would be nice to have at least some sanity checking there.

Version 2 – Better I hope

(require 'leiningen.exec)
(leiningen.exec/deps '[[enlive/enlive "1.1.4"]])
(require '[net.cgrand.enlive-html :as html])
(defn result-file-loc
"Returns the location of the result files (either provided as the first
command line argument or a default value of 'test/system/results'"
(let [first-cmd-line-arg (second *command-line-args*)]
(if (or
(nil? first-cmd-line-arg)
(= ":headless" first-cmd-line-arg))
;; Default dir
;; user provided dir
(defn read-html-files
"Given a directory, returns a sequence of instances of each .html
file in the directory"
;; Include only files that end with .html
(filter #(and (.isFile %) (.endsWith (.getName %) ".html"))
(file-seq ( dir))))
(defn extract-failed-test-names
"Given an Enlive html-resource of a Selenium result file, returns names of the
failed test cases"
(map html/text
(html/select selenium-html-resource [:tr.status_failed :td :a])))
(defn find-failed-tests
"Given a collection of Selenium test result files, returns the names of the
failed test cases in the result files"
(reduce (fn [all-failed-tests file]
(let [failed-tests
(->> file
(if-not (empty? failed-tests)
(apply conj all-failed-tests failed-tests)
all-failed-tests))) [] files))
(defn run
"Finds and prints the names of failed test cases from the Selenium result
files in the user provided or default directory"
(->> (result-file-loc) (read-html-files) (find-failed-tests))]
(println failed-test-case)))
;; Run when executed from the command line

The second version feels more idiomatic. The input directory and the files are no longer stored into a var using def, instead everything is a function.

Let’s go through the parts of the script in more detail:

(defn run
  "Finds and prints the names of failed test cases from the Selenium result
   files in the user provided or default directory"
       (->> (result-file-loc) (read-html-files) (find-failed-tests))]
    (println failed-test-case)))

The goal of the script is achieved by the subsequent calls of the following three functions, always passing the result of one to the next:

  • result-file-loc returns the directory of the files to read, which is passed to read-html-files
  • read-html-files creates a file seq of the HTML files in the directory and passes the sequence to find-failed-tests
  • find-failed-tests extracts the failed test case names from each file and returns them in a collection

The wrapping doseq iterates over each test case file name and prints it out.

Here are the individual functions in more detail:

(defn result-file-loc
  "Returns the location of the result files (either provided as the first
   command line argument or a default value of 'test/system/results'"
  (let [first-cmd-line-arg (second *command-line-args*)]

    (if (or
         (nil? first-cmd-line-arg)
         (= ":headless" first-cmd-line-arg))

      ;; Default dir

      ;; user provided dir

Nothing too special here. When the script is executed from command line with lein-exec, the second element in the global *command-line-args* contains the first command line argument (the first of *command-line-args* is the name of the script file). I also noticed that when evaluating the script in a REPL, the second element contains the value “:headless” so result-file-loc returns the default directory in that case also.

(defn read-html-files
  "Given a directory, returns a sequence of instances of each .html
   file in the directory"

  ;; Include only files that end with .html
  (filter #(and (.isFile %) (.endsWith (.getName %) ".html"))
          (file-seq ( dir))))

read-html-files is quite similar to the first version. I just added the additional condition to the filter predicate so that only files ending with “.html” are included. This of course doesn’t stop the script from trying to e.g. parse non-HTML files that just happen to have an .html extension and so on. For the purposes of this script this enough though.

(defn find-failed-tests
  "Given a collection of Selenium test result files, returns the names of the
  failed test cases in the result files"

  (reduce (fn [all-failed-tests file]
            (let [failed-tests
                  (->> file

              (if-not (empty? failed-tests)
                (apply conj all-failed-tests failed-tests)

find-failed-tests does the main work. Given the sequence of HTML files, it goes over them using reduce and accumulates a collection of the failed test case names which is finally returned.

In the first version we just iterated over the files and spat out the test case names ‘on the fly’ as they were encountered. In this version I wanted gather the names into a collection which could them be used for further processing (in this case just to print them out).

reduce calls the anonymous function for each file: contents of each file is read in, converted into a character stream (, the stream is fed to Enlive’s html-resource which parses the HTML. Finally extract-failed-test-names (explained below) is called with the parsed HTML contents to get any failed test case names.

If a particular file had failed test cases, their names are added to the vector all-failed-tests which is initially empty. If the test suite of the file didn’t have any failures, all-failed-tests is just passed on as-is for the next round of reduce.

(defn extract-failed-test-names
  "Given an Enlive html-resource of a Selenium result file, returns names of the
   failed test cases"
  (map html/text
       (html/select selenium-html-resource [:tr.status_failed :td :a])))

The code to extract the failed test case names using Enlive selectors is simplified a bit. Rather than doing (flatten (map :content (html/select … ) you can use Enlive’s text function to achieve the same result: (map html/text (html/select ... )

So there you have it. Any ideas for improvement? I’m wondering would it make sense (at least for sake of exercise) to make find-failed-tests return a lazy sequence rather than eagerly parse each file? Would you use something other than Enlive for the HTML parsing?

Shell scripting with Clojure

We had a nice afterwork Clojure learning session with a colleague the other day during which we quickly hacked together a shell script that reads a directory containing Selenium test result files and outputs all the failed test cases. (Why we haven’t automated the Selenium test running and result reporting with e.g. Jenkins is another story..)

In the session I learned at least about the following:

  • making standalone command-line executable Clojure scripts (as long as you have Leiningen installed) that still use Leiningen to handle the dependencies
  • reading and writing files
  • using Enlive to parse and look up particular data from HTML files

In this post, I’ll describe how to create command-line executable Clojure scripts with lein-exec. In a later post I will describe the test result parsing script we put together.

Command-line executable Clojure scripts with lein-exec

The only prerequisite here is that you have Leiningen installed but it should be pretty safe to assume this as lein seems to be the de facto build tool for Clojure.

Now, when you want to do quick standalone Clojure scripts without having to create a complete lein project, but still want to use lein to manage the dependencies to any 3rd party libraries, there is a nice lein plugin called lein-exec.

The installation is simple (the following assumes Leiningen 2, see the lein-exec readme for 1.x installation instructions). A global installation (meaning you can use lein-exec any time you call lein, rather than specifying it as a project-specific plugin) of lein-exec is done by adding the following to your ~/.lein/profiles.clj (if profiles.clj doesn’t exist, create it first).

;; as of writing this, 0.3.1 was the latest version
{:user {:plugins [[lein-exec "0.3.1"]]}}  

Now we can quickly create and run scripts:

(println (+ 1 2))

running foo.clj:

$ lein exec foo.clj

This isn’t that impressive yet, but once you add dependencies to third-party libraries, the usefulness becomes more apparent:

(require 'leiningen.exec)

;; Add a dependency to the classpath on the fly
(leiningen.exec/deps '[[enlive/enlive "1.1.4"]])

(require '[net.cgrand.enlive-html :as html])

;; Grab and print the title element from the Google front page using Enlive
(println (html/select (html/html-resource ( "")) [:title]))

and let’s run it:

$ lein exec bar.clj
({:tag :title, :attrs nil, :content (Google)})

Now lein and lein-exec handle the enlive dependency behind the scenes. This is nice as now you have all the power of the Clojure ecosystem available for you even in quick one-off scripts.

It is also possible to make the scripts executable themselves if you are in a *nix environment. You need to download a lein-exec script and save it somewhere in your path. Also, if your lein executable is called something else than `lein`, you have to modify the lein-exec script to reflect that.

$ wget # download
$ chmod a+x lein-exec # make the script executable
$ mv lein-exec ~/bin # move to ~/bin, assuming it is in PATH

Now if you add the following to the first line of your script:

#!/usr/bin/env lein-exec
(println (+ 1 2))

you can execute the script directly (as long as you give it the execute permission):

$ chmod +x foo.clj
$ ./foo.clj
$ 3

That’s it for now. I will follow up with the Selenium test result parsing script soon.

Read more about stand-alone Clojure scripting:

Simple infix math calculation

This is my last post in a series in which I go through the exercises presented at the introductory Clojure workshop at Reaktor Dev Day 2013. Check out part 1 and part 2 as well.

I didn’t have time to even look at the last exercise during the workshop so for this blog post I decided to try and solve it without looking at the solution first, and then compare mine with the organizers’ solution.

The task was to implement a function that calculates math expressions given in infix notation (that is ‘1+1’ instead of the prefix notation (+ 1 1) we are used to in Clojure). The task was highly simplified as there were no requirements for operator precedence or handling of parentheses.

So here is my solution with and without detailed commentary:

(defn calc [& symbols]
  (loop [acc (first symbols)
         syms (rest symbols)]

        (empty? syms) acc
        (recur ((first syms) acc (second syms)) (nthrest syms 2)))))

;; Note that this calculates naively from right
;; to left without considering order of operations
(= 7  (calc 2 + 5))
(= 42 (calc 38 + 48 - 2 / 2))
(= 8  (calc 10 / 2 - 1 * 2))
(= 72 (calc 20 / 2 + 2 + 4 + 8 - 6 - 10 * 9))

And here is the more detailed walk-through of the solution:

;; Define a function `calc` that takes a variable number of arguments which will
;; be available in the sequence `symbols`
;; (No sanity checking of input here, you could call (calc 20 +) and break it)

(defn calc [& symbols]

  ;; Loop works like `let` (we can bind values to local names and then use them
  ;; inside expressions within the loop. In addition, from inside the loop, we can
  ;; call `recur` with the same number of arguments as the loop has bindings,
  ;; which will jump the execution back to the beginning of the loop with the
  ;; values `recur` was called with bound to the names


      ;; Our loop has two bindings `acc` for accumulator and `syms` for
      ;; the remaining symbols to be processed. `acc` is initialized with the
      ;; first symbol from the input symbols (this should always be a number
      ;; in the kinds of infix expressions we support).
      ;; `syms` is initialized with the `rest` (everything but the first)
      ;; of the input symbols

      [acc (first symbols)
       syms (rest symbols)]

    ;; Inside the loop we always do a simple `if` on whether there are remaining
    ;; symbols or not

    (if (empty? syms)

      ;; If there are no symbols left to process, we return what the value of the
      ;; accumulator is


      ;; If there are symbols left, we `recur` back to the beginning of the loop,
      ;; with the following parameters


       ;; To get the next acc value, we take the first element from the
       ;; remaining symbols, treating it as a function, and give it the current
       ;; value of `acc` and the next remaining symbol as parameters.
       ;; E.g. if `acc` were currently 20 and syms were [+ 3 * 4 ...], the
       ;; first parameter to recur would be (+ 20 3) or 23

       ((first syms) acc (second syms))

       ;; The second parameter is the rest of the remaining symbols. Because
       ;; each pass of the loop processes two elements from the sequence of 
       ;; symbols, we use `(nthrest coll 2)` which with the parameter 2 returns 
       ;; the rest of the given collection except for the first two elements

       (nthrest syms 2)))))

And then for comparison, the workshop organizers’ solution:

(defn calculator [& xs]
  (loop [res (first xs) ops (rest xs)]
    (if (empty? ops)
      (recur ((first ops) res (second ops))
             (rest (rest ops))))))

Amusingly the solutions turn out to be basically identical. The only difference besides naming is the use of `(nthrest coll 2)` versus `(rest (rest coll))`

The solution and the whole problem are quite simple: take the first number from the input as the starting accumulated value. Then process the remaining arguments in chunks of two, always applying the first of the chunk as a function to the current accumulated value and the second value of the chunk, storing the result as the new accumulated value. Repeat this until there are no elements left and accumulator will contain the final result.

It actually took me quite a while to arrive at the solution. Although I had some faint gut feeling that this calls for recursion, I kind of avoided going there at first, trying different variations of somehow trying separate the numbers from the operators and then trying to “map the operators” over pairs of numbers, but that started to feel too complicated pretty quickly.

After a while it just hit me how to get it done with loop and recur since I had recently used them for the first time in another example. I think I partly tried to avoid going to loop/recur because in Clojure Programming Emerick et al. tell that:

recur is a very low-level looping and recursion operation that is usually not necessary:

When “iterating” over a collection or sequence, functional operations like map, reduce, for, and so on are almost always preferable.

So I assume this problem could probably be solved with reduce (or some combination of map, reduce, .. ) as well. I might try that and add the additional solution to this post later unless someone wants to beat me to it and include one in the comments. 🙂

Edit: Looks like this exercise was taken from 4Clojure: Infix Calculator

Parsing the rank and suit of a playing card’s string representation

I’ll follow the previous post with a similar one: this was the second exercise in the Introduction to Clojure workshop at Reaktor Dev Day. The task was to parse a textual representation of a standard playing card (e.g. “H5” for five of hearts) into a map of the form {:suit :heart :rank 5}.

The workshop was nearing its end when I got to work on this exercise so I didn’t have that much time to figure out a pretty solution. What I came up with is probably quite unidiomatic but at least it works.

For comparison I’ll also post the cleaner example solution by the workshop organizers.

The task:

;; A standard deck of playing cards has four suits -
;; spades, hearts, diamonds, and clubs - and thirteen cards in each suit.
;; Two is the lowest rank, followed by other integers up to ten;
;; then the jack, queen, king, and ace.

;; It's convenient for humans to represent these cards as suit/rank pairs,
;; such as H5 or DQ: the heart five and diamond queen respectively.
;; But these forms are not convenient for programmers, so to write a card
;; game you need some way to parse an input string into meaningful components.
;; For purposes of determining rank, we will define the cards to be valued
;; from 0 (the two) to 12 (the ace)

;; Write a function which converts (for example) the string "SJ"
;; into a map of {:suit :spade, :rank 9}. A ten will always be represented
;; with the single character "T", rather than the two characters "10".

(= {:suit :diamond :rank 10} (parse-card "DQ"))
(= {:suit :heart :rank 3} (parse-card "H5"))

My solution:

(defn parse-card [s]
  (let [
        ;; I always define two lookup maps that match the string representations
        ;; of the suits and ranks to their corresponding keyword and numeric
        ;; values
        suits {"D" :diamond "H" :heart "C" :club "S" :spade}

        ranks {"2" 0 "3" 1 "4" 2 "5" 3 "6" 4 "7" 5 "8" 6
               "9" 7 "T" 8 "J" 9 "Q" 10 "K" 11 "A" 12}

        ;; Then I look up the suit from the suits map using the first character
        ;; from the input `s` 
        suit (suits (str (first s)))

        ;; And the rank is looked up using the second character from the input
        ;; `s`
        rank (ranks (str (second s)))]

    ;; Finally suit and rank are returned as the expected map 
    {:suit suit :rank rank}))

And the “official” solution:

(defn as-rank-map [[s r]]
  {:suit (get {\S :spade \H :heart \D :diamond \C :club} s)
   :rank (.indexOf (seq "23456789TJQKA") r)})

Much shorter. Most of the shortness comes from not using `let` and the intermediate local names for e.g. the suit map. Everything is neatly inlined here while I was more into an imperative style: “first create these lookup maps, do a lookup and store the values, then create a map out of them”.

One important point that I missed is the destructuring of the input parameter: the string is already split into the suit and rank parts in the parameter vector of the function definition whereas I used `first` and `second` functions in my solution.

And obviously the way of determining the rank is different. I’m not sure if I like it though.. to me it seems a bit too clever. I might prefer a more unified approach for both suits and ranks:

{:suit (get {\S :spade \H :heart \D :diamond \C :club} s)
 :rank (get {\2 0 \3 1 ... \A 12} r)}

Although more verbose I think this would be more immediately obvious to someone reading the code.

Finding the largest multiple of five consecutive digits from a 500 digit string in Clojure

My first Clojure post will be an annotated solution to a specific job interview-like problem:

Given a long string of numbers such as:


how do you elegantly find the largest multiple of five consecutive single digit numbers?

I recently attended the awesome Reaktor Dev Day conference, which held an introductory hands-on workshop on Clojure. This problem was the first of the problems in an exercise set we worked on. I found it to be of just the right difficulty for my skill level at that time – at first I was lost but once I learned about the `partition` function, everything clicked and I got a kick out of solving the problem.

So here’s my solution as an embedded Gist with step-by-step comments. This is pretty basic stuff but hopefully the commentary is helpful to some beginner out there.

;; A helper function that converts a single character string into a numeric value
(defn as-int [char]
(Character/getNumericValue char))
;; The solution as a one-liner: function that returns the largest multiple of 5
;; consecutive digits in the given string of digits
(defn max-of-5-multiply-v1 [str]
(apply max (map #(apply * %) (partition 5 1 (map as-int str)))))
;; Example of use with the given input string
(def numbers-in-string (str "37900490610897696126265185408732594047834333441947"
(max-of-5-multiply-v1 numbers-in-string)
;; Returns 34992
;; The annotated solution
;; Defines a function named 'max-of-5-multiply-v1'
(defn max-of-5-multiply-v1
;; The function takes one argument: `str` - this is the string of numbers
;; We `apply` the function `max` to the sequence that is created by the subsequent
;; function calls.
;; Applying a function to a sequence (rather can calling the function directly) 'unrolls'
;; the elements of the sequence into individual arguments to the applied function.
;; You cannot call max with a e.g. a vector: (max [1 2 3]) does not work, but calling
;; (apply max [1 2 3]) is equivalent to (max 1 2 3).
;; max simply returns the largest of the given arguments. Note that we apply max to
;; sequence of numbers of any length because max works with any number of arguments.
(apply max
;; The `map` function applies the function given as its first parameter to each element
;; of the collection given as the second parameter and returns a sequence containing
;; these 'transformed' elements.
;; Here we define the function as the first argument to map above. #() is a shorthand
;; syntax for defining anonymous functions. The % is a place holder for the single
;; input argument to the anonymous function.
;; This function will be called for each element of the collection that is passed as
;; the second argument to the map call above. In this case the anonymous function
;; applies the multiplication function to each value it receives. Here we expect to
;; always get collections of values since multiplication of a single value doesn't
;; work.
;; Again `apply` unrolls the collection into individual arguments so if an element of
;; the input collection were [1 2 3], (apply * [1 2 3]) would result in the
;; call (* 1 2 3).
#(apply * %)
;; The function `partition` takes a sequence and splits it into partitions of length n
;; (here n = 5).
;; The optional second argument defines how much the input sequence is 'offset' for
;; each partition.
;; Examples:
;; (partition 2 '(1 2 3 4)) results in ((1 2) (3 4))
;; (partition 2 1 '(1 2 3 4) results in ((1 2) (2 3) (3 4))
;; So in this particular problem we need (partition 5 1 ...) which splits the input
;; sequence into partitions of five consecutive elements
(partition 5 1
;; The string of digits `str` is converted into a sequence of numeric values by
;; calling the `as-int` function for each character in the string separately.
;; Since strings are sequences, they can be transformed with `map` like any other
;; sequence.
(map as-int str)))))
;; To follow the flow of execution you have to start from the inner most (map as-int str)
;; call in the end:
;; 1) The string "37900490610..." is transformed into a sequence of numeric values:
;; '(3 7 9 0 0 4 ...)
;; 2) The sequence of numeric values is partitioned into sequences containing five
;; consecutive numbers (always offset by one) from the input sequence:
;; ((3 7 9 0 0) (7 9 0 0 4) (9 0 0 4 9) ...)
;; 3) The sequence of sequences is given to map which calls the anonymous #(apply * %) for
;; each 5 element sequence resulting in a sequence containing the results of multiplying
;; the numbers in each subsequence together: (0 0 0 ... 27216). We get a lot of zeroes in
;; the beginning because the first five number sequence without a zero takes a while to
;; come up.
;; 4) max is applied to the sequence of multiples which yields the largest of them.
;; Easier to read version of the same function can be achieved with the double arrow or
;; 'thread last' macro `->>`.
;; This macro 'threads' the first argument expression (in this case `str`) through each
;; form given as subsequent arguments: str is inserted as the last argument to the first
;; form, that form is inserted as the last argument of the second form and so on:
(defn max-of-5-multiply-v2 [str] (->> str
(map as-int)
(partition 5 1)
(map #(apply * %))
(apply max)))
;; In this example `str` goes as the last argument to the map call: (map as-int str).
;; The result of this then is inserted as the last argument to (partition 5 1):
;; (partition 5 1 (map as-int str)) and so on. So this will eventually yield exactly the same
;; code as the original solution but can be read in a more natural way.
view raw gistfile1.clj hosted with ❤ by GitHub

The example solution by the workshop organizers used

(map #(reduce * %))

where I used

(map #(apply * %))

Both work – I’m not sure if there is some performance or other difference between them in this case.

If anyone has alternative solutions, please post them in the comments. Also my use of Clojure terms might be a bit off or ambiguous at times – feel free to correct! 🙂

(By the way, I’m trying to find a WordPress theme that is wide enough to show the code snippets without vertical scrolling.)

New beginning

ImageWell, the posting frequency on this blog has been quite low. It seems that I don’t run into the kinds of programming related problems that are interesting and general enough to blog about.

At work I’ve been coding mostly Java (~98% of the time) with a little bit of JavaScript on the side (amount of JS has increased a bit lately with the whole JS MV* framework trend). Recently I’ve become more and more interested in Clojure however.

I’ve been growing increasingly bored of Java even though there’s a lot to learn there as well: things I don’t get to use in the day-to-day work. Rather than specializing in Java and getting to know every nook and cranny of it, I feel it will serve me better to really step ouf the comfort zone and look into different programming paradigms. It will probably be more valuable to learn new ways of thinking about and building software rather than knowing every feature of a single programming language.

Anyway, functional programming is trending and that seems like the first obvious paradigm to dive into after OOP. Since I’ve lived on the JVM thus far and have been passively interested in Lisps, Clojure seems like a good choice.

I’ve been slowly practising Clojure on my free time but haven’t started any real or even toy-projects yet, but I feel I’m getting ready to start working on something substantial soon – lot’s of ideas brewing at the back of my mind.

My plan is to try and blog about the issues I face learning and working with Clojure and how I’ve solved them – and also about any projects I might start to work on.

Upgrading Hibernate from 3.2.4 SP1 to 3.6.10

This is a short summary of what I had to do to upgrade Hibernate from version 3.2.4 SP1 to 3.6.10 in our reporting/analysis type of application built on an aging Struts 1.2.x / Spring 3.2 (recently upgraded) / Hibernate / MySQL -stack.

I was initially worried I would run into demoralizing amount of problems because although our product is not that large in terms of LOC, Hibernate is used a lot and in many ways (HQL queries, Criteria queries, native MySQL queries, somewhat complex object mappings, etc). However, the upgrade went surprisingly smoothly. The jump from 3.x to 4.x might be a bigger step though but that will be another story.

This case was probably quite a simple upgrade and upgrading other apps might face different issues but hopefully this will help someone.

Here are the steps I took:

  • Downloaded the release bundles for 3.6.10 
  •  Threw away the old 3.2.4 hibernate3.jar (and also hibernate-annotations.jar and hibernate-jpa-2.0-api-x.jar which we had because of dependencies from a shared library we use)
  • Added the new 3.6.10 hibernate3.jar and the jars from the required/ directory in the 3.6.10 release bundle. Also added the hibernate-jpa-x.jar from under jpa/

At this point I’m getting a compilation error. We have a custom enum UsertType (based on this example), which calls TypeFactory.basic(..), which longer exists. I find a question about this on StackOverflow which fortunately has an answer that contains an updated version of the custom type which works in 3.6.10.

Now the app compiles and deploys to Tomcat without errors. I get to the login page but get an exception thrown in my face immediately on login. The error is

org.hibernate.QueryException: query specified join fetching, but the 
owner of the fetched association was not present in the select list

The offending query looks something like this:

select f from Foo f
left join fetch
left join fetch f.baz as x
left join fetch x.y

SO comes to my rescue again, someone else has asked about this:

“Use regular join instead of join fetch (by the way, it’s inner by default)”,  “As error message tells you, join fetch doesn’t make sense here, because it’s a performance hint that forces eager loading of collection.”

So by changing the last ‘left join fetch’ to just ‘left join’ fixed the issue.

Now I can login and poke around the app manually for a few minutes and to my surprise all the major features seem to work at least on the surface. We have a pretty good integration and system test coverage so if I get those to pass I can be quite confident that nothing major is broken.

First I run the integration tests: 15 failures, all of the type:

[Ljava.lang.Object; cannot be cast to

Turns out there are few HQL queries of the form

select distinct foo,
from Foo foo
inner join fetch

which have previously for some reason returned List<Foo> without problems, but now return a list of Object tuples where the first element is an instance of Foo and second element is an instance of Bar. Looking at the query this makes total sense and I’m wondering why it worked before..

Since the code using these queries obviously has not needed the second element before, I change the queries to only select foo and all the failed tests pass.

Next up is system tets. To my surprise everything passes on the first go.

At this point I’m quite sure all the major features work. However there are some smaller less important features that are unfortunately not covered by tests which still need to be checked manually (if there are breakages, tests will be added).

Extending the export functionality of the Display tag library

Display tag

Display tag is a library we’ve been using at work when developing web applications in Java. It provides a fast and easy way to generate simple HTML tables from collections of objects.


The tables can also be exported into several alternative formats like CSV, XML, PDF or Excel. The exporting functionality is very basic and especially the output of the default Excel export implementation looks quite crappy:

Display tag Excel export output

How the default excel output looks

Another problem we had is that the export only exports the table (can’t complain though, that’s what the library is supposed to do, tables). Many of the tables in our product are generated based on selections the user makes and we want these selections to be visible in the exports they make. By default you can only add a simple one line caption and a footer to the exported table. We could just stuff all the selection information there on that one line but it would not look good because we have many types of selections all of which can have multiple options.

This is how we would like to have the export look like (manually sketched in Excel):

Export result mock up

The way we want it to look like


Implement the BinaryExportView

According to the Display tag website I need to write my own implementation of the BinaryExportView interface. There is also a TextExportView interface but because we want to output Excel data with style information like font sizes, font weights etc, we need the binary export.  Since I had no idea how to go on with generating Excel, I looked at the DefaultHssfExportView which is one of the default Excel export classes that come with the library. (If you are wondering about the “Hssf”, Display tag’s binary excel export uses the Apache POI project’s library called POI-HSSF to access and modify Excel workbooks.

Looking at the DefaultHssfExportView’s doExport() method, it doesn’t do any of the Excel generation there but delegates the task to a HssfTableWriter in which the table generation is nicely separated into methods like writeCaption(), writeTableHeader() and so on. So I clearly need my own writeCaption() in the HssfTableWriter and need generate my custom caption there.

My first thought was to extend the DefaultHssfExportView and HssfTableWriter and just override the essential methods since the rest of the export works well enough for us. However, I couldn’t really do this since all the fields in those classes were private so I couldn’t for example access the HSSFWorkbook wb -field in my subclass. That’s why I ended up copy / pasting (*cringe*) the classes and replacing the appropriate methods and imports with my own implementation (and fixing a bug* in the original HssfTableWriter).

Get the caption data to the new BinaryExportView

I still had one open question: how to get the caption information (header, selections) easily to my HssfTableWriter since the caption field in the TableModel is the only way to pass information.  So I need a neat way to pass structured information in plain text and put it back together for easy access in the HssfTableWriter.

Of course my initial primal instinct was to reinvent the wheel and design some neat text format and some kind of parser for it. Then I came to my senses and realized this would be a nice place to try out JSON. A quick search at StackOverflow led me to google-gson, which seemed pleasantly simple to use.

I created a simple data transfer class for the caption data:

public class ExcelExportCaptionData {

	private String header;
	private Map<String,Map<String,String>> sublists;

        // Getters and setters omitted..

Now I can create an ExcelExportCaptionData instance in my controller / action with the selections and headers and turn in into a JSON string for the Display tag table in the JSP page:

ExcelExportCaptionData data = new ..
String jsonOutput = new Gson().toJson(data);

This results in a string like

  "sublists": {

I put this string to the display tag table’s caption with the <display:caption> tag in the JSP page:

  <display:caption media="excel">${excelCaptionData}</display:caption>

Finally, in my HssfTableWriter, I can access the JSON string from the TableModel and turn that back to ExcelExportCaptionData instance:

ExcelExportCaptionData data = new Gson().fromJson(tableModel.getCaption(), ExcelExportCaptionData.class);

Generate the spreadsheet

Now I have all the caption data in the easy to access object and I can generate the excel caption before the actual table using the POI-HSSF API. I’m not going into details on that, just look at the original HssfTableWriter to see how it’s done or refer to the POI-HSSF documentation.


Theres one final thing to do in order to enable the new export classes. In the file, if you already used some other excel export class, replace the


with the fully qualified name of our new BinaryExportView implementation. Make sure you also have


otherwise the new export option won’t show up the tables.


Here’s a screenshot of the actual output with our custom export classes:

Final result

Final result of our custom export (private data obfuscated)


When I started looking into this problem I noticed that some of your original excel exports where crashing with an error like

Exception: [.DefaultHssfExportView] !DefaultHssfExportView.errorexporting! Cause: The 'to' row (0) must not be less than the 'from' row (24)

Some googling led me to the display tag issue tracker and this known issue The reason was that the underlying POI-HSSF API’s  cell merging feature had changed to use a different class. The display tag’s HssfTableWriter had been updated to reflect this change but they had failed to notice that the order of row and column number parameters had changed in the new API class:

Old API parameter order:

firstRow, firstCol, lastRow, lastCol

New API parameter order:

firstRow, lastRow, firstCol, lastCol

This issues hasn’t been fixed in the display tag source as of writing this so anyone using the default excel export should be aware of this. If I remember correctly it only occurs if you use captions or footers in your export though.