Shell scripting in Clojure part 2: A concrete example

This is a follow-up to “Shell scripting with Clojure“, which described how to make Clojure scripts easily runnable from the command line. In this part I would like to show the actual script we came up with @hanneshaataja in our afterwork Clojure session.

After taking a second look at the script few days after the session though, I realized that it was pretty damn ugly. We had naturally ran with the first thing that worked in the quick hackathon, so I wanted to refactor it quite a bit before posting it here for all to see. However I think it might Be interesting to show both the original and improved versions and see how they differ.

Purpose of the script

As a recap from the previous post, the purpose of the script was to go through a directory containing test result files generated by the Selenium testing framework and output the names of every failed test case.

Each test result file is an HTML document that contains a summary table that lists the test cases in a particular test suite. Rows of failed test cases have a “status_failed” HTML class attribute:

<tr class="status_failed"><td><a href="#testresult1">TestCaseX</a></td></tr>

So, the script should just go through each file, find the rows with the class status_failed and pick up the test case names which are nested in the <td> and <a> elements.

Version 1 – Not like this

Here is the first version of the script. Now my sense of what is idiomatic Clojure is not very strong yet, but I’m pretty sure this is not proper. After the code I’ll list what I thought was wrong with it.

(By the way, I’m posting the code as embedded Gists because the Clojure syntax highlighting of WordPress doesn’t seem to work that well. Especially the comments are annoying some as some words get highlighted. Unfortunately some content aggregators like Planet Clojure don’t seem to show to display Gists)

My first problem with the first version was that it wrote the results into a file. Since this is a command-line script it makes more sense to write the results into standard output so that the user of the script can then decide what to do with the results: write them to a file or e.g. pipe them to another program.

Secondly the code itself feels pretty imperative, reflecting our non-FP-mindset: first we set up “variables” like input-dir and files with def, then iterate over the files and do something with them. I guess this is not a big problem in a quick one-off script like this but I nevertheless wanted try to rewrite it in a more functional way.

Third small issue was that the script would try to parse any file in the directory it was given whether they were HTML files or not. In our case the directory always contains only html files but it would be nice to have at least some sanity checking there.

Version 2 – Better I hope

The second version feels more idiomatic. The input directory and the files are no longer stored into a var using def, instead everything is a function.

Let’s go through the parts of the script in more detail:

(defn run
  "Finds and prints the names of failed test cases from the Selenium result
   files in the user provided or default directory"
  []
  (doseq
      [failed-test-case
       (->> (result-file-loc) (read-html-files) (find-failed-tests))]
    (println failed-test-case)))

The goal of the script is achieved by the subsequent calls of the following three functions, always passing the result of one to the next:

  • result-file-loc returns the directory of the files to read, which is passed to read-html-files
  • read-html-files creates a file seq of the HTML files in the directory and passes the sequence to find-failed-tests
  • find-failed-tests extracts the failed test case names from each file and returns them in a collection

The wrapping doseq iterates over each test case file name and prints it out.

Here are the individual functions in more detail:

(defn result-file-loc
  "Returns the location of the result files (either provided as the first
   command line argument or a default value of 'test/system/results'"
  []
  (let [first-cmd-line-arg (second *command-line-args*)]

    (if (or
         (nil? first-cmd-line-arg)
         (= ":headless" first-cmd-line-arg))

      ;; Default dir
      "test/system/results"

      ;; user provided dir
      first-cmd-line-arg)))

Nothing too special here. When the script is executed from command line with lein-exec, the second element in the global *command-line-args* contains the first command line argument (the first of *command-line-args* is the name of the script file). I also noticed that when evaluating the script in a REPL, the second element contains the value “:headless” so result-file-loc returns the default directory in that case also.

(defn read-html-files
  "Given a directory, returns a sequence of java.io.File instances of each .html
   file in the directory"
  [dir]

  ;; Include only files that end with .html
  (filter #(and (.isFile %) (.endsWith (.getName %) ".html"))
          (file-seq (clojure.java.io/file dir))))

read-html-files is quite similar to the first version. I just added the additional condition to the filter predicate so that only files ending with “.html” are included. This of course doesn’t stop the script from trying to e.g. parse non-HTML files that just happen to have an .html extension and so on. For the purposes of this script this enough though.

(defn find-failed-tests
  "Given a collection of Selenium test result files, returns the names of the
  failed test cases in the result files"
  [files]

  (reduce (fn [all-failed-tests file]
            (let [failed-tests
                  (->> file
                       (slurp)
                       (java.io.StringReader.)
                       (html/html-resource)
                       (extract-failed-test-names))]

              (if-not (empty? failed-tests)
                (apply conj all-failed-tests failed-tests)
                all-failed-tests)))
          []
          files))

find-failed-tests does the main work. Given the sequence of HTML files, it goes over them using reduce and accumulates a collection of the failed test case names which is finally returned.

In the first version we just iterated over the files and spat out the test case names ‘on the fly’ as they were encountered. In this version I wanted gather the names into a collection which could them be used for further processing (in this case just to print them out).

reduce calls the anonymous function for each file: contents of each file is read in, converted into a character stream (java.io.StringReader), the stream is fed to Enlive’s html-resource which parses the HTML. Finally extract-failed-test-names (explained below) is called with the parsed HTML contents to get any failed test case names.

If a particular file had failed test cases, their names are added to the vector all-failed-tests which is initially empty. If the test suite of the file didn’t have any failures, all-failed-tests is just passed on as-is for the next round of reduce.

(defn extract-failed-test-names
  "Given an Enlive html-resource of a Selenium result file, returns names of the
   failed test cases"
  [selenium-html-resource]
  (map html/text
       (html/select selenium-html-resource [:tr.status_failed :td :a])))

The code to extract the failed test case names using Enlive selectors is simplified a bit. Rather than doing (flatten (map :content (html/select … ) you can use Enlive’s text function to achieve the same result: (map html/text (html/select ... )

So there you have it. Any ideas for improvement? I’m wondering would it make sense (at least for sake of exercise) to make find-failed-tests return a lazy sequence rather than eagerly parse each file? Would you use something other than Enlive for the HTML parsing?

Shell scripting with Clojure

We had a nice afterwork Clojure learning session with a colleague the other day during which we quickly hacked together a shell script that reads a directory containing Selenium test result files and outputs all the failed test cases. (Why we haven’t automated the Selenium test running and result reporting with e.g. Jenkins is another story..)

In the session I learned at least about the following:

  • making standalone command-line executable Clojure scripts (as long as you have Leiningen installed) that still use Leiningen to handle the dependencies
  • reading and writing files
  • using Enlive to parse and look up particular data from HTML files

In this post, I’ll describe how to create command-line executable Clojure scripts with lein-exec. In a later post I will describe the test result parsing script we put together.

Command-line executable Clojure scripts with lein-exec

The only prerequisite here is that you have Leiningen installed but it should be pretty safe to assume this as lein seems to be the de facto build tool for Clojure.

Now, when you want to do quick standalone Clojure scripts without having to create a complete lein project, but still want to use lein to manage the dependencies to any 3rd party libraries, there is a nice lein plugin called lein-exec.

The installation is simple (the following assumes Leiningen 2, see the lein-exec readme for 1.x installation instructions). A global installation (meaning you can use lein-exec any time you call lein, rather than specifying it as a project-specific plugin) of lein-exec is done by adding the following to your ~/.lein/profiles.clj (if profiles.clj doesn’t exist, create it first).

;; as of writing this, 0.3.1 was the latest version
{:user {:plugins [[lein-exec "0.3.1"]]}}  

Now we can quickly create and run scripts:

(println (+ 1 2))

running foo.clj:

$ lein exec foo.clj
3

This isn’t that impressive yet, but once you add dependencies to third-party libraries, the usefulness becomes more apparent:

(require 'leiningen.exec)

;; Add a dependency to the classpath on the fly
(leiningen.exec/deps '[[enlive/enlive "1.1.4"]])

(require '[net.cgrand.enlive-html :as html])

;; Grab and print the title element from the Google front page using Enlive
(println (html/select (html/html-resource (java.net.URL. "http://google.com")) [:title]))

and let’s run it:

$ lein exec bar.clj
({:tag :title, :attrs nil, :content (Google)})

Now lein and lein-exec handle the enlive dependency behind the scenes. This is nice as now you have all the power of the Clojure ecosystem available for you even in quick one-off scripts.

It is also possible to make the scripts executable themselves if you are in a *nix environment. You need to download a lein-exec script and save it somewhere in your path. Also, if your lein executable is called something else than `lein`, you have to modify the lein-exec script to reflect that.

$ wget https://raw.github.com/kumarshantanu/lein-exec/master/lein-exec # download
$ chmod a+x lein-exec # make the script executable
$ mv lein-exec ~/bin # move to ~/bin, assuming it is in PATH

Now if you add the following to the first line of your script:

#!/usr/bin/env lein-exec
(println (+ 1 2))

you can execute the script directly (as long as you give it the execute permission):


$ chmod +x foo.clj
$ ./foo.clj
$ 3

That’s it for now. I will follow up with the Selenium test result parsing script soon.

Read more about stand-alone Clojure scripting:

Simple infix math calculation

This is my last post in a series in which I go through the exercises presented at the introductory Clojure workshop at Reaktor Dev Day 2013. Check out part 1 and part 2 as well.

I didn’t have time to even look at the last exercise during the workshop so for this blog post I decided to try and solve it without looking at the solution first, and then compare mine with the organizers’ solution.

The task was to implement a function that calculates math expressions given in infix notation (that is ‘1+1’ instead of the prefix notation (+ 1 1) we are used to in Clojure). The task was highly simplified as there were no requirements for operator precedence or handling of parentheses.

So here is my solution with and without detailed commentary:

(defn calc [& symbols]
  (loop [acc (first symbols)
         syms (rest symbols)]

    (if
        (empty? syms) acc
        (recur ((first syms) acc (second syms)) (nthrest syms 2)))))

;; Note that this calculates naively from right
;; to left without considering order of operations
(= 7  (calc 2 + 5))
(= 42 (calc 38 + 48 - 2 / 2))
(= 8  (calc 10 / 2 - 1 * 2))
(= 72 (calc 20 / 2 + 2 + 4 + 8 - 6 - 10 * 9))

And here is the more detailed walk-through of the solution:

;; Define a function `calc` that takes a variable number of arguments which will
;; be available in the sequence `symbols`
;; (No sanity checking of input here, you could call (calc 20 +) and break it)

(defn calc [& symbols]

  ;; Loop works like `let` (we can bind values to local names and then use them
  ;; inside expressions within the loop. In addition, from inside the loop, we can
  ;; call `recur` with the same number of arguments as the loop has bindings,
  ;; which will jump the execution back to the beginning of the loop with the
  ;; values `recur` was called with bound to the names

  (loop

      ;; Our loop has two bindings `acc` for accumulator and `syms` for
      ;; the remaining symbols to be processed. `acc` is initialized with the
      ;; first symbol from the input symbols (this should always be a number
      ;; in the kinds of infix expressions we support).
      ;; `syms` is initialized with the `rest` (everything but the first)
      ;; of the input symbols

      [acc (first symbols)
       syms (rest symbols)]

    ;; Inside the loop we always do a simple `if` on whether there are remaining
    ;; symbols or not

    (if (empty? syms)

      ;; If there are no symbols left to process, we return what the value of the
      ;; accumulator is

      acc

      ;; If there are symbols left, we `recur` back to the beginning of the loop,
      ;; with the following parameters

      (recur

       ;; To get the next acc value, we take the first element from the
       ;; remaining symbols, treating it as a function, and give it the current
       ;; value of `acc` and the next remaining symbol as parameters.
       ;; E.g. if `acc` were currently 20 and syms were [+ 3 * 4 ...], the
       ;; first parameter to recur would be (+ 20 3) or 23

       ((first syms) acc (second syms))

       ;; The second parameter is the rest of the remaining symbols. Because
       ;; each pass of the loop processes two elements from the sequence of 
       ;; symbols, we use `(nthrest coll 2)` which with the parameter 2 returns 
       ;; the rest of the given collection except for the first two elements

       (nthrest syms 2)))))

And then for comparison, the workshop organizers’ solution:

(defn calculator [& xs]
  (loop [res (first xs) ops (rest xs)]
    (if (empty? ops)
      res
      (recur ((first ops) res (second ops))
             (rest (rest ops))))))

Amusingly the solutions turn out to be basically identical. The only difference besides naming is the use of `(nthrest coll 2)` versus `(rest (rest coll))`

The solution and the whole problem are quite simple: take the first number from the input as the starting accumulated value. Then process the remaining arguments in chunks of two, always applying the first of the chunk as a function to the current accumulated value and the second value of the chunk, storing the result as the new accumulated value. Repeat this until there are no elements left and accumulator will contain the final result.

It actually took me quite a while to arrive at the solution. Although I had some faint gut feeling that this calls for recursion, I kind of avoided going there at first, trying different variations of somehow trying separate the numbers from the operators and then trying to “map the operators” over pairs of numbers, but that started to feel too complicated pretty quickly.

After a while it just hit me how to get it done with loop and recur since I had recently used them for the first time in another example. I think I partly tried to avoid going to loop/recur because in Clojure Programming Emerick et al. tell that:

recur is a very low-level looping and recursion operation that is usually not necessary:

When “iterating” over a collection or sequence, functional operations like map, reduce, for, and so on are almost always preferable.

So I assume this problem could probably be solved with reduce (or some combination of map, reduce, .. ) as well. I might try that and add the additional solution to this post later unless someone wants to beat me to it and include one in the comments. 🙂

Edit: Looks like this exercise was taken from 4Clojure: Infix Calculator

Parsing the rank and suit of a playing card’s string representation

I’ll follow the previous post with a similar one: this was the second exercise in the Introduction to Clojure workshop at Reaktor Dev Day. The task was to parse a textual representation of a standard playing card (e.g. “H5” for five of hearts) into a map of the form {:suit :heart :rank 5}.

The workshop was nearing its end when I got to work on this exercise so I didn’t have that much time to figure out a pretty solution. What I came up with is probably quite unidiomatic but at least it works.

For comparison I’ll also post the cleaner example solution by the workshop organizers.

The task:


;; A standard deck of playing cards has four suits -
;; spades, hearts, diamonds, and clubs - and thirteen cards in each suit.
;; Two is the lowest rank, followed by other integers up to ten;
;; then the jack, queen, king, and ace.

;; It's convenient for humans to represent these cards as suit/rank pairs,
;; such as H5 or DQ: the heart five and diamond queen respectively.
;; But these forms are not convenient for programmers, so to write a card
;; game you need some way to parse an input string into meaningful components.
;; For purposes of determining rank, we will define the cards to be valued
;; from 0 (the two) to 12 (the ace)

;; Write a function which converts (for example) the string "SJ"
;; into a map of {:suit :spade, :rank 9}. A ten will always be represented
;; with the single character "T", rather than the two characters "10".

(= {:suit :diamond :rank 10} (parse-card "DQ"))
(= {:suit :heart :rank 3} (parse-card "H5"))

My solution:

(defn parse-card [s]
  (let [
        ;; I always define two lookup maps that match the string representations
        ;; of the suits and ranks to their corresponding keyword and numeric
        ;; values
        suits {"D" :diamond "H" :heart "C" :club "S" :spade}

        ranks {"2" 0 "3" 1 "4" 2 "5" 3 "6" 4 "7" 5 "8" 6
               "9" 7 "T" 8 "J" 9 "Q" 10 "K" 11 "A" 12}

        ;; Then I look up the suit from the suits map using the first character
        ;; from the input `s` 
        suit (suits (str (first s)))

        ;; And the rank is looked up using the second character from the input
        ;; `s`
        rank (ranks (str (second s)))]

    ;; Finally suit and rank are returned as the expected map 
    {:suit suit :rank rank}))

And the “official” solution:


(defn as-rank-map [[s r]]
  {:suit (get {\S :spade \H :heart \D :diamond \C :club} s)
   :rank (.indexOf (seq "23456789TJQKA") r)})

Much shorter. Most of the shortness comes from not using `let` and the intermediate local names for e.g. the suit map. Everything is neatly inlined here while I was more into an imperative style: “first create these lookup maps, do a lookup and store the values, then create a map out of them”.

One important point that I missed is the destructuring of the input parameter: the string is already split into the suit and rank parts in the parameter vector of the function definition whereas I used `first` and `second` functions in my solution.

And obviously the way of determining the rank is different. I’m not sure if I like it though.. to me it seems a bit too clever. I might prefer a more unified approach for both suits and ranks:

{:suit (get {\S :spade \H :heart \D :diamond \C :club} s)
 :rank (get {\2 0 \3 1 ... \A 12} r)}

Although more verbose I think this would be more immediately obvious to someone reading the code.

Finding the largest multiple of five consecutive digits from a 500 digit string in Clojure

My first Clojure post will be an annotated solution to a specific job interview-like problem:

Given a long string of numbers such as:

37900490610897696126265185408732594047834333441947
01850393807417064181700348379116686008018966949867
75587222482716536850061657037580780205386629145841
06964490601037178417735301109842904952970798120105
47016802197685547844962006690576894353336688823830
22913337214734911490555218134123051689058329294117
83011983450277211542535458190375258738804563705619
55277740874464155295278944953199015261800156422805
72771774460964310684699893055144451845092626359982
79063901081322647763278370447051079759349248247518

how do you elegantly find the largest multiple of five consecutive single digit numbers?

I recently attended the awesome Reaktor Dev Day conference, which held an introductory hands-on workshop on Clojure. This problem was the first of the problems in an exercise set we worked on. I found it to be of just the right difficulty for my skill level at that time – at first I was lost but once I learned about the `partition` function, everything clicked and I got a kick out of solving the problem.

So here’s my solution as an embedded Gist with step-by-step comments. This is pretty basic stuff but hopefully the commentary is helpful to some beginner out there.

The example solution by the workshop organizers used

(map #(reduce * %))

where I used

(map #(apply * %))

Both work – I’m not sure if there is some performance or other difference between them in this case.

If anyone has alternative solutions, please post them in the comments. Also my use of Clojure terms might be a bit off or ambiguous at times – feel free to correct! 🙂

(By the way, I’m trying to find a WordPress theme that is wide enough to show the code snippets without vertical scrolling.)

New beginning

ImageWell, the posting frequency on this blog has been quite low. It seems that I don’t run into the kinds of programming related problems that are interesting and general enough to blog about.

At work I’ve been coding mostly Java (~98% of the time) with a little bit of JavaScript on the side (amount of JS has increased a bit lately with the whole JS MV* framework trend). Recently I’ve become more and more interested in Clojure however.

I’ve been growing increasingly bored of Java even though there’s a lot to learn there as well: things I don’t get to use in the day-to-day work. Rather than specializing in Java and getting to know every nook and cranny of it, I feel it will serve me better to really step ouf the comfort zone and look into different programming paradigms. It will probably be more valuable to learn new ways of thinking about and building software rather than knowing every feature of a single programming language.

Anyway, functional programming is trending and that seems like the first obvious paradigm to dive into after OOP. Since I’ve lived on the JVM thus far and have been passively interested in Lisps, Clojure seems like a good choice.

I’ve been slowly practising Clojure on my free time but haven’t started any real or even toy-projects yet, but I feel I’m getting ready to start working on something substantial soon – lot’s of ideas brewing at the back of my mind.

My plan is to try and blog about the issues I face learning and working with Clojure and how I’ve solved them – and also about any projects I might start to work on.

Upgrading Hibernate from 3.2.4 SP1 to 3.6.10

This is a short summary of what I had to do to upgrade Hibernate from version 3.2.4 SP1 to 3.6.10 in our reporting/analysis type of application built on an aging Struts 1.2.x / Spring 3.2 (recently upgraded) / Hibernate / MySQL -stack.

I was initially worried I would run into demoralizing amount of problems because although our product is not that large in terms of LOC, Hibernate is used a lot and in many ways (HQL queries, Criteria queries, native MySQL queries, somewhat complex object mappings, etc). However, the upgrade went surprisingly smoothly. The jump from 3.x to 4.x might be a bigger step though but that will be another story.

This case was probably quite a simple upgrade and upgrading other apps might face different issues but hopefully this will help someone.

Here are the steps I took:

  • Downloaded the release bundles for 3.6.10 
  •  Threw away the old 3.2.4 hibernate3.jar (and also hibernate-annotations.jar and hibernate-jpa-2.0-api-x.jar which we had because of dependencies from a shared library we use)
  • Added the new 3.6.10 hibernate3.jar and the jars from the required/ directory in the 3.6.10 release bundle. Also added the hibernate-jpa-x.jar from under jpa/

At this point I’m getting a compilation error. We have a custom enum UsertType (based on this example), which calls TypeFactory.basic(..), which longer exists. I find a question about this on StackOverflow which fortunately has an answer that contains an updated version of the custom type which works in 3.6.10.

Now the app compiles and deploys to Tomcat without errors. I get to the login page but get an exception thrown in my face immediately on login. The error is

org.hibernate.QueryException: query specified join fetching, but the 
owner of the fetched association was not present in the select list

The offending query looks something like this:

select f from Foo f
left join fetch f.bar
left join fetch f.baz as x
left join fetch x.y
...

SO comes to my rescue again, someone else has asked about this:

“Use regular join instead of join fetch (by the way, it’s inner by default)”,  “As error message tells you, join fetch doesn’t make sense here, because it’s a performance hint that forces eager loading of collection.”

So by changing the last ‘left join fetch’ to just ‘left join’ fixed the issue.

Now I can login and poke around the app manually for a few minutes and to my surprise all the major features seem to work at least on the surface. We have a pretty good integration and system test coverage so if I get those to pass I can be quite confident that nothing major is broken.

First I run the integration tests: 15 failures, all of the type:

java.lang.ClassCastException: 
[Ljava.lang.Object; cannot be cast to com.foo.Bar

Turns out there are few HQL queries of the form

select distinct foo, foo.bar
from Foo foo
inner join fetch foo.bar
...

which have previously for some reason returned List<Foo> without problems, but now return a list of Object tuples where the first element is an instance of Foo and second element is an instance of Bar. Looking at the query this makes total sense and I’m wondering why it worked before..

Since the code using these queries obviously has not needed the second element before, I change the queries to only select foo and all the failed tests pass.

Next up is system tets. To my surprise everything passes on the first go.

At this point I’m quite sure all the major features work. However there are some smaller less important features that are unfortunately not covered by tests which still need to be checked manually (if there are breakages, tests will be added).