Shell scripting in Clojure part 2: A concrete example

This is a follow-up to “Shell scripting with Clojure“, which described how to make Clojure scripts easily runnable from the command line. In this part I would like to show the actual script we came up with @hanneshaataja in our afterwork Clojure session.

After taking a second look at the script few days after the session though, I realized that it was pretty damn ugly. We had naturally ran with the first thing that worked in the quick hackathon, so I wanted to refactor it quite a bit before posting it here for all to see. However I think it might Be interesting to show both the original and improved versions and see how they differ.

Purpose of the script

As a recap from the previous post, the purpose of the script was to go through a directory containing test result files generated by the Selenium testing framework and output the names of every failed test case.

Each test result file is an HTML document that contains a summary table that lists the test cases in a particular test suite. Rows of failed test cases have a “status_failed” HTML class attribute:

<tr class="status_failed"><td><a href="#testresult1">TestCaseX</a></td></tr>

So, the script should just go through each file, find the rows with the class status_failed and pick up the test case names which are nested in the <td> and <a> elements.

Version 1 – Not like this

Here is the first version of the script. Now my sense of what is idiomatic Clojure is not very strong yet, but I’m pretty sure this is not proper. After the code I’ll list what I thought was wrong with it.

(By the way, I’m posting the code as embedded Gists because the Clojure syntax highlighting of WordPress doesn’t seem to work that well. Especially the comments are annoying some as some words get highlighted. Unfortunately some content aggregators like Planet Clojure don’t seem to show to display Gists)

My first problem with the first version was that it wrote the results into a file. Since this is a command-line script it makes more sense to write the results into standard output so that the user of the script can then decide what to do with the results: write them to a file or e.g. pipe them to another program.

Secondly the code itself feels pretty imperative, reflecting our non-FP-mindset: first we set up “variables” like input-dir and files with def, then iterate over the files and do something with them. I guess this is not a big problem in a quick one-off script like this but I nevertheless wanted try to rewrite it in a more functional way.

Third small issue was that the script would try to parse any file in the directory it was given whether they were HTML files or not. In our case the directory always contains only html files but it would be nice to have at least some sanity checking there.

Version 2 – Better I hope

The second version feels more idiomatic. The input directory and the files are no longer stored into a var using def, instead everything is a function.

Let’s go through the parts of the script in more detail:

(defn run
  "Finds and prints the names of failed test cases from the Selenium result
   files in the user provided or default directory"
  []
  (doseq
      [failed-test-case
       (->> (result-file-loc) (read-html-files) (find-failed-tests))]
    (println failed-test-case)))

The goal of the script is achieved by the subsequent calls of the following three functions, always passing the result of one to the next:

  • result-file-loc returns the directory of the files to read, which is passed to read-html-files
  • read-html-files creates a file seq of the HTML files in the directory and passes the sequence to find-failed-tests
  • find-failed-tests extracts the failed test case names from each file and returns them in a collection

The wrapping doseq iterates over each test case file name and prints it out.

Here are the individual functions in more detail:

(defn result-file-loc
  "Returns the location of the result files (either provided as the first
   command line argument or a default value of 'test/system/results'"
  []
  (let [first-cmd-line-arg (second *command-line-args*)]

    (if (or
         (nil? first-cmd-line-arg)
         (= ":headless" first-cmd-line-arg))

      ;; Default dir
      "test/system/results"

      ;; user provided dir
      first-cmd-line-arg)))

Nothing too special here. When the script is executed from command line with lein-exec, the second element in the global *command-line-args* contains the first command line argument (the first of *command-line-args* is the name of the script file). I also noticed that when evaluating the script in a REPL, the second element contains the value “:headless” so result-file-loc returns the default directory in that case also.

(defn read-html-files
  "Given a directory, returns a sequence of java.io.File instances of each .html
   file in the directory"
  [dir]

  ;; Include only files that end with .html
  (filter #(and (.isFile %) (.endsWith (.getName %) ".html"))
          (file-seq (clojure.java.io/file dir))))

read-html-files is quite similar to the first version. I just added the additional condition to the filter predicate so that only files ending with “.html” are included. This of course doesn’t stop the script from trying to e.g. parse non-HTML files that just happen to have an .html extension and so on. For the purposes of this script this enough though.

(defn find-failed-tests
  "Given a collection of Selenium test result files, returns the names of the
  failed test cases in the result files"
  [files]

  (reduce (fn [all-failed-tests file]
            (let [failed-tests
                  (->> file
                       (slurp)
                       (java.io.StringReader.)
                       (html/html-resource)
                       (extract-failed-test-names))]

              (if-not (empty? failed-tests)
                (apply conj all-failed-tests failed-tests)
                all-failed-tests)))
          []
          files))

find-failed-tests does the main work. Given the sequence of HTML files, it goes over them using reduce and accumulates a collection of the failed test case names which is finally returned.

In the first version we just iterated over the files and spat out the test case names ‘on the fly’ as they were encountered. In this version I wanted gather the names into a collection which could them be used for further processing (in this case just to print them out).

reduce calls the anonymous function for each file: contents of each file is read in, converted into a character stream (java.io.StringReader), the stream is fed to Enlive’s html-resource which parses the HTML. Finally extract-failed-test-names (explained below) is called with the parsed HTML contents to get any failed test case names.

If a particular file had failed test cases, their names are added to the vector all-failed-tests which is initially empty. If the test suite of the file didn’t have any failures, all-failed-tests is just passed on as-is for the next round of reduce.

(defn extract-failed-test-names
  "Given an Enlive html-resource of a Selenium result file, returns names of the
   failed test cases"
  [selenium-html-resource]
  (map html/text
       (html/select selenium-html-resource [:tr.status_failed :td :a])))

The code to extract the failed test case names using Enlive selectors is simplified a bit. Rather than doing (flatten (map :content (html/select … ) you can use Enlive’s text function to achieve the same result: (map html/text (html/select ... )

So there you have it. Any ideas for improvement? I’m wondering would it make sense (at least for sake of exercise) to make find-failed-tests return a lazy sequence rather than eagerly parse each file? Would you use something other than Enlive for the HTML parsing?

Advertisements

Shell scripting with Clojure

We had a nice afterwork Clojure learning session with a colleague the other day during which we quickly hacked together a shell script that reads a directory containing Selenium test result files and outputs all the failed test cases. (Why we haven’t automated the Selenium test running and result reporting with e.g. Jenkins is another story..)

In the session I learned at least about the following:

  • making standalone command-line executable Clojure scripts (as long as you have Leiningen installed) that still use Leiningen to handle the dependencies
  • reading and writing files
  • using Enlive to parse and look up particular data from HTML files

In this post, I’ll describe how to create command-line executable Clojure scripts with lein-exec. In a later post I will describe the test result parsing script we put together.

Command-line executable Clojure scripts with lein-exec

The only prerequisite here is that you have Leiningen installed but it should be pretty safe to assume this as lein seems to be the de facto build tool for Clojure.

Now, when you want to do quick standalone Clojure scripts without having to create a complete lein project, but still want to use lein to manage the dependencies to any 3rd party libraries, there is a nice lein plugin called lein-exec.

The installation is simple (the following assumes Leiningen 2, see the lein-exec readme for 1.x installation instructions). A global installation (meaning you can use lein-exec any time you call lein, rather than specifying it as a project-specific plugin) of lein-exec is done by adding the following to your ~/.lein/profiles.clj (if profiles.clj doesn’t exist, create it first).

;; as of writing this, 0.3.1 was the latest version
{:user {:plugins [[lein-exec "0.3.1"]]}}  

Now we can quickly create and run scripts:

(println (+ 1 2))

running foo.clj:

$ lein exec foo.clj
3

This isn’t that impressive yet, but once you add dependencies to third-party libraries, the usefulness becomes more apparent:

(require 'leiningen.exec)

;; Add a dependency to the classpath on the fly
(leiningen.exec/deps '[[enlive/enlive "1.1.4"]])

(require '[net.cgrand.enlive-html :as html])

;; Grab and print the title element from the Google front page using Enlive
(println (html/select (html/html-resource (java.net.URL. "http://google.com")) [:title]))

and let’s run it:

$ lein exec bar.clj
({:tag :title, :attrs nil, :content (Google)})

Now lein and lein-exec handle the enlive dependency behind the scenes. This is nice as now you have all the power of the Clojure ecosystem available for you even in quick one-off scripts.

It is also possible to make the scripts executable themselves if you are in a *nix environment. You need to download a lein-exec script and save it somewhere in your path. Also, if your lein executable is called something else than `lein`, you have to modify the lein-exec script to reflect that.

$ wget https://raw.github.com/kumarshantanu/lein-exec/master/lein-exec # download
$ chmod a+x lein-exec # make the script executable
$ mv lein-exec ~/bin # move to ~/bin, assuming it is in PATH

Now if you add the following to the first line of your script:

#!/usr/bin/env lein-exec
(println (+ 1 2))

you can execute the script directly (as long as you give it the execute permission):


$ chmod +x foo.clj
$ ./foo.clj
$ 3

That’s it for now. I will follow up with the Selenium test result parsing script soon.

Read more about stand-alone Clojure scripting:

Finding the largest multiple of five consecutive digits from a 500 digit string in Clojure

My first Clojure post will be an annotated solution to a specific job interview-like problem:

Given a long string of numbers such as:

37900490610897696126265185408732594047834333441947
01850393807417064181700348379116686008018966949867
75587222482716536850061657037580780205386629145841
06964490601037178417735301109842904952970798120105
47016802197685547844962006690576894353336688823830
22913337214734911490555218134123051689058329294117
83011983450277211542535458190375258738804563705619
55277740874464155295278944953199015261800156422805
72771774460964310684699893055144451845092626359982
79063901081322647763278370447051079759349248247518

how do you elegantly find the largest multiple of five consecutive single digit numbers?

I recently attended the awesome Reaktor Dev Day conference, which held an introductory hands-on workshop on Clojure. This problem was the first of the problems in an exercise set we worked on. I found it to be of just the right difficulty for my skill level at that time – at first I was lost but once I learned about the `partition` function, everything clicked and I got a kick out of solving the problem.

So here’s my solution as an embedded Gist with step-by-step comments. This is pretty basic stuff but hopefully the commentary is helpful to some beginner out there.

The example solution by the workshop organizers used

(map #(reduce * %))

where I used

(map #(apply * %))

Both work – I’m not sure if there is some performance or other difference between them in this case.

If anyone has alternative solutions, please post them in the comments. Also my use of Clojure terms might be a bit off or ambiguous at times – feel free to correct! 🙂

(By the way, I’m trying to find a WordPress theme that is wide enough to show the code snippets without vertical scrolling.)