Shell scripting in Clojure part 2: A concrete example

This is a follow-up to “Shell scripting with Clojure“, which described how to make Clojure scripts easily runnable from the command line. In this part I would like to show the actual script we came up with @hanneshaataja in our afterwork Clojure session.

After taking a second look at the script few days after the session though, I realized that it was pretty damn ugly. We had naturally ran with the first thing that worked in the quick hackathon, so I wanted to refactor it quite a bit before posting it here for all to see. However I think it might Be interesting to show both the original and improved versions and see how they differ.

Purpose of the script

As a recap from the previous post, the purpose of the script was to go through a directory containing test result files generated by the Selenium testing framework and output the names of every failed test case.

Each test result file is an HTML document that contains a summary table that lists the test cases in a particular test suite. Rows of failed test cases have a “status_failed” HTML class attribute:

<tr class="status_failed"><td><a href="#testresult1">TestCaseX</a></td></tr>

So, the script should just go through each file, find the rows with the class status_failed and pick up the test case names which are nested in the <td> and <a> elements.

Version 1 – Not like this

Here is the first version of the script. Now my sense of what is idiomatic Clojure is not very strong yet, but I’m pretty sure this is not proper. After the code I’ll list what I thought was wrong with it.

(By the way, I’m posting the code as embedded Gists because the Clojure syntax highlighting of WordPress doesn’t seem to work that well. Especially the comments are annoying some as some words get highlighted. Unfortunately some content aggregators like Planet Clojure don’t seem to show to display Gists)

My first problem with the first version was that it wrote the results into a file. Since this is a command-line script it makes more sense to write the results into standard output so that the user of the script can then decide what to do with the results: write them to a file or e.g. pipe them to another program.

Secondly the code itself feels pretty imperative, reflecting our non-FP-mindset: first we set up “variables” like input-dir and files with def, then iterate over the files and do something with them. I guess this is not a big problem in a quick one-off script like this but I nevertheless wanted try to rewrite it in a more functional way.

Third small issue was that the script would try to parse any file in the directory it was given whether they were HTML files or not. In our case the directory always contains only html files but it would be nice to have at least some sanity checking there.

Version 2 – Better I hope

The second version feels more idiomatic. The input directory and the files are no longer stored into a var using def, instead everything is a function.

Let’s go through the parts of the script in more detail:

(defn run
  "Finds and prints the names of failed test cases from the Selenium result
   files in the user provided or default directory"
  []
  (doseq
      [failed-test-case
       (->> (result-file-loc) (read-html-files) (find-failed-tests))]
    (println failed-test-case)))

The goal of the script is achieved by the subsequent calls of the following three functions, always passing the result of one to the next:

  • result-file-loc returns the directory of the files to read, which is passed to read-html-files
  • read-html-files creates a file seq of the HTML files in the directory and passes the sequence to find-failed-tests
  • find-failed-tests extracts the failed test case names from each file and returns them in a collection

The wrapping doseq iterates over each test case file name and prints it out.

Here are the individual functions in more detail:

(defn result-file-loc
  "Returns the location of the result files (either provided as the first
   command line argument or a default value of 'test/system/results'"
  []
  (let [first-cmd-line-arg (second *command-line-args*)]

    (if (or
         (nil? first-cmd-line-arg)
         (= ":headless" first-cmd-line-arg))

      ;; Default dir
      "test/system/results"

      ;; user provided dir
      first-cmd-line-arg)))

Nothing too special here. When the script is executed from command line with lein-exec, the second element in the global *command-line-args* contains the first command line argument (the first of *command-line-args* is the name of the script file). I also noticed that when evaluating the script in a REPL, the second element contains the value “:headless” so result-file-loc returns the default directory in that case also.

(defn read-html-files
  "Given a directory, returns a sequence of java.io.File instances of each .html
   file in the directory"
  [dir]

  ;; Include only files that end with .html
  (filter #(and (.isFile %) (.endsWith (.getName %) ".html"))
          (file-seq (clojure.java.io/file dir))))

read-html-files is quite similar to the first version. I just added the additional condition to the filter predicate so that only files ending with “.html” are included. This of course doesn’t stop the script from trying to e.g. parse non-HTML files that just happen to have an .html extension and so on. For the purposes of this script this enough though.

(defn find-failed-tests
  "Given a collection of Selenium test result files, returns the names of the
  failed test cases in the result files"
  [files]

  (reduce (fn [all-failed-tests file]
            (let [failed-tests
                  (->> file
                       (slurp)
                       (java.io.StringReader.)
                       (html/html-resource)
                       (extract-failed-test-names))]

              (if-not (empty? failed-tests)
                (apply conj all-failed-tests failed-tests)
                all-failed-tests)))
          []
          files))

find-failed-tests does the main work. Given the sequence of HTML files, it goes over them using reduce and accumulates a collection of the failed test case names which is finally returned.

In the first version we just iterated over the files and spat out the test case names ‘on the fly’ as they were encountered. In this version I wanted gather the names into a collection which could them be used for further processing (in this case just to print them out).

reduce calls the anonymous function for each file: contents of each file is read in, converted into a character stream (java.io.StringReader), the stream is fed to Enlive’s html-resource which parses the HTML. Finally extract-failed-test-names (explained below) is called with the parsed HTML contents to get any failed test case names.

If a particular file had failed test cases, their names are added to the vector all-failed-tests which is initially empty. If the test suite of the file didn’t have any failures, all-failed-tests is just passed on as-is for the next round of reduce.

(defn extract-failed-test-names
  "Given an Enlive html-resource of a Selenium result file, returns names of the
   failed test cases"
  [selenium-html-resource]
  (map html/text
       (html/select selenium-html-resource [:tr.status_failed :td :a])))

The code to extract the failed test case names using Enlive selectors is simplified a bit. Rather than doing (flatten (map :content (html/select … ) you can use Enlive’s text function to achieve the same result: (map html/text (html/select ... )

So there you have it. Any ideas for improvement? I’m wondering would it make sense (at least for sake of exercise) to make find-failed-tests return a lazy sequence rather than eagerly parse each file? Would you use something other than Enlive for the HTML parsing?

Advertisements

One comment


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s