A ‘Phone Romance

July 27th, 2016


Thinking in Elm (and not cutting corners)

July 10th, 2016

So, I’m processing this list of js data which, for this evening, I know
contains exactly one thingummy. I’m I want to wire up some Elm and get things
to type check before I start to worry about handling lists of different sizes.
Calling upon my conventional (python) wisdom, I set about processing the
singleton in the list.

  case msg of
    UpdateThing msg' ->
    (thing, cmd) = Thingummy.update msg'  ... singleton element from model.things ...

How do I get that sucker? I’m thinking python: `model.things[0]` but, of
course, both `model.things !! 1`, and `List.head` return `Maybe a`. So now I’m

    ... (thing, cmd) ... = Maybe.map (Thingummy.update msg') (List.head model.things) ...

But now the lhs pattern is wrong, I’m using `map` and, ultimately, I need to
turn that updated singleton back into a list anyways.

  case msg of
    UpdateThing msg' ->
    (things, cmds) = List.unzip (List.map (Thingummy.update msg') model.things) 

Job done. (I should qualify that the job here is just the quick hack for
my singleton list. It’s applying the same change to all elements)

Moral: it isn’t any less complex to deal with a list of things than
with a thing taken from a list (Maybe). I can’t ask Elm to accept my
assumption that a list will contain exactly one item. Elm wants me to model
it. If there’s exactly one, just use a variable. If there’s zero or one,
use Maybe. Or just use a list, it’s just as simple!

The Fridge-brilliance of Maybe

May 30th, 2016

I’ve been tinkering with Elm recently. It’s a typed functional programming language that reminds me of the Haskell I wrote during my PhD. And, it’s great fun. I wrote a toy application, and recently upgraded it to 0.17. Along the way I had an insight about the Maybe type which I try and to explain in context.

My generic text input field needs to be able to respond to the Enter and Escape keys by triggering Latch and Reset messages respectively.

My onKeyDown function generates an on-keydown handler attribute that maps Int key codes to messages.

onKeyDown : (Int -> msg) -> Attribute msg
onKeyDown mapping =
    on "keydown" <| Json.map mapping keyCode

This excerpt from my view function shows how I use it; I define enter and escape as constant functions, Latch and Reset are constructors from my Msg type:

view model =
    div []
	[ input
	    [ type' "text"
	    , highlightStyle
	    , placeholder model.name
	    , value model.input
	    , autofocus True
	    , name model.name
	    , onInput UpdateInput
	    , onKeyDown
		&lt;| keyMsg
		[ ( enter, Latch )
		, ( escape, Reset )

The helper-function keyMsg takes a list of pairs (in lieu of literal dict syntax) from which it builds a dict of key codes and the corresponding messages

keyMsg : List ( Int, Msg ) -> Int -> Msg
keyMsg mapping keycode =
    Dict.fromList mapping
        |> Dict.get keycode
        |> Maybe.withDefault NoOp

Now we get to the interesting part. While writing this I wondered how to return the default value NoOp if the key code isn’t in the dictionary. In python, dict.get(key [, default]) lets you specify the default, so I started looking in the dict module. But it isn’t there. It’s in Maybe.

At which point I saw the Fridge Brilliance of Maybe. Putting the default behaviour inside dict.get means it has to be duplicated in any other data structures that require default behaviour. Making it part of Maybe demonstrates why Maybe so much more than just an abstraction of pointers that might be null. It is the right place to define default-value behaviour.


If I bring the names into scope, keyMsg function shortens nicely like this:

keyMsg mapping keycode =
    Dict.fromList mapping |> get keycode |> withDefault NoOp

Whoa! Even more brilliance with the ? operator!

keyMsg mapping keycode =
    Dict.fromList mapping |> get keycode ? NoOp

Get list of worksheets on the command line

April 14th, 2016

Today I need to get a list of worksheets from a very large spreadsheet (~100 worksheets, many thousands of cells on each) quickly from the command line. My ruby script tried to parse the entire workbook and exploded all my RAM. Then I discovered grep -0 which prints matching occurrences only (each on a separate line) which is very useful with regular expression patterns. The “sheet name” entries are in the “workbook.xml” zipped up in the Excel file.

$ unzip -c your-large-spreadsheet.xlsx xl/workbook.xml | grep -o sheet\ name=\"[^\"]*\" | cut -d = -f 2

Getting round the limitations of Excel VBA’s UsedRange function

April 5th, 2016

Turns out that the Excel VBA function UsedRange doesn’t always give the expected results. It gives you the range of cells that have ever been used. If you’ve formatted them, or they used to have values in that you’ve since deleted, you still get those cells. Today I wanted to iterate all cells in a lot of sheets. Some of those sheets had lots of cells in their UsedRange. I hit upon a solution I’ve not seen anywhere else: take the union of constants and formulas in a sheet. This  is quick, because you can use SpecialCells to do the heavy lifting. And you can refine the results to exclude errors using SpecialCells Value argument (the default 23 below selects everything). Here’s the code

Function myUsedRange(ws As Worksheet, Optional lookfor As Long = 23) As Range
    Dim c As Range, f As Range
    On Error Resume Next
    With ws.Range("A1")
        Set c = .SpecialCells(xlCellTypeConstants, lookfor)
        Set f = .SpecialCells(xlCellTypeFormulas, lookfor)
        If c Is Nothing Then Set c = .Cells
        If f Is Nothing Then Set f = .Cells
    End With
    Set myUsedRange = Union(c, f)
End Function

This returns a non-contiguous range containing only the cells you want. This version is a bit lazy and includes [A1] if there are no constants or no formulas matching your search. I didn’t mind because I was using lookfor=2 to search for text which reduced the number of cells by thousands. Hundreds of thousands in some cases. A better version would take a default as a parameter and return it only if neither  constants or formulas gave any results.

pip install rx

March 22nd, 2016

I just installed RxPy the Reactive extensions for Python, as you may guess from the title of this article. Well, that was easy. What next? Here’s a delightfully short program which does — ahem — something:

import sys
import rx


It echoes stdin to stdout. Kinda.

  • There are more newlines than I was expecting — an extra one after each line is echoed
  • The single line is actually an expression returning a Disposable that you can use to unsubscribe. I’m discarding it.
  • I have syntastic installed in Vim and pylint complains of a syntax error when I pass print to subscribe so perhaps I need to configure it to know about python3.

Excel excess: how to remove unwanted styles with VBA

March 11th, 2016

Long story short, I just opened a workbook with 37,000 styles :(. Apparently this is a common problem). Allen Wyatt of ExcelTips suggests deleting non-builtin styles. But I had over 6,000 of those. And some of my user-defined styles are important!

My solution is to remove any style whose name ends with a number, keeping one of each base-name. Cells with deleted styles get “Normal”.

The code uses regular expressions to trim trailing numbers from style names, and uses a dictionary for the set of base names.

Option Explicit

Sub DeduplicateStyles()
  Call DeleteDupStyles(Workbooks("NEW_multi_sector.xlsx"))
End Sub

Sub DeleteDupStyles(wb As Workbook)
  Dim sty As style
  Dim intRet As Integer
  Dim dict As New Scripting.Dictionary
  Dim count As Integer: count = 0

  For Each sty In wb.styles
    Dim n As String: n = trimTrailingNumbers(sty.Name)
    If dict.Exists(n) Then
      ' Debug.Print ("deleting: " & sty.Name)
      On Error Resume Next
      If Err.Number <> 0 Then
        Debug.Print ("ERROR deleting: " & sty.Name)
      End If
      Debug.Print ("keeping: " & sty.Name)
      dict.Add n, 1
    End If

    count = count + 1
    If count Mod 100 = 0 Then
    End If
  Next sty
End Sub

Function regexp(pattern As String) As regexp
  Dim rx As New regexp
  With rx
    .Global = True
    .MultiLine = False
    .IgnoreCase = False
    .pattern = pattern
  End With
  Set regexp = rx
End Function

Function trimTrailingNumbers(s As String) As String
  trimTrailingNumbers = regexp("( \d+)+$").Replace(s, "")
End Function

Keeping named files in bash

February 1st, 2016

It just took me way too long to solve this, so I’m going to write it up quickly:

The Problem

  • I have a file containing the basenames of spreadsheets that I want to keep in a directory structure. I want to delete all the other spreadsheets.
  • There are levels of nested directories and the files are distributed among them
  • The file names, and the directory names contain spaces (don’t look at me like that, it’s other people’s work on Windoze — at least I’m using bash
  • I’m using bash on Linux, so I have proper find, etc.

A Solution


find . -name \*.xls\?  | while read  f
    if ! grep -q "${f##*/}" $modelfiles
        rm "$f"

I’m sure there are more efficient ways. Contact me if you have ideas. Meanwhile I’ve got to get back to work so I’m sticking with “it works”.

UPDATE: that didn’t work when I tried to re-run it just now. So switched from grep -v to if ! …

Waiting on the platform

October 10th, 2015

A recent conversation with Lauren Serota reminded me of a challenge we often face building small-to-medium sized digital solutions for our clients: the right platform.

We were talking about building custom phone apps in the context of Myanmar’s smartphone revolution, and the challenge of finding the right partners to implement innovative mobile service delivery systems. She told a story of one such development which looked great but cost too much to make small changes to. That entailed going through another development cycle with the software company; for changes that felt like they should be under the service team’s control.
“They wrote a custom native app.”, she said, “instead of building a platform.”
This started me revisiting similar stories from Aptivate, where we were the software company.

The IT Specialists

I’m thinking about AuthorAID, which helps support researchers from developing countries preparing academic articles for publication in peer-reviewed journals. It is a fairly complex web application which functions not entirely unlike a dating site (matching mentors with mentees) with a way to track collaboration through research publishing tasks. We built it with Django, a powerful framework rapid web development.

Django was a great choice, it enabled us to build AuthorAID at a cost within INASP‘s budget; an impressive achievement because AuthorAID is essentially a fairly complex, bespoke web application serving a relatively small audience. That’s why it’s our web platform of choice. But neither AuthorAID, nor Django are the platform I think Lauren was talking about.

As we maintain and develop AuthorAID for INASP, some of the changes we make, like adding faceted search, feel like they belong squarely with the “IT specialists”. Others, like adding fields to the form that users fill when they register, feel like they belong with the team running the site.

A lot of Django’s power for rapidly building web applications comes from its focus on meta-data. To add those new fields to the sign-up form, we edit the Model file which is written in Python, the programing language Django is written in. So, you have to be a programmer to change them. But once that’s done, changes to web-forms can be generated automatically from the models. The new field values will need to be stored in Django’s relational database, but the necessary changes to the database schema can also be generated from the models, and theres a whole system to manage applying those database changes. Django is a great tool for the “IT specialists”. Which makes sense: it was created by web developers at a Newspaper.

This works out well for INASP who retain a relationship with Aptivate that enables us to act as their “IT specialists”, for the website stuff. But some of the web applications we build aren’t commissioned by organisations like INASP, but as part of change programmes with a limited lifetime and budget. They can’t afford to keep us on as IT specialists, in case their web application needs to be tweaked or extended. If they want to change the data they gather or report on, they need to be able to make those changes themselves. They don’t want a bespoke application built with a rapid-development framework; I think they need the “Platform”.

The General Solution

I’m thinking now of another project that, in the end, we didn’t do. We were written into the budget proposal for a three year programme with institutional funding, to provide a database for monitoring and evaluation (M&E) data. Budget was small and the requirements weren’t very clear until the programme started and a team assembled. We did our discovery process soon found out that our clients had high hopes for their M&E database, but they could predict neither the structure of the data they would be storing, nor the work-flows they would want it to support.

Trying to manage the risk of Building The Wrong Thing, we pushed for specifics specific examples.
“It doesn’t have to be an all singing, all dancing system.”, they told us in one meeting.
“Okay”, I replied, “but you have to tell us exactly which songs, and which dances you will want it to do.
Eventually they told us not to keep trying to understand their specific requirements, and “just deliver the generic version”.

It’s much harder (read more expensive) to build a generic tool than a specific one. It’s very hard to get the abstractions right, especially without passing through specific examples on the way. But our client had had something specific in mind: a low-cost, generic solution. I think I understand where they were coming from: Excel. The generic solution, for fifty bucks, built on billions of dollars of R&D and decades of getting the abstractions right.

Most people have never bought bespoke software, and don’t know how to. If we were one of those people, the closest thing we can imagine to bespoke software might be a customised spreadsheet; with some protected cells, nice colouring, some predefined formulae and some VBA macros behind the scenes. I’m actually working on one of these at the moment, for another client. If I don’t do all the customisation work, what will be left? The full awesome power of Excel. Users would have to work harder to use it but, with the right skills, they could do so. Excel is a cheap product and all that customization effort is expensive IT specialist consultancy. I understand it must feel frustrating When the IT specialists start asking ‘which songs and which dances?’. Just give us The Thing. The thing we can live in and work it out for ourselves. The web-based Excel* for data-driven applications.

* I know this sounds like Google sheets, but they weren’t actually looking for an on-line spreadsheet.

Compared to evolving a customised spreadsheet, building a bespoke web application in, say, Django is a completely different proposition. It takes significant work to get to an MVP and when you get there the only data and workflow it supports are those the product designers have chosen to include. There are no blank workbooks to fall back on.

I’m wondering if there is something awesome we should be using as a platform for web applications. And if there isn’t shouldn’t we be designing it?

Deadlock: BDD testing a python Tornado app with py.test and Splinter

September 23rd, 2015

I’m only a day into my new asynchronous web development project in Tornado, and I already have a deadlock. And this is remarkable, I think, because all I have done so far is a copy-and-paste the Hello World example app from the Tornado website. An example which “does not use any of Tornado’s asynchronous features”. So, how did I manage to get deadlock?

Well, on top of Hello World, I have a test. A functional test, written with pytest-bdd that uses pytest-splinter makes sure here is some sort of web server up and running. Now pytest-splinter provides a browser object as a py.test fixture. Tests can use this to access the site, so
long as a server is running. Since I’m shopping for py.test plugins, I added pytest-tornado whose http_server fixture starts a tornado HTTP server, serving the application I provide by defining an app fixture.

All set to go? I thought I was. But I got the big crunch. My splinter browser starts but no page loads. What’s up? Turns out the tests aren’t asynchronous. The splinter driver is blocking and the asynchronous web server never gets a look-in to serve the content. Pytest-tornado wants me
to write tests using Tornado’s asynchronous http client, so it and the server can play nicely together.

That’s all fair, but I want to use Splinter on this project so I’m going to want to start my server in a thread. I created an apprunner module that can start my Tornado app in a thread (and return the url to access it). And also provides a function to instruct the server to stop, and exit the

import threading
import tornado
from .application import Application

ioloop = tornado.ioloop.IOLoop.instance()

def start(TESTING_PORT=8192):
    """ Start Application runing on given port. Retrn the listening url"""
    global thread
    app = Application()
    thread = threading.Thread(target=ioloop.start)
    return "http://localhost:{}".format(TESTING_PORT)

def stop():
    """ Ask IOLoop to stop running, and wait for its thread to complete"""
    global thread

Then I define my own fixture to start and stop the server using pytest

def app():
    yield start()

And now I can write lovely clean BDD steps using splinter browser and my
own app running in a thread, like this:

@when("I visit the site")
def visit_site(browser, app):