I was recently at a FP-ish meetup group in Newcastle, and one topic of discussion is how to TDD the Roman Numerals kata.
Note: for regular readers, TDD is generally going to mean “test driven development” in this article. I suppose it could also stretch to include “test driven design” too. When instead I mean “type-directed development” I’ll mention it in full.
Of course, I remain sceptical about approaching this kata with TDD. My concern is that it’s hard to include much insight about the problem domain in the code, and from there, hard to produce clear (or even “beautiful”) code. See my earlier piece Uncle Bob and Functional Programming for some discussion on such points.
So, what alternatives are there? Maybe several, but here I’m going to use a flavour of type directed development. The core principles are to consider the data structures we want to use, particularly which ones will help to simplify the code. Code is the stuff between the types, and if we do a good job on the types, the code will become much easier.
Externally (and as a first approximation), the function we want has type “Int -> String”, but this tells us very little. It also encourages “Stringly programming” – doing too much with strings which we’ll then need to unpick when trying to understand / review / test the code. Put another way: strings lose information, and it’s this kind of info which we will want to rely on later.
We can do better. It’s almost certain that we’re going to be doing something non-trivial with values that have a reliable structure, and such structures can be rendered to string form when needed.
By the way, what are the roman numeral representation of zero or negative integers? I don’t know. The type “Int -> String” suggests it is going to handle any integer. Let’s sidestep that for now by saying the input is going to be “a suitable number” and revisit later. To recap, we have these ideas/targets
- some suitable intermediate type T
- something that will convert T values to a string, ie of type T -> String
- something that converts “a suitable number” to T, ie of type “a suitable number” -> T
- and we can compose these functions together if we want etc
What is T? Basically, we need it to encode some details of what roman numerals are, and how they work – and the more info we can pack in, the better.
Is it a string? hell, no. We’re not stringly programmers.
Only certain symbols are allowed – so how about a list of these? better, but is [V,M, L, L] allowed? it’s a valid list but not a valid roman numeral – so this is still not good enough.
What are the rules anyway? Let’s assume the core rules:
- M can appear up to four times
- then D at most once
- then C at most four times
- then L at most once
- then X at most four times
- then V at most once
- then I at most four times.
- there are certain “subtraction” rules to avoid four symbols in a row, but let’s keep it simple and assume we can do this simplification when rendering to a string
- apparently, the Romans weren’t that strict on rules – carvings exist with numerals that don’t follow the usual pattern; but let’s keep to the BBC version for now.
So how about a tuple of 7 integers? eg (0,0,2,1,3,1,3) to represent CCLXXXVIII (= 288). Better, but it’s still too wide – it has no protection against invalid values like (0,-3,10,2,1,1,-2). Nope, not good enough. Large tuples are also a smell, too easy to get numbers in the wrong place. At the very least, we’d want something record-like with clearly named fields.
One way to represent numeric constraints is with enumerations of the required size. Booleans have two values True and False, so we can use this or invent our own. We need something for 0 – 4 anyway.
data UptoOne = Zero_of_one | One_of_one data UpToFour = Zero_of_four | One_of_four | Two_of_four | Three_of_four | Four_of_four
So we can now have a record containing seven such values – much better, and it is starting to constrain the problem space. We can no longer represent invalid values. As an exercise, you might like to try writing a show function for such values, then maybe a function to add two such numbers. Then have a think about how to convert “a suitable value” into this type.
What about tests? depends what you aren’t confident about. I’ve not written any code yet (in the sense of stuff that transforms values into other values), but already I’m confident that the underlying type is a good basis for developing this code – it’s reducing the risk for working with bad values, and hence both simplifying the code and reducing the scope of what needs to be tested.
Is there anything test-like we can use to check our type definition? I’ll leave that for later discussion. Ditto on which tests you want to have when developing the code. What are the risks? In particular, with the kind of code you’ll be writing, what kind of mistakes are you afraid of making? Accidentally confusing field names is one, eg accidentally using the count for M when you want the count for X.
Note that we’re still in Haskell. No funny trickery, no advanced stuff. You could even write this in scala, I guess. (Send me the code if you do – I’m curious.)
Can we do even better? Forget for a moment the limitations on the various tools you are using, and the style and level of programming that they are forcing you to work at. Instead, what is the program you would like to write?
Also, how certain do you want to be about the code being bulletproof? Or about removing some of the risks of making errors?
I’ll follow up soon with a discussion of how & where the powerful features of dependently-typed languages can start to address these questions.
As a sneak peek, it’s going to involve type definitions which simultaneously allow only valid numerals to be constructed, explain how those numerals map to arabic numbers, and also make it easier to prove key properties of the code. We can probably get the type checker to write some of our code too. All this without writing a single test.