Fitting in

This article first appeared in PragPub back in 2016.

This, what you hold in your hand, was almost the twelth article in my series on modern functional programming, which appeared in PragPub from August 2012. See https://free-variable.org/2017/08/02/my-older-articles-in-pragpub/ for current whereabouts of that material. Some articles also appeared in this book.

Despite the title, it’s not about social issues, at least not directly! It is about programming, and about some recent ideas about abbreviations and articulacy in programming with a dash of “clever” (semantic) editing tools.

From the context of OO languages, you all know about “subtyping”, or know some of the various ideas that come under that umbrella. What we will explore here is an interesting and powerful generalisation of such ideas. We’ll see how far we can push the envelope and start to think about what this new tech buys us. As we move into more advanced type systems, some very interesting ideas appear.

The theory here is based on recent research work that a while back I had the good fortune to be a part of. The core ideas derive from work by my former boss, mentor and good friend, Professor Zhaohui Luo. These ideas have been developed and studied by Luo’s research group, my contributions to which included development of prototypes, exploring various classes of examples and research on practical applications – including programming. A few more details will be given at the end.

So, let’s start with what you think you know about subtyping.

“Traditional” subtyping

I’m labelling this “traditional” because there is a certain amount of folklore around what many people understand as subtyping in OO languages. (If you want a more detailed discussion, take a look at Kim Bruce’s “Foundations of Object Oriented Languages”, https://mitpress.mit.edu/books/foundations-object-oriented-languages: as a prelude to theoretical explorations, Bruce gives an entertaining, interesting and practical survey of the ideas in current languages. I really enjoyed this book. Does it show?)

Roughly, you know that if Y is a subtype of X (or subclass of), eg Dog is a subclass of Animal, then some code which expects an instance of X will be fine working on an instance of Y. For example, some code which works on any Animal instance, eg pretty-printing the number of legs, will be fine working on a Dog instance. Another way to look at it is this: you can use a Dog value whenever an Animal value is expected – and where some Animal value is expected, a Dog value will fit in quite nicely.

We can summarise this succinctly in rule form:

Y < X     SomeType some_method(X m)    foo.is_a?(Y) 
--------------------------------------------------- 
                 some_method(foo)

Which means, if we know or have the things on the top, then we get the thing on the bottom. So if we know Y is a subtype of X, and some_method takes an X value, and we have some instance foo of Y, then calling some_method on foo is fine.

Fine, that is, in the sense of it should work if the other OO-ish bits are in place. What we mean here is that type-wise, some_method(foo) is allowed in the rules of our language, and it is safe to execute the method call. How it runs, and whether it crashes or not – which involve a mix of inheritance and late binding – is quite another story. We’re not going to talk about those OO run-time details here!

Now, why is this rule or construct acceptable? What justifies its use in various OO languages? Basically, it works because a subtype instance generally contains “more stuff” and we can temporarily forget it is there. As far as the calling method is concerned, the input looks like an X and responds to messages as an X should.

To recap, the “traditional” style of subtyping basically works by forgetting: forgetting that an instance of a subtype may have extra fields or may respond to a wider range of messages. This allows us a certain amount of flexibility in using simpler code with more complex instances, for example we don’t have to add explicit conversions, and we can use these implicit conversions safely. This is probably what you understand from your own experience, and you use it all the time in programming.

Before we move on, it is worth asking whether this form of subtyping is what we actually want. Kim Bruce’s Foundations book goes further, even asking the question of whether OO languages really need subtyping! And Bruce’s answer is effectively negative, for several good reasons. (The short version is that when you make an OO language stronger in one direction, it reduces reliance on other directions. I also like to visualize it as a language having features X, Y and Z but feature X has weaknesses. When you fix X, you may then discover that Y and Z might have been compensating for X’s deficiencies and that they aren’t needed so much now…)

Generalising the idea

Traditional subtyping is based on forgetting any extra structure or functionality being passed in. A compiler or interpreter typically implements this amnesia by ignoring the extra stuff (and this step has zero run-time cost). But this operation isn’t particularly magical – we can write a method or function ourselves which does the forgetting, e.g. by reconstructing the “smaller” value explicitly.

Now, what if the “bridge” between supertype and subtype wasn’t this zero cost forgetting step but actually could be some other piece of code? Would it work? What would it look like?

We’re talking about this kind of rule now (switching to a Haskell-y notation too):

jump : Y -> X    some_method : X -> Z     foo : Y 
------------------------------------------------- 
     some_method(foo) = some_method(jump(foo))

In the top line, we have a value foo of type Y, and a function some_method expecting a value of type X, plus another function jump which can convert Y‘s to X‘s. Then on the bottom line, it’s ok to apply some_method to foo and what it means (or “computes to”) is that jump is first applied to foo and the result is passed to some_method.

Let’s try a slightly less abstract though obviously contrived example. Suppose you have have a function which works on oranges, maybe something like make_into_juice : Orange -> Juice . But, you only have an apple, perhaps called egremont_russet : Apple. Clearly, apples and oranges do not mix, and it does not seem possible to call make_into_juice egremont_russett . But, if we had the rule above in our language, and knew of a jump function which could convert an Apple to an Orange, then we’re in business.

Where does that jump function come from? Basically, the programmer supplies it when declaring that Y is a subtype of X, and he or she is justifying that fact by providing a function which does that precise conversion. To declare or specify that some Y is a subclass of X, you just need to provide a function which bridges the gap.

So, what kind of function could convert apples to oranges? I have no idea – it is supposed to be a contrived example. Though if you could find something sensible that works for you, then go ahead and use it!

A very real example

Something like this conversion mechanism is already present in the handling of numbers in languages like C, e.g. in 100.25 * 43 . Because the first argument is a floating point number, the multiplication should (usually) be translated to floating point multiplication – but here the second argument is an integer literal. Moreover, modern C will interpret 100.25 as a double precision literal so the multiplication should be double precision rather than anything else. The language rules require the integer to be converted to a double precision value to match the first argument, so it’s basically going to be executing multiply(100.25, int_to_double(43)).

If you want to check this for yourself, you could try looking at the output of gcc -S with a short C program, but be aware that the compiler will probably perform this constant-value computation at compile time and just use the resulting value in the code. (You can try fiddling with compiler options to disable such optimizations, or try making the integer value conditional on the contents of argv if you want to avoid this.)

So in C, there’s a set of rules which prescribe how to treat certain expressions involving mixed types, primarily the various numeric (scalar) ones. You can check the C reference manual for the details, or look at reliable online material like “C in a Nutshell”:https://www.safaribooksonline.com/library/view/c-in-a/0596006977/ch04.html.

But hang on, what does a literal like 43 actually mean? What would we like it to mean? Think about that for a moment.

Haskell takes a different tack by using its type classes and polymorphism. We don’t need to go into the deep details here, but it’s effectively postponing the decision about specific numeric types until it’s clear which specific type we do need. What does this mean? Well, the literal 43 is just that – a literal, and the only thing we do know about it is that it has no floating point part. It could be literally any numeric type – fixed size int, arbitrarily big integer, double precision, rational number (a pair of integers, numerator and denominator), complex number, quaternion – any type which “looks” numeric and supports the key numeric operations. Thus Haskell gives the type signature as Num a => a . It also magically inserts an overloaded function fromIntegral :: (Integral a, Num b) => a -> b to do the actual conversion when the time comes, which is when we know what other types are involved in the computation.

For example, in length [] + (fromIntegral 43) , length returns a conventional int, and addition (also overloaded) requires two values of the same numeric type, so we now choose the overloading of fromIntegral which returns an int, and this version is basically an identity function (just returns its input). Alternatively, in (100.2:: Float) + (fromIntegral 43) we have cast the first argument to be a fixed precision float, which leads to us requiring the version of fromIntegral that converts the underlying int to returns a float.

Haskell’s number system has other good features. For example, expression (10 / 3) :: Int is an error because the result of (fractional) division isn’t compatible with Ints. You need to either use integer division operators like div or add an explicit conversion, like truncate (10 / 3) . The bottom line is, you get some flexibility but are also prevented from various common errors And if you don’t like the standard numeric definitions, you can prevent the import of relevant modules and then define your own system of numbers.

Too complicated though? You have a point there, but numbers are complicated! And a source of subtle errors when you aren’t careful enough. And that’s why, after a few years of teaching Haskell, very few of my lectures and lab exercises in the first half of the year ever mentioned numbers! Explaining the intricacies of numbers got in the way of introducing the really new & interesting language features.

If you thought numbers were bad

Recall the main ideas, that where we have some context which is expecting a value of some type X (like a function taking an argument of that type), and we only have a value of type Y – but we know a way to convert from Y to X. Then it’s ok to use that value in that context – and the jump function will bridge the gap and allow our value to be used appropriately.

Let’s talk units. If numbers are a rich source of programming errors, then adding units and different scales into the mix is asking for trouble. Conversions between metric and imperial units, like kilograms vs pounds, appear more than once on lists of system failures. Can we reduce such problems?

Clear labelling is important: never have a number by itself, without some clear indication of what it means. For example, we could pair a number with some enumeration which indicates the flavour of unit. Something like this maybe, assuming we want to use double-precision numbers throughout.

data DistanceUnit = Metres | Feet | Cubits | ...
type Distance = (DistanceUnit, Double)

The enumeration type gives a set of constants to act as labels, and we tie a label to a value. Then we can have code like this:

-- basic conversions, cases include:
(Metres,v) `convertTo` Metres = (Metres, v)
(Metres,v) `convertTo` Feet   = (Feet, v * 3.28)
...

add (ua,va) (ub,vb) 
    -- add identical units directly
  | ua == ub  = (ua, va + vb)
    -- else convert second value to first unit
  | otherwise = (ua, va + ((ub, vb) `convertTo` ua))

It probably helps but it feels like the connection between unit and number is a bit loose, feels like we could do better. For example, we know that adding two metre values produces a value in metres, but that detail feels buried in the code – like we have to do a paranoia check each time before we use the value. Compare in C, it would be weird if you added two ints and could not be sure you got an int back.

What other options are there? Let’s change the concrete numeric type into an arbitrary type to make it polymorphic, thus making it easier to switch to rational numbers or other numeric types. We can also try a tighter link between unit and value by switching to a form of union type. How about this, with resulting code?

data Distance v = Metres v | Feet v | Cubits v | ...

convertToMetres (Metres x) = Metres x
convertToMetres (Feet   x) = Metres (x * 3.28)

add (Metres x) (Metres y) = Metres (x + y)
add (Metres x) other      = add (Metres x) (convertToMetres other) 
add (Feet   x) (Feet y)   = Feet (x + y)
add (Feet   x) other      = add (Feet x) (convertToFeet other)

It might look a bit prettier but it’s probably not much of an improvement. We are faced with a choice between needing N^2 patterns or a collection of N functions like convertToMetres, convertToCubits etc. How about using the type system a bit more? Metres and feet are kind of different things, so it’s not too strange to have them as separate types.

data Metres v = Metres v
data Feet v = Feet v

metresToFeet (Metres v) = Feet (v * 3.28)
feetToMetres (Feet v)   = Metres (v / 3.28)

So far, so good. Adding distance values gets more interesting too. We now want different functions for adding metres, adding feet, etc – so it’s a good case for using overloading. In Haskell, that would look something like this.

class Distance d where
  add :: Num v => d v -> d v -> d v

instance Distance Metres where
  add (Metres x) (Metres y) = Metres (x + y)

instance Distance Feet where
  add (Feet x) (Feet y) = Feet (x + y)

Don’t get sidetracked by the technical fine detail. The key points are that we can use types like this, and it’s a nice fit with overloading too. We get type-level confirmation that adding metre values gives us metre values. It’s also harder to accidentally mix up incompatible units.

However (and this is the important bit now), we’ve got more detail floating around, and where we gain in conceptual clarity and precision, it looks like we’re losing by making our programs longer and more verbose. Maybe we can do something about that now.

Enter Coercive Subtyping

It’s time to get back to the main point! We’re going to use our implicit conversion mechanism to help reduce how much code we need to write. The idea is that we declare the conversions between distance types as jump functions, and then use the mechanism to insert the conversions where they are needed.

I’m going to switch to the terminology used in the research work now. We use the term “coercion” for the jump functions, and refer to the wider mechanism as Coercive Subtyping , in the sense that it is a generalization of subtyping that works via insertion of coercions. (However, it is probably more than just subtyping, but we’ll discuss that later.)

For example, in add (Metres 10) (Feet 20) would normally be a type error because the types of the operands do not match. However, after declaring the feet to metres conversion as a coercion, the above will compute to add (Metres 10) (feetToMetres (Feet 20)). Because we have useful type information around, we can determine which conversion we need and then apply it.

We might even get away with a coercion from bare numbers to metres, hence add (Metres 10) 20 becomes valid and computes to (Metres 30). And if we know the final result should be in metres, then add 10 20 will also be fine and duly compute to (Metres 30).

To underline the key point: we can declare a number of conversion functions as coercions, and use the coercive subtyping mechanism to fill in the gaps. This mechanism has already shown its worth by helping to develop some very complex mechanical proofs (e.g. a proof of the four color theorem). We’ve also used it in some formal linguistics work. The potential payoff for programmers is to support use of more complex types but reduce the associated boilerplate, and it also gives an alternative view for overloading. It might help us to eat some of the cake we acquire.

Let’s continue our exploration.

Turning it up to 11

Suppose we have a function that expects a list of measurements in metres, but a contractor has provided a list in cubits. What do we do? you’re probably thinking that you need to map the conversion function (cubits to metres) over the lists before handing it to the function. This is a pain. But, can we do better? of course.

The underlying idea is that if we have a coercion from A to B, then calling the usual map function with that coercion gives us a coercion from a list of A values to a list of B values.

What about a needing a list of list of metre values, but getting a list of list of cubits? No problem: use the same idea on itself – so our new coercion from List(A) to List(B) can be lifted up to a coercion from List(List A) to List (List B). And so on.

To make this work, the compiler just needs some machinery to infer or calculate suitable coercions. Because we know our start point and end point when looking for a coercion, this inference is usually straightforward. If you know about Haskell’s type system, you might recognise some similarities here with instance inference. It is basically the same kind of problem, of given some inference rules and a target, find a solution. (I guess it is an open question how the type class system compares with coercive subtyping. You can simulate aspects of one via the other, and the jury is still out on other details.)

What about dependent types?

The list example is a relatively simple case. What happens when we add in dependent types? A popular example in recent papers is the conversion between conventional lists and the sized vector type from dependently-typed programming. The USP on this vector structure is that it displays its size in its type, and we can use this information to be more precise about manipulating the vector.

Instead of a type like List(A), we have types like Vector(A, 0) – for a vector of zero size, or Vector(A, 10) for vectors of size precisely 10. There are only two ways to build a vector: the nil case vnil : Vector(A, 0) – which is the only way to build an empty vector, and the step case, of adding an element to a vector of size N to get a vector of size N + 1, viz vcons : N -> A -> Vector(A,N) -> Vector(A, N+1) . We can then be very precise about size effects when working on vectors, for example appending one vector to another must give us a new vector of the appropriate size, i.e. v_append : Vector(A, N) -> Vector(A,M)-> Vector(A, N + M).

However, more detail can mean more fluff to write, say compared to lists. But if we can have some magic conversions from one to another, some of this goes away. We just need to declare the relevant conversions as coercions and let the coercion mechanism do the rest. It’s now feasible to write v_append [1,2,3] [14,16,18,20] and get a vector of size 7 back.

I’d like to highlight one idea here and leave you to chew on it. Dependent types do have many advantages, though they can sometimes lead to verbosity and key details being lost under the infrastrure. The coercions shown here do help to cut down some of the verbosity by providing bridges from “simple” data to more complex data.

With great power comes responsibility

It’s not all win, of course. There are various technical issues to consider. Also remember that this is something of an experimental feature, from current research where we are still exploring the issues.

There are various issues around how powerful the inference mechanism should be. To cut a long story short, with many possible coercions, it’s possible to find multiple ways to convert from some type X to some type Y – and for the sake of understanding what the coercions are doing, we need these different ways to be indistinguishable. This is the “coherence problem”.

Another set of issues appear when coercions from X to Y combine with coercions from Y to X, e.g. is going from X to Y to X the same as staying at X? And vice versa? Understanding and trusting the results is another big issue. If you have lots of coercions and interactions between them, are you sure that the implicit conversions being inserted are relevant and correct?

There may also be implementation overheads too. If done naively, you might end up repeating the same conversions. Some work has been done on avoiding a few of these overheads (including a contribution from me on approaching coercion as a transformation on how computation acts on certain types).

Finally, how does this mechanism compare to other mechanisms in programming language research? I’ve mentioned links to Haskell-style type classes, but there are other mechanisms like implicit inference of function arguments, proof search. What can this mechanism do that the others can’t, or vice versa, and how useful is it in a wider context?

We have answers to some aspects of such questions, studying families of cases or trying restrictions on the power of the mechanism, but there’s much more to explore!

Some discussion

The point of this article was to introduce you to a different view of subtyping and how it allows certain abbreviations in programs. It’s not finished work, but it does have some interesting features and potential benefits.

It’s about time I explained the title: “Fitting in”! I meant this in the sense of your program having some context to fill, such as you wanting to call a function but you don’t immediately have the input in the right form. Rather than crafting something that fits exactly, you can instead try supplying what you think should work and let the implicit conversions work out the details. That is, you provide what you think rather than what the system requires.

I’ve talked before about type-directed development, where you set up the types of important operations and the system uses that richer information to guide you to a program that meets that type. During the editing process you will have various “holes” to fill, and the coercion mechanism can help to fill those holes intelligibly. In particular, I believe coercions can help bridge the gap between simple data (that you program with) and dependently typed programs (that uses that data).

The abbreviation angle is important too. You use forms of abbreviation all the time when communicating with other people, leaving out details which the listener or reader can reasonably infer from the context. Languages like Haskell are able to skip some verbiage through being able to infer certain details from a program. And as with people, if the compiler couldn’t work out what was meant, it can ask for more information. How much more can we use abbreviations in programming, reducing the obvious(-ish) information and thus focusing more on the key content?

Coercions might also be an alternative to some kinds of overloading. Instead of trying to write some code in a very generic way, you could maybe instead write a more concrete version and rely on coercions to convert other inputs appropriately. For example, having a rich set of functions on vectors but relatively few for lists, so when you want to do something to a list you do it via the code for vectors – there’s no need for a separate version for lists.

Finally, there’s the point about Using More Types. I’m an advocate of using the type system to encode rich information about a problem domain, and to help avoid certain programming errors. Languages like Haskell promote such wider uses of types – for example, there’s little ceremony to defining a new type compared to some languages I could mention. (If you grew up in the UK, you might enjoy a trip down memory lane and can then consider which programming language(s) match the description. I give you “Worzelese”.) Ideas like coercions make it easier to work with richer types.