GHC Development: OutsideIn

2016-05-10

Here are some notes I took while attempting to contribute to the GHC Haskell compiler. I came to the project as an outsider, but with considerable experience with the language.

1 Positives

make

I expected the build system to be the biggest pain point in the process, but was quite impressed by how well it works. While getting up and running can be a challenge, it can also go very smoothly if you follow a set of instructions that work for you. The big win comes after the first long compile time (if you've never built GHC before, expect it to take around 20 minutes), you make a change, and the subsequent rebuild is really rather quick given that you run make in the right place.

The Code

GHC is an old project. Think how incomprehensible your code from last year can sometimes be, and imagine what it would be like to live with all your code from twenty years ago. Few of us have had the opportunity to work on a particular code base for such a long period of time, and it is a striking testament to the developers' discipline and engineering sensibilities that not only has the whole thing not collapsed under its own weight, but that it still admits significant change.

2 Negatives

The Code

Let's get this out of the way: testament to solid engineering notwithstanding, the code is, in places, quite unusual. There are clear signs of changes in convention with how code should be spaced, how punctuation should be used, how identifiers should be chosen, and if the use of comments is actually forbidden or just uncouth.

Here is an example of one coding style found in GHC,

tcLookupGlobal :: Name -> TcM TyThing
-- The Name is almost always an ExternalName, but not always
-- In GHCi, we may make command-line bindings (ghci> let x = True)
-- that bind a GlobalId, but with an InternalName
tcLookupGlobal name
  = do  {    -- Try local envt
          env <- getGblEnv
        ; case lookupNameEnv (tcg_type_env env) name of {
                Just thing -> return thing ;
                Nothing    ->

                -- Should it have been in the local envt?
          if nameIsLocalOrFrom (tcg_mod env) name
          then notFound name  -- Internal names can happen in GHCi
          else

           -- Try home package table and external package table
    do  { mb_thing <- tcLookupImported_maybe name
        ; case mb_thing of
            Succeeded thing -> return thing
            Failed msg      -> failWithTc msg
        }}}

You won't find too many examples of Haskell code in the wild with do vertically aligned like that, or three closing braces stacked up on each other.

In most open source projects, these kinds of things are fixable. Today, it is truly easy to clone a repository, make a change, solicit feedback on that change from as wide a set of concerned parties as the project can sustain, and ask a maintainer to merge those changes. Not so with GHC. In part this is due to a tendency to avoid changes that aren't strictly necessary so as to ease long-standing branches representing work of people who are not outsiders. And in part this is due to GHC development not hewing to the overwhelming trend of open source hosting on GitHub.

Notes

Code documentation is another area where GHC blazes its own trail. While many definitions are uncommented, anything Simon finds suitably non-obvious in his or anyone else's code gets a Note.

GHC has a coding style that specifies, among other things, how comments are to be used. These guidelines are aspirational more than anything else. The use of long block comments headed by the word Note is valuable, but not terribly well supported by auxiliary tooling, nor is the content or style particularly predictable. Reading the code, you will scroll past long Notes you do not care about, you will encounter code you wish had a Note, and you will read Notes that probably answer a subtle question you can't ask without being as immersed in the code as the Note author must have been.

Tooling to fold Notes in your editor, and to extract Notes to easily-linkable HTML would be very useful. Unfortunately, such tooling will be unique to GHC because this is a non-standard approach to documentation.

Stewardship

On darker days, GHC feels like a "source is available" project. We see this most commonly with projects run by large companies: they make code available, but the project effectively evolves behind closed doors.

GHC is definitively not developed in secret by any company, but you as an outsider will eventually hear that a decision was made on some phone call, and realize that you do not have a seat at that table, you do not have access to the discussion at that table, and those that do have a seat do not represent the community.

The problem with this process is that it is inconsistent. Internally spawned ideas appear to be implemented on a whim, while outside contributions must climb a mountain made of quicksand and moving goalposts.

Submitting Changes

GHC is a project with a long history and a healthy number of contributors, and the development process is geared more towards making existing contributors comfortable than courting new ones. This is not a black-and-white issue, and I honestly do not mean that pejoratively. To this end, code review and preservation of the issue database have been prioritized. Regarding the former, this has led to the choice to rely upon Phabricator for code review, and arcanist for submitting patches.

A benefit of GitHub's popularity is that one can, say, tweet a link to an issue or proposed change to gather input: virtually everyone is already logged into GitHub, and can click a 👍 button to show support, while far fewer people are set up to use GHC's specific project management tool. While this bit of friction is far from fatal, a worse consequence is the loss of some genuinely good aspects to a typical GitHub workflow due to the use of arc.

In short, the use of GitHub does not require any change to a developer's use of git. A contributor works on their development branch, can share it with other developers with a work-in-progress pull request, and even get comments from non-maintainer interested parties.

In contrast, arc is not git, flattens commits, and enjoys eating your commit messages. The result of this is that it felt to me like my development work was entirely subservient to the master repository, whereas a more usual git workflow feels as though I am maintaining an equal-rights clone of another repository.

If you are old and jaded, you might scoff at the notion of non-maintainer interested parties for anything but the largest of projects. Here is an anecdatum for you: a change to a submode of a mode of a text editor had patches applied to it, and several people chime in with questions (that were answered!) and encouragement, before the maintainer merged the updated change request. Wow!

Familiarity is a huge factor here. For better or worse, though, I need to be familiar with git to contribute to most open source projects, while arc is much less commonly encountered.

Wiki

GHC development makes use of Wikis to keep track of proposals, discussions, and instructions. This system does not work because there is no holistic authorship.

Features

If you google for a GHC feature you've heard about, you will be sent to a wiki page whose contents represent a subset of the sides of a debate that may have happened years ago, and doesn't actually tell you what came of it all. It is not just useless, but harmfull because it will tell you what is, today, the wrong thing.

Instructions

Instructions for doing things fall out of date and get duplicated. This is tragic because somebody has taken the time to add updated instructions, but you don't realize you're reading old ones. Something I think would help would be to prepare a process for extracting Wiki contents into a traditional manual. It is not that a manual is more useful, but that such an effort at deciding where in a hypothetical manual a wiki page belongs can reveal overlap, and perhaps offer an opportunity for removing out of date information.

Here is a minor example: Getting the Sources lists five git config commands a user should enter if cloning from the GitHub mirror; the Newcomers page has a single git config line. It is not hard to figure out what is going on, but for a curious user who opens a few browser tabs when trying to get up and running, having to think about whether two superficially different sets of instructions for doing ostensibly the same thing are semantically the same is needless friction.

The Wiki has had a lot of work put into it, and is far from the worst it has ever been. But it still feels like a series of islands that don't tell a cohesive story.

The Process

If you do not have a personal relationship with someone on the GHC development conference calls, you will have to do the following to make a change or add a feature to GHC:

  • Sign up to one or more of the haskell-cafe, ghc-devs, libraries, haskell-prime, or many other email lists; send a description of your idea
  • Either receive no response or suffer the questionable wit of longtime list members; realize email is a terrible way for people to determine if a community of thousands likes the broad strokes of an idea
  • Author a post to /r/haskell for a different subset of the community
  • Create an account on trac to describe your change in a different markup syntax to a different (tiny) subset of the community
  • Get a wiki account account, then write another description on a Wiki page primarily for Simon as that is what he prefers
  • Implement your change, use arc to submit it to phab where another tiny audience comments on things
  • If you are not able to commit your change, you will be told to take a long walk off a short pier go back to step one

That is completely unacceptable. Any proposal needs a lightweight way to gauge broad support, then a period of constructive refinement. What exists today is shabby treatment of those who are giving their time to help others.

Other open source projects nurture contributors by valuing their contributions. GHC developers do not stand up to defend contributors from bad behavior on haskell.org-hosted mailing lists, offer pedantic warnings and cautions while an idea is still seeking support (e.g. the potential to update syntax highlighting rules in text editors is often brought up before any substantive feedback), and demand the bike shed be painted their choice of color.

Open source projects that are not aiming for Torvaldsian social norms welcome changes and new opinions.

3 Outside In

There are awesome people working on GHC, and their efforts are greatly appreciated. I will specifically call out Ben Gamari and Austin Seipp who are both smart, generous, and tremendously talented programmers.

Perhaps due to its long history, however, I think GHC is slightly disfunctional as an open source project. That said, maybe the benefits outweigh the costs. It is a bitter pill to swallow, but I do not know of a compiler I would rather use even if I can not affect its course.