The Nerden of Dorking Paths

All the hot takes have been hotly taken, so this is a meditation about some stuff I've been working on. And I have a new phrase for you: Opportunistic Generativity.

The thing about a two-week cadence on the internet is that contemporaneous material comes and goes, and by the time you’re ready to write again, all the takes have been taken. So I won’t be saying much about Basecamp, DogeCoin, Elon Musk hosting Saturday Night Live, or Chinese rocket detritus. Instead I’ll have to write about mundane, quotidian stuff, like what I’ve been working on.

I am less on the fence than I was a couple months ago about whether I should carry on the ordinary tree-shaking with my existing repertoire, plough some hours into a post-pandemic service portfolio, or whether I should aim instead for a jay-oh-bee. Then a couple things happened. One was that the company that I was pretty sure was wasting my time with their elaborate and terribly contrived coding test was indeed wasting my time, and a lot of it. The silver lining there, I suppose, is that I scraped a decade’s worth of barnacles off of my Python skills: something I needed to do anyway, though in pretty much any other context I could have gotten paid for that.

What I sent this prospect was testing them as much as they were testing me. I wrote before—and even told them—that I had apprehensions about spending real effort on a fake product, so I had a little fun with it—after I had met their requirements. Anybody but the most joyless misanthrope would have at least wanted to chat about it. What I got in response instead—after keeping me waiting for two weeks—was some gaslighting, Frankfurtian bullshit. These people literally complained about the formatting. Of Python. The most eminently automatically-formattable programming language in existence, with automated formatters out the ying, which I ran. Talk about out-of-whack priorities. Guess I dodged a bullet, huh?

(If I get the motivation later on I might just put this thing on GitHub and let you in the audience pick it apart. It’s not like they paid for it, and nobody asked me not to.)

Method acting a prospective tech employee also gave me a good gut feeling for what is an appropriate order-of-magnitude effort considering the risk involved, which I’d say is maybe around ten or twelve hours per company, including the initial scouting. Any more than that is a freelance gig.

The main influence steering my current direction is the fact that over the last year and change, I have identified two projects: one for a product and another for a tool to help deliver a service. These are both indefinite as to when they will be “done enough” to be merchantable. They are, to refresh our memories:

  • The IBIS tool, roughly a planning and design-problem-articulating infrastructure based on the mid-20th century work of Horst Rittel and collaborators. This will, as it already does, take the form of a (to be blown up and replaced) Web app, and potentially later on, a desktop, tablet, and mobile application.

  • The content Swiss Army knife, which marshals pages, structure, and metadata of arbitrarily large websites (I use it on my own) and affords sophisticated bulk views and operations on Web content. Because of what this thing does, I can’t see it ever being more than a command-line tool that is realistically wieldable by more than a very small cadre of experts, which is why I consider it to be more of a piece of equipment to aid in delivering a service than a shrinkwrapped product in its own right.

What these two things have in common, aside from the fact that for very irritating structural reasons I have no idea (could be weeks, could be months, could be years) when either of them will be in a position to earn me any money, is that neither are getting worked on nearly as much as they could be. So the ideal short-to-medium-term goal, for the sake of my bastard offspring, is to get into a position where I can at the very least reliably reclaim my evenings and weekends to work on them.

Why you gotta keep coming up with weird words?

Now to turn to what I have been working on between wrangling prospective employers: this damn thing I’ve shown you half a dozen times already, which is actually approaching finished! What’s left to do are to draw the sample timelines, probably change one of the input parameters, and clean up the rest of visualizations. For this last thrust I want to take a step back and approach it from an outside-in, art-direction-y perspective, rather than the bottom-up programming problem treatment it has been getting heretofore. I expect to have it up and out the door by next newsletter.

What I wanted to talk about with respect to this little project—and come to think of it the two others aforementioned—is something I’ll maybe call opportunistic generativity. It goes a little like this: the ostensible goal is only kinda sorta the actual goal. The actual goal is to define a real-world problem, that through the process of solving it, generates a bunch of useful byproducts.

It is worth remarking that the IBIS tool story is backwards: it began life as a concrete way to test out an extremely generic protocol that I had designed, then became useful in its own right, and then became something of a laboratory for a number of techniques for taking advantage of metadata embedded in markup. The Swiss Army knife started in the middle, as a data vocabulary for content inventories that I blasted out back in 2010 and didn’t touch for years, only to pick it back up again to work out some technique for modeling things like website migrations, as well as other desirable behaviours of content management systems, like SEO metadata, and eliminating 404 errors by tracking URLs across renames. I use it, in its current breadboard state, to generate my own website.

This particular project started life as a kind of thought experiment from observing my younger brother’s idiosyncratic self-employment scenario to the extent that it was superficially very risky-looking, but also statistically well-behaved enough to work out on average. I figured if I could parametrize the inputs, I could capture the essence of a livelihood that included pitching gigs, scoring them, working on them, and getting paid for them, in all the ways that could vary. The proximate objective is to use mathematics and computation to generate plausible scenarios—in this case circa-one-year timelines—to help imagine what life would be like under one uncertainty regime or another, and figure out which ranges of parameters are viable and which are not.

The goal of this thing is also to smoke out boring practical questions like “how much credit do I need to carry?” and “what would be the unambiguous signal that I need to get out of this enterprise?” To be sure you’re still going to need a spreadsheet, but the posited timelines of the simulation, ensconced in friendly, conversational text, are going to give you better situational awareness than looking at a bunch of statistics in Excel.

This brings me to the second layer of opportunistic generativity: the thing that took most of the effort on this project by far was determining the control surface for the parameters. True to form in any design problem, the bulk of the work was just trying to decide on an optimal configuration from arbitrarily many candidates. How, for instance, should I choose to transmit the parameters from the probability distribution sparklines back to the Web form? How should I represent inflections, singulars and plurals, and other conditional text?

This second-order goal is to be able to just plunk down a bunch of ordinary declarative markup and then call in a script on top and have everything Just Work™, with the aim of shrinking the overhead of creating these interfaces by at least an order of magnitude. I could further see potentially encapsulating this behaviour into a product like D3. This would serve the even farther-out goal of promulgating a form of computational rhetoric, or if you prefer, model-driven debate.

I will fully admit this project is a white elephant. If this was a client project, I probably would have underbilled and underbudgeted the time it would take to do (though hard to tell because I’m working maybe one day a week on this thing). But I consider it to be training, and offering people a glimpse into otherwise opaque and/or invisible work. I began by writing out what I wanted to see on the page, and I did whatever it took to make that happen. This is the scoping equivalent of painting yourself into a corner.

Biting off more math than I can chew

Naïvely, one of the features I had decided was essential was a display of individual simulation runs that were considered “typical”, to the extent that the outcome was “similar to the majority of”—for some rigorous mathematical definition thereof—the rest of the other outcomes. So here’s where I LARP being a data scientist (although it’s not even LARPing because my data is fake).

Now, the data that I generate currently has 11 dimensions. I had remembered a thing called multiple correspondence analysis from a PhD dissertation-turned-book called Cybertext, which used a method of squashing the dimensionality of a bunch of data containing a bunch of yes-no features—in that case it was some (hyper, cyber and dead-tree) texts—down to something more manageable. It turns out, unsurprisingly, that there is an analogous process for continuous data called principal component analysis, that will take whatever you throw at it and reorganize it in terms of “the most important differences” and do so in order of said importance.

To help me out, I found an excellent lecture series by some people at the University of Washington, ultimately intended for biologists. In addition to being high-quality instruction, they have a remarkably slick setup, especially since they recorded those videos in 2015, before people went Zoom-crazy.

Watching this series is also the closest I have come to giving a crap about machine learning. I am not a fervent AI skeptic per se, this is just the first time I could see jobs that I care about that machine learning would genuinely be the right tool for.

It turned out in this case that two dimensions was more than enough to describe the important differences in my eleven-dimensional data set. Determining typicality at that point was a matter of finding the coordinate at the centre of all the points and drawing a circle around it✱ to net the closest 80% of them.

✱ More accurately, it’s a matter of measuring the Euclidean distance of every data point from the centroid and just taking the first 80% of them. This is not quite the same thing as drawing a circle and saying whatever is inside it is “typical”. The points are selected first and the circle (which will usually look like an ellipse in drawings anyway because the coordinate grid is squashed) gets drawn on afterward.

Also worth remarking that even just sitting here writing this aside, I can think of a few ways this definition of “typicality” is actually kind of sketchy, and can imagine other kinds of datasets for which it wouldn’t work and other strategies which would be better. But that’s for another time.

The fractal digression that keeps on fractaling

Another second-order outcome of this project stems from the fact that I have to programmatically draw a lot of graphics, which involves a crapload of typing if you do it the standard way. Solution? Port the terse markup generator I had previously designed over to JavaScript.

This little script—which, I know, I have yet to package up properly—is a worked example of a third-order initiative: my conviction that in many cases the penultimate description of a piece of software is arguably more valuable than any one chunk of actual code. In this case, it fits in a single paragraph. If your programming language has a particular dirt-common native datatype, typically called:

  • hash,

  • associative array,

  • dictionary,

  • object (in the special case of JavaScript),

  • map…

…then you can use these along with regular arrays and strings to construct a precursor to Web markup that can be represented and manipulated by ordinary syntax and operations, and subsequently “baked” by a single, easy-to-write-down function. The abstract definition of said function is slightly longer than this description, but with it (you can also sneak a look at any of the now three reference implementations), in a few hours you can make another one in any language you want. Because this very small piece of software is specified in the abstract, it will behave the same and support the same set of expectations. What this means is that if you encounter one of these things in a language you had not previously seen it written in, you will already know how to use it.

If this silly little project to simulate the gig-pitching process with slightly-more-complex-than-spreadsheet math was only evaluated for its proximate goals, it would be decidedly in the red. If all I was after was the result of a simulation, this could have been over and done with in under a day. Since I treated the goal as a kernel of opportunistic generativity, here’s what I got out of it as well:

Not to mention this and other writeups. I guess the moral of this story is, if you start projects without really worrying about finishing them, you may finish them eventually, but you will also have a mountain of byproduct that is arguably more valuable.

Though for Actual Paying Work™, you might want to be a little bit more conservative.

It’s not the worst strategy in the world; I can think of at least a couple billionaires who were made this way.

Nerd bonus

I wrote a rantifesto last week about spreadsheets, about how they are not very good, and how to change them to make them better. And if you think I can help you at your company or on your project, or know somebody who might, do not hesitate to say so.