What I’m working on at Recurse Center
I’m almost 3/4 of the way through my batch at Recurse Center. I’ve been working on a bunch of things, and thought it might be useful to go over them in some detail.
I came here with the goal of re-launching Ghilbert. This is a very ambitious project, basically a new language for representing formal proofs. I got quite a bit done on it, but my motivation has not been consistently strong.
One of the challenges (which I’m realizing is a bad sign), is that I haven’t been able to articulate the goals of the project very well, especially who it’s for. I made an attempt at an essay, but am not very satisfied by it. If this is going to be a viable replacement for a first course in formal logic, then it’s also going to need the equivalent of a textbook, written specifically to be accessible to a broader audience. That’s a huge time investment, and being able to leverage theorems translated from Metamath only helps with a small part.
Ultimately, I have decided to put this project on the back burner. I still think the ideas are good, but it feels like to really accomplish the goals will take a year or more of full-time work. Some of what I want to do, in particular redesigning the module system so that theorems can be written independent of foundational axioms, requires work at the cutting edge of language design (I think dependent pair types are promising, but the details are tricky at best).
Xi syntax highlighting performance
While I haven’t made a goal of focusing on xi during my batch, I did start trying to use it as my day-to-day editor. I quickly found that performance was totally unacceptable because of the batch highlighting. That’s more than a little ironic, because performance is a stated goal of the project.
Implementing rope science 11 was a juicy and fun project, and it improved performance tremendously, back into usable territory. There’s lots more to be done, but just this was an encouraging step.
The core of the algorithm is a cache eviction strategy designed to balanced local access patterns with keeping “gaps” reasonably small (all this is to keep the data structures small and nimble when highlighting large documents). I gave a little talk and, preparing for that, realized that showing the cache policy interactively was the best way to communicate that. The visualization currently lives as a pull request for the xi documentation directory.
I hope to polish up the talk and the visualization, and publish to a wider audience, as I think it’s a great showcase for xi.
I also made some progress in doing a more detailed explanation, intending it to have an interactive visualization showing the construction step-by-step, but didn’t follow through. I can imagine coming back to this, though.
I think getting rid of the pseudo-randomness, and just doing fill breadth-first, would be an improvement (discussions with Marcus Klass de Vries helped illuminate this question).
Apple 2 bitmap text rendering
The Recurse Center has an Apple //e in the space, and the second I laid eyes on it I thought there would be a pretty good chance I’d code something for it. The KIM-1, a primitive 6502-based computer, was the very first computer I ever programmed, so I was drawn to coding something in 6502 assembler.
Guided by the thought of maybe putting an emulator on my webpage, I wrote some fairly simple code to render a proportionally spaced bitmap font. It worked:
The code is not yet published.
Along the way I made some improvements to appletoo, which was written at Recurse a few years ago. This is a good emulator to put on a homepage because it’s pretty simple and can be readily adapted as needed.
Apple 2 cassette loading
Getting code moved over to the Apple hardware is a challenge. The best supported (but slow) technique is to use the cassette interface, which rips along at around 1300 bits per second. There is an open source project called c2t which increases this to around 8k with a “hi-fi” mode. I looked at the details of the encoding and analog path and came to the conclusion that much higher data rates are possible. My original goal was 40kbps.
I used the “retro and physical computing” weekend to finish the project (aided greatly by an oscilloscope purchased for the weekend), and ultimately achieved 23k. I haven’t published this code yet either, but a bit of a writeup exists as an issue on c2t.
Keiran King has been working on Phil, a tool to help construct crosswords, as the main project of his batch. One goal is to automatically fill the remaining grid with partial words. As it turns out, there is some literature on this, and an extremely promising approach is to use a SAT solver.
I’ll be writing lots more about this, but this is now my main project, and I’m very excited about it. We’ll run the solver in the browser (using wasm and emscripten), and are collaborating on refining the user experience. I’ve also figured out ways to prune the search space so the solver can run much faster.
Modern SAT solvers (such as glucose) are almost magical in their ability to find solutions even in large problems (a typical crossword translates to hundreds of thousands of variables and millions of clauses).
There’s also a lot of interesting work in refining the wordlist, as this is key to quality results. We’re starting from Saul Pwanson’s xd corpus, and thinking of a bunch of ways to filter out low-scoring words as well as augment it with fresh new ones. A promising approach to the latter is collocations, and as we speak I’m replicating those results (pairing with JB Rubinovitz on the analysis code).
I pushed my experiment running the FM synthesizer in a browser (using emscripten). I mentored some work towards the rewrite of pulldown-cmark to a better algorithm. I’ve also been doing minor maintenance on xi and fancy-regex.
Thanks to the Recurse Center for providing the space, and for the many talented and kind Recursers who have shared stimulating conversation, encouragement, and pairing.