Andrew Chan


Things I did in May 2024

Diffusion models, Skew, and NYC.

May was less building, more experimenting, and even more reading and writing.

The Skew Programming Language

Skew doesn't have a logo, so I made this in Figma.

In early May, the company blog about one of the most interesting projects I worked on at Figma - the quest to migrate away from our custom programming language - was published.

I originally wrote some thoughts about Skew here, but it was getting long, so I split it into its own blog post: The Skew Programming Language.

Diffusion Models

Fourier features in a diffusion model Fourier features in a diffusion model

Left: Dataset of 2D points in the shape of a T-Rex. Right: 1000 samples from a diffusion model trained on the dataset.

I also published a long explanation blog post about diffusion models, covering background, theory, advances, and applications. It got some attention on Hacker News, and I learned that the HN moderators will actually re-up posts that they decide have good content if they don't make it to the front page the first time around.

I wrote this because I was really dissatisfied with how the fast.ai “Build Stable Diffusion from Scratch” course explained the theory behind diffusion models.

To its credit, it's a practical course rather than a theoretical one, but I found it frustrating that we were frequently implementing formulas with only a vague explanation of what they were doing rather than a solid understanding of why the formulas looked the way they did and what each piece meant. I also wasn't satisfied with the "intuitive" explanation of diffusion that Jeremy gave, which went something like:

Sampling from a distribution with score matching

Sampling from a distribution by following the score. Via Calvin Luo's blog.

I now recognize this as an explanation of denoising score matching, but I also think it leaves out a lot of context, and left me with a lot of questions at the time, like:

  1. Why is this better than other approaches people have tried for generating images in the past, like GANs?Answer: GANs have mode collapse and are hard to train. Flow-based models don't have mode collapse, but aren't as expressive and are difficult to scale up.
  2. How does this lead to the training objective of predicting noise added to an image?Answer: Long derivation in the post!
  3. If we're following a gradient, why do we need to use this arbitrary-looking formula for sampling rather than a pre-existing gradient-based optimizer like Adam?Answer: We're not only following a gradient. If we do that, then we end up with a form of mode collapse during sampling.

Answers to the above are in footnotes.

Figuring out the answers to the above questions were rough. There are many resources online explaining diffusion models, but I found them either:

I ended up writing a resource for myself which filled in all the context that I needed as a software engineer who likes understanding the why and how of new technology at an intuitive level (connecting things to my existing knowledge graph), but with only an undergraduate background in math and stats from years ago. There are lots of folks out there like me, and I was happy to see that many people found the post with the visual aids and example modeling task helpful.

One personal thing I realized when writing the post was that I have forgotten an embarrassing amount of basic math, particularly algebra. For example, I had to stop and think for a second whether \(\sqrt{\frac{a}{b}}=\frac{\sqrt{a}}{\sqrt{b}}\). This really slowed down my understanding of proofs; it was a lot easier in undergrad when I routinely exercised these skills. Memorization and repetition really is huge when it comes to learning.

Other

I started organizing an informal ML paper reading group for South Park Commons. It's nice to be forced to read papers and catch up on hot topics. For instance, coding agents are all the rage these days, so we recently read SWE-Agent.

I also started participating in the run club at SPC. I got injured after my last race and so was out of commission for a while, but am getting back into running. I like running because you don't need any special equipment or facilities for it - just your own two feet and a safe path to run on! It's something everyone can do, too. My dad runs marathons, one day I'll run one too.

Visiting NYC

I traveled to New York for the first time in a while to visit friends and family. I hung out a lot at South Park Commons NYC, went up the Empire State Building for the first time, and ate a bunch of pizza. Some observations:

Some clear signs of age I noticed:

Some modernizations and advantages over other systems: