Andrew Chan


Contents

Notes From Figma II: Engineering Learnings

This is part II of my notes on Figma, which will focus on my technical and non-technical learnings from Figma - problems in our domain, how I grew as an engineer, and how Figma's culture shaped our work. The first part, which talks about what brought me to Figma and my take on why Figma succeeded is here.

What technical things did I learn at Figma?

Constraints of the web

The web's unparalleled distribution is a double-edged sword. It's a single platform with vast reach: not only is it write once, run anywhere for developers, but it's zero install, instant play for end users. But its universality means it moves slowly. In many ways it's a highly constrained, backwards platform only now catching up to decade-old developments in hardware and low-level technologies. For example:

What's interesting is that most software - even native software - doesn't take advantage of the hardware features listed above, and Figma was able to exploit this. It came out in 2015 using a web standard based on 8-year old technology (WebGL). Rendering 2D vectors on the GPU wasn't a new idea either (Loop-Blinn was a decade old by then) yet Sketch - its MacOS native competitor - barely even used the GPU!

Design tools have multiplicative complexity

My colleague Rudi pointed this out a while back. Basically, when you're designing a set of primitives that all go in the same space (the canvas) which people are going to combine with each other to make things, everything needs to work with everything else.

To illustrate this, let's imagine a simple version of Figma where we let people put rectangles and text on the canvas, change their color, and directly manipulate (select, move, and resize) them with the mouse. In the image below, our app is shown with the layers panel on the left, the toolbar in the middle, and properties of the currently selected object (Text 1) on the right:

Simple Figma

Already we have some questions:

We only have two types of objects and a handful of operations and info panels, yet we already need to consider how resizing can behave differently given different object types, or how the right-hand panel might behave differently given heterogeneous selections. Some of these questions are more product-related (how resizing a selection of multiple objects should behave), while others are more engineering-related (how we might efficiently handle reflowing large amounts of text). And this is all before we throw in multiplayer, auto-layout, and components!

What non-technical things did I learn at Figma?

Crucial re-architectures

I joined Figma at a unique time in the codebase's health where - a few years after the initial launch and the team building as fast as possible - the business was beginning to really take off but the accumulated tech debt was on the verge of catching up to us.

The codebase had a foundation of tight, well-documented systems which came from the earliest years of the company (before Figma was released, when the team was small and had time to internally iterate), but beyond that, accidental complexity abounded. When newcomers asked “why does X system work this way”, a common refrain was: it's just like that, because we were building under pressure, that's what we came up with, and we never had time to rethink it!

The underlying reason was, of course, competitive pressure. Our tech had given us a head start at launch, but our competitors had taken notice, and by 2019 were furiously chasing after us in collaboration features and threatening to pull ahead in editor features like auto-layout and prototyping animations.

So the prevailing sense in the editor teams was urgency, and while we did strive to perfect the design of our features before releasing them, the engineering behind them often took on considerable debt. This led to some weird code: for example, to ship auto-layout, we built two separate layout engines managing different parts of the document with some overlap, which was a constant source of subtle bugs and performance issues for years after. It also meant that learning all the weird quirks of the engine became perhaps more valuable than being a fast problem solverI'm biased, because I spent years as a platform engineer, and probably was better at doing the former than the latter., because at times the former was needed to get anything done at all.

One time where this came to a head was in the architecture behind our components feature: our original architecture had a ton of accidental complexity and made important operations very difficult to reason about. This was blocking several key features, so the team decided to bite the bullet and completely redesign it, which required two huge combined refactoring-and-migration projects, and the creation of a system for file-wide migrations in our backend, which we had somehow gone for years without having. It was a heroic effort, and succeeded I think because of engineers like Joey Liaw, who led the project and had also been part of the healthcare.gov rescue team.

That said, some re-architectures failed, even the ones with the same hero engineers working on them. For instance, a large project to rework the engine's core datastructures in 2018 failed. But these failures were not critical, partly because we always undertook them in a way that didn't disrupt ongoing product development.

So what happened? Did we just get lucky? If I had to pick a dividing line between the failures and successes, it's that the successes always had a product goal in mind ahead-of-time. For instance, both the components re-architecture and another mobile viewer re-architecture I worked on aimed at unlocking a product roadmap, with the latter also improving performance. The failures, on the other hand, aimed to make code easier to reason about - meaning “reduce bugs”Or equivalently, a vague notion of making it easier to build new features without an explanation of which features would be easier. - but without a strong notion of why the bugs being reduced were so important that a couple senior engineers needed to be tied up for a few months. It's also no coincidence that the product goal was usually easier to objectively measure while goals around reducing bugs often weren't.

Here's what I'd claim: these product-driven re-architectures were crucial to our success, and while they sometimes required heroic efforts to succeed, this wasn't a symptom of dysfunction; instead, in the early days it was optimal for us to go through these cycles of building fast and taking on debt, then attempt refactors as needed, because we were racing to stay ahead of our competitors in a fast-moving space.

The Mythical Stakeholder Alignment

I spent my first couple years as a product engineer. I was not a very good product engineer.

I was ignorant of the urgent dynamics in the editor product at the time, and had a habit of misunderstanding or interpreting ambiguous requirements in ways that increased scope (because of the same instincts that would lead me to work on the platform - everything had to be clean and performant!). But because I was good at fixing bugs fast and soaking up trivia about the codebase, I was trusted by the team to take on larger projects and maybe mistakenly seen as a high performer.

In my first year, this led to me working solo on a refactor which had to be scrapped after a few months because I thought it needed to support more than it actually needed to. At the time it felt catastrophic, but looking back it wasn't all that bad - the failure didn't significantly impact the rest of the company but was a big learning moment for me.

On the other hand, I suspect my stint as a poor product engineer made me a better platform engineer: I knew what it was like to interface with product and design folks, what they valued and how they communicated it, and how to use this to flush out the biggest risks and distill the highest priority requirements.

For instance, I led a later project that overhauled our whiteboarding renderer to improve FPS, and our initial idea (throttling canvas rendering to improve overall responsiveness) seemed concerning to basically everyone working on on-canvas features. However, we were able to quickly establish that the optimization could activate selectively and usefully, that the critical times where it had to be disabled were few and far between, and that risks like system de-synchronization either weren't issues or could be easily mitigated.

How did we do it? We talked to people! We held meetings and wrote docs, but also, since folks at Figma are big on seeing and trying things out, we built a demo, recorded videos demonstrating what magical terms like system de-synchronization actually looked like, and later when we landed it in staging, we built in a subtle UI indicator that would appear to let folks know when our optimization was active.

In other words, I had learned how do this mythical thing that people call “building alignment”: in a large company, where both the software and its use cases are too complex to fit in any one person's head, I had enough of an understanding of the system being optimized to know when an idea (that someone else came up with!) was locally possible, and of the product and org (the connected surfaces and the folks who understood them) to know or find out if it was globally possible, too.

Figma's engineering culture

Importance of side projects. At Figma, we had a culture that encouraged side projects - toys or programs that folks would create which nobody asked them to, which did not necessarily have an impact on the bottom line, but which could be used to explore new ideas or technologies. Many features and innovations came out of side projects. Just off the top of my head:

Two things made it work for us:

WebGL Water

The sculpture of the "WebGL Water" demo seen in the Figma office before the pandemic.

Why were side projects important to us?

It's interesting contrasting our culture with the one that many of my ex-Dropboxer colleagues came from, where (I was told that) side projects were frowned upon because it meant you weren't spending enough time working on your main project.

Another interesting question is when side project culture doesn't add value anymore. As with so many things in company cultures, I think side projects don't really scale with company size. In the early days of a startup, folks are much more incentivized to spend their free time on exploring ideas that are genuinely relevant to the business, since they have a greater stake in its success. But larger companies are often more averse to revolutionary new ideas due to having well-established businesses, and I'd guess that the ideas folks explore if given the opportunity would be more driven by what is better for their own careers, and only relevant to the company insofar as they fulfill "hack time" requirements and come from opportunities in the company's systems.

Geeks attracted to a weird tech stack. We had (and still have) a stellar team with lots of browser and graphics hackers who had cut their teeth in the 90s and 00s. Partly we made intentional hiring decisions early on, but I think we were attractive because of our app's unique tech stack which made it like a browser within a browser or an MMO. For example, among my colleagues were a teammate who interned at Netscape and worked at Mozilla through the birth of the open web platform, and multiple other teammates who'd built videogames during the "masters of doom" early 3D era.

PRs as documentation. PRs in my corner of Figma were specifically written with code archaeology in mind and providing context to changes line-by-line via git blame. For a bugfix, a PR might include in its description:

Not every team did this, but my team did, and it was very convenient because months or years later, I could reconstruct the whole history of a module, including the reasoning for all the little hairs and bugfixes that made little sense on first glance. Most of the time it was also possible to reproduce the bug with the steps contained, and even decide that whatever fix was there could be deleted if needed (for example, as part of a refactor).

Another thing we did that I haven't seen elsewhere is structure larger PRs so that they were readable commit-by-commit. No matter what order we had done our original changes in, we would commit them in a logical sequence that made it clear what the PR was doing using a tool like git add --interactive. So for example if we were creating a new version of a module that had a slightly different API and worked differently under-the-hood, our commits might look like so:

        
1111111: Copy-paste old library code into new files
2222222: Make changes to code and function signatures in new files
3333333: Add feature flag and update call-sites
        
    

This made changes extremely easy to review and doubled as a way for team members to learn how senior engineers thought.

3 things. We had a beloved traditionSome folks even got together and filmed a documentary about it over maker week, though I think they still haven't finished editing all the footage they took! called 3 things where a person would talk about 3 things that made them who they are. These used to be company-wide but at a certain scale, it gets weird to be vulnerable, so we stopped doing them (although my team reintroduced them internally).

The prompt was intentionally vague - it didn't have to be the 3 biggest or deepest things, it could be any 3 things. Some people reached down deep and revealed formative experiences of heartfelt pieces of themselves, while others talked about recent hobbies they were spending lots of time on or favorite books. I attended a bunch of these and felt that I really got to see beyond the professional veil to a more personal side of my coworkers, and I regret a bit not having participated.

A common question for company cultures is whether the company is more like a "team" or a "family", and it's also a matter of opinion which one is better. During my time there, Figma was at least on the surface much more like a family. Whereas companies like Netflix emphasize extraordinary candor, professionalism, and performance, Figma's culture emphasized empathy, mutual support, and growth, and 3 things more than anything else fostered empathy.

Did it work for us? I think it did, and miraculously it did so without sacrificing performance: the folks I worked with are still some of the most brilliant and productive I've met, while being some of the kindest (Jamie, who is one of these people, calls them "humble wizards"). Yet it wasn't without tradeoffs: there were certainly times when I felt that we had over-emphasized support over candor in communication, which made it hard to give or receive valuable feedback. But if I had to point at one reason why this culture worked so well for us, I think it was because the design tool space requires creativity, and you need an empathetic, supportive environment to foster thatAlthough maybe early Apple with "brilliant jerk" Steve Jobs is a counterexample. But while the early days might have been different, AFAIK we didn't have a brilliant jerk during my time at Figma. So maybe the learning is: you can have either a brilliant jerk or a team of empathetic, supportive peers..

Acknowledgements

Thanks to Rudi Chen, Ryan Kaplan, and Jamie Wong for valuable feedback, quotes, and suggestions on multiple revisions of this post.