Hacker News — vinext + Cloudflare Workers

new
past
show
ask
show
jobs
submit

▲Writing a C Compiler, in Zig (2025) (ar-ms.me)

172 points by tosh 2 days ago | 62 comments

fuhsnn 2 days ago [-]

Looking at the repo, the author seemed a little fed up [1] with the nature of lower level language and quitted.

[1] https://github.com/asibahi/paella/blob/main/writeup/c19.md#u...

MatthiasPortzel 2 days ago [-]

I’ve just read the two functions there by that footnote, `reaching_copies_meet`. I have so much code review feedback just on code style, before we even get into functionality. And it’s like 20 lines. (The function shouldn’t return an error set, it should take an allocator, the input parameter slices should be const, the function shouldn’t return either the input slice or a newly allocated slice.)

It’s interesting how Zig clicked for me pretty quickly (although I have been writing it for a couple of years now). But some of the strategies of ownership and data oriented design I picked up writing JavaScript. Sometimes returning a new slice and sometimes returning the same slice is a problem for memory cleanup, but I wouldn’t do it even in JavaScript because it makes it difficult for the caller to know whether they can mutate the slice safely.

I suspect that there’s a way to write this algorithm without allocating a temporary buffer for each iteration. If I’m right that it’s just intersecting N sets, then I would start by making a copy of the first set, and on each iteration, removing items that don’t appear in the new set. I suspect the author is frustrated that Zig doesn’t have an intersect primitive for arrays, but usually when the Zig standard library doesn’t have something, it’s intentionally pushing you to a different algorithm.

z0ltan 1 days ago [-]

[dead]

dwroberts 2 days ago [-]

Feels like maybe something lost in translation with their explanation - they say they were fed up of data structures etc. but they returned to Rust? I’m assuming there’s something a bit more nuanced about what they got tired of with Zig

namr2000 2 days ago [-]

Rust is a world away from Zig as far as being low-level. Rust does not have manual memory management and revolves around RAII which hides a great deal of complexity from you. Moreover it is not unusual for a Rust project to have 300+ dependencies that deal with data structures, synchronization, threading etc. Zig has a rich std lib, but is otherwise very bare and expects you to implement the things you actually want.

moveoverhal 22 hours ago [-]

Rust does have manual memory management. You can allocate memory and get a simple pointer back, and then it is on you to free, just like C.

It is just that, it also has conveniences that make it feel like a language where you don't even have to think about memory. I think that is pretty cool.

Ygg2 2 days ago [-]

This depends on what you mean by low level. Commonly it means, how much you need to take care about minute, low-level issues. In that way C, Rust, and Zig are about the same.

Dependencies have nothing to do with low-level vs. high-level but just package management, how well the language composes, and how rich the standard library is. Are assumptions in package A able to affect package B. In C that's almost impossible to avoid, because different people have different ideas about how long their objects live.

Having a rich standard library isn't just a pure positive. More code means more maintenance.

namr2000 2 days ago [-]

I agree with you that package management has nothing to do with how low-level a language is.

That being said Rust is definitely a much higher level language than either C or Zig. The availability of `Arc` and `Box`, the existence and reliance on `drop`, and all of `async` are things that just wouldn't exist in Zig and allow Rust programmers to think at higher levels of abstraction when it comes to memory management.

> Having a rich standard library isn't just a pure positive. More code means more maintenance.

I would argue it's much worse to rely on packages that are not in the standard library since its harder to gain trust on maintenance and quality of the code you rely on. I do agree that more code is almost always just more of a burden though.

Ygg2 2 days ago [-]

> That being said Rust is definitely a much higher level language than either C or Zig. The availability of `Arc` and `Box`, the existence and reliance on `drop`

I mean, C++ have RAII and stuff like unique pointer, does that make it higher level than Zig?

And what if you don't use Arc or Box? Is your program now lower level than baseline Rust?

As I said, depends a lot about what you mean by low level.

resonancel 1 days ago [-]

IMO "level" roughly corresponds to the amount of runtime control flow hidden by abstractions. Zig is famous for having almost no hidden runtime control flow, this appears pretty "low level" to many. OTOH, Zig can have highly non-trivial hidden compile time control flow thanks to comptime reflection, but hardly anyone identifies Zig as a "high level" metaprogramming language.

namr2000 2 days ago [-]

It depends on the facilities the language offers to you by default right?

C++ offers much higher level primitives out of the box compared to Zig, so I'd say its a higher level language. Of course you can ignore all the features of C++ and just write C, but that's not why people are picking the language.

zamadatix 2 days ago [-]

I'd say so. Zig is aiming to be a bit smarter than C while staying at roughly the same level. C++ more sought/seeks to support C but offer higher level things with it.

tialaramex 1 days ago [-]

And in practice the maintenance just doesn't get done. That's why Python's "rich standard library" with batteries included not only periodically has to throw out "dead batteries" because parts of its stdlib are now obsolete, but also has an ecosystem where good Python programmers don't use parts of the stdlib "everybody knows" just aren't good enough.

You see that in C++ too. The provided hash tables aren't good enough so "everybody knows" to use replacements, the provided regular expression features aren't good enough, there's the 1970s linear algebra library that somebody decided must be part of your stdlib, here's somebody's "my first optimized string buffer" type named string...

For now Zig is young enough that all the bitrot can be excused as "Don't worry, we'll tidy that up before 1.0" but don't rely on that becoming a reality.

z0ltan 1 days ago [-]

[dead]

ecshafer 2 days ago [-]

I think Rust is "higher level" than C or Zig in the sense that there are most abstractions than C or Zig. Its not Javascript, but it is possible to program Rust without worrying too much about low level concerns.

NooneAtAll3 2 days ago [-]

> in the sense that there are most abstractions

is it a typo for more abstractions? or is there some different meaning?

ecshafer 2 days ago [-]

yes its a typo

Ygg2 2 days ago [-]

Except if you need to expose or consume a C API, or you need to use some obscure performance improvement.

Pay08 1 days ago [-]

Fuck it, by that logic C and Python are the same language.

flykespice 2 days ago [-]

Which is still a crazy claim considering Rust is often told about having strong bureaucracy around even sharing variables (borrow checker).

namr2000 2 days ago [-]

The languages trade complexity in different areas. Rust tries to prevent a class of problems that appear in almost all languages (i.e two threads mutating the same piece of data at the same time) via a strict type system and borrow checker. Zig won't do any of that but will force you to think about the allocator that you're using, when you need to free memory, the exact composition of your data structures, etc. Depending on the kind of programmer you are you may find one of these more difficult to work with than the other.

slopinthebag 2 days ago [-]

There are some cases in Rust where the borrow checker rejects valid programs, in those cases it may be because of a certain data structure in which case you probably have many crates available to solve the issue, or you can solve it yourself with boxing, cloning, or whatever. The vast majority of the time (imo) the borrow checker is just checking invariants you have to otherwise hold and maintain in your head, which is harder and more error prone.

The actual hard part of Rust is dealing with async, especially when building libraries. But thats the cost of a zero-cost async abstraction I suppose.

mrkeen 2 days ago [-]

The author was fed up with not having data structures already provided, and needing to roll his own

resonancel 1 days ago [-]

Then it's actually the immature zig ecosystem that rubbed the author the wrong way, not zig the language itself. Not that the ecosystem isn't important, but IMO a language only truly fails you when it doesn't offer the composability and performance characteristics necessary for your solution.

dwroberts 2 days ago [-]

Not really understanding what this would be though, zig has all the basic stuff you would expect in its stdlib (hashmap, queues, lists etc) just like Rust

Pay08 1 days ago [-]

This is from last year, the stdlib was much more bare back then.

slopinthebag 2 days ago [-]

While you can obviously write low level code in Rust and manage allocations, memory, use pointers etc, you can also write much higher level code leveraging abstractions both in Rust itself and its' rich ecosystem. If you're coming from higher level languages it's much friendlier than C/C++ or Zig. I think I would struggle to write C or Zig effectively but I have no issues with Rust and I really enjoy the language.

zengid 2 days ago [-]

Quite a footnote [0]:

> I do not know if it is me being bored with the project, or annoyed with having to build and design a data structure, that has soured me on this project. But I have really at this point lost most motivation to continue this chapter. The way Zig is designed, it makes me deal with the data structure and memory management complexity head on, and it is tiresome. It is not "simpler" than, say, Rust: it just leaves the programmer to deal with the complexity, <strike-through>gaslighting the user</strike-through> claiming it is absolutely necessary.

[0] https://github.com/asibahi/paella/blob/main/writeup/c19.md#u...

groos 2 days ago [-]

I'm not sure why people seem to be under the impression that writing a compiler means that the language the compiler is implemented in should have "low level" features. A compiler is just a text -> text translation tool if you can leverage other tools such as an assembler and never needs to access machine level instructions. E.g., Pascal compilers have traditionally been written in Pascal, hardly a language which conjures up a "low level" image. Even when an assembler isn't available, all your implementation language needs to support, in terms of "low level" features, is writing of bytes to a file.

But manipulating instruction and file formats and such can be tedious if your language doesn't have the right capabilities but it's not impossible.

munificent 2 days ago [-]

> I'm not sure why people seem to be under the impression that writing a compiler means that the language the compiler is implemented in should have "low level" features.

Performance.

You definitely can write a compiler in a high-level language and given the choice I certainly prefer to on my hobby projects. Having a garbage collector makes so many compiler algorithms and data structures easier.

But I also accept that that choice means there's an upper limit to how fast my compiler will. If you're writing a compiler that will be used to (at least aspirationally) compile huge programs, then performance really matters. Users hate waiting on the compiler.

When you want to squeeze every ounce of speed you can get out of the hardware, a low-level language that gives you explicit control over things like memory layout matters a lot.

Philpax 1 days ago [-]

For context, this is the author of https://craftinginterpreters.com/ , and https://gameprogrammingpatterns.com/ , and https://journal.stuffwithstuff.com/2015/09/08/the-hardest-pr... .

I promise that he knows a thing or two about compilers and performance!

For what it's worth, I agree with him. A recent example is the porting of the TypeScript compiler to Go: it hasn't been fully released yet, but people are already going wild for its performance improvement over the original in-TS compiler.

Of course, it took them over a decade to reach the point where a port was necessary - so it's up to you to decide when that decision makes sense for your language.

titzer 2 days ago [-]

I think once you get the design of the IR right and implement it relatively efficiently, an optimizing compiler is going to be complicated enough that tweaking the heck out of low-level data structures won't help much. (For a baseline compiler, maybe...but).

E.g. when I ported C1 from C++ to Java for Maxine, straightforward choices of modeling the IR the same and basic optimizations allowed me to make it even faster than C1. C1X was a basic SSA+CFG design with a linear scan allocator. Nothing fancy.

The Virgil compiler is written in Virgil. It's a very similar SSA+CFG design. It compiles plenty fast without a lot of low-level tricks. Though, truth be told I went overboard optimizing[1] the x86 backend and it's significantly faster (maybe 2x) than the nicer, more pretty x86-64 backend. I introduced a bunch of fancy representation optimizations for Virgil since then, but they don't really close the gap.

[1] It's sad that even in the 2020s the best way to make something fast is to give up on abstractions and use integers and custom encodings into integers for everything. Trying to fix that though!

LeFantome 2 days ago [-]

Does “low level” translate to performance? Is Rust a “low level” language?

Take C#. You can write a compiler in it that is very fast. It gives you explicit control over memory layout of data structures and of course total control over what you wrote to disk. It is certainly not “low level”.

munificent 16 hours ago [-]

> It gives you explicit control over memory layout of data structures

Some with structs, yes. But overall it doesn't give you much control over where things up in memory once references get involved compared to C, C++, or Rust.

rayneorshine 1 days ago [-]

>Having a garbage collector makes so many compiler algorithms and data structures easier.

Does it really? Compilers tend to be programs that just appends a bunch of data to lists, hashmaps, queues and trees, processes it, then shuts down. So you can just make append-only data structures and not care too much about freeing stuff.

I never worry about memory management when I write compilers in C.

munificent 8 hours ago [-]

> Compilers tend to be programs that just appends a bunch of data to lists, hashmaps, queues and trees, processes it, then shuts down.

This is true if you're writing a batch mode compiler. But if you're writing a compiler that is integrated into an IDE where it is continuously updating the semantic state based on user edits, there is a whole lot more mutability going on and no clear time when you can free everything.

bsder 2 days ago [-]

> But I also accept that that choice means there's an upper limit to how fast my compiler will.

Don't buy it.

A decent OCaml version of a C or Zig compiler would almost certainly not be 10x slower. And it would be significantly easier to parallelize without introducing bugs so it might even be quite a bit faster on big codebases.

Actually designing your programming language to be processed quickly (can definitively figure things out with local parsing, minimizing the number of files that need to be touched, etc.) is WAY more important than the low-level implementation for overall compilation speed.

And I suspect that the author would have gotten a lot further had he been using a GC language and not had to deal with all the low-level issues and debugging.

I like Zig, and I use it a lot. But it is NOT my general purpose language. I'm definitely going to reach for Python first unless I absolutely know that I'm going to be doing systems programming. Python (or anything garbage collected with solid libraries) simply is way more productive on short time scales for small codebases.

baranul 11 hours ago [-]

> I suspect that the author would have gotten a lot further had he been using a GC language and not had to deal with all the low-level issues

Agree, that many people are using languages out of context to what they are actually trying to do. In many cases, using a GC language would be far more productive for them. Though I do think we should distinguish between compiled and interpreted GC languages, as often there is a significant gap in performance, that can be wanted or appreciated.

bsder 7 hours ago [-]

> Though I do think we should distinguish between compiled and interpreted GC languages, as often there is a significant gap in performance, that can be wanted or appreciated.

Sure, that is tautologically true.

However, I maintain that the original author would have gotten much further even with a pathologically slow Python implementation. In particular, munging all the low-level stuff like linking is going to have full-on libraries that you could pass off the task to. You can then come back and do it yourself later.

For me, reaching a point that helps reinforce my motivation is BY FAR the most relevant consideration for projects. Given the original article, it seems like I'm not alone.

pjmlp 1 days ago [-]

Any AOT compiled language offers enough performance for writing full compiler toolchain.

cxr 2 days ago [-]

This comment started out strong, but then:

> Pascal compilers have traditionally been written in Pascal, hardly a language which conjures up a "low level" image.

It may be the case that it doesn't conjure up such an image, but Pascal is approximately on the same rung as Zig or D—lower level than Go, higher level than assembly. If folks have a different impression, the problem is just that: their impression.

groos 19 hours ago [-]

Pascal, as defined by Wirth, had no "low level" features. E.g., no control over memory allocation other than the language provided new/dispose, no bit operators, clunky strings of fixed size, no access to system calls, no access to assembly, not even any hex or octal constants, all features which a language allowing "low level" access is expected to have (e.g. Ada, Modula-2/3, Oberon, all Pascal-derived languages). Things like conformant array parameters showed up much later in the ISO version but were not widely adopted. No modules either but this is not a low level feature. Turbo Pascal attempted to fix all this on the PC later on and it was deservedly well loved. Still, Wirth successfully wrote Pascal compilers in Pascal without --- obviously -- having a Pascal compiler available. [Link](https://en.wikipedia.org/wiki/Pascal_(programming_language)#...)

baranul 11 hours ago [-]

Many people seem to forget that Pascal evolved, and Wirth was very involved in that evolution. Wirth (as a consultant) helped create Object Pascal (from Apple), which then influenced Turbo Pascal and Delphi. Modula and Oberon are often referred to as influences on earlier or certain versions of Turbo Pascal (versions 4 to 5.5) as well.

userbinator 1 days ago [-]

It's because a compiler is supposed to be high-level to low-level; you already have a lower-level language to write in it, and not a higher-level one. Writing a C compiler in a higher-level language than C is going backwards.

E.g., Pascal compilers have traditionally been written in Pascal, hardly a language which conjures up a "low level" image.

How could the first Pascal compiler be compiled if it was written in Pascal, but a Pascal compiler didn't yet exist?

groos 19 hours ago [-]

> How could the first Pascal compiler be compiled if it was written in Pascal, but a Pascal compiler didn't yet exist?

Therein lies the magic of bootstrapping a language compiler written in the same language. Look it up.

userbinator 17 hours ago [-]

You need a different compiler to compile the first one. You are the one who needs to "look it up".

flossly 2 days ago [-]

I thought Zig has a C compiler built in? Or is it just the Zig build system that's able to compile C, but uses an external compiler for that?

Still a proper programmer-flex to build another one.

spiffyk 2 days ago [-]

Zig actually bundles LLVM's Clang, which it uses to compile C with the `zig cc` command. But the long term goal seems to not be so tightly coupled to LLVM, so I'm expecting that to move elsewhere. They still do some clever stuff around compiler-rt, allowing it to be better at cross-compilation than raw Clang, but the bulk of it is mostly just Clang.

There is also another C compiler written in Zig, Aro[1], which seems to be much more complete than TFA. Zig started using that as a library for its TranslateC functionality (for translating C headers into Zig, not whole programs) in 0.16.

[1]: https://github.com/Vexu/arocc

Pay08 2 days ago [-]

They're not planning on dropping Clang.

Meneth 2 days ago [-]

They kinda are: "This issue is to fully eliminate LLVM, Clang, and LLD libraries from the Zig project." https://github.com/ziglang/zig/issues/16270

Pay08 2 days ago [-]

Yes, as a backend. Clang as the `zig cc` frontend will stay (and become optional) to my knowledge.

ulbu 2 days ago [-]

libraries, not processes.

flykespice 2 days ago [-]

I find that a very bold move, how will they reivent the wheel on the man-years of optimization work went into LLVM to their own compiler infrastructure?

dnautics 2 days ago [-]

They're just removing the obligate dependency. I'm pretty sure they will keep it around as a first-class supported backend target for compilation.

jibal 1 days ago [-]

No, the whole point is to eliminate dependencies that they have to maintain. "not obligate" really doesn't mean anything if it's available as a backend--the obligation is on the Zig developers to keep it working, and they want to eliminate that obligation.

And the original question was "how will they reivent the wheel on the man-years of optimization work went into LLVM to their own compiler infrastructure?" -- the answer is that Andrew naively believes that they can recreate comparable optimization.

There are a whole lot of misstatements about Zig and other matters in the comments here by people who don't have much knowledge about what they are talking about--much of the discussion of using low-level vs high-level languages for writing compilers is nonsense. And one person wrote of "Zig and D" as if those languages are comparable, when D is at least as high level as C++, which it was intended to replace.

dnautics 1 days ago [-]

> the answer is that Andrew naively believes that they can recreate comparable optimization.

That's exactly wrong.

> There are a whole lot of misstatements about Zig and other matters in the comments here by people who don't have much knowledge about what they are talking about.

Well spoken. You should look in the mirror.

jibal 6 hours ago [-]

To clarify, my statement was based on comments I have seen and heard from Andrew Kelley when discussing this subject. I can't locate those at the moment, but here is https://news.ycombinator.com/item?id=39156426 by mlugg, a primary member of the Zig development team (emphasis added):

"To be clear, we aren't saying it will be easy to reach LLVM's optimization capabilities. That's a very long-term plan, and one which will unfold over a number of years. The ability to use LLVM is probably never going away, because there might always be some things it handles better than Zig's own code generation. However, trying to get there seems a worthy goal; at the very least, we can get our self-hosted codegen backends to a point where they perform relatively well in Debug mode without sacrificing debuggability."

The current interim plan (which I think was developed after the comments that I heard from Andrew, perhaps in recognition of their naivete) is for Zig to generate LLVM binary files that can be passed to a separate LLVM instance as part of the build process. Is that "a first-class supported backend target for compilation"? I suppose it's a matter of semantics, but that certainly won't be the current LLVM backend that does LLVM API calls.

P.S. It may be helpful to read through https://github.com/ziglang/zig/issues/13265

bsder 2 days ago [-]

Proebsting's Law: Compiler Advances Double Computing Power Every 18 Years

You need to implement very few optimizations to get the vast majority of compiler improvements.

Many of the papers about this suggest that we would be better off focusing on making quality of life improvements for the programmer (like better debugger integration) rather than abstruse and esoteric compiler optimizations that make understanding the generated code increasingly difficult.

convolvatron 2 days ago [-]

as a comment about a particular project and its goals and timelines, this is fine. as a general statement that we should never revisit things its pretty offensive. llvm makes a lot of assumptions about the structure of your code and the way its manipulated. if I were working on a language today I would try my best to avoid it. the back ends are where most of the value is and why I might be tempted to use it.

we should really happy that language evolution has started again. language monoculture was really dreary and unproductive.

20 years ago you would be called insane for throwing away all the man-years of optimization baked into oracle, and I guess postgres or mysql if you were being low rent. and look where we are today, thousands of people can build databases.

geodel 2 days ago [-]

All that will still be available just not in main zig repo. Someone may have asked same question about LLVM when GNU compiler exist.

spiffyk 1 days ago [-]

I expressly said "not be so tightly coupled to LLVM" because I know they're not planning on dropping it altogether. But it is the plan for LLVM and Clang not to be compiled into the Zig binary anymore, because that has proven to be very burdensome. Instead, the plan seems to be to "side-car" it somehow.

scatbot 2 days ago [-]

Cool project. Feels like writing a C compiler in Zig aligns nicely with the old "maintain it in Zig" idea that was part of Zig's early value proposition. Is that still considered a relevant goal today?

Longer term it also makes me wonder whether something like this could eventually reduce reliance on Clang/LLVM for the C frontend in zig's toolchain.

spiffyk 2 days ago [-]

There is actually another C compiler written in Zig, Aro[1], which Zig started using since 0.16 for its TranslateC module.

[1]: https://github.com/Vexu/arocc

mastermage 1 days ago [-]

Cool

0x1da49 2 days ago [-]

[dead]

Rendered at 12:35:38 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.