I’m building a new startup and file metadata plays an important role. There are thousands of file formats, each format may have dozens of versions, and each stores metadata differently.
While I would love to one day invest in creating a library to handle this monumental task, I think many would agree the best tool for this job is ExifTool1 by Phil Harvey.
Problem solved - throw it in a Docker container, run it when a file is uploaded, and call it a day? Not quite.
For some file formats, certain metadata can only be read by OS tooling. On macOS, for example, mdls
is capable of reading vector embedding information from the QuickTime container which is useful for RAG applications. Our use-case also needs metadata to be present when a file is uploaded - extracting the data on our servers means we add considerable overhead to upload post-processing & we lose data that is useful to customers.
So we need to extract metadata client-side and staple it to the upload. Herein begins a journey of self-inflicted pain and suffering.
ExifTool is written in Perl.
Undefined Behavior
No matter what programming language you choose to adopt for your new ideas, if you plan to grow and scale, there is a universal certainty that at some point your stack will come to depend on one piece of tooling written long ago in Python or Perl.
Python is objectively the most popular language, and as such a lot more community effort has been made to make embedding Python really easy. Perl hasn't received as much love.
To give Perl credit, they do have first-party documentation on how to embed Perl in a C program2, however, this is through dynamic linking so Perl still needs to be installed on the users system. On macOS Perl is installed by default3, but we have no guarantees if dependencies we need our installed, or if the prefix4 has been modified in a compatible way, and ultimately we shouldn’t be modifying the users system - especially since our SDK or CLI could be ran in an environment where changes aren’t persisted so we’d have to bootstrap each time.
On Windows, we would need to ship the entirety of Perl (which is what ExifTool does via PAR::Packer5) sign every binary, and pray no dynamic dependencies are missing.
Other options such as percc6 just don’t work. It is often heard that only Perl can parse Perl. This is not true. Perl cannot be parsed at all, it can only be executed. Perl has various built-in ambiguities that can only be resolved at runtime, so writing a self-contained interpreter is also nothing more than a fever dream.
But we’re already tugging on this thread, so let’s see where it leads. If we do ship our own build of Perl, that build needs to be statically linked. So how hard could that be?
Very.
There is essentially no documentation to achieve this, all we have are scattered forum post from venerable ghost. If they could do it, so could I.
Could we? Should we? Configure
Perl’s build system leans on a script called Configure7, which is essentially a massive (and I do mean massive) shell script that probes your system and figures out how it should compile Perl. We will get more into that later.
Unfortunately, I use an M-series Mac, and no matter what I tried, Perl just wouldn’t build—not even dynamically. I’m sure it’s user error on my part (perhaps I failed to properly appease the camel). Regardless, local compilation on macOS wasn’t in the cards. Even with Rosetta and a Linux VM, everything was just broken enough to halt progress.
Picture the rest of this saga as a game of “golf” with a GitHub Action—endless commits, pushes, and guesswork—because that’s exactly how it played out.8
After many hours, this was the magical incantation that finally produced a static build of Perl on Linux:
Let’s just look at the artif-
There’s little point in explaining why this path turned out to be a dead end. Suffice it to say, I nearly gave up. Then I recalled the experiment where I compiled .NET to WebAssembly, that produced really small modules — granted, the tooling there actually existed, but it gave me enough hope to think, “Maybe I can pull off something similar here.”
Emscripten is Magic
I’m hardly the first to toy with compiling Perl to WebAssembly. Projects like Perl.js9 forked the now-deprecated microperl10, and WebPerl11 used an older version of mainline Perl. Both involved extensive patching to make Emscripten cooperate — something I definitely don’t want to repeat. They were products of their time, when Emscripten was still maturing and heavy modification was unavoidable.
And the work done to get a static build was not entirely a waste, because only Perl can compile Perl, so we will utilize that in our workflow to build our WebAssembly version.
Remember that giant Configure
script we talked about? It doesn’t just sniff around your system — it can be coaxed into building Perl in just about any configuration under the sun.12
All you need is a deep reservoir of patience and the willingness to juggle endless environment variables, flags, and a hint file.
My hunch was that Emscripten’s NODERAWFS feature would theoretically let Perl’s file-system calls just work—no heavy patching needed. If I could build a WebAssembly version of Perl that still saw the world as a regular filesystem, then tools like ExifTool might just function out of the box. Astoundingly, after enough trial and error, I ended up with a working Perl build in WebAssembly form. Even more surprising? ExifTool ran! Two major takeaways:
It is possible to compile Perl to WebAssembly without patching the source.13
ExifTool can function in that environment, albeit with the Emscripten-generated JavaScript glue.
But (of course there’s a “but”): NODERAWFS isn’t truly raw filesystem access. The underlying data structures assume V8’s internal layout, so if you were hoping to simply drop this build into some other engine or reimplement the I/O layer, you’re in for a world of hurt. Replacing Emscripten’s JavaScript glue turned out to be unrealistic.
Still, this is a major win. I proved to myself that you can, in principle, compile Perl to WebAssembly and run Exiftool. But if I want a more engine-agnostic build that doesn’t rely so heavily on Emscripten’s JavaScript glue, the next logical step is WASI.
Given we now have a Configure hint file that produces a working Emscripten build, I should be able replace emcc with the wasi-sdk without any iss-
Once More Into the Fray
Emscripten isn’t magic just because it makes native code run in the browser — it’s magic because it does so while shielding you from the dark horrors of a 37-year-old codebase. I’m not just talking about labyrinthine lines of C; I mean the unyielding force that is Perl’s Configure
script, which insists on compiling and running little test executables—even when you explicitly tell it, “We’re cross-compiling here, please don’t do that!”
The workaround was to reproduce14 the Python scripts Emscripten uses to sidestep those tests, because apparently that’s how life works.
At this point, my workflow was simple (and maddening):
1. Tweak the hint file.
2. Lob new arguments at clang.
3. Wait for the build to fail on CI.
4. Repeat.
All the while, I’m chanting, “I do not want to patch Perl. I will not patch Perl.” - I had to patch Perl. Larry Wall, the inventor of the patch tool — also created Perl; so in a way, it’s poetic.
The first patch was relatively harmless: fix a bug in Configure
, which ironically was itself causing build failures. But the real trouble started with setjmp/longjmp
. Traditionally, these are used for exceptions, but Perl also uses them to run scripts. So, to compile successfully, we need -lsetjmp
plus the LLVM flag -mllvm -wasm-enable-sjlj
.
Here’s the rub: no major WASM runtime supports the latest exception-handling proposal. Browsers partially support the phase-3 version, but neither Wasmtime nor Wasmer implements the `exnref` side of things. Meanwhile, WASI-SDK’s documentation suggests that if your build references these libraries, your only choice is to run:
wasm-opt --translate-to-exnref -all -o your_app.exnref.wasm your_app.wasm
…and then you’re stuck with exactly one tool that can run it: toywasm15
Thankfully, a bit of luck arrived in the form of WACS16 — a C#-based WebAssembly runtime published by Kelvin Nishikawa. It didn’t initially support the current exception-handling proposal, but after I opened an issue, he added support in just two days. I contributed some fixes to the WASIP1 filesystem implementation in WACS, I was unblocked.
Then came my first Perl bug: a bizarre preprocessor glitch. I still haven’t figured out the real fix, but I worked around it by setting the environment variable LC_ALL=C
. Next up were integer overflows17 in file system calls, no elegant solution so far, but this quick patch resolves them.
And then, against all odds, it finally worked.
After applying wasm-opt, the build is 6.9MB. We’re not there yet though.18
The above build needs a host path to our “prefix” directory passed in as an argument, which we grant through WASI preopens. In theory, that’s fine: you drop the prefix files on disk, configure WASI, and Perl can open them. In practice, I noticed painful slowdowns. Perl scans a bunch of directories that don’t exist (it thinks it’s still on our CI machine with absolute paths), and every file read has to cross the host boundary. Also, what happens if those files get deleted?
So I figured, “Why not bundle the prefix inside the module and keep everything in guest memory?” That way, Perl can still open the modules it needs, but we don’t rely on the host filesystem for them—and for any other files, we fall back to WASI’s real I/O.
At first, I found wasi-vfs. It does exactly what I need. Unfortunately, it’s just a wrapper around wizer, which uses wasmtime, which doesn’t support the new exception-handling proposal. Crashed right out of the gate.
Then I saw a mention of using WASI Preview2 with something called WASI-Virt. I gave it a go, but it’s basically a haunted forest: no docs, half-implemented specs, Rust-only examples, and a WASI-SDK that just coughed up broken binaries whenever I tried enabling preview2 and working with ‘worlds’. That got old fast.
So I just rolled my own simple file system:
A script that slurps all the files in the prefix directory, concatenates them into a single binary blob, and spits out a C header + source file enumerating each file’s offsets in that blob.
I compile and link with flags like
-Wl,--wrap=open
,-Wl,--wrap=read
, and so on. That means whenever Perl tries to call the realopen()
, it actually calls our function__wrap_open()
first. We check if the requested path is in the “builtin” prefix. If it is, We serve the file directly from memory (viafmemopen()
); if not, we pass the call along to the original function (like__real_open
).A mini VFS table that keeps track of file descriptors, in-memory file sizes, and
FILE*
pointers. The code sets aside a little array (sfs_table
), which we populate when we “open” a builtin file. The actual data for that file comes from the blob we generated in step 1, so we just carve out the correct slice of bytes and treat it like a read-only buffer. If Perl tries toread()
from that file descriptor, it calls our__wrap_read()
, which checks the table to see if it’s one of our in-memory handles—if yes, it just does anfread()
from the buffer. Otherwise, it calls__real_read()
.We do the same trick with
stat()
,fstat()
,lseek()
, etc., always checking if the file is “ours” first and falling back if it isn’t. That keeps Perl’s scripts that rely on file checks from freaking out.
Once all that’s in place, Perl’s clueless that it’s not dealing with a “real” filesystem. I did find some weird bug where Perl panics if a file descriptor is > 16, so our simple file system shares the possible FD space with ones from the WASI host, but otherwise it works and the prefix is entirely bundled.
But, with every solution comes another problem: the build is now 50 MB.
Trimming the Build
To reduce the size of the build I’d remove a file from the prefix, create a new build, see if anything exploded, rinse and repeat. That might not sound too bad, but remember, Perl has a lot of files—docs, network modules we’ll never use, but you can’t assume how Perl interacts with Perl. If the build still worked, I’d record that filename in a delete.txt
for the CI pipeline to clean up automatically.
After purging everything I could, each .pm
and .pl
file still contained large amounts of whitespace and inline docs. I used Perl::Strip to fix that. I ran it across what remained in the prefix, and although the CI time jumped from ten minutes to nearly an hour, I ended up with a 9.1 MB WebAssembly build of Perl that is fully sandboxed and self-contained.
Wrapping Up
I achieved what I set out to do. But (and this is the last one), because there is no broad support for the new exception proposal in WASI/WebAssembly runtimes, zeroperl is useless to you.
At least until, if ever, wasmtime or wasmer actually support the proposal. But in the spirit of open source, you can find the full code here. I’m gonna go put ExifTool in Docker now. And now every WebAssembly runtime is supported. More details here.
Credits:
Vadim Kantorov: for providing a great starting point with Emscripten and for trading notes with me over email.
Kelvin Nishikawa: for the amazing work on WACS and for allowing me to collaborate on it.
Leon Timmermans: for attempting to diagnose the integer overflow cause
Karl Williamson: for the Configure patch
When people refer to the “Perl prefix,” they usually mean the installation directory (or “prefix”) under which Perl and its related libraries, scripts, and documentation live. In other words, it’s the top-level path that tells your system where to look for Perl and all of its components.
Did you know every time you add a boolean option, you double your configuration space. Add enough of them, and you end up with more possible configurations than atoms in the universe.
This is a perfect illustration of how committee-driven development can stall an otherwise amazing technology. No one wants to implement unfinished proposals, but without a reference implementation, who’s going to test it?
I stepped away from this project for three weeks and returned to a CI pipeline that had apparently forgotten how to build. Same pinned versions, but suddenly the produced prefix directory structure was all out of whack. I never did pinpoint the exact cause. I spent a full day chasing phantom changes before accepting this was just the new structure.
Did you publish an npm package for that?, I think it could be very useful
What would it take you think to get your proposal accepted.
In addition what if anything could be improved in Perl itself to make this job easier?