How Docker Image Lazy Loading Made My Airflow Webserver 80% Faster
---
Update (2025-07-16):
I realized I made a mistake in my earlier Airflow benchmark: I was testing port availability, which isn’t a reliable indicator since a port can be occupied as soon as a pod runs—even if the service inside hasn’t started yet. So, the previous “24x faster” comparison wasn’t accurate.
To fix this, I ran a more meaningful test by executing `airflow version` inside the pod. This requires the Airflow binary and all its dependencies to be ready, making the comparison much more accurate. The new results? I cut down runtime by about 80%—from 10 seconds to just 2 seconds! (see here) 🚀
I also started testing Spark Connect and experimented with another technique called Nydus (similar to stargz) to see how it affects performance. Early results are promising, but since Nydus implements Docker lazy loading differently (and needs more infrastructure customization), I’ll need more time to fully evaluate it.
Stay tuned for more updates as I dig deeper into these optimizations!
---
Ever stared at a loading screen, waiting for your container to start, and thought, "There must be a faster way"? That was me a couple of days ago, frustrated by how long it took to spin up an Airflow webserver in my dev environment.
Then I stumbled upon two awesome reads:
Databricks' 7x faster serverless boot: https://www.databricks.com/blog/booting-databricks-vms-7x-faster-serverless-compute
Depot's deep dive on estargz: https://depot.dev/blog/booting-containers-faster-with-estargz
And I thought: what if I could bring that "instant boot" magic to my own workflow?
What is Docker Image Lazy Loading?
Lazy loading means your container only pulls and unpacks the parts of the image it actually needs, when it needs them. No more waiting for the whole image to download and extract before your app can start.
Default (overlayfs): Downloads and extracts the entire image before starting.
Lazy loading (estargz, stargz): Starts the container almost instantly, fetching files on demand.
My Experiment: Airflow Webserver Startup
I set up a quick test:
Default overlayfs: Airflow webserver startup = 7.4 seconds
Lazy loading (estargz): Airflow webserver startup = 0.3 seconds
That's a 24x speedup! 🚀
You can check out my code and setup here
Why Does This Matter?
Faster dev/test cycles: No more waiting for containers to boot.
Better scaling: Spin up new services in a flash.
Happier engineers: Less time staring at loading bars, more time building cool stuff.
Cost savings: Faster startup means less compute time wasted, which can reduce your cloud bills—especially at scale.
Instant-on user experience: Just like Databricks Serverless, lazy loading enables super fast startup for user-facing workloads—perfect for interactive or on-demand use cases where waiting isn't an option.
Key Takeaway
Docker image lazy loading isn't just a neat trick—it's a game changer for anyone who wants faster, more efficient container workflows.
Curious to try it?
https://github.com/hoaihuongbk/lazydockerload-exploration
Check out my repo, or drop a comment if you've got questions or want to share your own results. And if you know someone who loves fast containers, send this their way!
-----------------
🍜 Still here? You must really like this stuff – I appreciate it!
If you enjoyed this post, come grab another tech bite with CodeCookCash:
▶️ YouTube: youtube.com/@codecookcash
📝 Blog: codecookcash.substack.com
👋 Want more behind-the-scenes and tech-life reflections?
Connect with Huong Vuong:
💼 LinkedIn: linkedin.com/in/hoaihuongbk
📘 Facebook: facebook.com/hoaihuongbk
💡 Follow for fun – read for depth – learn at your pace.